Right, but how often is that actually done? Unless they're a content creator or are archiving their media collection, there's not a whole lot of use cases a consumer would necessarily benefit from having more hardware threads.
Depends on what they're doing.
Are they editing video and encoding at the same time?
Are they rendering and doing other tasks in the background?
What is there system doing, and how many different things are going on at the same time?
The current workload can vary drastically.
Or one can look at Task Manager and see if their CPU utilization is close to 100% on something like an 12+ thread CPU. If it is, then sure, they could probably stand to have more cores or threads. But even as I'm typing this with a YouTube video playing, Discord, and various other apps running, I'm sitting at like <10% CPU utilization.
You might also be running loads that aren't very demanding of CPU resources, even if you run 100 different apps, if all of them are lightly threaded and aren't demanding work-loads, then it really doesn't matter.
You're not taxing the CPU with work-loads that are demanding enough.
The core issue with your argument that I see is you feel like this is a free lunch. A flexible design would require somewhere in the system to keep tabs on thread residency in order to predict how many threads to schedule in a core. I feel this is almost getting into the Halting Problem territory.
It can be determined by the CPU/OS/End User.
Which one you choose is up to you, I'd prefer to manually benchmark and set limiters on each program on how many threads per core they should generate before moving onto another core.
Given that we're coming into the era of many cores per consumer, I don't see that as a issue.
Assuming you know what you're doing, you can schedule or limit how many threads per core to run simultaneously, then scale up in core count as appropriate for your work load.
A major hurdle with this is that code that runs on the CPU tends to be unpredictable. And it's not like the software running doesn't know what it's going to do next, but for a typical consumer based system, the human also plays a role in this. Every time I type a key, I'm firing off an interrupt, which ruins the determinism of how to schedule things. Or if I have things coming in on the network (which things are always coming in on the network), that has to be serviced as well.
That's why we have many core paradigm in this day and age.
You can be doing whatever you're doing, network traffic can be running the background for whatever apps you have running the background, and you can even have certain work-loads doing heavy core stuff.
All auto-managed for you. You can even put limiters on each program.
The only reason why GPUs can get away with a deep SMT design is because the workloads GPUs do are highly predictable and deterministic. And it has to be, otherwise the GPU wouldn't work as well as it does.
There workloads also don't branch very much or at all. The fact that they're largely linear and computation heavy with little to no branching makes it perfect for GPU's.
If your code branches like crazy and jumps around, GPU's might not be the right solution for you. Everything depends on what you're trying to do. That's the kicker.