I don't understand this comment at all, it's completely random.
It wouldn't seem that way if you've actually done multithreaded programming. The fact that two threads are running on the same core doesn't change how you synchronize and communicate between them, or the things you have to do for load-balancing, etc. Plus, you don't control exactly when or where each of your threads gets to run, so you can't build in assumptions about exactly which thread of you program is running where or when.
If you are after latency then HTT gives you that many extra threads to dispatch work to right away.
If you after parallelization then HTT gives you that many extra threads for higher throughput.
Communicating, synchronizing, & balancing work between them incurs overhead. This increases as you involve more threads or seek to achieve better utilization. In throughput-oriented systems, that overhead can more easily be amortized. In latency-oriented systems, that overhead is more difficult to hide.
See also:
Well, what is the extend of what you have seen?
I never saw an atom by myself but I'm still pretty sure that everything is made out of them.
The way you describe it, we should expect 100% scaling to be the norm, for hyper-threading/SMT. Instead, we get results like what Anandtech found.
Also I don't need to proof math, if a thread only uses half the units of a core then another thread can use the other half.
There is nothing about that that needs any proving.
I didn't say math, I said
real-world data. You're talking about a conceptual model of how something works, which involves lots of simplifications and assumptions. But, what actually matters isn't how nice a model is, it's real-world performance.
It would need a buttload of statistics to show how many threads use more or less that half of the (any one, model, sku) core, but that is a completely different thing.
I'm just talking about some benchmarks. You're pretty good at finding those, when it suits you.
And it never changed since!
We never got HTT 2, like we did turbo 2, or avx 2, or so many other things.
That's because Hyperthreading is not an instruction set. So, Intel can change and refine it + the rest of the microarchitecture, without having to version-control it, the way they do with their instruction set extensions.
Also a real-world would be just as hypothetical as this, because it would be a different software from what you are using, or from what you would like to see.
It would at least tell us if there's
ever any truth to what you're claiming. People run software on actual hardware, not conceptual models. It's how the hardware behaves & performs that actually matters.
And what do you think that OoO changes?
Other than filling up even more gaps by taking code from farther ahead.
That's the point, isn't it? The better a core is at out-of-order instruction scheduling, the less dependent it is on another thread to achieve good utilization of the core's backend.
Are you even listening to yourself?!
How compute-dense something is doesn't change the amount of work?
So more compute is not more work?
If we're talking about pipeline under-utilization, then how long a task is executing is immaterial. In CPU terms, even a millisecond is a long time. At 5 GHz, that's half a billion clock cycles. So, you cannot say that whether a task which completes in a few milliseconds has anything to do with its pipeline utilization.