Discussion Thoughts on Hyper-Threading removal ?

kanewolf · Mar 25, 2024

TerryLaze said:
Because it still adds at least around 30% of throughput on any app that scales with cores for every core that has HTT.
Why would anybody want to lose that amount of "free" performance?

It isn't free for the manufacturer. It isn't free for the software developer. It isn't free for the operating system developer. There are costs for everything.
If the CPU developer could remove HT and add 50% extra cache for the space taken up by HT or could have 25% clock speed increase, would removal be justified by you ?
It also doesn't add X% "to any app that scales". It only adds to apps that have breaks in their data flows / thread. IF a thread can actually monopolize a core, then it can be slowed by hyperthreading. CPU intensive applications have to be benchmarked with HT on and off to determine the optimum configuration.

TerryLaze · Mar 25, 2024

kanewolf said:
It isn't free for the manufacturer. It isn't free for the software developer. It isn't free for the operating system developer. There are costs for everything.

It's extremely cheap for the manufacturer, it's completely free for the dev since it's the exact same as multicore, if a dev wants something to work on more than one core then it will work on HTT just as well.

kanewolf said:
If the CPU developer could remove HT and add 50% extra cache for the space taken up by HT or could have 25% clock speed increase, would removal be justified by you ?

No, because HTT would still add more performance on top.

kanewolf said:
It also doesn't add X% "to any app that scales". It only adds to apps that have breaks in their data flows / thread. IF a thread can actually monopolize a core, then it can be slowed by hyperthreading.

Absolutely not.
Even if thread would be slowed down it would still increase throughput, and when running multithreaded things a higher throughput is always better than having fewer cores run faster.

kanewolf · Mar 25, 2024

TerryLaze said:
Even if thread would be slowed down it would still increase throughput, and when running multithreaded things a higher throughput is always better than having fewer cores run faster.

You and I will just disagree on the benefits of HT.

bit_user · Mar 25, 2024

TerryLaze said:
Because it still adds at least around 30% of throughput on any app that scales with cores for every core that has HTT.
Why would anybody want to lose that amount of "free" performance?

This is pretty much what I was going to say.

Yes, the E-cores add enough multi-threaded performance for most, but the extra performance you get from HyperThreading/SMT is the icing on the cake and probably makes the difference between winning multi-threaded benchmarks or not. I expect Intel would have to go to a core config of like 8P + 32E before they could ditch HTT without losing some multi-threaded benchmarks to AMD or their own previous-gen CPUs.

The most CPU-intensive thing I regularly do is compiling software. Often times, I'm compiling enough files that the extra threads from SMT make an actual difference. So, for me, the benefits are not merely theoretical.

bit_user · Mar 25, 2024

kanewolf said:
If the CPU developer could remove HT and add 50% extra cache for the space taken up by HT or could have 25% clock speed increase, would removal be justified by you ?

Cache is very expensive, in terms of die area. Estimates I've seen are that HTT adds somewhere between 3% to 5% more die area per core. By contrast, AMD's CCDs already devote more die area to L3 than to the actual cores. So, your thought experiment is way out of touch with the reality.

kanewolf said:
It also doesn't add X% "to any app that scales". It only adds to apps that have breaks in their data flows / thread.

The data we've already cited is pretty clear (see post #45). The integer/scalar workloads out there leave plenty of untapped capacity in the ALUs of modern CPUs. It's pretty much only some floating-point/vector workloads where the benefits of SMT are negligible or nonexistent.

kanewolf said:
IF a thread can actually monopolize a core, then it can be slowed by hyperthreading. CPU intensive applications have to be benchmarked with HT on and off to determine the optimum configuration.

The cases where SMT becomes a liability are probably more due to cache contention or poor coordination between the OS & applications than the ALUs or front ends already operating at max capacity, without.

Regarding OS/app coordination, what I mean by this is that games which have some latency-sensitive threads should schedule those to have exclusive access to a P-core and they need a good way to express this to the OS. That's the main reason HTT/SMT tends to be a liability for games. It's not that games don't have threads which could run perfectly fine on E-cores or sharing a P-core, but they need to work more effectively with the OS both to identify the CPU's capabilities and the application's needs. This is a fixable problem, but it takes times for threading APIs to be improved and adopted by software.

TerryLaze said:
it's completely free for the dev since it's the exact same as multicore, if a dev wants something to work on more than one core then it will work on HTT just as well.

Only for throughput-oriented applications, like my "compiling" example (and even some of those which are limited by cache contention might prefer fewer threads). Realtime or latency-sensitive applications are really where the complexity of SMT cannot be ignored.

TerryLaze said:
Even if thread would be slowed down it would still increase throughput, and when running multithreaded things a higher throughput is always better than having fewer cores run faster.

As I stated above, this universally holds for throughput-oriented applications, only. Not necessarily latency-oriented, where a small number of threads could dominate the critical path. If you can do more to parallelize your latency-sensitive applications, then great, but that's not always so easy.

TerryLaze · Mar 25, 2024

bit_user said:
I expect Intel would have to go to a core config of like 8P + 32E before they could ditch HTT without losing some multi-threaded benchmarks to AMD or their own previous-gen CPUs.

And they would still leave that extra performance on the table.

bit_user said:
As I stated above, this universally holds for throughput-oriented applications, only. Not necessarily latency-oriented, where a small number of threads could dominate the critical path. If you can do more to parallelize your latency-sensitive applications, then great, but that's not always so easy.

Again, hyperthreading adds a second thread that can run a latency critical thread at the same latency as it would on a second physical core.
It might run slightly slower but the latency, "click to something happening" or "command issued to something happening" would be the same.
Since latency critical things are NOT extremely parallel HTT will almost always be 100%

gamerk316 · Mar 25, 2024

I think the point I was making was lost:

What I *think* Intel is doing is basically the same as HTT, just with "all" cores that are on the die (or at least within a given L2 context).

HTT really just adds a bunch of CPU registers (the bare minimum you need to allow a thread of execution to run) and attempts to utilize the unused parts of its paired CPU core to maximize throughput. I think Intel is just going to start doing this across the board, trying to use all parts of a CPU core at the same time, to the point where a single CPU core could be running more then a single thread at a time. [Note this would depend on having enough registers free to pull this off, or some mechanism to store the result, but it's not totally impossible].

Also consider what you gain: die space. And given how large CPU cache is getting and how much they impact performance, I can see that die space being put to good purpose. Or maybe just not used to increase yields; both are equally likely.

bit_user · Mar 26, 2024

TerryLaze said:
And they would still leave that extra performance on the table.

Eh, you could say the same thing about current x86 P-cores using only 2 threads and not more. Maybe going wider is past the point of diminishing returns, but it should offer more performance, in most cases. That's why IBM and Sun have previously gone up to SMT-8. Even Intel did SMT-4, in Knights Landing. But, my point is that it goes to show that there's no "one right way" to design a CPU core. It basically comes down to making the best tradeoffs for whatever aspects you're trying to optimize.

For instance, why do you think E-cores don't have SMT? It should still offer more performance, just not necessarily greater efficiency. And the area overhead is probably greater than for big P-cores, making the value proposition more dubious. However, if the goal were simply to offer the greatest amount of multithreaded performance per area, than I would expect them to have it.

So, if the rumors are true, maybe Intel comes out and says that with all the security mitigations now being required, HTT is starting to use more die area and providing fewer benefits, while having a detrimental impact on single-thread performance. If that's their rationale for removing it, then I can see the logic of optimizing your P-cores to deliver the fastest lightly-threaded performance possible, while falling back on a big array of E-cores for multithreaded performance. It might not be the best tradeoff for everyone, but it might be a good tradeoff for the core markets they're targeting.

That's just a hypothetical. There are too many unknowns, so I really would rather not go too far down that path. However, I'll say this: if it turns out the main reason for them ditching HTT is just due to poor latency-sensitive thread scheduling by operating systems and applications (i.e. games), I think that would be really sad.

TerryLaze said:
Again, hyperthreading adds a second thread that can run a latency critical thread at the same latency as it would on a second physical core.

Not if the thread is compute-bound. In that case, you get the highest performance (i.e. lowest-latency) on a thread by giving it exclusive use of a P-core.

What I mean by latency-oriented workloads is something like a game or other realtime application, where you have to complete a specific amount of work in the shortest time possible. Also, where that amount of time is measured in just a few milliseconds or less. That limits the ability of software to do the sorts of batching, queuing, load-balancing, etc. that's needed to achieve good utilization across a multiplicity of cores.

TerryLaze said:

Please don't repost the same content in the same thread. We all saw it the first time. If you want people to re-read it, I'd suggest linking back to your prior post.

bit_user · Mar 26, 2024

gamerk316 said:
HTT really just adds a bunch of CPU registers (the bare minimum you need to allow a thread of execution to run)

These are logical registers, not physical ones. The physical register file of modern cores is far bigger than the number of ISA registers needed for both hyperthreads. According to this, each Golden Cove core has 280 integer registers:

Popping the Hood on Golden Cove

Alder Lake (ADL) is the most exciting Intel launch in more than half a decade.

chipsandcheese.com

x86-64 has only 16 general purpose ISA registers. So, there is already way more than enough physical registers to run multiple threads. The key question is: how many registers do you need to run N threads efficiently?

gamerk316 said:
attempts to utilize the unused parts of its paired CPU core to maximize throughput.

I'm sure instructions are dispatched fairly from both threads, so you wouldn't have a situation of thread 1 getting starved if thread 0 is exceptionally compute-dense.

gamerk316 said:
I think Intel is just going to start doing this across the board, trying to use all parts of a CPU core at the same time, to the point where a single CPU core could be running more then a single thread at a time.

I'm not clear on the distinction you're drawing between that and how SMT/HTT currently works.

TerryLaze · Mar 26, 2024

bit_user said:
What I mean by latency-oriented workloads is something like a game or other realtime application, where you have to complete a specific amount of work in the shortest time possible. Also, where that amount of time is measured in just a few milliseconds or less. That limits the ability of software to do the sorts of batching, queuing, load-balancing, etc. that's needed to achieve good utilization across a multiplicity of cores.

Especially in those cases hyperthreading gives the cpu one more thread per core to run such a thread.
Unless either thread is well over 50% of wideness of the core both threads will run at full speed.
See previous graph. (See, I'm not going to post it again even though you seem to not have looked at it at all)
Threads that will finish in milliseconds are not going to be super wide.

bit_user · Mar 26, 2024

TerryLaze said:
Especially in those cases hyperthreading gives the cpu one more thread per core to run such a thread.

The parallelization overhead is roughly the same, whether it's another thread running on the same core or a different one.

TerryLaze said:
Unless either thread is well over 50% of wideness of the core both threads will run at full speed.

Not from what I've seen, but of course you're free to try and find some real-world data which backs it up.

TerryLaze said:
See previous graph. (See, I'm not going to post it again even though you seem to not have looked at it at all)

It's a hypothetical - not real-world - and it's from January 2003!

Due to its age, it also doesn't take into account the vastly more sophisticated OoO machinery in modern CPU cores.

TerryLaze said:
Threads that will finish in milliseconds are not going to be super wide.

The amount of work has nothing to do with the compute-density of said work. Especially if the code is highly-optimized, it could be very compute-dense, even if the total amount of computation is relatively small.

TerryLaze · Mar 26, 2024

bit_user said:
The parallelization overhead is roughly the same, whether it's another thread running on the same core or a different one.

I don't understand this comment at all, it's completely random.
If you are after latency then HTT gives you that many extra threads to dispatch work to right away.
If you after parallelization then HTT gives you that many extra threads for higher throughput.

bit_user said:
Not from what I've seen, but of course you're free to try and find some real-world data which backs it up.

Well, what is the extend of what you have seen?
I never saw an atom by myself but I'm still pretty sure that everything is made out of them.

Also I don't need to proof math, if a thread only uses half the units of a core then another thread can use the other half.
There is nothing about that that needs any proving.

It would need a buttload of statistics to show how many threads use more or less that half of the (any one, model, sku) core, but that is a completely different thing.

bit_user said:
It's a hypothetical - not real-world - and it's from January 2003!

And it never changed since!
We never got HTT 2, like we did turbo 2, or avx 2, or so many other things.

Also a real-world would be just as hypothetical as this, because it would be a different software from what you are using, or from what you would like to see.

bit_user said:
Due to its age, it also doesn't take into account the vastly more sophisticated OoO machinery in modern CPU cores.

And what do you think that OoO changes?
Other than filling up even more gaps by taking code from farther ahead.

bit_user said:
The amount of work has nothing to do with the compute-density of said work.

Are you even listening to yourself?!
How compute-dense something is doesn't change the amount of work?
So more compute is not more work?

bit_user said:
Especially if the code is highly-optimized, it could be very compute-dense, even if the total amount of computation is relatively small.

I think you are confusing time to complete with amount of computation.

The amount of work can be very high with the time to complete still being very low.

bit_user · Mar 26, 2024

TerryLaze said:
I don't understand this comment at all, it's completely random.

It wouldn't seem that way if you've actually done multithreaded programming. The fact that two threads are running on the same core doesn't change how you synchronize and communicate between them, or the things you have to do for load-balancing, etc. Plus, you don't control exactly when or where each of your threads gets to run, so you can't build in assumptions about exactly which thread of you program is running where or when.

TerryLaze said:
If you are after latency then HTT gives you that many extra threads to dispatch work to right away.
If you after parallelization then HTT gives you that many extra threads for higher throughput.

Communicating, synchronizing, & balancing work between them incurs overhead. This increases as you involve more threads or seek to achieve better utilization. In throughput-oriented systems, that overhead can more easily be amortized. In latency-oriented systems, that overhead is more difficult to hide.

See also:

https://en.wikipedia.org/wiki/Amdahl's_law

TerryLaze said:
Well, what is the extend of what you have seen?
I never saw an atom by myself but I'm still pretty sure that everything is made out of them.

The way you describe it, we should expect 100% scaling to be the norm, for hyper-threading/SMT. Instead, we get results like what Anandtech found.

TerryLaze said:
Also I don't need to proof math, if a thread only uses half the units of a core then another thread can use the other half.
There is nothing about that that needs any proving.

I didn't say math, I said real-world data. You're talking about a conceptual model of how something works, which involves lots of simplifications and assumptions. But, what actually matters isn't how nice a model is, it's real-world performance.

TerryLaze said:
It would need a buttload of statistics to show how many threads use more or less that half of the (any one, model, sku) core, but that is a completely different thing.

I'm just talking about some benchmarks. You're pretty good at finding those, when it suits you.

TerryLaze said:
And it never changed since!
We never got HTT 2, like we did turbo 2, or avx 2, or so many other things.

That's because Hyperthreading is not an instruction set. So, Intel can change and refine it + the rest of the microarchitecture, without having to version-control it, the way they do with their instruction set extensions.

TerryLaze said:
Also a real-world would be just as hypothetical as this, because it would be a different software from what you are using, or from what you would like to see.

It would at least tell us if there's ever any truth to what you're claiming. People run software on actual hardware, not conceptual models. It's how the hardware behaves & performs that actually matters.

TerryLaze said:
And what do you think that OoO changes?
Other than filling up even more gaps by taking code from farther ahead.

That's the point, isn't it? The better a core is at out-of-order instruction scheduling, the less dependent it is on another thread to achieve good utilization of the core's backend.

TerryLaze said:
Are you even listening to yourself?!
How compute-dense something is doesn't change the amount of work?
So more compute is not more work?

If we're talking about pipeline under-utilization, then how long a task is executing is immaterial. In CPU terms, even a millisecond is a long time. At 5 GHz, that's half a billion clock cycles. So, you cannot say that whether a task which completes in a few milliseconds has anything to do with its pipeline utilization.

gamerk316 · Mar 26, 2024

bit_user said:
It wouldn't seem that way if you've actually done multithreaded programming. The fact that two threads are running on the same core doesn't change how you synchronize and communicate between them, or the things you have to do for load-balancing, etc. Plus, you don't control exactly when or where each of your threads gets to run, so you can't build in assumptions about exactly which thread of you program is running where or when.

Ding.

General rule: Outside of embedded OS's, you do *not* control exactly where or when threads execute, and can make zero assumptions about actual uptime.

bit_user said:
Communicating, synchronizing, & balancing work between them incurs overhead. This increases as you involve more threads or seek to achieve better utilization. In throughput-oriented systems, that overhead can more easily be amortized. In latency-oriented systems, that overhead is more difficult to hide.

See also:

https://en.wikipedia.org/wiki/Amdahl's_law

Yep, intra-thread communication and synchronization is a killer.

Also, bonus points for invoking Amdahl's Law

bit_user said:
The way you describe it, we should expect 100% scaling to be the norm, for hyper-threading/SMT. Instead, we get results like what Anandtech found.

If you have a workload you can "fully" make parallel and spawn an infinite amount of worker threads that do not need to synchronize, you can get upwards of 80% scaling, though you still lose performance elsewhere (generally communicating with RAM or cache concerns). There's also a point where scaling just drops off due to overhead.

bit_user said:
I didn't say math, I said real-world data. You're talking about a conceptual model of how something works, which involves lots of simplifications and assumptions. But, what actually matters isn't how nice a model is, it's real-world performance.

Yep, models are nice when you do initial design, but often times I've found reality just loves to punch you in the face,

bit_user said:
That's because Hyperthreading is not an instruction set. So, Intel can change and refine it + the rest of the microarchitecture, without having to version-control it, the way they do with their instruction set extensions.

Yep, HTT is basically invisible to the developer (unless they have a *very* specific workload where use of HTT is undesirable AND the scheduler does a poor job)

bit_user said:
If we're talking about pipeline under-utilization, then how long a task is executing is immaterial. In CPU terms, even a millisecond is a long time. At 5 GHz, that's half a billion clock cycles. So, you cannot say that whether a task which completes in a few milliseconds has anything to do with its pipeline utilization.

Throughput != Latency. More often then not, they go in opposite directions.

bit_user · Mar 27, 2024

Thanks for your reply. I think we've found many points of agreement!
: )

gamerk316 said:
If you have a workload you can "fully" make parallel and spawn an infinite amount of worker threads that do not need to synchronize, you can get upwards of 80% scaling,

This exactly describes how SPECrate-N works. They spawn N copies of each benchmark and measure the aggregate performance. This means there's neither any synchronization nor shared state between them. This heavily stresses the cache & memory subsystems, but it does show how well a system can scale up on such shared-nothing workloads.

In post #38, I inlined a graph where Anandtech measured this on the Alder Lake i9-12900K. It shows the aggregate speedup on SPECint2017 rate-N was 17.5% for hyperthreaded P-only (8P2T+0E) vs. non-hyperthreaded P-only (8P1T+0E).

If you look at the sub-scores, which I embedded in post #45, the integer benchmark with the greatest benefit was Leela (chess), with a speedup of 32.3% from hyperthreading. The benchmark showing the least benefit was Exchange2, showing only 0.2% speedup from hyperthreading. Unlike the SPECfp benchmarks, the integer suite showed no regressions due to it.

gamerk316 said:
Yep, HTT is basically invisible to the developer (unless they have a *very* specific workload where use of HTT is undesirable AND the scheduler does a poor job)

I had one AVX-heavy workload, where I used affinity masks to prevent any two such threads from running on the same core. That yielded a notable speedup.

jasonf2 · Mar 27, 2024

It seems to me that Hyperthreading created a generation of software that has been tailored and optimized to its existence based on Intel's four core is enough market hype. It's continuation into much higher core counts is questionable for multiple reasons. While it's implementation has obvious performance advantages, especially in limited core, optimized software applications, it has been a continuous source of security issues. The mitigations of those issues have in many cases resulted in either severely reducing performance or disabling the ability all together on certain chips. So from a software developer standpoint reliance on the technology for some sort of performance benchmark is pretty iffy based on track record, both from AMD and Intel. From a corporate customer standpoint where security is a major issue simply disabling SMT altogether is preferable to potential data breach.

The shared resource nature of SMT makes it intrinsically vulnerable because a bad actor is potentially concurrently using the same core and memory structures that sensitive and secure threads are operating on.

Throw the tile structure into the processor mess where we potentially have not only multigenerational core sets on the same package, but also multiple manufacturers and securely maintaining SMT is a real problem. This really increases the vulnerability footprint. I think as a tech it is probably always going to be able to show a performance boost in certain workloads. However as core count continues to increase, mix and match packages come out via foundry, and ASIC type cores becoming more prevalent (NPUs etc.) it's implementation and maintenance costs are questionable to it's gains.

As foundry ramps up Intel's wanting to be a one stop shop also makes SMT a conflict of interest. While they may be able to keep it going on their inhouse designs ,fabless implementations of X86 are a bit more nuanced.

With all said I am sure that overall SMT as a tech isn't dead. It has proven itself to be one of the most effective ways to organize core utilization and has been a mainstay of modern chip IPC. As a matter of opinion however the performance gains from SMT are really more about the Frankenstein's monster that X86 is today than SMT being the best way of feeding a core.

hotaru.hino · Mar 27, 2024

jasonf2 said:
It seems to me that Hyperthreading created a generation of software that has been tailored and optimized to its existence based on Intel's four core is enough market hype.

I would argue however that for most people, they don't need more than a handful of cores. If I'm not gaming or doing a Handbrake run, 1-2 cores are being used at any given time in my computer. If all you're doing is just watching YouTube videos and meming on text-based forums, then there's really no reason to have more than a handful of cores.

There was a video I watched about an HP laptop from around 2010 that would hijack the Windows boot process just so it could display some calendar and email data. And he surmised the only reason why they did that was beacuse we've reached a point in computers long ago that what we do with computers for most day-to-day activities reached a peak, and HP needed to find something to sell you, even if you didn't need it or it could do what you already do any better.

jasonf2 said:
As a matter of opinion however the performance gains from SMT are really more about the Frankenstein's monster that X86 is today than SMT being the best way of feeding a core.

Multiple CPUs that implement ISAs other than x86 use SMT. Besides that, there are scenarios where no amount of software optimization will help the overall performance of the system, but SMT can at least help gain some of that back.

One of the main problems with computers is that they're all designed to be a Turing Machine. Until we make something that gets away from this design, everything we make to increase performance are just kludges to get around the limitation that CPUs can only execute one thing at a time in a serial fashion. Or at least, they're expected to do that so even if they execute things out of order, they still go out of their way to present the results in order.

There are probably better ways of doing things, but it's hard to introduce something that requires everyone to do a massive amount of work to use, so we have to find ways around the limitations.

bit_user · Mar 27, 2024

jasonf2 said:
It seems to me that Hyperthreading created a generation of software that has been tailored and optimized to its existence

Funny enough, I'm not sure it has. I would've expected it to, but I'm not seeing much sophistication in threading APIs or applications for tagging threads as latency-sensitive, so the OS can know that it should try to give them exclusive use of a P-core, or for enabling applications even to easily figure out how many such threads they should spawn.

Instead, it seems the thinking was very simplistic: if an application has a lot of concurrency, just spawn a bunch of threads and if two end up sharing a P-core, you'll still have better overall throughput. If it doesn't have much concurrency, then the OS will try to give the thread(s) exclusive P-core usage simply by virtue of the CPU being undersubscribed and the OS preferring to grant exclusive P-core access.

So, the only real knobs an application had was how many threads to spawn and whether to set affinity masks (which, IMO, is a horrible idea). Worse: there's no dynamic adaptation of how many threads to use. Most games probably spawn more threads than they really need, almost inevitably leading to some threads landing on E-cores or sharing P-cores that really shouldn't.

jasonf2 said:
it has been a continuous source of security issues.

This point gets way too much hype. Linux & hypervisors have a very good mitigation for this, which is to avoid running threads from different processes or VMs on the same core. That one change quite nearly eliminates that entire set of issues from consideration.

jasonf2 said:
The shared resource nature of SMT makes it intrinsically vulnerable because a bad actor is potentially concurrently using the same core and memory structures that sensitive and secure threads are operating on.

Okay, then have fun disabling your L3 cache, de-interleaving your DRAM, and running just one VM per DRAM channel. That's where this "shared-nothing" thinking will lead you, and it's still not perfect, because we've seen data even leak across context switches.

jasonf2 said:
the performance gains from SMT are really more about the Frankenstein's monster that X86 is today than SMT being the best way of feeding a core.

Not at all. SMT dates back way before x86. As I mentioned, Sun and IBM have used it at up to 8-way, in SPARC and POWER CPUs.

GPUs scale it up even further, which goes to show what a powerful technique SMT is for increasing throughput even in very simple microarchitectures.

bit_user · Mar 27, 2024

hotaru.hino said:
I would argue however that for most people, they don't need more than a handful of cores. If I'm not gaming or doing a Handbrake run, 1-2 cores are being used at any given time in my computer.

LOL, I'd have thought that before I saw how my corporate laptop can get bogged down by all the security & other stuff running in the background. When I login to it, the first thing I do is start task manager and then get amazed at just how much it clobbers my CPU. Even with 24 threads, it still hits 100% utilization at points!

hotaru.hino said:
If all you're doing is just watching YouTube videos and meming on text-based forums, then there's really no reason to have more than a handful of cores.

Okay, but then you could also just use a quad E-core chromebook. So, that argument goes almost nowhere.

hotaru.hino said:
we've reached a point in computers long ago that what we do with computers for most day-to-day activities reached a peak, and HP needed to find something to sell you, even if you didn't need it or it could do what you already do any better.

Try browsing normal commercial web sites without an ad blocker. You'll be amazed at just how much it can bog down a modern CPU.

hotaru.hino said:
There are probably better ways of doing things, but it's hard to introduce something that requires everyone to do a massive amount of work to use, so we have to find ways around the limitations.

This is my take on why Intel introduced ThreadDirector. They couldn't afford to wait for the software ecosystem to adapt to better thread classification and work scheduling, so they made what I consider a hack.

Order 66 · Mar 27, 2024

bit_user said:
Try browsing normal commercial web sites without an ad blocker. You'll be amazed at just how much it can bog down a modern CPU

Modern low end CPU without HT, sure. I’ve never had my 7700x bog down even with ad heavy sites. My celeron based Chromebook struggles even browsing the web.

hotaru.hino · Mar 27, 2024

Order 66 said:
My celeron based Chromebook struggles even browsing the web.

That's more of a problem with modern web developers than the need for a powerful CPU.

There's no reason for Facebook to be essentially an application running on interpreted code, but here we are.

bit_user · Mar 28, 2024

hotaru.hino said:
That's more of a problem with modern web developers than the need for a powerful CPU.

Being able to point the finger of blame elsewhere does nothing to solve the problem.

In general, software tends to be targeted at the machines it's developed on. The typical developer or software company tends to stop optimizing once it runs "fast enough" on the machines they use for development and testing. So, the upgrade cycle is self-perpetuating, to some extent.

hotaru.hino said:
There's no reason for Facebook to be essentially an application running on interpreted code, but here we are.

No, there's no interpreting in a modern web browser. It's JIT-compiled. That's not a perfect solution, since it makes page loads take longer. I'm sure they use caching, but it's not a perfect/complete solution.

hotaru.hino · Mar 28, 2024

bit_user said:
Being able to point the finger of blame elsewhere does nothing to solve the problem.

In general, software tends to be targeted at the machines it's developed on. The typical developer or software company tends to stop optimizing once it runs "fast enough" on the machines they use for development and testing. So, the upgrade cycle is self-perpetuating, to some extent.

The rapid pace of web development technologies, coupled with the field having, I would argue, a fairly low barrier of entry, is a recipe for having people that won't be able to settle down and figure out how best to use the tools that were given to them. They might now, now that the dust has appeared to settle with whatever frameworks to use, but now they're saddled with technical debt from a decade of "just get it out the door" style of coding.

There's also the issue with a lot of media resources being handed out in much larger sizes than they need to be. I've seen instances where website push out megapixel sized images when it's presented down in less than a half a megapixel.

I fail to see a reason why websites need more processing power to run than the desktop environment the computer is running. Or at the very least, can't run smoothly enough.

mjbn1977 · Mar 28, 2024

When they can achieve higher performance without Hyperthreading than with Hyperthreading, I really don't care. If they decided to not use Hyperthreading due to the fact that performance with it is worse, than so what?

TheHerald · Mar 29, 2024

For context, on a CPU like the 14900k that has 16ecores, turning HT off lowers performance (in heavy MT scenarios that can take advantage of HT) by about 10%. It's nowhere near 20-25-30% that was mentioned in this thread. That is the case obviously because only 8 of the 24 cores have HT.

Now I don't know how much space HT takes and what one can do with that space, but I'm pretty sure something better can be done than just increasing performance by 10%. Since I'm primarily gaming, I always turn it off because unless you run out of physical cores, HT just gimps game performance and increases power draw by a ton at the same time.

Discussion Thoughts on Hyper-Threading removal ?

Titan

Titan

Titan

Titan

Titan

Titan

Glorious

Titan

Titan

Titan

Titan

Titan

Titan

Glorious

Titan

Distinguished

Glorious

Titan

Titan

Grand Moff

Glorious

Titan

Glorious

Distinguished

Respectable

Share this page