Question Would it be a good idea to disable Hyper Threading?

Matthew Dilks

Honorable
Sep 22, 2017
27
1
10,535
My current processor is equipped with 6 cores and 12 threads. If I were to disable Hyper Threading, would it lower my CPU temperature?
 
Dec 6, 2019
19
1
15
When u turn off hyper threading, your cpu will run at the same load, just maybr a littke bit cooler. As to having it on your cpu will get things done quicker, and be able to cool off faster.
 
  • Like
Reactions: Matthew Dilks

InvalidError

Titan
Moderator
My current processor is equipped with 6 cores and 12 threads. If I were to disable Hyper Threading, would it lower my CPU temperature?
Back in the Netburst days, Intel said that HT added ~5% to power and area cost. If your CPU is under less than 50% load, you may save some of that ~5%. If your CPU is under significantly more than 50% load (enough active threads to overwhelm a CPU without HT/SMT), the power reduction may be larger from having more under-used execution resources but you'll lose 30-40% in potential performance.

Disabling HT/SMT also forces the OS' scheduler to preempt threads more often for time-sharing the CPU between everything that is requesting CPU time, which means more time getting wasted in context switch overheads, which often translates to higher jitter in frame rates, aka frame rate variance or micro-stutter, for games. Some games that heavily favor single-threaded performance still fare better with HT/SMT off.
 
  • Like
Reactions: Matthew Dilks
My current processor is equipped with 6 cores and 12 threads. If I were to disable Hyper Threading, would it lower my CPU temperature?

Yes. How much though is debatable. It might be very close to 0, or it might be 20 or more degrees C.

The question is, why are you worried about heat? If you are below 90C you are good. Although 80C should be your target temp for long term reliability. If you are over that, then something is wrong with your setup (voltages, BIOS, airflow, or cooler)
 
  • Like
Reactions: Matthew Dilks
Back in the Netburst days, Intel said that HT added ~5% to power and area cost. If your CPU is under less than 50% load, you may save some of that ~5%.

That's a very interesting statistic. I wonder if that's under load or idle? If under load, that means only a maximum 5% of resources are duplicated in the pipeline which I find hard to believe for HT to be effective. At the very least the ALU and some of the FPU would have to be duplicated as these comprise the vast majority of instructions.
 
  • Like
Reactions: Matthew Dilks

InvalidError

Titan
Moderator
That's a very interesting statistic. I wonder if that's under load or idle? If under load, that means only a maximum 5% of resources are duplicated in the pipeline which I find hard to believe for HT to be effective. At the very least the ALU and some of the FPU would have to be duplicated as these comprise the vast majority of instructions.
The 5% difference is overhead, as in CPU with HT physically absent from the architecture vs CPU with HT disabled. With HT enabled, you have to add the extra power of execution units being active more often when the CPU is under heavy and sufficiently multi-threaded load to make HT do its thing.

There is no ALU/FPU duplication, the main thing HT adds is tagging of instructions and resources to keep tabs on which thread owns what. As CPUs get wider to increase instructions-per-clock, a single instruction flow (thread) becomes increasingly unlikely to contain a sufficiently diverse instruction mix with resolved dependencies within the re-order window to allow the scheduler to use every possible execution unit on every single clock tick. HT/SMT's performance gains come from enabling the scheduler to interleave instructions from two threads (more in some architectures like IBM's POWER8 which has SMT8) to increase its chances of filling more execution units more often without having to go as deep out-of-order or speculative to find eligible instructions.
 
Last edited:
The 5% difference is overhead, as in CPU with HT physically absent from the architecture vs CPU with HT disabled. With HT enabled, you have to add the extra power of execution units being active more often when the CPU is under heavy and sufficiently multi-threaded load to make HT do its thing.

There is no ALU/FPU duplication, the main thing HT adds is tagging of instructions and resources to keep tabs on which thread owns what. As CPUs get wider to increase instructions-per-clock, a single instruction flow (thread) becomes increasingly unlikely to contain a sufficiently diverse instruction mix with resolved dependencies within the re-order window to allow the scheduler to use every possible execution unit on every single clock tick. HT/SMT's performance gains come from enabling the scheduler to interleave instructions from two threads (more in some architectures like IBM's POWER8 which has SMT8) to increase its chances of filling more execution units more often without having to go as deep out-of-order or speculative to find eligible instructions.

I may be mistaken, but I think we are discussing the same thing just a matter of semantics. There are a pool of resources the pipeline can use for micro-instructions and the threads pull from the resources. One source of a stall condition is when both pipelines need the same resource and no other is available. Since there are an excess of ALU and FPU operations, the micro-op resources for these ops are often duplicated to reduce the chance of conflict. For example, a left/right bit shift and rollover (Carry) is a common op on (E)AX/(E)BX/(E)CX/(E)DX registers. But there are more than one set of AX/BX/CX/DX etc... registers inside a HT CPU core so that each can execute concurrently.
 
Last edited:

InvalidError

Titan
Moderator
I may be mistaken, but I think we are discussing the same thing just a matter of semantics.
Yes, you got the whole concept backwards. Mainstream CPUs are NOT getting wider and deeper to run multiple threads, they are getting wider to run one thread faster and have a SURPLUS of resources most of the time. SMT taps those under-used resources for extra performance at low architectural overhead cost.

Also, modern CPUs don't have fixed general-purpose registers like EAX/EBX/ECX/etc., they have a 128+ entries deep renamed register file, a reservation station that allocates a register file entry to each register write and then that renamed register entry becomes the new whichever-register in the context of subsequent instructions until the next time the whichever-register gets written.

Modern x86 CPUs are fundamentally RISC inside.
 
Yes, you got the whole concept backwards. Mainstream CPUs are NOT getting wider and deeper to run multiple threads, they are getting wider to run one thread faster and have a SURPLUS of resources most of the time. SMT taps those under-used resources for extra performance at low architectural overhead cost.

Also, modern CPUs don't have fixed general-purpose registers like EAX/EBX/ECX/etc., they have a 128+ entries deep renamed register file, a reservation station that allocates a register file entry to each register write and then that renamed register entry becomes the new whichever-register in the context of subsequent instructions until the next time the whichever-register gets written.

Modern x86 CPUs are fundamentally RISC inside.
That's kind of what I meant. :D Maybe I'm just wording it poorly. (And yes I knew about register renaming) The concept of limited registers is long dead. But there's plenty of code that still use them and CPU's that remap them through renaming. Is an AX and AX on a HT CPU? I'll agree and say "No, it isn't". It is remapped. But I wrote it that way to illustrate point that resources are duplicated. In this case an AX from Thread 0 and AX from Thread 1 are both renamed. But they are still a duplication of resources from that resource pool from with operations are performed.
 

InvalidError

Titan
Moderator
Is an AX and AX on a HT CPU? I'll agree and say "No, it isn't". It is remapped. But I wrote it that way to illustrate point that resources are duplicated. In this case an AX from Thread 0 and AX from Thread 1 are both renamed. But they are still a duplication of resources from that resource pool from with operations are performed.
Renamed registers still aren't duplicates of anything regardless of how many hardware threads a CPU core supports. They have absolutely nothing to do with SMT and predate SMT (at least the Netburst flavor) by something like a decade. What does get duplicated is the tracking of what renamed registers represent each hardware thread's current state, perhaps that is what you meant to say.

Register renaming is there so that out-of-order CPUs, especially with speculative execution, aren't paralyzed whenever an OoO instruction wants to overwrite a register that still has dependencies on it - other instructions that still need the old value. Assign a new renamed register to every register write, then refer all subsequent reads to the new renamed register entry so you never need to worry about overwriting a register that is still needed elsewhere or waiting for it. When speculative execution is unrolling a tight loop, you may end up with a dozen instances of every register the loop writes to in the renamed register file. These aren't duplicates either, merely local temporary variables so execution units don't get held back by easily avoidable register contention.