Calculatron :
Pinhedd :
This is incorrect.
Hyperthreading is Intel's proprietary implementation of SMT, or Simultaneous Multi-Threading. SMT multiplies the front-end portion of a CPU core, allowing more than one logical context to be tracked at once (also called a logical processor). This suppresses variances in instruction type and flow which decrease execution efficiency. The backend can execute any mixture of instructions from any of the frontends attached to it, even in the same cycle. The instructions are selected dynamically to optimize usage of the microprocessor's execution ports to which the execution units are connected.
The cores are not divided into "real and fake" threads or "physical and logical" threads, each core with SMT enabled exposes two or more logical processors which are equal in capability. It is the job of the operating system and application designer to properly assign threads to these logical processors in order to optimize for their particular design parameters.
Intel's implementation of SMT only duplicates the frontend (two logical processors per core), as this is optimal for real time applications but IBM's brand new POWER8 architecture has 8 logical processors per core, and 12 cores per CPU for a total of 96 contexts per CPU.
Barring the number of threading, this sounds similar to what I described, but worded differently - but perhaps I am lost in the semantics?
Either way - you learn something new every day!
What you describe is closer to coarse-grained multithreading, not simultaneous multithreading. I'll cover the big three concepts behind hardware threading:
CGMT, or Coarse-Grained MultiThreading
FGMT, or Fine-Grained MultiThreading
SMT, or Simultaneous MultiThreading
From the perspective of user-mode software (I won't discuss kernel stuff as that can get a bit tricky) all appear the same, as two or more logical processors. The relationship between a logical processor and its physical hardware can usually be divined by looking at various identifiers such as the APIC ID and very high quality software will usually calibrate itself based on the arrangement of sockets, cores, and logical processors. However, there is no need to do this unless one wants to optimize performance.
Under the CGMT scheme the microprocessor tracks two or more thread contexts (one on each logical processor), and works on each thread in chunks. The microprocessor switches logical processors whenever a logical processor stalls out, (such as on a cache miss), blocks (such as on a lengthy IO operation), or after an allotted number of cycles to prevent starvation.
Under a non-multithreaded scheme a cache miss or a block can be resolved two ways. Either the microprocessor inserts stall cycles until the thread can continue, or the operating system performs a context switch which replaces the running thread on the logical processor with a new one. In most cases a cache miss will be resolved faster than the operating system can load a new context onto the logical processor (which in itself is highly likely to result in a cache miss) so a cache miss or an uncompensatable hazard in a non-threaded environment almost always results in stall cycles. However, a CGMT microprocessor can switch logical processors very, very quickly, and do so in a fashion that is completely transparent to the operating system. Thus, instead of stalling, the microprocessor switches to another context that is already loaded on another logical processor.
Under CGMT, whenever a certain stall threshold is met, the microprocessor switches execution to another logical processor so that it has something to do. When that logical processor stalls or exceeds its cycle allocation it switches again, either back to the first thread or to the next one. Even though the microprocessor is executing instructions from a separate logical processor, the memory manager will continue to resolve the miss for the processor that stalled.
FGMT takes this concept a little bit further and alternates execution between logical processors on every cycle. Ideally a stall would be resolved by the time that the microprocessor returns to the thread that stalled, but if it is not, it can be skipped. This is advantageous when execution resources are much cheaper and more plentiful than fast memory. Processors of this style are known as barrel processors because they rotate execution over the logical processors in a cyclic fashion. This style of execution was very popular in the 1990s and early 2000s.
SMT is the apex of multithreading, and is exclusive to superscalar microarchitectures (although FGMT/CGMT can also operate on superscalar architectures). The microprocessor issues instructions to the execution pipes (a feature of superscalar microarchitectures) from all logical processors in a dynamic fashion. If one logical processor stalls, blocks, or is idled for performance reasons, the microprocessor will dedicate all resources to the remaining logical processors. As the number of logical processors per core grows, it becomes easier and easier to keep the execution pipes on that core busy 100% of the time. This is very desirable in throughput sensitive environments such as application servers and databases, but can be detrimental to real-time sensitive environments such as gaming. This is why many game developers configure thread affinity on Hyperthreaded microprocessors to avoid fighting for resources.
I hope that this was informative.