vmN :
Pinhedd :
As for the second part, about there not being any software capable of detecting which threads are "supposed to be the hyper-threads", that's provably false. Each logical processor in a system is identified by a unique APIC ID (APIC stands for Advanced Programmable Interrupt Controller) and the APIC ID of each logical processor contains within it the physical processor ID. If the physical processor ID is the same for two logical processors, then they share the same core. Accessing this is done through the CPUID instruction which is unprivileged and can thus be performed by any application regardless of operating system.
I'm pretty sure there is a language barrier between us.
"If the physical processor ID is the same for two logical processors, then they share the same core."
This is what I meant, really this very line, you will end up with 2 identical IDs, not one 1 ID for the physical cores and 1 ID for the "hyper-thread", but instead get the same.
What my statement was about, was for people general assumption that games will utilize the 8t/8c from lets say a fx 8320, but wont utilize more than 4t/4c out of a 8t/4c intel I7 processor because of hyper-threading.
The ID for the logical processor, which is the device on which threads are actually scheduled, is unique. However, the physical processor ID can be extracted from this and checked against another logical processor to see if they share the same physical processor. If that is the case, then those two virtual processors correspond to the two hardware threads exposed by a single core (assuming hyperthreading is enabled; if hyperthreading is not enabled, then each physical processor ID will be unique rather than each virtual processor ID).
This information is visible to the application, but that does not automatically mean that the application can make use of it. The operating system controls thread scheduling, not the application. Most modern operating systems provide a huge amount of leeway in how each process can configure its own threads and desired execution attributes but few guarantees are actually made.
A process may be able to deduce that out of 16 logical processors, numbers 0/1, 2/3, 4/5, and 6/7 are pairs exposed by 4 physical cores sitting in one processor package in socket 0, and numbers 8/9, 10/11, 12/13, and 14/15 are pairs exposed by 4 physical cores sitting in a second processor package sitting in socket 1. As such, it may wish to create 4 threads with affinity set to cores 0, 2, 4, and 6 respectively. This ensures that threads do not share core resources with each other (effectively disabling hyperthreading from the perspective of the application), and that threads do not cross socket boundaries.
This configuration is simply a request made by the process to the kernel. It is entirely up to the kernel designer's to decide to what degree they wish to honour the process's request (all will, but that's through proper design).
Setting up processes like this is awfully difficult, and in many cases awfully unnecessary. Kernel schedulers are very well developed and very mature. In the above example the scheduler may schedule the threads in the exact same way by default without any additional prompting on behalf of the application.
As for why some people assume that games will utilize all 8 threads on an 8 thread FX-8000 series microprocessor, but only 4 threads on an 8 thread i7-4000 series microprocessor... there are several possible explanations.
1. As I mentioned above, many PC applications will spawn worker threads based on the number of physical processors rather than logical processors. Intel's HT has been around for more than a decade, so application developers have gotten used to it.
One of the biggest benefits of HT over AMD's CMT is that HT balances two threads on one big back end. When one of the threads is idle, or HT is disabled, the complementary thread can monopolize the entire core including all of the core's backend execution capability.
AMD's FX series microprocessors cannot do this, disabling one of the cores per module disables both the frontend and backend of that core. This frees up some of the shared L1 and L2 cache capacity and bandwidth, but it also cuts the combined backend capacity in half.
With proper optimization, there's little difference between running 4 workers on an i7-4000 series microprocessor, and running 8 workers. Without proper optimization, 8 workers will win out with a 15%-30% margin as HT takes care of inefficiencies in the code.
2. AMD's FX series microprocessors appear as 8 physical cores with one thread each. AMD introduced a new layer of organization with the "module" that wasn't taken into consideration by any operating system or application. As such, with #1 above in mind an application that spawns threads on the basis of physical processors rather than virtual processors will spawn twice as many on an FX-8000 series microprocessor as it will on an i7-4000 series microprocessor. The threads that are spawned on the i7 will execute approximately twice as fast though.
3. They're full of crap. Many games over the past 8 years have been ported over from the PS3 and XBox 360. While the PS3 used a process model similar to some mainframes (don't ask), the 360 at least used a process model similar to the PC and was used as the basis for a large number of ports. As such, many games were designed with somewhere between 3 and 6 logical processors in mind. When the OS bounces these around to load balance, it may appear as if 8 threads are heavily loaded. This is partly an illusion caused by threads waiting for shared resources.
4. They're speculating. Both the PS4 and XBone are based off of AMD's FX series microarchitecture. Given the similarities between them, it stands to reason that developers wishing to squeeze performance out of the consoles would follow a similar model on the PC.