Having done benchmarking on this, hard locking a single threaded application to a core will give you a slight performance boost. We have multiple cores, windows will not preempt a thread off a core if there is another core already free. Hard locking won't run into issues with NT jacking your time slice as NT will just run on another available core instead. NT will only evict you if all other cores are @100% in which case you really don't need to be hard locking as your already maxing CPU performance. Hard locking is just a trick to squeeze out maximum performance in poorly threaded applications.
Not true. Understand the first rule of threading in Windows:
Windows, in all cases, without exception, will run the highest priority thread(s) that are capable of being run.
Granted, the switch may be for a few ns or so, so you don't notice. But it happens, and it happens a lot. Every time you access RAM? Your thread gets booted. Look at some variable stored in the L3? Thread gets booted? L2? Same deal. If your thread can not run, or if some higher priority thread can, you get booted. And guess what? Every time you get booted, there's a chance it may resume on a different CPU core.
Basic example: You have a priority of 5. Some other task has a priority of 1. So your thread runs for two cycles. Due to how the scheduler works, the prioritys will have changed so you have a priority of 3, and the other task has a priority of 3 (your priority gets decrmented while you run, while the other thread gets a boost while waiting). Windows decides to run the other task, your thread stops running. One cycle later, your thread has a priority of 4, and the other thread a priority of 2. So you figure your thread will resume on the same core, right? Unless some other thread on a different core has a priority of 1, in which case, guess what? He's the one who gets booted, due to having the lowest priority.
So you say "OK, I'll hardlock the core to avoid having to dump the cache". Lets look at this example:
Lets take the case of a 4 core system; you have 4 threads; 3 do a lot of work, 1 not so much. You query the processor, see it has four cores, and hard lock each thread to a core at runtime.
Meanwhile, at some point in time, you alt-tab out of the application, do some stuff on the internet, and so on. Meanwhile, your AV kicks off, and seeing no major CPU activity, kicks off the AV scanner, and loads its main thread on core 0, because it also hardlocked its threads to cores.
You come back, and blissfully unaware what just happened, return to your application.
You are now officially screwed from a performance standpoint; the thread on the first core (likely the games main thread) is now competing against the AV scan. Even if your thread has the higher priority (is gets a boost due to being in a foreground process), it is going to get booted some percentage of the time by the AV. And heaven forbid if the AV is coded where its main thread gets "high" priority by default.
When you hard lock threads, you make a major assumption: No other heavy workload application will come around and prevent an application critical thread from running.
The worst case, of course, would be the oft mentioned theoretical example of two games running side by side in windowed mode. Imagine if both games were coded to lock their three or four heavy workload threads on the first four CPU cores, blissfully unaware you are running an 8 core system and the last four cores are sitting idle. But, because you hard-locked a 1:1 core:thread ratio, the OS can't dispatch to the unused cores. Another good example: FRAPS. [And no, querying the CPU for
current is not a good idea, because the CPU core will NEVER be doing nothing at the time you query it. (sarcasm implied)]
So yes, if no other process heavy app is working, hard locking would likely lead to very minimal (5%) gains, simply due to not flushing the CPU cache. Run with another process heavy app though, and performance suffers for both applications.
Now, what you describe is common for consoles, because there is NO threat of other threads running. [As far as the PS3, which I am more familiar with, you have just over 200MB RAM and 6 SPE's to play with; the rest (~56MB RAM and the 7th PPE) is reserved for the OS. As only one task (OS aside) can run at a time, you can hard lock threads:cores without negative impacts, unlike on a multitasking OS.
This feature is actually incredibly important when dealing with NUMA architectures, scheduling a task to run on one CPU where it's memory space is located on a different physical CPU's memory is very bad.
Windows actually has NUMA specific API calls for just that situation. For one, you can set logical process groups (up to 64 cores per process group). Point being, you aren't going to see applications move onto a different processor node on their own.