Question Affinity to Preferred Processor

Maybe this has been asked/answered before so bear with me...

While games are becoming more multi-threaded there is always one or two threads that are the real bottleneck, and that's pretty obvious by watching the processors in Task Manager while playing modern triple-A titles. But today many processors identify the strongest cores for boosting and... you can see the strongest cores indentified in RyzenMaster for Ryzen processors for intance.

So the question is, how does the system know which thread is the bottle neck and is to be given affinity to one of the strongest cores, which can be boosted to highest / longest?

I'm aware there is a SetThreadIdealProcessor and SetThreadIdealProcessorEx functions which are used to inform the Windows10 scheduler to 'favor' a certain processor for specific threads. I assume (not being a programmer) that a program would query processor capabilities at startup and then use those functions to steer what it knows to be the bottleneck thread to the strongest cores (processors).

Does that mean games have to be coded correctly to do this? Is it the graphics driver that would do it (providing somewhat better assurance it will happen)?

Or is it something in the OS itself that can identify the heaviest loading thread and set affinity to favor the strongest core(s)?

When I read of benchmarks and game reviews for performance nobody ever tells if the games are using the strongest cores or not. It seems to me that it would be good to know your hardware is being properly and efficiently utilized.

Does anyone have in information on this?
 
IMO, you have a basic error in your assumption that there is a "strongest core". There is no differentiation.
Perhaps confused by 'strongest'...

As I understand it some cores are identified to have best thermal and power performance and are flagged by the mfr. You can see those flagged cores identified in RyzenMaster with little gold stars. The cores have the same inherent performance, i.e. IPC, it's just they are able to run at a higher frequency, with a heavier processing load, longer without exceeding defined current, thermal and/or power constraints at which time the frequency has to be reduced. This is essentially what I understand XFR to be all about.

Check out Toms' article on RyzenMaster:
https://www.tomshardware.com/reviews/amd-ryzen-7-2700x-review,5571-2.html
"....The fastest cores are identified during the binning process and flagged by Ryzen Master with gold stars on a per-CCX basis. The third- and fourth-fastest cores are marked with a circle. "

With a processor that boosts (XFR) one or two cores at a time to higher frequency it makes sense you'd want that (those) cores to be executing the heaviest loaded thread(s) especially in an application where everything else depends on the performance of that thread.

So my question is: how does this happen? What mechanism 'steers' those one or two threads to those cores so that they can benefit from the longer boost frequency? Or is there no mechanism and this whole XFR2 thing is kind of meaningless for games.
 
So my question is: how does this happen? What mechanism 'steers' those one or two threads to those cores so that they can benefit from the longer boost frequency? Or is there no mechanism and this whole XFR2 thing is kind of meaningless for games.
Normally the OS would put a high usage high priority thread on one of these cores automatically but this only works for short times because the OS will also start to shuffle all threads around all cores to keep individual cores from overheating and degrading.
Also having the main thread(s) running faster than the rest of the threads can cause a lot of sync problems,you see it in forums a lot where the suggested solution for stutter often is to lock all cores to the same clockspeed instead of having some cores boosted to different speeds.
 
Normally the OS would put a high usage high priority thread on one of these cores automatically but this only works for short times because the OS will also start to shuffle all threads around all cores to keep individual cores from overheating and degrading.

That raises several questions: first, does that mean the Window's scheduler indeed is cognizant of the processor's 'gold star' capabilities and acts accordingly?

And second: isn't that kind of reactionary and very dangerous from, a resource-cost perspective, to move execution threads around like that in a time-dependent application? Therefore, wouldn't it be best to schedule a high-useage thread on a 'gold star' core at the outset? Isn't that something the game developer would know and be able to account for with coding?

Also having the main thread(s) running faster than the rest of the threads can cause a lot of sync problems,you see it in forums a lot where the suggested solution for stutter often is to lock all cores to the same clockspeed instead of having some cores boosted to different speeds.

That's the argument I hear about why games don't really utilize multi-core processors effectively. Sure, they are more multi-threaded these days but there's always one thread that is the most heavily utilized and it stays pretty consistently so. I'd assumed that's to assure temporal dependencies aren't violated.
 
That raises several questions: first, does that mean the Window's scheduler indeed is cognizant of the processor's 'gold star' capabilities and acts accordingly?
The CPU knows which cores are better (and can tell windows) , windows knows which threads are the most demanding,that's why the OS can get CPU specific updates and also the CPU can get microcode updates to handle things better.
And second: isn't that kind of reactionary and very dangerous from, a resource-cost perspective, to move execution threads around like that in a time-dependent application? Therefore, wouldn't it be best to schedule a high-useage thread on a 'gold star' core at the outset? Isn't that something the game developer would know and be able to account for with coding?
Yes it is, if you follow tech news you will know about linux benchmarks popping up from time to time that show how much faster CPUs (mostly AMD) are in linux,that's because linux doesn't care about wear leveling.

Yes what you propose gives the fastest results and some games do that just by giving the main thread a high priority which makes windows concentrate on that thread more then others,that's why for example GTA V stutters like crazy on low-mid core count CPUs,and has artifacts even on high core count CPUs,due to how windows is designed the task manager drops everything until that high priority thread is done but in a game it's never done.

View: https://www.youtube.com/watch?v=VN-mdoMDuSQ

If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread.
...
Use HIGH_PRIORITY_CLASS with care. If a thread runs at the highest priority level for extended periods, other threads in the system will not get processor time.
 
...
Yes what you propose gives the fastest results and some games do that just by giving the main thread a high priority which makes windows concentrate on that thread more then others,that's why for example GTA V stutters like crazy on low-mid core count CPUs,and has artifacts even on high core count CPUs,due to how windows is designed the task manager drops everything until that high priority thread is done but in a game it's never done.
...

What all this suggests to me is they (game developers, etc.) don't really know how to ustilize the hardware we make available to play games on. I can understand some of the issues as they are time honored (or reviled), e.g., PC's being open-source platforms you really don't know what's 'under the hood'. But it seems they should be getting around to using these high core-count processors better as that seems the way forward in the wake of the repeal of Moore's Law.

Are you familiar with SetThreadIdealProcessor function? When I was reading about it in a Microsoft informational it sounded like a solution to the problems you relate from using HIGH_PRIORITY_CLASS. I have to imagine it comes with it's own set of baggage. But I'm not a programmer so all it did was raise my original questions.
 
But it seems they should be getting around to using these high core-count processors better as that seems the way forward in the wake of the repeal of Moore's Law.
You can't split up a workload and put it back together unless you synchronize everything to the slowest thread which negates any performance gain.
Because of that there are a lot of games that just don't synchronize anything,if you ever see some trees or bushes (tomb raider)or street signs(agents of mayhem) or whatever popping into the picture whenever they feel like that's because they are prepared on separated threads and display whenever they are ready.Far cry 5 has the different AIs separated which is why it became a meme that you couldn't have a conversation without something coming at you and interrupting you,the separated threads had no idea what the others where doing so everything would happen at once, some times.
 
You can't split up a workload and put it back together...

I can see that, but just splitting out the workload to all the cores isn't the only way to do it. Another way would be to arrange the thread with the greatest workload to execute on a core that can operate at the highest frequency for the longest time and then keep as many inter-dependent tasks on that thread as possible. That's what I thought the SetThreadIdealProcessor function would allow for.

The advantage is, of course, users wouldn't have to run at massive heat-generating, power consuming all-core overclocks.