Question Is inter-CCD latency give any significant performance hit on Ryzen 5900X/5950X and 7900X/7950X for gaming??

Wolverine2349

Prominent
Apr 26, 2022
145
13
585
Is it an issue where 1% lows or 0.1% lows could drop severely for a CPU thread intensive game if a thread has to communicate with a thread on the other CCD??

Or has it been fixed or not an issue with Ryzen 7000 anymore??

And is it true that all Ryzen 7900X CPUs are 2 6 core CCDs and all 7950X CPUs are 2 8 core CCDs?? Or are there any 790X with an 8 core CCD and 4 core CCD??
 

kanewolf

Titan
Moderator
Is it an issue where 1% lows or 0.1% lows could drop severely for a CPU thread intensive game if a thread has to communicate with a thread on the other CCD??

Or has it been fixed or not an issue with Ryzen 7000 anymore??

And is it true that all Ryzen 7900X CPUs are 2 6 core CCDs and all 7950X CPUs are 2 8 core CCDs?? Or are there any 790X with an 8 core CCD and 4 core CCD??
Do you already disable SMT to ensure you have one thread per CPU?
Are you that close to the maximum possible performance that this is an issue for you ?
 
It's an issue no matter what due to locality. That is, the further something has to travel, the more latency it adds.

The only way this can be mitigated is to try to keep related threads in the same "node" (in this case, a CCD), but I recently read that Microsoft hasn't optimized the the scheduler for Zen 4 CPUs on Windows 11. It probably treats all the cores as having the same locality.
 

Wolverine2349

Prominent
Apr 26, 2022
145
13
585
It's an issue no matter what due to locality. That is, the further something has to travel, the more latency it adds.

The only way this can be mitigated is to try to keep related threads in the same "node" (in this case, a CCD), but I recently read that Microsoft hasn't optimized the the scheduler for Zen 4 CPUs on Windows 11. It probably treats all the cores as having the same locality.

They have not optimized it on WIN11 for this?? What about WIN10?? I mean its the same exact config as Ryzen 5000 dual CCD CPUs. What is there to optimize??
 
And is it true that all Ryzen 7900X CPUs are 2 6 core CCDs and all 7950X CPUs are 2 8 core CCDs?? Or are there any 790X with an 8 core CCD and 4 core CCD??
AMD 5000 and 7000 are 8 core CCD's, in the case o f the 7900X it's 2 x 8 core CCD's with 2 cores disabled on each giving you a total of 12 cores. The 7700X is a single 8 core CCD and the 7950X is 2 x 8 core CCD's with all core's enabled.

Is it an issue where 1% lows or 0.1% lows could drop severely for a CPU thread intensive game if a thread has to communicate with a thread on the other CCD??
There is a penalty for communicating across CCD's, the Windows scheduler though is aware of this and will attempt to fill up the first CCD first before moving to the second. On the previous AMD chips the difference between a 5900X and 5800X has been minimal. It does seem there is an issue with Ryzen 7000 though, however I would expect Windows optimisations for this. It certainly wouldn't put me off buying a Ryzen 9, they are still there better CPU's. Some games are actually smoother on the Ryzen 9's as well.
 

Wolverine2349

Prominent
Apr 26, 2022
145
13
585
AMD 5000 and 7000 are 8 core CCD's, in the case o f the 7900X it's 2 x 8 core CCD's with 2 cores disabled on each giving you a total of 12 cores. The 7700X is a single 8 core CCD and the 7950X is 2 x 8 core CCD's with all core's enabled.


There is a penalty for communicating across CCD's, the Windows scheduler though is aware of this and will attempt to fill up the first CCD first before moving to the second. On the previous AMD chips the difference between a 5900X and 5800X has been minimal. It does seem there is an issue with Ryzen 7000 though, however I would expect Windows optimisations for this. It certainly wouldn't put me off buying a Ryzen 9, they are still there better CPU's. Some games are actually smoother on the Ryzen 9's as well.


Why would some games be smoother on Ryzen 9 if there are 2 CCDs and games do no take advantage of more than 8 cores?? Especially being the case for the 7900X which only has 6 cores per CCD. At least the 7950X has an 8 core CCD you can lock the game threads too. Where as 7700X only has one 8 core CD anyways.

Are some games programmed well enough to not have the cross latency penalty matter?? I thought game threads always needed rapid fast communication with each other where as productivity apps that parallelism to infinite threads it did not matter cross communication latency??
 
Why would some games be smoother on Ryzen 9 if there are 2 CCDs and games do no take advantage of more than 8 cores?? Especially being the case for the 7900X which only has 6 cores per CCD. At least the 7950X has an 8 core CCD you can lock the game threads too. Where as 7700X only has one 8 core CD anyways.

Are some games programmed well enough to not have the cross latency penalty matter?? I thought game threads always needed rapid fast communication with each other where as productivity apps that parallelism to infinite threads it did not matter cross communication latency??
The games that still have any cross ccd are extremely rare, it only happens when there is "a bug" so to say.
Some games run smoother on ryzen because of the much bigger cache that helps games that are made for consoles since it simulates the merged ram pool much better.
13th gen intel is supposed to have much better minimums, they also increased the cache.

Also game threads do not needed rapid fast communication with each other, each thread does its own job and sends the result to wherever it needs to go (gpu sound card and so on) and they do not check on each other.
The most you can hope for is for the main thread to wait for everything being ready before continuing to the next loop.
 
Are some games programmed well enough to not have the cross latency penalty matter?? I thought game threads always needed rapid fast communication with each other where as productivity apps that parallelism to infinite threads it did not matter cross communication latency??
It depends on what the data dependencies are. If a thread references data that's in another CCX's cache or it does a write to a shared piece of data that has to propagate to other caches (known as cache coherency), then this incurs more latency. Since games are typically serialized in overall execution (but they may have components that can be run in parallel), then it does stand to reason that there's a lot of data dependency throughout the course of a game's life.

This is likely why the 5800X3D performs so well: it has a lot of cache space to hold all of that data.

You also can't program games to avoid latency penalties; cache is transparent to software. You might be able to set the process affinity, but the problem is in AMD processors, there's actually a preference on which core and CCX to run the threads on. The OS scheduler is supposed to be able to take advantage of this, but the bug with AMD and Windows 11 earlier was the CPU wasn't able to report this correctly to the OS. So the scheduler didn't know how to correctly schedule threads for optimal performance.
 
You also can't program games to avoid latency penalties; cache is transparent to software. You might be able to set the process affinity, but the problem is in AMD processors, there's actually a preference on which core and CCX to run the threads on. The OS scheduler is supposed to be able to take advantage of this, but the bug with AMD and Windows 11 earlier was the CPU wasn't able to report this correctly to the OS. So the scheduler didn't know how to correctly schedule threads for optimal performance.
https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-setthreadaffinitymask

It's something the devs set while coding, but because they code the games for consoles they only use the mask for the console CPU.
The OS can't do anything about that, it sends the threads where the games tells it to.
 
That's if you want to set the affinity. But you normally don't have to and really shouldn't have to do this.
Devs are doing it.
As a user you can change the affinity on threads that are already running, and most threads only set the affinity once so it does stick, but if you look at the link it is for coding and sets the affinity mask inside the code so that the threads start on the specified cores and stay there.
The OS is only responsible for when there is no affinity mask in the code, in that case the OS has free range to put any thread anywhere it chooses to.

So as far as your previous post goes, yes they can code a game in a way that it will stay on a single ccd and avoid latency penalties.
 
Devs are doing it.
Do you have proof?

As a user you can change the affinity on threads that are already running, and most threads only set the affinity once so it does stick, but if you look at the link it is for coding and sets the affinity mask inside the code so that the threads start on the specified cores and stay there.
The OS is only responsible for when there is no affinity mask in the code, in that case the OS has free range to put any thread anywhere it chooses to.

So as far as your previous post goes, yes they can code a game in a way that it will stay on a single ccd and avoid latency penalties.
The problem with doing this manually is you also have to know which CPU you're on. Doing this incorrectly will result in worse performance. For instance, Ryzen CPUs have preferred cores that should have work scheduled on them for optimal performance. You can't set a static mask because preferred cores are different for each CPU. And you certainly shouldn't be doing this on Alder Lake because you might throw something on an E-core when it should be on a P-core.

It's better to just let the OS do the work for you. It knows what the CPU is, it knows what the preferred cores are (assuming that mechanism is working correctly), why do the extra work?
 
Do you have proof?
Have you heard of farcry 4?
That starts running on the 3rd core no matter how many cores you have.
The community had to come up with an injector that forces the game to start on the first core for it to run on anything with less than 3 cores.
Also:
6 cores on the PS4, the two first are left for the OS, that's the reason for farcry 4 starting on the 3rd core.
They sort it out in a way that all cores are as full as possible without reaching 100% with the main core running alone and as fast as possible, which is why single core speed is still important no matter how many cores a game uses.
H4YOGDT.jpg

http://advances.realtimerendering.com/destiny/gdc_2015/Tatarchuk_GDC_2015__Destiny_Renderer_web.pdf

The problem with doing this manually is you also have to know which CPU you're on. Doing this incorrectly will result in worse performance. For instance, Ryzen CPUs have preferred cores that should have work scheduled on them for optimal performance. You can't set a static mask because preferred cores are different for each CPU. And you certainly shouldn't be doing this on Alder Lake because you might throw something on an E-core when it should be on a P-core.

It's better to just let the OS do the work for you. It knows what the CPU is, it knows what the preferred cores are (assuming that mechanism is working correctly), why do the extra work?
That's why I always say that all the games are made for consoles.
If you are lucky and your CPU is close enough to the console layout your games will run fine, otherwise good luck to you.
 
Have you heard of farcry 4?
That starts running on the 3rd core no matter how many cores you have.
The community had to come up with an injector that forces the game to start on the first core for it to run on anything with less than 3 cores.
Also:
6 cores on the PS4, the two first are left for the OS, that's the reason for farcry 4 starting on the 3rd core.
They sort it out in a way that all cores are as full as possible without reaching 100% with the main core running alone and as fast as possible, which is why single core speed is still important no matter how many cores a game uses.
H4YOGDT.jpg

http://advances.realtimerendering.com/destiny/gdc_2015/Tatarchuk_GDC_2015__Destiny_Renderer_web.pdf


That's why I always say that all the games are made for consoles.
If you are lucky and your CPU is close enough to the console layout your games will run fine, otherwise good luck to you.
This sounds like lazy programming more than programming for performance.

Given you mentioned Ubisoft, no wonder.
 

Wolverine2349

Prominent
Apr 26, 2022
145
13
585
The games that still have any cross ccd are extremely rare, it only happens when there is "a bug" so to say.
Some games run smoother on ryzen because of the much bigger cache that helps games that are made for consoles since it simulates the merged ram pool much better.
13th gen intel is supposed to have much better minimums, they also increased the cache.

Also game threads do not needed rapid fast communication with each other, each thread does its own job and sends the result to wherever it needs to go (gpu sound card and so on) and they do not check on each other.
The most you can hope for is for the main thread to wait for everything being ready before continuing to the next loop.

https://forums.tomshardware.com/thr...chmarks-and-more.3766309/page-4#post-22745932

In this thread you were saying latency cross CCD was extremely bad and super ouch and I had thought well maybe game threads did not need rapid communication with each other and could utilize more good cores even on separate CCX?? But you were saying to does not work like that unless game code was really small and that made me think they did need rapid communication with each other??

Your comment made me think about that and I researched it and it did concern me. Maybe it is not much of a concern or maybe it is. In a perfect worked, a 12 to 16 P core CPU with all cores on a single CCD/ring would be ideal. Then even HT/SMT off and it blazes so fast.

I am thinking about adopting AM5 platform and going more than 8 cores just for very CPU intensive games that could possibly use more. Plus AM5 platform has upgrade paths to Zen 4 X3D and then probably Zen 5. But the dual CCD thing from your comment a few months ago in that thread does concern me. Though no other choices. Well we have Intel and those e-cores, but not crazy at all going and using hybrid arch. As for asking why, just that is how I feel.
 

Wolverine2349

Prominent
Apr 26, 2022
145
13
585
Devs are doing it.
As a user you can change the affinity on threads that are already running, and most threads only set the affinity once so it does stick, but if you look at the link it is for coding and sets the affinity mask inside the code so that the threads start on the specified cores and stay there.
The OS is only responsible for when there is no affinity mask in the code, in that case the OS has free range to put any thread anywhere it chooses to.

So as far as your previous post goes, yes they can code a game in a way that it will stay on a single ccd and avoid latency penalties.


So they can code a game in a way that will stay on a single CCD. If it does have to take advantage of an extra core or more on the other CCD because cores are saturated, is it a bad hit or can game devs code it to do so properly.
 

Wolverine2349

Prominent
Apr 26, 2022
145
13
585
It does seem form reading there is a bad issue with WIN 11 22H2.

What about Windows 10 21H2?? Any issues there?

I assume the OS tries to ensure it will not if it is working right, but if a thread has to use another CCD, does the OS know to schedule it right to avoid the latency hit or does it schedule it correctly so it goes on other CCD to begin with so no latency hit. Or is it only if the threads when gaming do a swap between the CCDs it is a hit. Meaning could a game use more than 8 cores and have to put threads on 2 CCDs, but it does that ahead of time when the game is launched so that way no swapping between CCDs of game threads during game play and thus no penalty?? Or is there still a penalty with communication cross CCD between the threads that matters??
 
I had thought well maybe game threads did not need rapid communication with each other and could utilize more good cores even on separate CCX?? But you were saying to does not work like that unless game code was really small and that made me think they did need rapid communication with each other??
Again, it depends on what the threads are doing and what data they need. If the data they need is completely independent of each thread, then there isn't much of an issue with related threads being on difference CCDs. But if they do a lot of sharing between each other, then it can be a problem depending on how often they do it. There isn't really a black and white thing going on.

So they can code a game in a way that will stay on a single CCD. If it does have to take advantage of an extra core or more on the other CCD because cores are saturated, is it a bad hit or can game devs code it to do so properly.
The thing to note here for application development, especially with games, is that they have to target a lower end system and make sure it performs well for that. After all, if you have a game that only runs well on the latest and greatest hardware, you basically limit who your audience is.

Now you can have some room for growth to let your application perform better on better hardware, but there's only so much that actually scales because doing anything more would cause the lower-end hardware to perform worse.

So say for instance someone targets 30 FPS on a quad core. Sure it might have enough work for an 8-core to achieve much higher performance, but it's not going to get better after that because scaling higher (as a requirement anyway, not for optional cosmetic extras) would cause the quad core to perform worse.

It does seem form reading there is a bad issue with WIN 11 22H2.

What about Windows 10 21H2?? Any issues there?

I assume the OS tries to ensure it will not if it is working right, but if a thread has to use another CCD, does the OS know to schedule it right to avoid the latency hit or does it schedule it correctly so it goes on other CCD to begin with so no latency hit. Or is it only if the threads when gaming do a swap between the CCDs it is a hit. Meaning could a game use more than 8 cores and have to put threads on 2 CCDs, but it does that ahead of time when the game is launched so that way no swapping between CCDs of game threads during game play and thus no penalty?? Or is there still a penalty with communication cross CCD between the threads that matters??
As far as I know, Windows 10 isn't affected because AMD worked with Microsoft back in the 1903 release for Windows to take advantage of the "preferred core" system Ryzen processors implement.

Windows 11 has a different or updated scheduler. The standout hardware this was supposed to support was Intel's Thread Director. But something may have happened along the way that made Windows 11's scheduler somehow forget about AMD's preferred core system. But who really knows what happened?
 

Wolverine2349

Prominent
Apr 26, 2022
145
13
585
As far as I know, Windows 10 isn't affected because AMD worked with Microsoft back in the 1903 release for Windows to take advantage of the "preferred core" system Ryzen processors implement.

Windows 11 has a different or updated scheduler. The standout hardware this was supposed to support was Intel's Thread Director. But something may have happened along the way that made Windows 11's scheduler somehow forget about AMD's preferred core system. But who really knows what happened?

Well AMD preferred core system. Does that still apply if you are doing manual overclocking. I imagine with dual CCDs it still does, but not with a single CCD like 7700X/7600X?


Again, it depends on what the threads are doing and what data they need. If the data they need is completely independent of each thread, then there isn't much of an issue with related threads being on difference CCDs. But if they do a lot of sharing between each other, then it can be a problem depending on how often they do it. There isn't really a black and white thing going on.

In general are threads for most modern games produced from 2012 to 2022 (last 10 years) have threads that need lots of data sharing between each other, or do related game threads have data completely independent of each thread??
 
https://forums.tomshardware.com/thr...chmarks-and-more.3766309/page-4#post-22745932

In this thread you were saying latency cross CCD was extremely bad and super ouch and I had thought well maybe game threads did not need rapid communication with each other and could utilize more good cores even on separate CCX?? But you were saying to does not work like that unless game code was really small and that made me think they did need rapid communication with each other??

Your comment made me think about that and I researched it and it did concern me. Maybe it is not much of a concern or maybe it is. In a perfect worked, a 12 to 16 P core CPU with all cores on a single CCD/ring would be ideal. Then even HT/SMT off and it blazes so fast.

I am thinking about adopting AM5 platform and going more than 8 cores just for very CPU intensive games that could possibly use more. Plus AM5 platform has upgrade paths to Zen 4 X3D and then probably Zen 5. But the dual CCD thing from your comment a few months ago in that thread does concern me. Though no other choices. Well we have Intel and those e-cores, but not crazy at all going and using hybrid arch. As for asking why, just that is how I feel.
The latency is bad but it doesn't really matter because anytime it happens either MS or the game company brings out a patch that keeps the game on one ccd.
With the first gen of ryzen this was happening pretty often, lately either it doesn't happen any more or there is no coverage of it anymore.
https://www.guru3d.com/news-story/r...with-new-rise-of-the-tomb-raider-patch,2.html

And the same can be said about the e-cores, if there is a game that runs really bad because of the e-cores you can be pretty sure that it will be fixed, at least for new and popular games that still make them money.
 

Wolverine2349

Prominent
Apr 26, 2022
145
13
585
The latency is bad but it doesn't really matter because anytime it happens either MS or the game company brings out a patch that keeps the game on one ccd.
With the first gen of ryzen this was happening pretty often, lately either it doesn't happen any more or there is no coverage of it anymore.
https://www.guru3d.com/news-story/r...with-new-rise-of-the-tomb-raider-patch,2.html

And the same can be said about the e-cores, if there is a game that runs really bad because of the e-cores you can be pretty sure that it will be fixed, at least for new and popular games that still make them money.


SO basically is having game threads on 2 CCDs bad just as having game threads on e-cores is bad. But patches form Microsoft or AMD and Intel or game company fix it so the game will only stay only on cores on one CCD and never the other or in Intel's case stay on only the P cores and avoid touching e-cores entirely.

But maybe having the extra CCD with extra ciores and even the e-cores can benefit not the game itself, but maybe slightly smoother gameplay as they pickup background tasks where as an 8 core only single CCD or Intel with e-cores disabled does not have that extra slack. Though difference probably almost nill unles you have lots of heavy background stuff running that spikes CPU usage beyond 5% on one core or a CPU intensive foreground task you just minimize like streaming or video editing.

Otherwise for gaming only with minimal Windows 10 install with only HWINfo64, MSI Afterburner and NOD32 basic AV running in background and WIN Updates disabled makes more sense just to go with 8 core 16 threads for gaming as those are lite background tasks anyways and WIN Updates cannot start to interfere and AV wil not update as it has a game mode.
 
Last edited: