News New Ryzen 9000X3D CPU could deliver EPYC levels of game-boosting L3 cache — rumored chip reportedly sports 16 Zen 5 cores, 192MB L3 cache, 200W TDP

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
The V-cache chiplets may not be nearly as expensive as the price that they sell them at. I've heard a $20 figure in Zen3 days and with experience and industrialization that price most likely hasn't increased. That's quite a bit less than the 100% uplift they charged on those V-cache EPYCs.

What has changed is the speed loss due to the V-cache chiplet sitting on top, so there is very little in terms of clock constraints via the addition of V-cache. I'm pretty sure the 200W TDP rating isn't really due to V-cache power consumption, but just about creating a monster chips for a premium price, that leaves each and every potential competitor in the dust... until Zen 6 finally arrives.

It's again picking up a page from Intel's playbook, filling the gap between generations with a faster refresh to keep speed addicts buying.

Will this have a significant impact for gamers? not bloody likely [excuse my French], CCD-to-CCD overhead won't be improved and you'd really need games to be topology aware and need more CPU resources in the first place.

Since games are still primarily designed to respect console constraints and any heavy lifting is outsourced to GPUs and perhaps NPUs sooner or later, there simply isn't any demand. In fact, not even 3D is a true requirement for anyone happy with ~100FPS, especially when running at 4k: all of that stuff is mostly GPU bound.

As to scientific workloads, genetics, EDA etc. in other words those workloads which spawned V-cache on EPYCs: sure, they'll gain, potentially a lot. But people who run those, will most likely have the budgets for truly EPYC servers, which scale much further.

So this chip is mostly for people who are happy to cruise downtown with a 1000HP muscle car, and compared to that, it's a lot cheaper, and just as necessary.

If I hadn't sworn I'd wait for Zen 6 for my next upgrade, and if the price was reasonable (that's a hard if), I might actually be tempted. Because knowing you got the best, also relieves all upgrade anxiety.

I appreciate that AMD finally is giving consumers the full range of choices, especially since it costs them basically nothing. It's great to have a full menu, even if you don't tend to go for the caviar.
 
  • Like
Reactions: usertests
Oh dear god, I would be so moist if this happens.

I have a 5950X and starting a build around a 9950X3D, so this would be the perfect CPU to make that generational leap, especially as I have stagnated on core counts for a number of years, the biggest boost I can expect is an increased per-thread performance.
If all you do is gaming, you might consider going back to a 5800X3D or just sticking with what you got.

If you have a 2nd screen just pop up a couple of monitoring tools like HWinfo while you game and see what happens on the CPU side: 12 out of 16 cores might just get bored to death, while not even a single one may use top clocks.

The role of CPUs in gaming is greatly exaggerated, mostly via synthetic benchmarks at low resolutions

So if you game at 4k even with a 4090 or 5090 you're mostly GPU bound.

Of course there are some stupid games, FS2020 and FS2024 are among the worst in my library, where only the speed of the fastest single core really seems to matter. But in VR mode even my 7950X3D in combination with a 4090 is't able to keep the world from stuttering: the one thing that never happens in real-life, wings may flutter, but the crash to death is fluid motion.

If you find any way to compare or are able to take advantage of an e-tailer return window, I can only recommend that you use that opportunity.

I swapped a 5800X for a 5800X3D and then a 5950X, because the 5800X was really about as power hungry as the 5950X turned out to be, because it was a far worse bin.

The 5800X3D delivered not just very similar speed, but I just had to try V-cache. It was a much better bin, too, using significantly less power than the original 5800X. It didn't hurt anything I was doing, but I can't say that it was more than just a safe bet for gaming at the time: mostly I was assured that I had gotten the best there was, tuning or overclocking neither possible nor required.

I then upgraded to the 16-core, because my workstations are primarily for work and it had become cheap enough: my kids were happy about the 5800X3D and I basically upgraded three out of four to that level, when it had become affordable.

They aren't complaining yet and much pickier than I am.

I went with the 7950X3D for the other workstation, because it was a safe bet: it would deliver the best gaming performance possible, yet get extremly close to an optimal 16-core workstation performance as well. But the only walls I've ever hit seemed to be the GPU and stupid software. It's not running rings around the 5950X in general, but since that's only running a 5070 they aren't exactly comparable. For my normal simulation work, the difference is barely noticable, 8 extra cores help a lot more than the extra clocks.

On the CPU side speed records are still broken, but unless you have a CPU specific bottleneck (which are becoming rare in anything currently hot), the returns on investment are becoming very small indeed.
 
with 10000 series around the corner im kinda wondering whats the point !!

Dont get me wrong even i wanted dual stacked cache on the 9950x3d as ive held back on the 9800x3d and now the 9950x3d by sticking with my 7800x3d for this very reason .

i wonder though is x3d is starting to over hype its self into oblivion ..

With GPU's getting better and better the days of 1080 gaming is slowing down which obviously once you start going up in resolution cache becomes less and less of a factor ..

There are obvious gains with higher cache at every resolution but it lessens ..

Then the other question what is the diminishing returns on cache for games anyway ??

If you double the cache or triple it even is there a point where the benefit of all that extra cache does little ??
Is it right around the corner? It could be another 10-12 months. Although if this thing is real, it also may not be out for some months.

GPUs getting better and better actually means the CPU becomes more important, since it can eventually shift the bottleneck back to the CPU. So the 4K 240 Hz monitors that are starting to come out can have their potential realized. Although if I'm not mistaken, frame generation is taking more work away from the CPU.

The effect of cache is dependent on the game. It should continuously benefit some games until a steep drop off at some point where more doesn't help. So when you average all games together, tripling the cache seems to improves things by 15% average, but it's more like many games have 0% or small improvements, and some games have gigantic improvements of 50% or more.

Factorio is a classic example of a game that loves big L3 cache. It will probably benefit from 144 MiB (Zen 6 single CCD).

Game developers are living in the post-5800X3D world, so they could move to exploit big L3 more when it's available, or engines could become more scalable to take advantage of large caches. And hopefully, Intel will have its own version with Nova Lake. Ultimately, the sky is the limit. If you had 1 GiB of L3, that's 1 GiB of data that doesn't have to be fetched from slower L4 or DRAM.
 
On the
GPUs getting better and better actually means the CPU becomes more important, since it can eventually shift the bottleneck back to the CPU. So the 4K 240 Hz monitors that are starting to come out can have their potential realized. Although if I'm not mistaken, frame generation is taking more work away from the CPU.
The question is where 240Hz provide their main benefits and wether that's worth likely twice the energy expense (apart from the silicon cost). To my knowledge it would be mostly competitive shooters, nothing more eye candy oriented or heavily into details. The trading of HZ vs resolution found on modern display makes both economical and physiological sense, paying for everything always, isn't a univeral mindset.
The effect of cache is dependent on the game. It should continuously benefit some games until a steep drop off at some point where more doesn't help. So when you average all games together, tripling the cache seems to improves things by 15% average, but it's more like many games have 0% or small improvements, and some games have gigantic improvements of 50% or more.

Factorio is a classic example of a game that loves big L3 cache. It will probably benefit from 144 MiB (Zen 6 single CCD).
Factorio and perhaps other simulation games probably are most related to the V-cache original purpose like EDA tools. But those would also be the type of games, where I really don't see much of a benefit beyond 60-120Hz, since they are more about details and thinking than competitive human response times.
Game developers are living in the post-5800X3D world, so they could move to exploit big L3 more when it's available, or engines could become more scalable to take advantage of large caches. And hopefully, Intel will have its own version with Nova Lake. Ultimately, the sky is the limit. If you had 1 GiB of L3, that's 1 GiB of data that doesn't have to be fetched from slower L4 or DRAM.
Well, while at work, game developers certainly appreciate 192 cores or more, because Unreal requires extraordinary time for initial compiles, but manages to spread the load over all available CPUs.

In terms of target market, they have to follow the client population, and I don't know if Steam accounts for V-cache, but I'd hazard the numbers are small. And currently GPU related accelerations, DLSS and alikes, offer far greater gains and already far too many headaches for game developers to also invest into the CPU side of things: what the game engine won't do on its own, game developers will only do with big bribes from CPU vendors. At least that's what I read from Youtube interviews on that subject.

When it comes to game engines automatically scaling to CPU hardware, IMHO that's mostly wishful thinking, certainly for any of the older engines. I have no idea if even Unreal 5 invests into V-cache specific optimizations, because there is no money in this and it's very hard to do generically. The main attraction of AMD's X3D is its transparent gains in games, once it were to require special coding branches, it will see as wide an adoption as 3DNow! (and old AMD ISA extension from Socket 7 days).
 
Last edited:
When it comes to game engines automatically scaling to CPU hardware, IMHO that's mostly wishful thinking, certainly for any of the older engines.
Older engines don't really need such optimizations.

Engines in general target a wide range of PC hardware, and so they necessarily have to scale to some extent. Although in practice they will be aiming to support rather low-end hardware for broad sales potential, and particularly whatever the latest console is, with more not helping much other than driving up frame rates from an accepted baseline.
The main attraction of AMD's X3D is its transparent gains in games, once it were to require special coding branches, it will see as wide an adoption as 3DNow! (and old AMD ISA extension from Socket 7 days).
It already helps without any work by developers (or hurts negligibly from lower clock speeds, higher L3 latency) and that won't change. But if it were possible to make a few simple tweaks to utilize a big L3 cache if present, why not? This would be more likely for an Unreal Engine 6 and other broadly used engines.

We've seen rumors of Intel's Nova Lake introducing bLLC (big last level cache) on a single compute tile. But now we have a new rumor:
AMD Ryzen Dual-X3D and Intel Nova Lake Dual-BLLC leaks surface almost simultaneously

We're up to three rumored X3D-like SKUs from Intel, with one of them featuring two compute tiles equipped with extra cache, like this rumored AMD product.

Intel jumping in will provide more legitimacy to 3D cache, and competition will drive prices down (9800X3D's $480 MSRP is very high). Overall, many millions of office PCs will not be shipping with 3D cache CPUs, but it could become increasingly common among DIY gamers, where it is one of AMD's most popular offerings. In fact, it might become mandatory to add cache via 3D stacking some time in the future, since cache does not scale well and dominates the die area of CCDs made on expensive nodes.
 
If all you do is gaming, you might consider going back to a 5800X3D or just sticking with what you got.

If you have a 2nd screen just pop up a couple of monitoring tools like HWinfo while you game and see what happens on the CPU side: 12 out of 16 cores might just get bored to death, while not even a single one may use top clocks.

The role of CPUs in gaming is greatly exaggerated, mostly via synthetic benchmarks at low resolutions

So if you game at 4k even with a 4090 or 5090 you're mostly GPU bound.

Of course there are some stupid games, FS2020 and FS2024 are among the worst in my library, where only the speed of the fastest single core really seems to matter. But in VR mode even my 7950X3D in combination with a 4090 is't able to keep the world from stuttering: the one thing that never happens in real-life, wings may flutter, but the crash to death is fluid motion.

If you find any way to compare or are able to take advantage of an e-tailer return window, I can only recommend that you use that opportunity.

I swapped a 5800X for a 5800X3D and then a 5950X, because the 5800X was really about as power hungry as the 5950X turned out to be, because it was a far worse bin.

The 5800X3D delivered not just very similar speed, but I just had to try V-cache. It was a much better bin, too, using significantly less power than the original 5800X. It didn't hurt anything I was doing, but I can't say that it was more than just a safe bet for gaming at the time: mostly I was assured that I had gotten the best there was, tuning or overclocking neither possible nor required.

I then upgraded to the 16-core, because my workstations are primarily for work and it had become cheap enough: my kids were happy about the 5800X3D and I basically upgraded three out of four to that level, when it had become affordable.

They aren't complaining yet and much pickier than I am.

I went with the 7950X3D for the other workstation, because it was a safe bet: it would deliver the best gaming performance possible, yet get extremly close to an optimal 16-core workstation performance as well. But the only walls I've ever hit seemed to be the GPU and stupid software. It's not running rings around the 5950X in general, but since that's only running a 5070 they aren't exactly comparable. For my normal simulation work, the difference is barely noticable, 8 extra cores help a lot more than the extra clocks.

On the CPU side speed records are still broken, but unless you have a CPU specific bottleneck (which are becoming rare in anything currently hot), the returns on investment are becoming very small indeed.
I do more than gaming... Thus I would never make a regression to the 5800X3D.

AM4 is also an outgoing platform, some of us need/want more than just PCI-E 4 for storage and GPU compute, some of us want more than just USB 3.2 Gen 2x1. It's not just about the CPU.

The 5800X3D is not the be-all, end-all.
 
That and a good supply of the cache chiplets. Do you know what node the latest ones used with Zen 5 CPUs are one?


This is a good part for workstation-lite people who have workloads that benefit from 3D cache, or people who want 16-core, 3D cache, and zero scheduling issues.

If there is a game that could actually show a significant benefit from two CCDs having 3D cache, that would be wild. I doubt it exists, but if you programmed specifically for this processor, it wouldn't need to be heavily multi-threaded. You would just use at least one core on each CCD, with those cores using up as much cache as they can.

Maybe the scheduler can make that happen, i.e. if at least one core on each CCD can boost really high at the same time, because the thermal load is spread out, then they will naturally use both pools of cache.

Having a larger cache pool (Zen 6 X3D should have 12 cores with 144 MiB on a CCD instead of 96 MiB) is going to be more broadly beneficial. But if some developer really wanted to show off, they might be able to force dual cache to be leveraged.

This is a rumor though, so don't get too excited.
You will still get scheduling issues with this. Gaming workload hate latency and cross CCD latency is big enough to cause stutter.

This will only boost performance for productivity workload or productivity like gaming workload like large scale simulation games.

So you still need to park ccd1 for most games.
 
You will still get scheduling issues with this. Gaming workload hate latency and cross CCD latency is big enough to cause stutter.

This will only boost performance for productivity workload or productivity like gaming workload like large scale simulation games.

So you still need to park ccd1 for most games.
False. The "which CCD should we use" problem was solved long ago. It's not such a hard problem to solve, because NUMA is a thing and has been for decades. Windows just has been lagging behind because it's AMD tech. The problem is NUMA, IIRC, does not include cache topology and hasn't been updated/extended for it.

I know, because I went from the 5900X to the 9950X3D and the 5900X never had issues, even in VR. The achilless heel of the 9950X3D is VR games and it's a pain. Symmetric CCDs would make the problem go away entirely.

Regards.
 
False. The "which CCD should we use" problem was solved long ago. It's not such a hard problem to solve, because NUMA is a thing and has been for decades. Windows just has been lagging behind because it's AMD tech. The problem is NUMA, IIRC, does not include cache topology and hasn't been updated/extended for it.

I know, because I went from the 5900X to the 9950X3D and the 5900X never had issues, even in VR. The achilless heel of the 9950X3D is VR games and it's a pain. Symmetric CCDs would make the problem go away entirely.

Regards.
Wrong. Both 5900x and 9950X3d are UMA.
They are not NUMA as the memory controller is in the IOD not the CCD.

Just they have CPU core complex that when bouncing cache lines result in a big latency penalty.

Only first gen EPYC/TR are actually NUMA and the IMC is spreaded across multiple CCDs.

With identical CCDs you still need to limit your game to run on one CCD only. Or you will have a bad time when threads are dispatched across CCDs.


BTW, Windows supports NUMA perfectly.
You will only utilize 1 NUMA node if your software is NUMA un-aware. That support exist since NT4 at least. Windows can keep track of node local memory and have APIs for applications to support cross node allocations.
 
Last edited:
Wrong. Both 5900x and 9950X3d are UMA.
They are not NUMA as the memory controller is in the IOD not the CCD.

Just they have CPU core complex that when bouncing cache lines result in a big latency penalty.

Only first gen EPYC/TR are actually NUMA and the IMC is spreaded across multiple CCDs.

With identical CCDs you still need to limit your game to run on one CCD only. Or you will have a bad time when threads are dispatched across CCDs.


BTW, Windows supports NUMA perfectly.
You will only utilize 1 NUMA node if your software is NUMA un-aware. That support exist since NT4 at least. Windows can keep track of node local memory and have APIs for applications to support cross node allocations.
They expose a unified memory interface, yes, but they're still NUMA in the sense you can group the logical nodes. But it's a fair thing to point out, I guess.

And Windows Server has had NUMA support for a while, but I don't recall the consumer side Kernel having NUMA support at all. Or so I recall reading some time ago when I was looking to upgrade to the 5900X. That's why I mentioned that.

If you know more, then please give more details on the specifics as they're escaping me and I haven't found any reliable ways to get the exact topology of the CPU in Win11 Pro.

Regards.
 
They expose a unified memory interface, yes, but they're still NUMA in the sense you can group the logical nodes. But it's a fair thing to point out, I guess.

And Windows Server has had NUMA support for a while, but I don't recall the consumer side Kernel having NUMA support at all. Or so I recall reading some time ago when I was looking to upgrade to the 5900X. That's why I mentioned that.

If you know more, then please give more details on the specifics as they're escaping me and I haven't found any reliable ways to get the exact topology of the CPU in Win11 Pro.

Regards.
Windows client support NUMA for quite some times. What recently changed is the processor group limitation. Due to super high core count UMA EPYCs being.a thing now. It have to be recognized as NUMA and multiple processor groups due to per group core count limitations. And starting from windows 11 NUMA unaware softwares will run on all groups aka all cores by default. Previously they are constrained by a single processor group.

This is not related to dual CCD Ryzen although they are also UMA with variable core to core latency. They are far from reaching the processor group limitations. And they appeared as a single processor group.

Most productivity workload doesn’t care about core to core latency that much and will run across CCDs by default starting from the high performance preferred core marked by BIOS and chipset drivers.

Gaming workloads need to be limited to one CCD only to avoid cross CCD communications. Plus parking the CCD1 can leave more power budget for CCD0 to boost higher. Game bar basically detects whitelisted games and overwriting the preferred core on the fly to force windows to dispatch thread to CCD0 X3D core first and parks CCD1 cores to prevent thread creation on them.

So for a dual X3D chip or 9900X/9950X. The preferred core doesn’t need any on the fly modification anymore. But the core parking is still needed.

That why AMD mandated the chipset X3D drivers for 9900X and 9950X, even though they are not X3D chips.
 
  • Like
Reactions: -Fran-
Windows client support NUMA for quite some times. What recently changed is the processor group limitation. Due to super high core count UMA EPYCs being.a thing now. It have to be recognized as NUMA and multiple processor groups due to per group core count limitations. And starting from windows 11 NUMA unaware softwares will run on all groups aka all cores by default. Previously they are constrained by a single processor group.

This is not related to dual CCD Ryzen although they are also UMA with variable core to core latency. They are far from reaching the processor group limitations. And they appeared as a single processor group.

Most productivity workload doesn’t care about core to core latency that much and will run across CCDs by default starting from the high performance preferred core marked by BIOS and chipset drivers.

Gaming workloads need to be limited to one CCD only to avoid cross CCD communications. Plus parking the CCD1 can leave more power budget for CCD0 to boost higher. Game bar basically detects whitelisted games and overwriting the preferred core on the fly to force windows to dispatch thread to CCD0 X3D core first and parks CCD1 cores to prevent thread creation on them.

So for a dual X3D chip or 9900X/9950X. The preferred core doesn’t need any on the fly modification anymore. But the core parking is still needed.

That why AMD mandated the chipset X3D drivers for 9900X and 9950X, even though they are not X3D chips.
So, what I read from your explanation: "consumer" Windows doesn't expose much in the way of config, other than the logical cores (0-31 in the case of the 9950X3D) and I have no way to query the OS to find out where the extra L3 cache is?

The additional information is appreciated, for sure, and I can say I'm partially validated in my impression, but it was more AMD thinking it's not needed (yet) on the consumer side and not so much Microsoft sucking donkey marbles as usual.

Regards.
 
I know, because I went from the 5900X to the 9950X3D and the 5900X never had issues, even in VR. The achilless heel of the 9950X3D is VR games and it's a pain. Symmetric CCDs would make the problem go away entirely.

Regards.
Have you tried using Lasso?

I can't say it the best of UIs, but at least it gives you some control over which chiplsets your game is allowed to use, to avoid cross CCD context switches.

I'm not sure it will fix VR games, because it may not be those context switches that are the root of the problem.

E.g. with FS2020 and FS2024 IMHO it's the engine that's broken so badly, I've given up on M$ getting it right.

I believe VR is the only way to fly, but M$ doesn't seem to care.
 
  • Like
Reactions: -Fran-
I figured this would happen the moment I gave up and just bought a 9950X3D, and would you look at that - I was right. I waited since they first introduced the X3D processors for this, and they just had to do it now.

Damn it.