News AMD's Ryzen 9000 won't beat the previous-gen X3D models in gaming, but they'll be close — improved 3D V-Cache coming, too

It is easier to try to extrapolate the performance metrics with AMD IPC gaming chart.

  • From the Horizon benchmarks, the 9950x would be 14% faster than the 7800x3D...
  • From the F1 2023 benchmarks, the 9950x would be 16% faster than the 7800x3D...
  • Surprising results, while for DOTA 2 the 9950x would be 11% slower than the 7800x3D...
 
  • Like
Reactions: artk2219

Cooe

Prominent
Mar 5, 2023
27
23
535
"Woligroski points to a slimmer margin between the X3D and non-X3D chips this time around"

🤦 He's referring to next-gen non-X3D vs last-gen X3D CPU's ala R7 9700X vs R7 7800X3D here, NOT current gen non-X3D vs current gen X3D ala R7 9700X vs R7 9700X3D! 😑

He's basically just saying that the gaming performance gap between the R7 9700X will be smaller vs the R7 7800X3D than the R7 7800X was vs the R7 5800X3D. Aka instead of an -8% gap like you saw last time, maybe it's more like ≈-0-5%. In fact, I totally expect standard Zen 5 to come out on top in many people's reviews simply based on differences in game selection.

As far as the gaming performance gains with Zen 5 X3D vs regular Zen 5 goes, I'm expecting even LARGER performance gains than what we saw last time thanks to the upcoming new 3rd Gen 3D V Cache along with whatever "new surprises" they've been very loudly hinting at. Each generation of 3D V Cache has been more performant than the last, and I fully expect this generation will be no different. 🤷

I also expect the frequency regression gap between X3D vs non-X3D this generation to be the smallest we've yet seen. Small enough to put 3D V Cache on BOTH CCD's for the Ryzen 9 parts? Probably not, but it should still close the gap by a notable amount.
 
Last edited:

Cooe

Prominent
Mar 5, 2023
27
23
535
"Moving to a newer process node, like 6nm or maybe even 5nm, could enable AMD to cram in even more L3 cache capacity."

🤦 Come on Paul!!! You've been in the game WAAAAAAY too long to not be better than this semiconductor technology illiterate garbage! SRAM (aka cache) transistor density scaling absolutely hit a BRICK FREAKING WALL at 7nm! Only logic transistors are still shrinking anymore past that point!!!

Aka, going to 5nm would do practically NOTHING to increase cache capacity OR reduce die size, but would still make the chip about 2x more expensive! 6nm (aka refined 7nm w/ more EUV layers) might make sense strictly for power efficiency gains, but that's just about the limit of what you can manufacture a die of pure SRAM cache on without EXTREME levels of waste.

AMD's X3D dies are likely to stay 7nm or at best 6nm for basically the foreseeable future unless TSMC or Samsung have a MASSIVE breakthrough on pushing SRAM transistor scaling forward again that as of right now looks to be nothing but a wishful pipe dream... 🤷
 
Last edited:
  • Like
Reactions: 35below0

PaulAlcorn

Managing Editor: News and Emerging Technology
Editor
Feb 24, 2015
876
394
19,360
"Moving to a newer process node, like 6nm or maybe even 5nm, could enable AMD to cram in even more L3 cache capacity."

🤦 Come on Paul!!! You've been in the game WAAAAAAY too long to not be better than this semiconductor technology illiterate garbage! SRAM (aka cache) transistor density scaling absolutely hit a BRICK FREAKING WALL at 7nm! Only logic transistors are still shrinking anymore past that point!!!

Aka, going to 5nm would do practically NOTHING to increase cache capacity OR reduce die size, but would still make the chip about 2x more expensive! 6nm (aka refined 7nm w/ more EUV layers) might make sense strictly for power efficiency gains, but that's just about the limit of what you can manufacture a die of pure SRAM cache on without EXTREME levels of waste.

AMD's X3D dies are likely to stay 7nm or at best 6nm for basically the foreseeable future unless TSMC or Samsung have a MASSIVE breakthrough on pushing SRAM transistor scaling forward again that as of right now looks to be nothing but a wishful pipe dream... 🤷
Incorrect. TSMC SRAM scaling hits the wall at the transition from 5nm to 3nm, with the caveat that this occurs with its standard libraries. That means moving from 7nm to 6nm or 5nm would still make sense.

Besides, we aren't talking about a standard node here. AMD uses a specialized density-optimized 7nm TSMC node for the SRAM die, which, in fact, makes it significantly denser than the 5nm die that it's placed atop. I would expect AMD to take a similar density-optimized approach in future iterations. TSMC N6 has an 18% increase in logic density over DUV 7nm, so it's also logical to expect density gains for an optimized SRAM process.

As an aside, this SRAM scaling wall only makes it all the more attractive to put SRAM on an older node and preserve die area on the smaller process node.
 
Last edited:

Cooe

Prominent
Mar 5, 2023
27
23
535
Incorrect. TSMC SRAM scaling hits the wall at the transition from 5nm to 3nm. That means moving from 7nm to 6nm or 5nm would make plenty of sense.

Besides, we aren't talking about a standard node here. AMD uses a specialized density-optimized 7nm TSMC node for the SRAM die, which, in fact, makes it significantly denser than the 5nm die that it's placed atop. I would expect AMD to take a similar approach in future iterations. Additionally, TSMC N6 has an 18% increase in logic density over DUV 7nm, so it's also logical to expect density gains for an optimized SRAM process.

As an aside, this SRAM scaling wall only makes it all the more attractive to put SRAM on an older node and preserve die area on the smaller process node.
Lol this is just utterly irrelevant semantics. 5nm's SRAM scaling improvement over 7nm is literally like ≈+10%... For double the cost at a MINIMUM! (While 3nm vs 5nm has a mere ≈+5% improvement. 🤦). Aka the brick wall has already been hit, and that is simply not debatable. If AMD moves to 6nm this generation it'll be strictly for thermal/clock-speed reasons, NOT density or die size.

5nm's MINISCULE density gains for SRAM simply don't justify the HUMONGOUS cost increase for a die of pure SRAM, or at least not for a couple more years until 5nm gets non-crowded and relatively affordable.

So sure, 5nm 3D V Cache chips might happen EVENTUALLY, but I wouldn't expect it until around Zen 7. Otherwise it just simply does not math. And even when 5nm DOES become relatively affordable it still only just BARELY maths! A mere +10% increase to either cache capacity or reduction in die size simply doesn't move the needle.

AMD would essentially need to go from 7/6nm all the way down to 3nm to have an even SOMEWHAT notable improvement in SRAM cache density, but that would still only be a whopping..... +15% gain lol. 🤷 For like +4-5x the price... Compare that to pre-7nm nodes which were increasing SRAM density by at least >≈+30-50% every, single, generation.

AMD's basically stuck with <=64MB L3 cache chips made on 7nm family nodes for the foreseeable future. 5nm might happen eventually when it's not so stinking expensive, but it won't even come CLOSE to being enough of an change to actually increase the cache capacity. Not even 3nm would make a >64MB chip possible. Due to how AMD's L3 is laid out you'd need to hit 92MB which is simply a bridge MUCH too far for even 3nm's mere ≈+15% SRAM density gain over 7nm without a SIGNIFICANT die size increase.
 
Last edited:

rluker5

Distinguished
Jun 23, 2014
911
594
19,760
It is good AMD is clearing this up ahead of time.
A lot probably suspected this was the case as it also was last time. The new Ryzens are definitely improvements and can be portrayed as such instead of falling short of expectations now.
 
Zen 5 not beating Zen 4 X3D shouldn't be particularly surprising given that their benchmark testing against the 14900K had been with the performance power profile which does have an impact on gaming performance in some games. Zen 4 X3D is around 16% faster than Zen 4 and around 5% faster than 14900K which means I wouldn't be surprised if Zen 5 was around 14900KS in gaming performance.

Zen 5's multithreaded performance is what seems like will shine this generation. Between the higher IPC and efficiency improvements it's looking very good.
 

PaulAlcorn

Managing Editor: News and Emerging Technology
Editor
Feb 24, 2015
876
394
19,360
Why not just put the cache die under the CPU die? Solve the heat problem that way?
Power and signals are routed from the PCB to the bottom of the chip, which would cause issues passing through the cache die. Backside power delivery (wherein they move this circuitry to the other side of the transistors) could fix this issue, but that isn't coming until future TSMC nodes (A16 is the first iirc).
 

usertests

Distinguished
Mar 8, 2013
966
855
19,760
What does everyone think the 9000X3D improvements will look like?

I hope they do a 24-core Zen 5 X3D + Zen 5c, or at least not make a single-CCD V-Cache model like 7950X3D/7900X3D.

TSMC always had the ability to do multiple layers of cache. Maybe it's time to pull that out and differentiate the top chips from the 6/8-core X3D. Kind of like how Intel gives you progressively more L3 cache as you move up from the bottom to the flagship. So if they make a 9950X3D with cache only on 1 CCD, it could have 160 MiB instead of 96 MiB. Then keep the 9800X3D with 96 MiB. Suddenly the 8-core doesn't game as well as the 16-core, barring scheduling issues.

Lol this is just utterly irrelevant semantics. 5nm's SRAM scaling improvement over 7nm is literally like ≈+10%... For double the cost at a MINIMUM! (While 3nm vs 5nm has a mere ≈+5% improvement. 🤦). Aka the brick wall has already been hit, and that is simply not debatable. If AMD moves to 6nm this generation it'll be strictly for thermal/clock-speed reasons, NOT density or die size.
I think TSMC wants to transition most N7 production to N6, and it gets mild efficiency gains that weren't touted (they only talked about +18% density for logic), so I could see that node being used and resulting in some measurable benefit. 1st-gen to 2nd-gen V-Cache increased bandwidth by 25% (2.0 to 2.5 Tb/s).

Everyone made a big deal about the SRAM brick wall at 5nm, but I think it's possible that it starts scaling some more only with a post-FinFET technology. Obviously there's GAAFETs at TSMC N2, but maybe something beyond that would be better. But I agree that a mature node should be used. In fact, maybe ALL L3 cache should be moved off of core dies and onto cache chiplets. Could be part of the secret sauce for Zen 6 or later.
 
  • Like
Reactions: bit_user

umeng2002_2

Respectable
Jan 10, 2022
273
249
2,070
When AMD posted the 5800X3D benchmarks a few years back, I knew this would be the new trend.

As Intel benefited 15 years ago, cache performance is king in games at least. Have it lighting fast or compensate for lower speed with massive amounts of it.
 

phxrider

Distinguished
Oct 10, 2013
108
54
18,670
Sooooo, the tests I've seen put the 14900K and 7800x3d pretty neck and neck in games with only a very slight edge going to AMD in the aggregate, so does this mean the 9000x fails to beat Intel's current CPU at gaming?

If it comes within a few % with vastly lower power consumption, it will still be a feat, but I was hoping these would at least beat the 14th gen.... I guess the real question is though, how much will any of it matter other than benchmarking games at low resolutions and low graphic detail with extremely powerful GPUs, which other than competitive e-sports players, no one actually does in the real world....
 

rluker5

Distinguished
Jun 23, 2014
911
594
19,760
Sooooo, the tests I've seen put the 14900K and 7800x3d pretty neck and neck in games with only a very slight edge going to AMD in the aggregate, so does this mean the 9000x fails to beat Intel's current CPU at gaming?

If it comes within a few % with vastly lower power consumption, it will still be a feat, but I was hoping these would at least beat the 14th gen.... I guess the real question is though, how much will any of it matter other than benchmarking games at low resolutions and low graphic detail with extremely powerful GPUs, which other than competitive e-sports players, no one actually does in the real world....
You also have to remember that the vastly higher power consumption Intel has in gaming is only at stock, CPU limited scenarios that aren't at all realistic. Since Intel also has lower power consumption at very low use, the additional power consumption in a typical gaming scenario is probably somewhere between significant at higher framerates to insignificant at 4k60.

Edit: lucky comment number
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,630
502
2,060
You also have to remember that the vastly higher power consumption Intel has in gaming is only at stock, CPU limited scenarios that aren't at all realistic. Since Intel also has lower power consumption at very low use, the additional power consumption in a typical gaming scenario is probably somewhere between significant at higher framerates to insignificant at 4k60.

Edit: lucky comment number
Well not really true, unless you manually do something in the bios no, a CPU like the 14900k is a power hog in games. It just is. Yes even at 4k. Yes obviously it's fixable by spending 3 minutes in the bios, but if you don't, the 14900k I'd argue is almost unusable for games.
 
  • Like
Reactions: bit_user
Well not really true, unless you manually do something in the bios no, a CPU like the 14900k is a power hog in games. It just is. Yes even at 4k. Yes obviously it's fixable by spending 3 minutes in the bios, but if you don't, the 14900k I'd argue is almost unusable for games.
At unlimited the 14900k uses about 145W compared to the 89w of the 7950x , and while that is a high percentage difference we have to keep in mind that this is the power draw of the CPU with a 4090 that itself can use like 500W ,so 50W more form the CPU isn't going to be the thing that will make the difference there.
Also the 14900k is about 12% faster for that 50w (720p )
https://www.techpowerup.com/review/intel-core-i9-14900k/22.html
 

TheHerald

Respectable
BANNED
Feb 15, 2024
1,630
502
2,060
At unlimited the 14900k uses about 145W compared to the 89w of the 7950x , and while that is a high percentage difference we have to keep in mind that this is the power draw of the CPU with a 4090 that itself can use like 500W ,so 50W more form the CPU isn't going to be the thing that will make the difference there.
Also the 14900k is about 12% faster for that 50w (720p )
https://www.techpowerup.com/review/intel-core-i9-14900k/22.html
And some games it hits 200 watts. I know, I have the damn thing
 

bit_user

Titan
Ambassador
🤦 Come on Paul!!! You've been in the game WAAAAAAY too long to not be better than this semiconductor technology illiterate garbage! SRAM (aka cache) transistor density scaling absolutely hit a BRICK FREAKING WALL at 7nm! Only logic transistors are still shrinking anymore past that point!!!
Instead casing aspersions and using lots of of ALLLL CAAAPS and !!!'s please try to make your points land by citing high-quality sources. Especially when addressing authors & editors with whom we have the infrequent privilege of interacting with, on the forums.

In fact, for the amount of very specific claims and sweeping proclamations you make, not to cite even a single source is really poor form.

Aka, going to 5nm would do practically NOTHING to increase cache capacity OR reduce die size,
According to this graph from Wikichip, TSMC N5 has 28.6% more dense SRAM than N7.

sram-density-tsmc-n3b-n3e.png


Source: https://fuse.wikichip.org/news/7343/iedm-2022-did-we-just-witness-the-death-of-sram/

but would still make the chip about 2x more expensive!
Where are you getting this pricing data and how do you know it's not already out of date?

6nm (aka refined 7nm w/ more EUV layers) might make sense strictly for power efficiency gains, but that's just about the limit of what you can manufacture a die of pure SRAM cache on without EXTREME levels of waste.
Well, that's good because it's the process node that AMD used for the MCD chiplets in RDNA3, which launched already 18 months ago.

AMD's X3D dies are likely to stay 7nm or at best 6nm for basically the foreseeable future unless TSMC or Samsung have a MASSIVE breakthrough on pushing SRAM transistor scaling forward again that as of right now looks to be nothing but a wishful pipe dream... 🤷
Not sure if/when this is on TSMC's roadmap, but here's a 2017 article from IMEC about such a method.
 
Last edited:

bit_user

Titan
Ambassador
maybe ALL L3 cache should be moved off of core dies and onto cache chiplets.
Somewhere, I read that AMD could only stack cache on top of cache, not active logic. So, that would argue against the idea of putting all your cache on another chiplet.

To my untrained mind, the more credible of your ideas was that the cache die could perhaps have 2 layers of SRAM cells. If they could get that to work, then it's an easy path to increased density.

As Intel benefited 15 years ago, cache performance is king in games at least. Have it lighting fast or compensate for lower speed with massive amounts of it.
It doesn't always work out like that. Whether larger cache capacity provides a benefit really depends on the size of your working set.

Phoronix did test various X3D CPUs with various workloads and the benefit provided did vary quite a bit. The same is true of Intel's Xeon Max CPUs, which have the on-package HBM. It didn't always benefit the same benchmarks that were helped by X3D and vice versa.
 
  • Like
Reactions: thestryker

usertests

Distinguished
Mar 8, 2013
966
855
19,760
Somewhere, I read that AMD could only stack cache on top of cache, not active logic. So, that would argue against the idea of putting all your cache on another chiplet.

To my untrained mind, the more credible of your ideas was that the cache die could perhaps have 2 layers of SRAM cells. If they could get that to work, then it's an easy path to increased density.
Probably true for the current design, at least, so it wouldn't be applicable to Zen 5. But I think I heard something somewhere that suggested that cache could go under logic in the future. I can't find a link for this. Either way, Zen 6 should finally change the design from what we've seen since Zen 2.

As for 2+ layers, we've known it's possible for 3 years now:
https://www.anandtech.com/show/1672...acked-vcache-technology-2-tbsec-for-15-gaming
The V-Cache is a single 64 MB die, and is relatively denser than the normal L3 because it uses SRAM-optimized libraries of TSMC's 7nm process, AMD knows that TSMC can do multiple stacked dies, however AMD is only talking about a 1-High stack at this time which it will bring to market.
I think this would be a simple way to make a 16-core flagship more attractive to gamers, who would previously look at reviews and see that a 5600X/5800X performs virtually identically to a 5950X due to having the same L3 cache per CCD, or the 7800X3D matching or even beating the 7950X3D (no scheduling trouble with the lone 8-core CCD, even if driver updates cover it up over time).
 
  • Like
Reactions: thestryker

usertests

Distinguished
Mar 8, 2013
966
855
19,760
Oh, I thought you were talking about multiple layers of SRAM cells within a single die. There's been talk of 3D DRAM (similar to the 3D revolution in NAND), so I have to wonder whether it would not carry over to SRAM, also.
I guess it's a similar situation to HBM, which is now reaching 12-16 stacked dies using TSVs. 5800X3D et al. use a single die with TSVs. The TSV connection areas were present since Zen 2 in anticipation of the technology being ready.

Stacking another 64 MiB will add to the cost but if it's for a $700+ flagship CPU to bolster sales of that CPU, it could be a good idea.

Still waiting on that missing-in-action Samsung X-Cube, which was described as stacking SRAM on logic btw:
The X-Cube test chip built on 7nm uses TSV technology to stack SRAM on top of a logic die, freeing up space to pack more memory into a smaller footprint.
 

bit_user

Titan
Ambassador
I guess it's a similar situation to HBM, which is now reaching 12-16 stacked dies using TSVs. 5800X3D et al. use a single die with TSVs.
Again, I'm not talking about multi-die stacks. More than a decade ago, Samsung pioneered what they called V-NAND, which had multiple layers of cells within a single die!

So, the idea would be that your SRAM die has multiple layers of SRAM cells within it. That way, you don't need to stack multiple of them or waste area on TSVs.
 
  • Like
Reactions: thestryker