News AMD's Ryzen 9000 won't beat the previous-gen X3D models in gaming, but they'll be close — improved 3D V-Cache coming, too

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Somewhere, I read that AMD could only stack cache on top of cache, not active logic. So, that would argue against the idea of putting all your cache on another chiplet.
Right now cost with TSMC would be way too high to say directly connect two CCDs with a chiplet of just SRAM. This is the sort of thing an EMIB like technology would be perfect for, but I'm not sure TSMC's has been fully validated yet. I've been pretty surprised that Intel hasn't gone that route, now that tiles are in play (not to mention GAA/BSPDN for 20A+), to compete with AMD's V-Cache SKUs. Every once in a while I go read Ian Cutress' 2020 Broadwell retrospective and just shake my head that Intel hasn't gone back there.
 
RDNA3 would like a word with you. They got like 950 MB/s per MCD, IIRC.
Bandwidth doesn't mean anything if the cost to implement and/or added latency don't make sense. Seeing as there were rumors about Zen 4 using one of TSMC's InFO/CoWoS packaging technologies and we're getting Zen 5 still without it that should say everything. I'm guessing it's the economics that don't make sense seeing as Apple's Ultra SoCs don't appear to have a notable latency penalty.
 

bit_user

Titan
Ambassador
Apple's Ultra SoCs don't appear to have a notable latency penalty.
Well, I thought we were talking about a cache die. Do the M Ultra series use cache on the other die, or just their own? RDNA3 put infinity cache on the MCDs, so I would presume the latency isn't untenable. Then again, GPUs are more latency-tolerant than CPUs.
 

rluker5

Distinguished
Jun 23, 2014
863
561
19,760
This admission has got me suspecting something. contributing thoughts are:
3D helps more in some games and IPC in others.
Zen 5's IPC doesn't look too different than RPL.
AMD used an apparently gimped setup on RPL for comparison.

What if there are a lot of the same games that Zen5 would beat Zen43D on, that a properly run RPL (pre "Intel default" settings) still beats Zen5 on so AMD doesn't want a comparison like that by independent reviewers, or users at large. And would rather focus on the games helped by 3D cache where the 7800X3D still beats everything? Game selection matters in comparisons.

Just conjecture at this point and reviews will obviously tell more.

I'm also curious how much Zen 4 users get from their crazy long motherboard ram tuning? I see 10ns left on the table from my 1-2 minute Intel mobo ram tuning vs XMP. About 65ns XMP to under 55ns tuned to be specific. Maybe a more exhaustive motherboard training session helps more with latency?
 
Well, I thought we were talking about a cache die.
Well I'm referring to the original idea of moving to a cache chiplet that both CCDs could access. You'd need the latency to be as close to on die as possible so if you were using a high speed/low latency interconnect you might as well remove the existing CCD to CCD latency as well.
Do the M Ultra series use cache on the other die, or just their own?
As far as I'm aware the M Ultras are like SPR/EMR and act like a monolithic chip.
RDNA3 put infinity cache on the MCDs, so I would presume the latency isn't untenable. Then again, GPUs are more latency-tolerant than CPUs.
It does add latency, but I'm pretty sure they're using one of the TSMC InFO packaging methods so it's rather low (I'd looked last night and it seems like they never officially said what it usss). As you say though GPUs are more tolerant about the added latency though.
What if there are a lot of the same games that Zen5 would beat Zen43D on, that a properly run RPL (pre "Intel default" settings) still beats Zen5 on so AMD doesn't want a comparison like that by independent reviewers, or users at large. And would rather focus on the games helped by 3D cache where the 7800X3D still beats everything? Game selection matters in comparisons.

Just conjecture at this point and reviews will obviously tell more.
Titles that Intel wins on the regular Zen 4 chips don't do very well on either so my guess that would likely be an architectural design disadvantage. Zen 5 doesn't seem to be a significant deviation so unless it's caused by L1/L2 cache predominantly I'm not sure that will change. Some of the games AMD tested in their slides against Intel were games they already did better at as well reinforcing game selection mattering.
 
Jun 14, 2024
6
2
15
I remember the earlier announcements of ZEN 5. AMD said that ZEN 5 would be the most revolutionary architecture since ZEN 1, AMD said that it would offer the biggest performance leap since ZEN 1 and that it was not an architecture update as previous versions of ZEN did, but a completely rebuilt new architecture :)

So 16% IPC plus higher efficiency of N4P transistors does not seem to be the biggest performance jump in the history of ZEN :)

An article from 2023 announced 22-30% higher IPC :)

https://t.ly/zhVwP (AMD Zen 5 architecture leak reveals 22-30% IPC gain as well as a much bigger L1, unified L2, and a possible shared L4 cache for APUs)
 

bit_user

Titan
Ambassador
I remember the earlier announcements of ZEN 5. AMD said that ZEN 5 would be the most revolutionary architecture since ZEN 1, AMD said that it would offer the biggest performance leap since ZEN 1 and that it was not an architecture update as previous versions of ZEN did, but a completely rebuilt new architecture :)
Maybe a close look at the details will show their efforts in a better light. Also, consider that it could be setting the stage for developments they've got in store for Zen 6 and beyond, which might not have been very feasible with the old architecture.

So 16% IPC plus higher efficiency of N4P transistors does not seem to be the biggest performance jump in the history of ZEN :)
We've become spoiled, I fear. For a decade, we thought a 5-7% IPC increase was normal. Now, we think 15-20% is normal.

Furthermore, with the generational improvements in process nodes tapering off, it seems like the situation really should be the other way around!

An article from 2023 announced 22-30% higher IPC :)
Perhaps that was comparing against Zen 3?

Anyway, don't pay too much heed to leakers. They have an incentive to sensationalize, which just leads to disappointment like you're feeling now.
 
Jun 14, 2024
6
2
15
Maybe a close look at the details will show their efforts in a better light. Also, consider that it could be setting the stage for developments they've got in store for Zen 6 and beyond, which might not have been very feasible with the old architecture.


We've become spoiled, I fear. For a decade, we thought a 5-7% IPC increase was normal. Now, we think 15-20% is normal.

Furthermore, with the generational improvements in process nodes tapering off, it seems like the situation really should be the other way around!


Perhaps that was comparing against Zen 3?

Anyway, don't pay too much heed to leakers. They have an incentive to sensationalize, which just leads to disappointment like you're feeling now.


Do you remember the news about 40% and 46% MT and 41% ST (not IPC but Core to Core ZEN 4 vs. ZEN 5) that circulated around the world a few weeks ago? :)

See the number of improvements in ZEN 5 (dark article also from a few weeks ago) summing up all the advantages of each of these improvements and changes, these increases should not be surprising, and 16% should be surprising :).

But I'm calm because AMD has always announced very mild, safe performance increases with each ZEN premiere, and the reviews always turned out to be more :) Plus, from the Ages version to the AGes version, the performance increases. For example, today the 7950X with the latest AGESA has higher performance than the 7950X during the premiere and first reviews :) the same will happen with ZEN 5

BVPOWue.png


zw21XJu.png


2i00EXf.png
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
Thanks for following up with details!

I think it's most likely (one of) the SPECfp benchmarks, as that would best exploit the wider and lower-latency AVX-512 implementation.

Note that SPECfp is an average of 10 or 12 different benchmarks. Also, there are different ways to run the SPEC2017 benchmarks, and we ought to know whether this was a single-core run or a full SPEC_rate run. Yet more reasons not to trust leaks - key details are often missing.