News AMD dishes more Zen 5 details — Compact core is 25% smaller than the normal core, new SoC and chip architecture with dual CCXs

edzieba

Distinguished
Jul 13, 2016
540
535
19,760
For comparison, Intel's 'E-cores' are 25% of the die area of the 'P-cores', or a 75% reduction in size.
Or put another way: 4x Zen 5C cores takes up the same area as 3x Zen 5 cores, and 4x E-cores takes up the same die area as 1x P-core.
 
  • Like
Reactions: artk2219
For comparison, Intel's 'E-cores' are 25% of the die area of the 'P-cores', or a 75% reduction in size.
Or put another way: 4x Zen 5C cores takes up the same area as 3x Zen 5 cores, and 4x E-cores takes up the same die area as 1x P-core.
True. However, instruction per cycle for E-core is very low, and it isn't multithreaded. So, the number of instructions executed per die area on E-cores isn't good when compared with P-cores, while this is not true of Zen4C vs Zen 4, or Zen5C vs Zen5.
 
  • Like
Reactions: artk2219

edzieba

Distinguished
Jul 13, 2016
540
535
19,760
True. However, instruction per cycle for E-core is very low, and it isn't multithreaded. So, the number of instructions executed per die area on E-cores isn't good when compared with P-cores, while this is not true of Zen4C vs Zen 4, or Zen5C vs Zen5.
TPU's testing of 8x E-cores vs. 8x P-cores( both limited to 3.9GHz) show IPC of the P-cores being about 150% of the E-cores. That puts the E-cores at about 2.6x the IPC per-unit-die-area of the P-cores at iso-frequency. Of course, a lot of area in the P-cores is taken by by parallel-ganged transistors to allow the P-cores to not operate at iso-frequency but boost to 6GHz.
AMD have claimed in the past that Zen 4 and Zen 4C have the exact same IPC, but I'm not aware of any benchmarking that independantly verifies that. Assuming Zen 5 and Zen 5C also have the same IPC, then that would mean 1.3x IPC per-unit-die-area increase.
 

TheSecondPower

Distinguished
Nov 6, 2013
75
66
18,610
A lot of things will be different this generation with regard to e-cores and c-cores. Last-gen, Phoenix 2 had 2 standard cores and 4 c-cores on a shared L3 cache, but Strix Point now gives a different and smaller L3 cache to the c-cores, so the IPC will be lower for the c-cores. And Intel's new Skymont e-core is a great deal faster than their old Crestmont e-core which in turn is faster than Gracemont that Raptor Lake uses.

Assuming Arrow Lake is like Meteor Lake, Strix Point (4+8, two L3 blocks) will complete against a 2+6+8 design, where the 6+8 will share a cache. AMD will have more threads but Intel will have more big cores, more cores, and unified cache. I expect they'll be close, and this time users won't notice the difference if task runs on a smaller core for a moment.
 
  • Like
Reactions: artk2219

bit_user

Titan
Ambassador
The article said:
The die has two CCXs (Core Complexes — core clusters on the same die), much like we saw in older AMD Zen 2 chips. Both core types have their own private L1 and L2 caches, but the 24MB of L3 cache is split into an 8MB slice for the standard cores and a 16MB slice for the Zen 5c compact cores.
The slide shows it the other way around: 16 MB for the Zen 5 cores and 8 MB for the Zen 5C cores.

HEfNnKHLc98bQEDwJRxs3a-970-80.jpg.webp


The next paragraph said:
the four full-sized performance cores have 4MB of L3 apiece to satisfy low-latency and bursty workloads. In contrast, the eight compact cores have a mere 1MB of L3 apiece for the low-utilization high-residency workloads.
Okay, now you've got it!

Unlike prior generation APUs, this gives the Zen 5 cores the same L3 per core as their desktop cousins, but leaves the Zen 5C cores to really suffer.

The article said:
Unlike Intel's approach, both Zen 5 core types support SMT and the same instruction set (ISA), avoiding the scheduling concerns that Intel faces with its dissimilar core types — Intel's core types don't support the same ISA.
The whole reason Intel disabled AVX-512 in Alder Lake was specifically so that the different core types would have ISA symmetry!

The article said:
AMD's approach also differs from Intel's because it prioritizes keeping the performance of the Zen 5c cores as close to the standard cores as possible during multi-core workloads. This prevents situations where the larger cores are waiting on smaller cores to complete workloads, which is important for situations like multi-core workloads with thread dependencies. This sidesteps what Mike Clark, Zen's lead architect, calls a 'scheduling cliff,'
This is probably also why Intel has been focusing so much on closing the performance gap between E-cores and P-cores. I don't recall them saying how close they're getting, but with Skymont's massive improvements, it's going to be a lot narrower in Lunar Lake & Arrow Lake.
 
Last edited:

bit_user

Titan
Ambassador
For comparison, Intel's 'E-cores' are 25% of the die area of the 'P-cores', or a 75% reduction in size.
Or put another way: 4x Zen 5C cores takes up the same area as 3x Zen 5 cores, and 4x E-cores takes up the same die area as 1x P-core.
I think that's what they said, but die shot analysis of Alder Lake-S showed each Gracemont core is really more like 29% of a Golden Cove. That might seem like splitting hairs.

BTW, AMD had previously given a figure of Zen 4C occupying half the area of Zen 4, but that was only true when you factored in the size of their L3 cache slices. It'll be interesting to see what the ratio is between the full Zen 5 and Zen 5C CCDs for EPYC.

instruction per cycle for E-core is very low, and it isn't multithreaded. So, the number of instructions executed per die area on E-cores isn't good when compared with P-cores
PPA (performance per area) of E-cores is about twice that of Intel's P-cores! They're actually more area-efficient than they are energy-efficient, so long as you keep P-cores' clocks low. The main reason for Intel using E-cores on their desktop CPUs was to boost multithreaded performance per $ (as well as perf/W).
 

bit_user

Titan
Ambassador
Assuming Arrow Lake is like Meteor Lake, Strix Point (4+8, two L3 blocks) will complete against a 2+6+8 design,
I doubt it. I think Arrow Lake just going to be used for the HX laptop line, where they basically take a desktop 8P + 16E die and put it in a BGA package.

Remember, Lunar Lake is coming first. That will feature 4P + 4E. I forget if there are any other die configurations, but I think the number of P-cores tops out at 4.

AMD will have more threads but Intel will have more big cores, more cores, and unified cache. I expect they'll be close, and this time users won't notice the difference if task runs on a smaller core for a moment.
AMD will probably also repeat what they did with Zen 4, which was to repackage their chiplet-based desktop CPUs for the high-end laptop segment.
 
  • Like
Reactions: thestryker

usertests

Distinguished
Mar 8, 2013
777
709
19,760
I look forward to seeing how well scheduling goes with Strix Point.

If a game or application needs 8 cores, you're not finding them all in the fast CCX.

The differing amounts of cache brings to mind 7950X3D/7900X3D, but it should be much easier to figure out since the faster cores also have more L3 cache. But I could still see some weirdness from having the 16/8 split.

I don't think it will be too bad, but it's undeniably more complex than Cezanne, Rembrandt, and Phoenix/Hawk.

Clark said the Zen 5 core could be compacted even further for compact-core-only (homogenous) designs with different performance targets (for reference, Bergamo only has compact cores), but this design meets the targets for this specific heterogenous design. So, it's possible we'll see even denser Zen 5c core designs emerge with other products.
Also, Strix Point and its Zen 5c cores are on N4P. For Turin Dense, those (up to 192) Zen 5c cores will be on an N3 node.

It will be interesting to see APUs with only Zen 5c cores. I suspect a Steam Deck-like handheld could get away with that (Zen 2 cores in Steam Deck don't clock high). "Sonoma Valley" is the Mendocino successor with only Zen 5c cores. I hope they make something like that at Samsung.
 
Last edited:

HideOut

Distinguished
Dec 24, 2005
588
95
19,070
I get the older graphics. They will release an updated system based upon the laptop architecture for folks wanting a more powerful desktop graphics setup. What I do not get is still only DDR5 5600 though. Thats constrains this pretty bad, and will age even worse.
 
  • Like
Reactions: jp7189 and bit_user

usertests

Distinguished
Mar 8, 2013
777
709
19,760
Strix Point now gives a different and smaller L3 cache to the c-cores, so the IPC will be lower for the c-cores.
Not when the whole cache isn't needed. :)

Does 7800X3D have more IPC than the 7700X? :O

And Intel's new Skymont e-core is a great deal faster than their old Crestmont e-core which in turn is faster than Gracemont that Raptor Lake uses.
Who else wants to see Alder Lake-N but with Skymont?

It'll be interesting to see what the ratio is between the full Zen 5 and Zen 5C CCDs for EPYC.
Will 128-core Zen 5 Turin use TSMC N4P (shared with desktop) while 192-core Zen 5 Turin Dense uses TSMC N3_? Not a fair comparison anymore. I edited the comment you liked above.

I get the older graphics. They will release an updated system based upon the laptop architecture for folks wanting a more powerful desktop graphics setup. What I do not get is still only DDR5 5600 though. Thats constrains this pretty bad, and will age even worse.
It's the "honest" DDR5 speed. They can obviously run DDR5 much higher, but with less consistency. Per a discussion between Moore's Law is Dead (Tom) and Wendell from Level1Techs, the "honest" DDR5 speed for Raptor Lake is actually something like DDR5-4400. I am paraphrasing here, but yeah.

If this is a Strix Point specific concern, the higher DDR5 speeds would be reachable if they put it on the AM5 socket so it's using DIMMs, or maybe with CAMM.
 
Last edited:

bit_user

Titan
Ambassador
Not when the whole cache isn't needed. :)

Does 7800X3D have more IPC than the 7700X? :O
You're right - it depends on the workload. For some background tasks and even compute jobs, there's less sensitivity to L3 size.

However, you made a good point that games probably won't fare great on it. Now, if we consider a game that only needs 6 threads, 4 could run on the big core cluster and 2 could run on the C-cluster. Since L3 is shared between all cores in the cluster, that would mean all 6 threads get 4 MB each! That said, I'd bet games are a lot more "spikey" with their thread usage, so even if the average utilization is like 6 cores or less, there are probably points during frame rendering when even more are active.

Who else wants to see Alder Lake-N but with Skymont?
100%. Sadly, we'll probably have to wait until early 2026 for that. I think they won't do it until the 18A node, since I think tiles are too expensive for such a product and the 20A library lacks all the cells you need for things like I/Os.

In the meantime, we might get a Crestmont-based version on Intel 3.

Will 128-core Zen 5 Turin use TSMC N4P (shared with desktop) while 192-core Zen 5 Turin Dense uses TSMC N3_? Not a fair comparison anymore. I edited the comment you liked above.
Good point.
 
I look forward to seeing how well scheduling goes with Strix Point.

If a game or application needs 8 cores, you're not finding them all in the fast CCX.
Keep in mind on mobile parts the clocks drop a lot faster on heavier workloads which should minimize the impact. At that point it would only be the cache capacity coming into play which is hit and miss as to how important it is. AMD ahd said 2 Z4/4 Z4C in Hawk/Phoenix were no different than 6 Z4 performance wise though I'm pretty sure these parts all shared cache.

I think more in depth CPU performance than normal should definitely be on the table for anything using these parts.
 

bit_user

Titan
Ambassador
Yes, at least in games. https://www.tomshardware.com/reviews/amd-ryzen-7-7800x3d-cpu-review/6
CPUbaseboostTom's geomean fps @ 1080p
7700X4.5GHz5.4GHz173
7800X3D4.2GHz5.0GHz224
Thanks for that, but the benefits of the extra L3 cache are very workload-dependent.

Phoronix did some testing on Genoa, Genoa-X, and Bergamo, which is a very interesting 3-way comparison. Compared to the 9654, the 9684X has both 3x the L3 cache and a little higher base clocks (but same boost freq), while the 9754 has more cores at lower base & boost clocks and much less L3 cache. All use the same platform and have the same memory bandwidth. You can therefore see which workloads favor more of which resource and by how much vs. the baseline Zen 4 CCD.
When reviewing Phoronix benchmarks, it's very important to look at the units of measure, just below the title. Sometimes, a graph will show perf/W or perf/$, instead of raw perf. I recommend looking at the 1P results, just to simplify matters and keep them more analogous to desktop performance (aside from core counts).

In spite of the 9584X seemingly having every benefit over the 9554, the latter still manages some wins. I guess, because maybe the X throttles quicker? One area near and dear to me is compilation, where it seems the 9554 does the best, overall. The 9754 (Bergamo) is always bringing up the rear, suggesting that either compilation is scaling too poorly to more cores or is hurt too much by Zen 4C's smaller L3 cache.
 
  • Like
Reactions: Phaaze88

TheSecondPower

Distinguished
Nov 6, 2013
75
66
18,610
Thanks for that, but the benefits of the extra L3 cache are very workload-dependent.
Yes, and in the Tom's Hardware review of the 7800X3D I didn't see a single win for the 7800X3D outside of games; the 7700X almost always won by as much as its clock speed would suggest. But the 7800X3D was 29% faster in 1080p gaming, even while running at a lower clock speed. But the mobile chips don't have as much cache so I'm not sure how it'll play out.
CPU CoresCore CountL3L3/Core
7700X Zen 4832MB4MB
7800X3D Zen 4896MB12MB
HX 370 Zen 5416MB4MB
HX 370 Zen 5c88MB1MB
 

TheHerald

Prominent
Feb 15, 2024
912
273
760
Does 7800X3D have more IPC than the 7700X? :O
Theoretically, absolutely not. Practically, yes in some cases. The bigger cache removes some bottlenecks the 7700x has on fetching data, is that an increase in IPC? I'd say no but some people disagree.

If you upgrade from a 4070 to a 4090, did you increase your CPU's IPC just because it performs better? Naaah
 

bit_user

Titan
Ambassador
Yes, and in the Tom's Hardware review of the 7800X3D I didn't see a single win for the 7800X3D outside of games; the 7700X almost always won by as much as its clock speed would suggest.
Phoronix did actually test the 7800X3D, which is probably a better point of comparison. Here's how it measured up against the 7700X, in non-gaming benchmarks. You can see quite a few cases where the 3D V-cache helps quite a lot, even though the overall average puts them within the low single-digit % of each other.

ProgramBenchmarkUnits7700X7800X3DSpeedup
LeelaChessZero 0.28Backend: EigenNodes per Second
1539​
1588​
3.2%​
CloverLeafLagrangian-Eulerian HydrodynamicsSeconds
69.44​
39.39​
76.3%​
NAMD 2.14ATPase Simulation - 327,506 Atomsdays/ns
1.48744​
1.61926​
-8.1%​
Pennant 1.0.1Test: leblancbigSeconds
32.55​
27.21​
19.6%​
Xcompact3d Incompact3d 2021-03-11Input: input.i3d 129 Cells Per DirectionSeconds
18.58​
14.17​
31.1%​
Xcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionSeconds
76.88​
67.7​
13.6%​
OpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution TimeSeconds
248.83​
169.69​
46.6%​
OpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution TimeSeconds
2506.41​
2115.66​
18.5%​
SPECFEM3D 4.0Model: Mount St. HelensSeconds
53.31​
48.45​
10.0%​
SPECFEM3D 4.0Model: Layered HalfspaceSeconds
124.95​
129.77​
-3.7%​
LULESH 2.0.3z/s
8017.06​
8846.29​
10.3%​
John The Ripper 2023.03.14Test: bcryptReal C/S
23864​
21844​
-8.5%​
John The Ripper 2023.03.14Test: WPA PSKReal C/S
91048​
82925​
-8.9%​
John The Ripper 2023.03.14Test: HMAC-SHA512Real C/S
1.25E+08​
1.13E+08​
-9.7%​
dav1d 1.1Video Input: Summer Nature 4KFPS
343.08​
351.94​
2.6%​
Embree 4.0.1Binary: Pathtracer ISPC - Model: CrownFPS
17.66​
17.97​
1.8%​
Embree 4.0.1Binary: Pathtracer - Model: Asian DragonFPS
17.86​
17.99​
0.7%​
SVT-AV1 1.4Encoder Mode: Preset 4 - Input: Bosphorus 4KFPS
3.965​
3.588​
-9.5%​
SVT-AV1 1.4Encoder Mode: Preset 12 - Input: Bosphorus 4KFPS
157.56​
153.49​
-2.6%​
OpenVKL 1.3.1Benchmark: vklBenchmark ISPCItems / Sec
230​
236​
2.6%​
OSPRay 2.10Benchmark: particle_volume/ao/real_timeItems / Sec
4.12599​
3.7365​
-9.4%​
OSPRay 2.10Benchmark: particle_volume/scivis/real_timeItems / Sec
4.10773​
3.72127​
-9.4%​
Timed Godot Game Engine Compilation 4.0Time To CompileSeconds
296.48​
298.96​
-0.8%​
Timed Linux Kernel Compilation 6.1Build: defconfigSeconds
77.08​
80.49​
-4.2%​
Timed Linux Kernel Compilation 6.1Build: allmodconfigSeconds
948.63​
987.26​
-3.9%​
oneDNN 3.0Harness: IP Shapes 3D - Data Type: bf16bf16bf16ms
2.36119​
1.20382​
96.1%​
oneDNN 3.0Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16ms
2062.84​
2219.87​
-7.1%​
ClickHouse 22.12.3.5100M Rows Hits Dataset, Third RunQueries Per Minute, Geo Mean
252.6​
272.52​
7.9%​
nginx 1.23.2Connections: 1000Requests Per Second
98019.87​
94007.47​
-4.1%​
Apache HTTP Server 2.4.56Concurrent Requests: 500Requests Per Second
161549.1​
185054.3​
14.5%​
Apache HTTP Server 2.4.56Concurrent Requests: 1000Requests Per Second
134983.4​
185660.7​
37.5%​
ASKAP 1.0Test: tConvolve MPI - GriddingMpix/sec
6787.11​
10495.8​
54.6%​
GROMACS 2023Implementation: MPI CPU - Input: water_GMX50_barens/day
1.547​
1.767​
14.2%​
GIMP 2.10.34Test: resizeSeconds
9.505​
10.412​
-8.7%​
GIMP 2.10.34Test: rotateSeconds
8.874​
9.809​
-9.5%​
GIMP 2.10.34Test: auto-levelsSeconds
9.098​
10.023​
-9.2%​
GPAW 22.1Input: Carbon NanotubeSeconds
191.06​
184.23​
3.7%​
SeleniumBenchmark: Kraken - Browser: Firefoxms
464​
516.1​
-10.1%​
SeleniumBenchmark: Jetstream 2 - Browser: FirefoxScore
207.59​
192.01​
-7.5%​
SeleniumBenchmark: Kraken - Browser: Google Chromems
344.6​
389.9​
-11.6%​
SeleniumBenchmark: PSPDFKit WASM - Browser: FirefoxScore
2799​
2857​
2.1%​
SeleniumBenchmark: Jetstream 2 - Browser: Google ChromeScore
366.44​
329.55​
-10.1%​
SeleniumBenchmark: PSPDFKit WASM - Browser: Google ChromeScore
3121​
2501​
-19.9%​
SeleniumBenchmark: WASM imageConvolute - Browser: Firefoxms
17.7​
20.1​
-11.9%​
SeleniumBenchmark: WASM collisionDetection - Browser: Firefoxms
246​
269.2​
-8.6%​
GeomeanAbove tests
4.0%
GeomeanAll (Phoronix)
80.42​
79.05​
-1.7%

Source: https://www.phoronix.com/review/amd-ryzen-7-7800x3d-linux
 

bit_user

Titan
Ambassador
Theoretically, absolutely not. Practically, yes in some cases. The bigger cache removes some bottlenecks the 7700x has on fetching data, is that an increase in IPC? I'd say no but some people disagree.
Since we're talking about performance of the entire CPU, then IPC should include differences in the cache subsystem. Therefore, yes.

However, one could use microbenchmarks specifically designed to measure core performance, irrespective of L3 or memory bandwidth & latency. Then, you could talk about IPC of the core, itself.

If you upgrade from a 4070 to a 4090, did you increase your CPU's IPC just because it performs better? Naaah
What you're encountering is how application performance is used as an indirect proxy for IPC. It only works if one is careful to isolate the test from external variables like that. If you fail to eliminate such variables, then the test is no longer giving you a valid measure of IPC.
 
I get the older graphics. They will release an updated system based upon the laptop architecture for folks wanting a more powerful desktop graphics setup. What I do not get is still only DDR5 5600 though. Thats constrains this pretty bad, and will age even worse.
If I'm not mistaken, the older graphics take up less room. Also, they're here as backup : you're unlikely to game on them, and as such, proven and stable architecture is better for OEMs - no need to re-spin system images if chipset and GPU drivers remain the same.
If it's for desktop chips, probably because anything faster than DDR5-5600 on "standard" DDR5 DIMM mounts can be rather unstable. To my knowledge, AMD doesn't (yet) support any other RAM mount than DIMM on desktop systems.
 
  • Like
Reactions: bit_user
Since we're talking about performance of the entire CPU, then IPC should include differences in the cache subsystem. Therefore, yes.
I think 3D V-Cache messes with the concept of IPC a bit for X3D parts since when the extra cache doesn't come into play the IPC is identical to the non-X3D parts. I certainly wouldn't make a proclamation that one had higher or lower IPC since it entirely depends on whether or not the extra cache matters for the application at hand.
I get the older graphics. They will release an updated system based upon the laptop architecture for folks wanting a more powerful desktop graphics setup. What I do not get is still only DDR5 5600 though. Thats constrains this pretty bad, and will age even worse.
FWIW the article is wrong Zen 4 is 5200 officially so technically they've moved forward to 5600. That might be a simple availability limitation as I haven't seen any 6000 JEDEC modules and 6400 require CKD so volume likely isn't there.
DDR5 speed for Raptor Lake is actually something like DDR5-4400.
If you're curious about the memory support specs (reinforces to me why 2DPC needs to die):
YP8q0AC.png
 
  • Like
Reactions: usertests

bit_user

Titan
Ambassador
I think 3D V-Cache messes with the concept of IPC a bit for X3D parts since when the extra cache doesn't come into play the IPC is identical to the non-X3D parts. I certainly wouldn't make a proclamation that one had higher or lower IPC since it entirely depends on whether or not the extra cache matters for the application at hand.
News flash: there's no single IPC number for any CPU! I mean... unless you're going all the way back to classic benchmarks like Dhrystone MIPS, there isn't.

What AMD and Intel mean by IPC is that they benchmark a range of different apps at iso-frequency. Then, they basically take the median speedup and tout that as the IPC increase. If the new CPU made improvements to the cache subsystem, as Zen 5 has done, then whichever of those apps are most affected by those improvements will tend to show a bigger improvement than the others. However, cache very much is in play, here! How much it helps is really just app-specific.
 

NinoPino

Respectable
May 26, 2022
409
247
2,060
I get the older graphics. They will release an updated system based upon the laptop architecture for folks wanting a more powerful desktop graphics setup. What I do not get is still only DDR5 5600 though. Thats constrains this pretty bad, and will age even worse.
Agree, but we need to wait for X3D parts to see what is really memory constrained.
Maybe the extra cache solve also this problem, in particular considering that few workloads saturate DDR5-5600 bandwidth.