News AMD dishes more Zen 5 details — Compact core is 25% smaller than the normal core, new SoC and chip architecture with dual CCXs

thestryker · Jul 25, 2024

bit_user said:
News flash: there's no single IPC number for any CPU! I mean... unless you're going all the way back to classic benchmarks like Dhrystone MIPS, there isn't.

What AMD and Intel mean by IPC is that they benchmark a range of different apps at iso-frequency. Then, they basically take the median speedup and tout that as the IPC increase. If the new CPU made improvements to the cache subsystem, as Zen 5 has done, then whichever of those apps are most affected by those improvements will tend to show a bigger improvement than the others. However, cache very much is in play, here! How much it helps is really just app-specific.

I'm well aware of how IPC works, but these are an anomaly in that you have two different CPUs that are architecturally identical, but one has extra cache added on at the cost of clockspeed. That means the IPC is identical in every circumstance that doesn't leverage the extra cache capacity. You could certainly make an argument for saying X3D are higher IPC, but this is also a circumstance where IPC is a completely useless metric.

NinoPino · Jul 25, 2024

usertests said:
Not when the whole cache isn't needed.

You are right.

usertests said:
Does 7800X3D have more IPC than the 7700X? :O

7800X3D have IPC >= of 7700.
AMD said that zen4c have same IPC of zen4 but the extra cache can increase the IPC in same cases and considering that the other parts of thee CPU are identical, the IPC can only be greater or equal in the worst case.

TheHerald said:
Theoretically, absolutely not. Practically, yes in some cases. The bigger cache removes some bottlenecks the 7700x has on fetching data, is that an increase in IPC? I'd say no but some people disagree.

Absolutely yes.
The definition of IPC of the whole CPU says that. If you consider only the core without extra cache the IPC is the same (source AMD).

TheHerald said:
If you upgrade from a 4070 to a 4090, did you increase your CPU's IPC just because it performs better? Naaah

We are talking of the CPU IPC not of the performance of the whole system.
The IPC makes sense only if we consider a single core.

thestryker said:
I think 3D V-Cache messes with the concept of IPC a bit for X3D parts since when the extra cache doesn't come into play the IPC is identical to the non-X3D parts. I certainly wouldn't make a proclamation that one had higher or lower IPC since it entirely depends on whether or not the extra cache matters for the application at hand.

The IPC definition is very simple, who messes with it are peoples.
It depends on caching system and memory speed, not only on the pure core IPC.
In the case of X3D we need to consider the whole L1+L2+L3+memory hierarchy.

bit_user · Jul 25, 2024

thestryker said:
but one has extra cache added on at the cost of clockspeed.

The clockspeed penalty is what makes IPC a little more complicated. The conceit of IPC is that you can just scale up by clockspeed to get final performance. However, if clockspeed is being affected, the you need to remember that even though the X3D is a higher-IPC part, that doesn't necessarily make it faster.

This isn't so strange as it might first seem. Consider that people would compare wildly different CPUs' IPC - such as Apple A-series or M-series SoC vs. x86 - using Geek Bench or SPEC2017. That's how they concluded Apple's M1 had soundly beaten x86 on IPC, although you had to keep in mind that it couldn't clock as high. That doesn't invalidate the concept of IPC - or clock-normalized performance - as it should more properly be called.

thestryker said:
this is also a circumstance where IPC is a completely useless metric.

That's an intrinsic property of being workload-specific.

bit_user · Jul 25, 2024

NinoPino said:
considering that the other parts of thee CPU are identical, the IPC can only be greater or equal in the worst case.

In practice, I expect this will nearly always be true. However, caches always have a pitfall where you can encounter thrashing. A larger cache can potentially make thrashing even more painful.

Another scenario where larger cache could lead to lower clock-normalized performance is when the additional power consumed by the extra cache causes you to drop down the turbo curve further/sooner. I think this probably explains why the EPYC 9654 would occasionally outperform the EPYC 9685X, even though the latter had a higher base clock and they both had the same boost clock.

At this point, we're down in the footnotes. Overall, I agree with your statement.

NinoPino · Jul 25, 2024

thestryker said:
... That means the IPC is identical in every circumstance that doesn't leverage the extra cache capacity. You could certainly make an argument for saying X3D are higher IPC, but this is also a circumstance where IPC is a completely useless metric.

It is not useless at all because a lot of softwares leverage larger L3.

bit_user · Jul 25, 2024

NinoPino said:
It is not useless at all because a lot of softwares leverage larger L3.

Trying to assess how often and the extent to which it helps is the reason for the exercise I undertook in post #20. Even so, that's not exactly a random sampling of apps - I think it's a mix of common apps and ones which Phoronix suspected might benefit from the larger L3 cache.

TheSecondPower · Jul 25, 2024

This has been touched on here but I think a formula might be helpful. For a single core and a single thread:
IPC × frequency = performance

I've seen elsewhere people talking about IPC as though it were single-threaded performance, and even some tech reviewers have said that Intel's Meteor Lake has much lower IPC than Raptor Lake. But I've never seen anyone normalize frequency when showing data for this claim. From what we know, the Zen 5c cores are probably going to perform worse because they have a lower maximum frequency, and they may also perform worse in some scenarios because they have less cache. (At least for Strix Point they have less cache, that might not be true in all products Zen 5c appears in.)

usertests · Jul 25, 2024

thestryker said:
If you're curious about the memory support specs (reinforces to me why 2DPC needs to die):

I know what I want the replacement to be... quad-channel, 1 DIMM each.

bit_user · Jul 25, 2024

TheSecondPower said:
But I've never seen anyone normalize frequency when showing data for this claim.

I've definitely seen some reviewers and even Intel or AMD pick a common frequency, for measuring the IPC difference between CPUs.

The other way to do it is to pick fixed frequencies on each and divide by them, although this is less optimal since performance doesn't scale quite linearly with frequency.

TheSecondPower said:
From what we know, the Zen 5c cores are probably going to perform worse because they have a lower maximum frequency,

Let's say you have a 16 core CPU with base clocks of 3.2 GHz. That means the worst case, multi-threaded load should run an all-core clock of at least 3.2 GHz. If you then replace some of your cores with C-cores that can clock at least as high as 3.2 GHz, then your worst case performance shouldn't be negatively impacted by some of them being C-cores (if we leave aside the issue of less L3 cache, anyway).

That's the basic idea behind using C-cores in a hybrid configuration.

TheSecondPower said:
At least for Strix Point they have less cache, that might not be true in all products Zen 5c appears in.

I'll bet it is, though. Not the 1 MiB per core of Strix Point, but probably the 2 MiB per core that we saw in Begamo.

bit_user · Jul 25, 2024

usertests said:
I know what I want the replacement to be... quad-channel, 1 DIMM each.

You're gonna need a bigger ~~boat~~ socket.

🦈

TheSecondPower · Jul 25, 2024

bit_user said:
I've definitely seen some reviewers and even Intel or AMD pick a common frequency, for measuring the IPC difference between CPUs.

Yeah I think AMD and Intel are usually pretty honest about IPC, and all quality tests I've seen reviewers do of IPC have confirmed AMD and Intel's claims. The most far off was when AMD claimed a 40% IPC increase for Zen 1 and reviewers found it to be over 50% in their tests.

But when Meteor Lake came out and reviews showed slightly lower single-thread performance for the 155H than the 1360P, some people said it was an IPC regression and very few seemed to notice that the 155H had a 200MHz lower maximum clock speed.

TheSecondPower · Jul 25, 2024

bit_user said:
Let's say you have a 16 core CPU with base clocks of 3.2 GHz. That means the worst case, multi-threaded load should run an all-core clock of at least 3.2 GHz. If you then replace some of your cores with C-cores that can clock at least as high as 3.2 GHz, then your worst case performance shouldn't be negatively impacted by some of them being C-cores (if we leave aside the issue of less L3 cache, anyway).

That's the basic idea behind using C-cores in a hybrid configuration.

I'd even go so far as to say that in the best case the Zen 5c cores will perform better than the Zen 5 cores, since 5c is more efficient than the standard core at lower frequencies, and in highly threaded workloads all the cores have to clock down since there's not enough power and thermal management to go around. Similarly with Skymont and Lion Cove, in highly threaded workloads the two will probably do better together than a pure Lion Cove CPU would, even with Skymont being a little slower.

usertests · Jul 25, 2024

bit_user said:
You're gonna need a bigger ~~boat~~ socket.

🦈

I want that too. Desktops are becoming more niche, overshadowed by laptops. Corporate mini PCs, all-in-ones, etc. could use mobile chips instead of desktop chips. So I think the desktop sockets should grow to accommodate larger, more premium chips. Even something like Strix Halo could probably fit onto AM5 if it was 20% larger.

Threadripper has been crammed onto Micro-ATX, something a little larger than AM5 could probably fit on Mini-ITX. Some form of CAMM memory could replace DIMMs but it's too early to tell and I don't know if it takes up more horizontal space.

thestryker · Jul 25, 2024

bit_user said:
The clockspeed penalty is what makes IPC a little more complicated. The conceit of IPC is that you can just scale up by clockspeed to get final performance. However, if clockspeed is being affected, the you need to remember that even though the X3D is a higher-IPC part, that doesn't necessarily make it faster.

This isn't so strange as it might first seem. Consider that people would compare wildly different CPUs' IPC - such as Apple A-series or M-series SoC vs. x86 - using Geek Bench or SPEC2017. That's how they concluded Apple's M1 had soundly beaten x86 on IPC, although you had to keep in mind that it couldn't clock as high. That doesn't invalidate the concept of IPC - or clock-normalized performance - as it should more properly be called.

The clockspeed penalty is what makes it a useless metric for the X3D parts when comparing them to non-X3D. By the logic you're applying here a 13900K would have higher IPC than a 13600k (as it has 50% more L3 which can be leveraged for lower threaded workloads), but does saying that actually mean anything useful?

Let's take it from another angle AMD and Intel love using IPC whether it's applicable to what they're referring to or not and guess what's not in any of the X3D slide decks.

It's totally fine that you think it's useful I just don't.

bit_user · Jul 25, 2024

TheSecondPower said:
when Meteor Lake came out and reviews showed slightly lower single-thread performance for the 155H than the 1360P, some people said it was an IPC regression and very few seemed to notice that the 155H had a 200MHz lower maximum clock speed.

These are the type who will toss around jargon terms to make it sound like they know what they're talking about, but without putting in the effort to properly understand them, first.

Anyone who said that is doing you a favor: they are telling you they care more about appearances than actually knowing what they're talking about, and probably care more about followers than not leading people astray. So, heed their warning! Don't follow them, if you don't want to be led astray!
: )

TheSecondPower said:
I'd even go so far as to say that in the best case the Zen 5c cores will perform better than the Zen 5 cores, since 5c is more efficient than the standard core at lower frequencies, and in highly threaded workloads all the cores have to clock down since there's not enough power and thermal management to go around.

You sometimes hear them called "throughput cores", because their smaller size & greater efficiency means a larger number of them can fit the same area or power budget to better tackle large (but separable) problems.

TheSecondPower said:
Similarly with Skymont and Lion Cove, in highly threaded workloads the two will probably do better together than a pure Lion Cove CPU would, even with Skymont being a little slower.

Yes, it's the same idea.

bit_user · Jul 25, 2024

usertests said:
Threadripper has been crammed onto Micro-ATX, something a little larger than AM5 could probably fit on Mini-ITX. Some form of CAMM memory could replace DIMMs but it's too early to tell and I don't know if it takes up more horizontal space.

AMD actually introduced a smaller server CPU socket (SR6), last year, for their EPYC 8000 series CPUs. The ThreadRipper 7000 series' sTR5 socket is derived from this.

Good move, but it was still 4844 pins and supported up to 8 DIMM channels and up to 128 PCIe lanes. I think that's still too much, because only their ThreadRipper Pro CPUs could actually use all of that. They'd have been better off leaving those on SR5 (EPYC Genoa socket) and making sTR5 even smaller, in order to lower platform costs and make the lower-end ThreadRipper models more accessible.

However, I guess by diverging from SR6, they might lose more than they gain, in terms of platform costs. The TRX50 boards already didn't wire up many of the pins in sTR5, so maybe the savings wouldn't have been worthwhile.

bit_user · Jul 25, 2024

thestryker said:
The clockspeed penalty is what makes it a useless metric for the X3D parts when comparing them to non-X3D. By the logic you're applying here a 13900K would have higher IPC than a 13600k (as it has 50% more L3 which can be leveraged for lower threaded workloads), but does saying that actually mean anything useful?

Yes, that is true. I think it's also true that the R9 7900X probably has higher IPC than R9 7950X, on some workloads, because AMD leaves all 32 MiB of L3 enabled on their CCDs, giving the 7600X and 7900X more cache per core than CPU models with fully-occupied CCDs. So, if you're measuring multi-threaded IPC, at a constant clockspeed that each CPU can hit, then I'd expect it to be a little higher for the 6-core and 12-core models.

thestryker said:
It's totally fine that you think it's useful I just don't.

I think IPC is most useful when using it to characterize a microarchitecture. Minor variations, within the model lineup aren't so interesting, particularly due to IPC being a rather coarse metric, not highly-predictive, and not totally separable from clock speed (i.e. due to sub-linear frequency scaling).

However, I think this has been a worthwhile exercise to remind us what factors can influence it and therefore how much/little to read into it. Heck, graphs like this should be reminder enough of that:

If you're thinking "hey, Zen 5 gives me 16% more IPC!", you'll be disappointed when you fire up Far Cry 6 and get only a 10% improvement. Or, maybe you'll underestimate the CPU's efficiency on workloads like the GeekBench one that delivered 35% IPC improvement!

So, it's a coarse metric of relatively limited value, but still interesting and worth looking at for tracking the development & sophistication of microarchitectures.

thestryker · Jul 25, 2024

bit_user said:
Good move, but it was still 4844 pins and supported up to 8 DIMM channels and up to 128 PCIe lanes. I think that's still too much, because only their ThreadRipper Pro CPUs could actually use all of that. They'd have been better off leaving those on SR5 (EPYC Genoa socket) and making sTR5 even smaller, in order to lower platform costs and make the lower-end ThreadRipper models more accessible.

Intel seems to be doing the same thing with GNR (and thus I'd assume into the future) which I'm a bit disappointed by. We're going to have 3 sockets again but it's desktop to big to giant instead of there being a middle ground socket. I would love to see something in the 2000-3000 pin range again where they could move up to quad channel memory and more PCIe than desktop without blowing the entire budget.

usertests · Jul 25, 2024

TheSecondPower said:
I'd even go so far as to say that in the best case the Zen 5c cores will perform better than the Zen 5 cores, since 5c is more efficient than the standard core at lower frequencies, and in highly threaded workloads all the cores have to clock down since there's not enough power and thermal management to go around. Similarly with Skymont and Lion Cove, in highly threaded workloads the two will probably do better together than a pure Lion Cove CPU would, even with Skymont being a little slower.

I was under the impression that Zen 4c was only more efficient than Zen 4 in a narrow range of frequencies. Actually, it's from this article here:

https://www.tomshardware.com/news/amd-phoenix-2-review-evaluates-zen-4-zen-4c-performance

Zen 4c needs a higher core voltage to reach the same clock speeds as Zen 4. The VID (voltage identification definition) charts revealed that Zen 4 hits the Vmin (the minimal voltage that a processor requires for a workload at a particular frequency) at 2.3 GHz. In contrast, Zen 4c arrives at the Vmin below 1.5 GHz. The V/F (voltage-to-frequency) curve for both cores overlaps at 1.5 GHz. Zen 4c's power efficiency resides in between 1.5 GHz and 2 GHz. Zen 4c consumes less power despite the higher recorded voltage due to the more compact design.

Considering that Zen 4c and likely Zen 5c can get up to around 3.5 GHz, and that kind of clock can be reached in a multithreaded workload, the efficiency story doesn't seem that impressive.

I just checked the 8540U and the Zen 4c base clock isn't even in that 1.5 to 2 GHz range. It's 3.0 GHz, unless that's a mistake by AMD.

Zen 4c and Zen 5c have good performance-per-area, and they sacrifice higher clock speeds that they don't need to hit in mobile anyway, leaving those to the smaller number of full size cores. Their actual performance-per-watt is not that impressive, especially at their max frequencies. It's just using the die area more wisely. That could even be the case on a hypothetical hybrid desktop CPU, where 8 Zen 5 + 16 Zen 5c cores are almost always going to outperform 16 Zen 5 cores in a highly threaded workload.

TheSecondPower · Jul 25, 2024

thestryker said:
By the logic you're applying here a 13900K would have higher IPC than a 13600k (as it has 50% more L3 which can be leveraged for lower threaded workloads), but does saying that actually mean anything useful?

I think that does mean something. The 13600K has a 12% lower turbo clock and 33% less cache but is only like 10% slower in games. It even matches the 7700X which has more L3 cache. Evidently Intel's Golden/Raptor Cove cores aren't as affected by L3 in games as Zen 4 is. Maybe because its L2 cache is twice as big or because of what's chosen for the cache.

thestryker · Jul 25, 2024

TheSecondPower said:
I think that does mean something. The 13600K has a 12% lower turbo clock and 33% less cache but is only like 10% slower in games. It even matches the 7700X which has more L3 cache. Evidently Intel's Golden/Raptor Cove cores aren't as affected by L3 in games as Zen 4 is. Maybe because its L2 cache is twice as big or because of what's chosen for the cache.

It's largely the architecture itself HUB did a video during 10th gen which showed clear gains that appeared to be related to cache in gaming, but their updated test for RPL didn't show anywhere near the difference: https://www.techspot.com/review/2845-intel-3d-vcache/

Of course with cache there is also an inflection point to be had. When you get to the really big cache numbers the advantage is less of the workload hits system memory. I imagine if Intel was able to add on an extra 64MB there would be much larger gains than the 12MB difference between 13600K and 13900K.

bit_user · Jul 25, 2024

usertests said:
I was under the impression that Zen 4c was only more efficient than Zen 4 in a narrow range of frequencies. Actually, it's from this article here:

https://www.tomshardware.com/news/amd-phoenix-2-review-evaluates-zen-4-zen-4c-performance

Considering that Zen 4c and likely Zen 5c can get up to around 3.5 GHz, and that kind of clock can be reached in a multithreaded workload, the efficiency story doesn't seem that impressive.

I think you glossed over a key detail, at the end of that quote: "Zen 4c consumes less power despite the higher recorded voltage due to the more compact design." So, even if it needs higher voltage to hit a certain frequency than the non-C core, that doesn't mean it needs more power or is therefore less efficient!

Higher voltage only translates to higher power, when you're talking about the same core. You could run a different core at higher voltage and still find it uses less power (or, conversely, uses more power at a lower voltage).

usertests said:
I just checked the 8540U and the Zen 4c base clock isn't even in that 1.5 to 2 GHz range. It's 3.0 GHz, unless that's a mistake by AMD.

Uh, you know what baseclock actually means, right? It's the guaranteed minimum all-core clock speed for a CPU. It's therefore specific to the CPU type, not the core type, and it also means that the core type must go at least that high.

usertests said:
Zen 4c and Zen 5c have good performance-per-area, and they sacrifice higher clock speeds that they don't need to hit in mobile anyway, leaving those to the smaller number of full size cores. Their actual performance-per-watt is not that impressive, especially at their max frequencies.

First of all, servers also tend to run at lower clockspeeds. The peak boost clock of the flagship EPYC Genoa 9654 CPU is only 3.7 GHz!

Secondly, how do you know their perf/W is worse? That's not consistent with Phoronix' benchmarks of Bergamo and Siena. If you look at the Geomean and whole-test power stats, the 9754 (Bergamo) beats the 9654 (Genoa) at iso-power (configured) of 360W by 19.9%, while using only 86.4% as much power!

https://www.phoronix.com/review/amd-epyc-9754-bergamo

usertests said:
It's just using the die area more wisely.

Area matters, too. That's most directly coupled to perf/$.

usertests · Jul 25, 2024

bit_user said:
I think you glossed over a key detail, at the end of that quote: "Zen 4c consumes less power despite the higher recorded voltage due to the more compact design." So, even if it needs higher voltage to hit a certain frequency than the non-C core, that doesn't mean it needs more power or is therefore less efficient!

Higher voltage only translates to higher power, when you're talking about the same core. You could run a different core at higher voltage and still find it uses less power (or, conversely, uses more power at a lower voltage).

I made no claim about voltage. According to that quote, outside of 1.5 GHz to 2 GHz, Zen 4c is not more efficient than Zen 4.

I mention the base clock because it will often be running those cores at above 2 GHz, unless it's idling and not using much power anyway.

https://www.anandtech.com/show/2111...s-with-zen-4c-smaller-cores-bigger-efficiency

Here's a graph AMD produced. The hybrid configuration is a little more efficient between 10W and 17.5W, but the advantage is already mostly gone by 15W.

bit_user · Jul 25, 2024

usertests said:
I made no claim about voltage.

You quoted something that made it sound as if that's what you were latching onto. Anyway, thanks for clearing it up.

usertests said:
According to that quote, outside of 1.5 GHz to 2 GHz, Zen 4c is not more efficient than Zen 4.

That's only a problem if you run all cores at the same frequency. If you keep each core at a comparable efficiency, then scalable workloads will run more efficiently because more cores are more efficient at performing highly-scalable workloads than fewer cores, since every core type loses efficiency as you turn up the frequency. Having more cores means each core gets less power, which keeps it better inside of its efficiency window.

Anyway, that might mean running the regular cores at 500 MHz faster than the C cores. So, the thread scheduling disparity doesn't completely disappear, but there's still more aggregate perf/W.

usertests said:
Here's a graph AMD produced. The hybrid configuration is a little more efficient between 10W and 17.5W, but the advantage is already mostly gone by 15W.

That's obvious. The 2x Zen 4 + 4x Zen 4C configuration has a lower performance ceiling than 6x Zen 4. So, as you push them both to higher performance levels, the former is becoming less efficient sooner, because that's what happens to every CPU as it nears its performance ceiling.

jp7189 · Jul 27, 2024

TheSecondPower said:
This has been touched on here but I think a formula might be helpful. For a single core and a single thread:
IPC × frequency = performance

I've seen elsewhere people talking about IPC as though it were single-threaded performance, and even some tech reviewers have said that Intel's Meteor Lake has much lower IPC than Raptor Lake. But I've never seen anyone normalize frequency when showing data for this claim. From what we know, the Zen 5c cores are probably going to perform worse because they have a lower maximum frequency, and they may also perform worse in some scenarios because they have less cache. (At least for Strix Point they have less cache, that might not be true in all products Zen 5c appears in.)

Thanks for restating that in a simple way. I'll tack on to this and add that IPC is 0 whenever the CPU runs out of data to operate on. X3D chips tend to hit 0 IPC less often, and therefore usable average IPC is higher on some workloads even though theoretical max IPC is the same as non X3D.

News AMD dishes more Zen 5 details — Compact core is 25% smaller than the normal core, new SoC and chip architecture with dual CCXs

Judicious

Reputable

Titan

Titan

Reputable

Titan

Distinguished

Splendid

Titan

Titan

Distinguished

Distinguished

Splendid

Judicious

Titan

Titan

Titan

Judicious

Splendid

Distinguished

Judicious

Titan

Splendid

Titan

Distinguished

Share this page