News AMD Allegedly Testing Hybrid Processor with Zen 4 and 4c Cores

ezst036

Honorable
Oct 5, 2018
750
627
12,420
I like the big.little concept the more I think about it. I used to dislike it very strongly.

It makes sense for a majority of work to be done on the most low power cores they can make, and only brute force whatever needs it at the time it needs it such as when gaming or using Blender. This is, to some extent, how ARM achieves its goals.

It's also a win for the manufacturers because of the needs of die space in a fab.
 

DaveLTX

Commendable
Aug 14, 2022
104
66
1,660
I like the big.little concept the more I think about it. I used to dislike it very strongly.

It makes sense for a majority of work to be done on the most low power cores they can make, and only brute force whatever needs it at the time it needs it such as when gaming or using Blender. This is, to some extent, how ARM achieves its goals.

It's also a win for the manufacturers because of the needs of die space in a fab.
In particular it's great if the ISA and cores are truly heterogenous, in the case of ARM that failed to be the case with X2, A710 and A510 with only A710 supporting 32 bit iirc
With AMD all cores support AVX512 and since the caches take up majority of the space, it cuts down by half compared to Zen 4 while delivering almost identical IPC unless it heavily requires the L3 cache. It's almost half the size!
(L2, the main thing increased with zen 4 stays the same for Zen 4c per core)

Where I'm guessing AMD is going to take is 8 zen 4 CCD and 16 core zen 4c CCD for the big am5 options. Considering the massive ipc advantage zen 4 has over gracemont cores (especially with vector workloads) this could be a serious fight and the fact that it has a L3 which gracemont doesnt have but kind of has, but it has to go all the way round the CPU to nest into. This is a situation where gracemont truly are low performance cores... (Way below peak IPC, and they suck at power efficiency anyway to begin with)
 
  • Like
Reactions: gg83 and atomicWAR
Jul 18, 2022
23
18
15
In particular it's great if the ISA and cores are truly heterogenous, in the case of ARM that failed to be the case with X2, A710 and A510 with only A710 supporting 32 bit iirc
With AMD all cores support AVX512 and since the caches take up majority of the space, it cuts down by half compared to Zen 4 while delivering almost identical IPC unless it heavily requires the L3 cache. It's almost half the size!
(L2, the main thing increased with zen 4 stays the same for Zen 4c per core)

Where I'm guessing AMD is going to take is 8 zen 4 CCD and 16 core zen 4c CCD for the big am5 options. Considering the massive ipc advantage zen 4 has over gracemont cores (especially with vector workloads) this could be a serious fight and the fact that it has a L3 which gracemont doesnt have but kind of has, but it has to go all the way round the CPU to nest into. This is a situation where gracemont truly are low performance cores... (Way below peak IPC, and they suck at power efficiency anyway to begin with)
I would love to see a chip like that. All the cores support 2 threads so that's 48 threads of zen4 power in a desktop chip. Wild.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
I think this shows AMD is worried about Meteor Lake.

Where I'm guessing AMD is going to take is 8 zen 4 CCD and 16 core zen 4c CCD for the big am5 options.
I think this is primarily aimed at laptops, where they'll probably go with a monolithic die for energy-efficiency reasons.

Then again, maybe their recent advancements in chiplet communication efficiency, that we saw in RDNA3, will enable them to use chiplets for mainstream laptop CPUs.
 
  • Like
Reactions: atomicWAR

usertests

Distinguished
Mar 8, 2013
931
840
19,760
We've seen leaks of Zen 5 + Zen 4C chiplets for Granite Ridge many months ago. This thing, "Little Phoenix", was also leaked months ago.

With 2x Zen 4 (low clocks?) + 4x Zen 4C + unknown CUs (4?), it wouldn't be clearly better than some discounted Barcelo/Rembrandt options. I'm seeing it as a successor to Mendocino at the premium price points that was supposed to target ($400-700) while Mendocino is relegated to sub-$250 laptops. I guess it could target the same TDPs as Mendocino, especially with a node shrink and other tricks to reduce power. Slash the cache on some cores and lower the clock speeds, and you get the very efficient 'C' version. AMD has already tried mixed cache with 7000X3D.

I think this will be monolithic, no chiplets.
 
  • Like
Reactions: prtskg
Considering the massive ipc advantage zen 4 has over gracemont cores
IIRC Bergamo, Zen4c in server CPUs, each core is supposed to have performance around Zen 3. Gracemont has performance of about Skylake. Zen 2's IPC was 7-10% higher than Skylake and Zen 3 is another 15% over Zen 2. Based on that we could expect Zen4c's IPC to easily be 25-30% higher than Gracemont. I assume that Zen4c would be easier to clock higher as well. This could be an interesting setup as you still would have full fat cores just lower performing. It would be more like the newer Arm big.little with 4 high performance cores but 2 are clocked higher than another 2.
 

TJ Hooker

Titan
Ambassador
In particular it's great if the ISA and cores are truly heterogenous, in the case of ARM that failed to be the case with X2, A710 and A510 with only A710 supporting 32 bit iirc
With AMD all cores support AVX512 and since the caches take up majority of the space, it cuts down by half compared to Zen 4 while delivering almost identical IPC unless it heavily requires the L3 cache. It's almost half the size!
It seems like you mean "homogeneous", not "heterogeneous".
 

scottscholzpdx

Honorable
Sep 14, 2017
19
14
10,515
Most things done on a computer need only modern Atom level performance, but some things need some serious grunt. 4-8 high powered Zen 4 cores and plenty (16-32) of Zen 1-2 level of performance efficient cores is what I'd like in a desktop.
 
  • Like
Reactions: King_V and bit_user

usertests

Distinguished
Mar 8, 2013
931
840
19,760
Most things done on a computer need only modern Atom level performance, but some things need some serious grunt. 4-8 high powered Zen 4 cores and plenty (16-32) of Zen 1-2 level of performance efficient cores is what I'd like in a desktop.

That could be inevitable if Intel doubles down and eventually goes to 8 P-cores, 32-64 E-cores. 8 small cores can be ignored, 16 small cores is harder to ignore, 32+ is impossible to ignore. Everyone will have to optimize for that.

I think AMD should retain 16 big cores in the flagship, since it has offered that for 3 generations in a row. Granite Ridge could launch with up to 16x Zen 5 cores (2 chiplets), 16x Zen 4C cores (1 chiplet). Closer to the bottom of the stack you could get 6x Zen 5 + 12-16x Zen 4C.

If Granite Ridge isn't heterogenous, offer up to 24x Zen 5 cores, lower the price per core, and call it a day.
 
That could be inevitable if Intel doubles down and eventually goes to 8 P-cores, 32-64 E-cores. 8 small cores can be ignored, 16 small cores is harder to ignore, 32+ is impossible to ignore. Everyone will have to optimize for that.
There is nothing to optimize,
some workloads can use "infinite" cores so they would be scheduled on all available cores and some workloads can only use a very limited amount of cores so they would be scheduled on the big cores, there is nothing more they can do.
Any attempt to make workloads that use fewer cores use more cores is already being done anyway.
I think AMD should retain 16 big cores in the flagship, since it has offered that for 3 generations in a row. Granite Ridge could launch with up to 16x Zen 5 cores (2 chiplets), 16x Zen 4C cores (1 chiplet). Closer to the bottom of the stack you could get 6x Zen 5 + 12-16x Zen 4C.

If Granite Ridge isn't heterogenous, offer up to 24x Zen 5 cores, lower the price per core, and call it a day.
Both of these options are pretty bad for AMD because it will increase their cost by a lot and decrease the money they would get from each CPU and they are tied to TSMC so it's not like they can just easily produce a larger amount of CPUs to compensate for the lower per unit cost.
 

msroadkill612

Distinguished
Jan 31, 2009
204
30
18,710
AMD's Phoenix 2 APU may pack two Zen 4 cores, four Zen 4c cores.

AMD Allegedly Testing Hybrid Processor with Zen 4 and 4c Cores : Read more
Intel was mostly BigLittle motivated by core count envy.

I hate that they e.g. deceptively call 6+8 "14 core".

If AMD like big little too, I suspect it is for sounder reasons, & will have a better outcome.

AMD may have a problem of surplus cores - an 8 core minimum chiplet is more silicon than required for entry level processors. 4+4+gpu & ddr5 RAM sounds a nice balance for a frugal but powerful apu/mobile.
 

bit_user

Titan
Ambassador
There is nothing to optimize,
some workloads can use "infinite" cores so they would be scheduled on all available cores and some workloads can only use a very limited amount of cores so they would be scheduled on the big cores, there is nothing more they can do.
Any attempt to make workloads that use fewer cores use more cores is already being done anyway.
With better APIs, programming languages, and OS support, there's room for a fair bit more concurrency in most software. Intel knows this, hence their investment in things like TBB (Thread Building Blocks) and Sapphire Rapids' new UIRET instruction.


4+4+gpu & ddr5 RAM sounds a nice balance for a frugal but powerful apu/mobile.
There are two reasons for E-cores:
  1. Battery conservation.
  2. Better perf/$.
When we're talking about laptops, a 4+4 setup is certainly enough E-cores for demotion of low-priority tasks to extend battery life. However, it's not enough to really deliver much more perf/$. For that, you need a significant chunk of die area to be E-cores. So, at least a 2:1 ratio, if not more.
 
Last edited:
Intel was mostly BigLittle motivated by core count envy.
I would say yes and no to this. Intel knows that their P cores are VERY power hungry and therefore they cannot make a 16c/32t CPU using all P cores and have it clock well enough AND not consume 1000W while under operation. Intel's biggest problem is that the Core uArch has not been that efficient beyond 4 cores. With Zen AMD designed it to be efficient with 8 cores. We see this in the performance scaling of the CPUs. Zen 4 doesn't get much more performance after it is at 105W TDP whereas Core keeps going with an increase in power.
 

usertests

Distinguished
Mar 8, 2013
931
840
19,760
There is nothing to optimize,
some workloads can use "infinite" cores so they would be scheduled on all available cores and some workloads can only use a very limited amount of cores so they would be scheduled on the big cores, there is nothing more they can do.
Any attempt to make workloads that use fewer cores use more cores is already being done anyway.

Both of these options are pretty bad for AMD because it will increase their cost by a lot and decrease the money they would get from each CPU and they are tied to TSMC so it's not like they can just easily produce a larger amount of CPUs to compensate for the lower per unit cost.

If there was nothing to optimize in terms of operating systems, games, and applications, everybody would already be very enthusiastic about Intel's E-cores.

A Zen 4C chiplet should be cheaper than Zen 5 and on an older node. Whether or not it costs "a lot" (maybe $25 per CPU), AMD will do it if they think they need it to compete with Intel. Granite Ridge is as good of a time as any to take away Intel's multi-threading lead.
 
  • Like
Reactions: bit_user
Granite Ridge is as good of a time as any to take away Intel's multi-threading lead.
Right now the 13900k and 7950X are more times than not going to swap with each other when it comes to MT performance. In Tom's review of the 13900k, the geometric mean in their application MT tests had the 13900k at 97% as fast as the 7950X, however, the 13900k was 10% faster on average in ST work.
 
With better APIs, programming languages, and OS support, there's room for a fair bit more concurrency in most software. Intel knows this, hence their investment in things like TBB (Thread Building Blocks) and Sapphire Rapids' new UIRET instruction.

If there was nothing to optimize in terms of operating systems, games, and applications, everybody would already be very enthusiastic about Intel's E-cores.
Yeah, these would fall under the they are doing this already and anyway.
They aren't doing it because intel might release a CPU with 32+ e-cores sometime in the future.
 

bit_user

Titan
Ambassador
I would say yes and no to this. Intel knows that their P cores are VERY power hungry and therefore they cannot make a 16c/32t CPU using all P cores and have it clock well enough AND not consume 1000W while under operation.
Thermal runaway aside, you'd expect doubling the core count to (no more than) double the power. 1000 W is overshooting the mark by at least 2x.

But, if you really wanted to see what a 16 P-core Alder Lake would be like, you can get some idea by looking at the Xeon W5-2465X.
  • 4.7 GHz Turbo Boost Max
  • 3.1 GHz Base Frequency
  • 200 W TDP (Max Turbo Power: 1.2x base power)
  • MSRP: $1389
Source: https://www.tomshardware.com/news/intel-xeon-w-3400-w-2400-cpu-launch-hedt-overclock

Of course, there are aspects such as its extra memory channels & PCIe lanes that hurt efficiency and push up pricing, but it's another data point on Golden Cove.

Anyway, based on their respective performance, I think we can conclusively say the decision to bring E-cores to the desktop is primarily about balancing out Alder Lake's perf/mm^2 (and therefore perf/$). Individually, they're about 54% to 60% as fast as a single-threaded P-core, but use only about 25% of the area. That's a huge win for multithreaded workloads. That's at least as much of the reason that Intel doubled-down on E-cores, in Raptor Lake, as the power aspect.

ntel's biggest problem is that the Core uArch has not been that efficient beyond 4 cores. With Zen AMD designed it to be efficient with 8 cores.
Golden Cove has a much larger reorder buffer (512 entries vs. 320 in Zen 4). Seems like it should scale better, leaving aside the matter of clocks and power.
 
Last edited:
1000 W is overshooting the mark by at least 2x.
That was meant as a hyperbole, sadly there isn't a good way to express that in text.

But, if you really wanted to see what a 16 P-core Alder Lake would be like, you need look no further than Xeon W5-2465X.
I wouldn't say that gives us the base frequency and TDP but not the max power TDP. We both know Intel would likely try and have it boost as far as possible and throw power draw out the window, see Intel using the 5000W chiller demo system, and the actual boost TDP will be 500ishW.

Golden Cove has a much larger reorder buffer (512 entries vs. 320 in Zen 4). Seems like it should scale better, leaving aside the matter of clocks and power.
I'm not following what you are getting at here as my original comment was dealing with power. Core does a good job of scaling performance with additional power draw where as Zen4 levels off VERY quickly at 105W. That said the Core CPU needs 2x the TDP for the same performance. Which goes hand in hand with my statement that Core isn't efficient.
 

DaveLTX

Commendable
Aug 14, 2022
104
66
1,660
IIRC Bergamo, Zen4c in server CPUs, each core is supposed to have performance around Zen 3. Gracemont has performance of about Skylake. Zen 2's IPC was 7-10% higher than Skylake and Zen 3 is another 15% over Zen 2. Based on that we could expect Zen4c's IPC to easily be 25-30% higher than Gracemont. I assume that Zen4c would be easier to clock higher as well. This could be an interesting setup as you still would have full fat cores just lower performing. It would be more like the newer Arm big.little with 4 high performance cores but 2 are clocked higher than another 2.
More than that possibly. Zen 3 IPC might be understating it. If the L3 cache is very heavily used as in HPC or gaming or similar yeah it is Zen 3 IPC but there are many everyday situations where more cache does not improve
It seems like you mean "homogeneous", not "heterogeneous".
Refer to ARM, heterogeneous means two different architectures but share the ISA, on a technical level Zen 4c is "different" but is more optimized for lower clocks

That was meant as a hyperbole, sadly there isn't a good way to express that in text.


I wouldn't say that gives us the base frequency and TDP but not the max power TDP. We both know Intel would likely try and have it boost as far as possible and throw power draw out the window, see Intel using the 5000W chiller demo system, and the actual boost TDP will be 500ishW.


I'm not following what you are getting at here as my original comment was dealing with power. Core does a good job of scaling performance with additional power draw where as Zen4 levels off VERY quickly at 105W. That said the Core CPU needs 2x the TDP for the same performance. Which goes hand in hand with my statement that Core isn't efficient.
Golden cove is a big fat core for HPC and doesn't scale down at low power, Zen 4 is not a big fat core but somehow manages to hang in there with Golden cove
I assume base frequency is given for AVX512 situations given how much AVX512 lights up big parts of the core and draws a lot of power, in server situations TDP is heavily limited

There is nothing to optimize,
some workloads can use "infinite" cores so they would be scheduled on all available cores and some workloads can only use a very limited amount of cores so they would be scheduled on the big cores, there is nothing more they can do.
Any attempt to make workloads that use fewer cores use more cores is already being done anyway.

Both of these options are pretty bad for AMD because it will increase their cost by a lot and decrease the money they would get from each CPU and they are tied to TSMC so it's not like they can just easily produce a larger amount of CPUs to compensate for the lower per unit cost.
Actually heavy customization by Intel is also going to be bad for their IDF model. Intel is currently still having troubles because they used to depend on the fab to fix their issues rather than verifying that it works (IDF model means the design team is unable to do it anymore at least on the same scale)
That said, 4x Zen 4c cores will deliver near zen 4 IPC unless its VERY cache dependent (meaning it goes into L3) and its more power optimized (the aim for them was to go into bergamo which was designed to not go high on clocks, unlike Zen 4 whos aim is HPC) while being almost half the size of Zen 4 cores. the key difference is only slashing of L3 cache
While gracemont may be more area efficient (1 golden cove = 4 gracemont) , Zen 4 is actually smaller than Golden cove cores anyway and zen 4 is more power/performance efficient than either golden cove or gracemont (gracemont very much isn't, if you have to keep the cores on longer because the IPC is poor and power draw is still mid, it ends up drawing quite a lot more than expected in joules)
Not to forget that Intel bakes in L3 in the golden cove but not the gracemont means that anytime the gracemont cores are to be used, data has to exit the gracemont cores, light up the whole ring, look for data in the golden cove L3 because its not a true system level cache, uncore power draw is also significant
 
Last edited:
Core does a good job of scaling performance with additional power draw where as Zen4 levels off VERY quickly at 105W.
No it doesn't.
First off 105W is 145W of actual power used.
Secondly it doesn't level off there, it's just that heat gets out of control at that point preventing the CPU to go any higher, throw a bit of liquid nitrogen at the issue and you go to 337W with a decent amount of performance increase.

Just because intels node is more mature and can withstand the punishment of overclocking much easier, meaning that it can reach the same ~330W as ryzen but on normal cooling instead of needing N2 doesn't mean what you think it means.

Yes, ryzen is much more efficient at CB but heavy multithreading isn't the only thing there is.
 
  • Like
Reactions: rtoaht