News AMD Phoenix 2 CPU Die Shot Seemingly Shows Zen 4, Zen 4c Cores

Status
Not open for further replies.
Huh - I wasn't sure, but it does seem AMD managed to fit 2 Zen4c cores in the same area as it takes to run a single Zen4 core. With SMP support, it means they did manage to fit as many threads as Intel but without compromising instruction set support. AMD does what Intel don't.
 
Nice! I'm a sucker for die shots. : P
Is there any word on whether or not Zen4c will come to desktop? My biggest complaint about my current (and previous) AMD CPU is the idle and low-load power draw. If C-cores can improve that, while the regular cores keep their power efficiency with heavy loads, it would be an awesome hybrid.
 
Huh - I wasn't sure, but it does seem AMD managed to fit 2 Zen4c cores in the same area as it takes to run a single Zen4 core. With SMP support, it means they did manage to fit as many threads as Intel but without compromising instruction set support. AMD does what Intel don't.
Yes. Look at EPYC Bergamo. 2 Zen 4c in nearly Zen 4 area.
 
Nice! I'm a sucker for die shots. : P
Is there any word on whether or not Zen4c will come to desktop? My biggest complaint about my current (and previous) AMD CPU is the idle and low-load power draw. If C-cores can improve that, while the regular cores keep their power efficiency with heavy loads, it would be an awesome hybrid.
Actually it's a lot to do with optimization. The cpu cores reach deep sleep into basically 0-0.1W each. While the IOD is the one drawing over 10W. However not really much and the same design is now in laptops with lower idle power.
Wouldn't work in laptops if the IOD drew 10W. And not to forget, IOD power is exponential with dram speed, try undervolting the IOD as far as you can, the new bioses allow for either higher speed 1:1 IF or lower voltage at iso frequency
 
  • Like
Reactions: AgentBirdnest
Is there any word on whether or not Zen4c will come to desktop? My biggest complaint about my current (and previous) AMD CPU is the idle and low-load power draw.
Does anyone know if a Phoenix-based APU is slated for AM5? That might be noticeably more efficient than other AM5 options.

You can also go for one of the non-X (i.e. 65 W) AM5 CPUs, or just cap the TDP of one of the X-series CPUs in BIOS.
 
  • Like
Reactions: AgentBirdnest
I heard that Phoenix 2 won't have the AI accelerator. That seems like a mistake. If it's trying to "rival entry-level Meteor Lake", I'm guessing the MTL 2+8 die will include the accelerator used in the bigger dies.

Don't @ me with "but nobody uses it" or something. These are in every smartphone and will probably be in almost every new x86 chip within 2-3 years, including desktop.

Does anyone know if a Phoenix-based APU is slated for AM5? That might be noticeably more efficient than other AM5 options.

You can also go for one of the non-X (i.e. 65 W) AM5 CPUs, or just cap the TDP of one of the X-series CPUs in BIOS.
If Agent wants low idle power, the hypothetical Phoenix APU is going to be loads better for that, since it's monolithic and optimized for battery power.

I don't think we have gotten great leaks on Phoenix desktop APUs, but I assume the matter will be resolved for anyone who can wait another 6 months.
 
I heard that Phoenix 2 won't have the AI accelerator. That seems like a mistake. If it's trying to "rival entry-level Meteor Lake", I'm guessing the MTL 2+8 die will include the accelerator used in the bigger dies.

Don't @ me with "but nobody uses it" or something. These are in every smartphone and will probably be in almost every new x86 chip within 2-3 years, including desktop.
I agree with you - Ryzen AI is a forward-looking feature, assuming the implementation is as good as they claim. It's not there for what it accelerates today, so much as what it can enable tomorrow.

3D graphics didn't used to be an essential feature, but now it's become so ubiquitous that it's become mandatory. Video codec acceleration is pretty much like that, too - especially in phones & laptops. AI accelerators will likely tread the same path.
 
Yes. Look at EPYC Bergamo. 2 Zen 4c in nearly Zen 4 area.
Much of the area savings on Bergamo are from not increasing the amount L3 cache per CCD (so halving it per core), while using a the HPC variant of TSMC N5 that allows for denser logic and cache (but limits speeds of each).
 
Much of the area savings on Bergamo are from not increasing the amount L3 cache per CCD (so halving it per core), while using a the HPC variant of TSMC N5 that allows for denser logic and cache (but limits speeds of each).
No, the process node didn't change!

Semianalysis did a thorough breakdown of Zen 4c - they looked at the relative sizes between it and Zen 4 and what key changes enabled the area reductions.

"AMD created Zen 4c by taking the exact same Zen 4 Register-Transfer Level (RTL) description, which describes the logical design of the Zen 4 core IP, and implementing it with a far more compact physical design. The design rules are the same as both are on TSMC N5, yet the area difference is massive. We detail the three key techniques of device Physical Design that enables this."
  1. "lowering the clock target of a design can lead to reduced area when the core is synthesized."
  2. "flatter design hierarchy with fewer partitions."
  3. "The final method of area reduction is by using denser memory. Zen 4c has a reduction in SRAM area within the core itself, as AMD has switched to using a new type of SRAM bitcell."

Source: https://www.semianalysis.com/p/zen-4c-amds-response-to-hyperscale?utm_source=/search/zen%204c

They go into quite a bit more detail about each of those points, so do take a look, if you're so inclined.

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7e48eab-af56-435b-9680-1ecfd901835b_1200x1502.png

 
Why not put all the L3 cache on the IOD and use the extra space on the core chiplets for lots of big cores with more L1 and L2 cache memory?
(The larger L1 and especially L2 caches are to compensate for the higher latency of the L3 cache memory being on another chiplet.)
Why is AMD not taking such an approach?

Would love a processor with a single core-focused chiplet housing 16 big cores and an IO die chiplet with 64 or even 128 MB L3 planar/flat cache memory (and iGPU).
 
Nice! I'm a sucker for die shots. : P
Is there any word on whether or not Zen4c will come to desktop? My biggest complaint about my current (and previous) AMD CPU is the idle and low-load power draw. If C-cores can improve that, while the regular cores keep their power efficiency with heavy loads, it would be an awesome hybrid.
Other than maybe some APU's I don't think you will see Zen 4c on the desktop you will however see Zen 5c on desktop.
 
  • Like
Reactions: AgentBirdnest
Why not put all the L3 cache on the IOD and use the extra space on the core chiplets for lots of big cores with more L1 and L2 cache memory?
I see a few reasons not to do that:
  1. Increases latency of a L3 cache hit.
  2. Significantly hampers core-to-core communication, within the same CCD.
  3. Could bottleneck L3, if all dies are now competing for a unified L3 on the IOD.
  4. Eliminates natural scalability of L3, that you get from having it in the CCDs.

The thing to keep in mind is that the L3 slice on a CCD is there primarily to benefit the cores on that chiplet. AFAIK, they're the only ones which can populate it, which means they're also the ones primarily interested in reading whatever it holds.

AMD's segmented approach to L3 also makes it a little unfair to compare their L3 quantity with Intel's, since Intel has a unified L3 (all slices can be populated by any core).

(The larger L1 and especially L2 caches are to compensate for the higher latency of the L3 cache memory being on another chiplet.)
It's not as if there's a simple ratio of additional L1/L2 that would work the same for all workloads. There also are other factors which feed into their sizing of L1 and L2 - most importantly, latency. It's not as if the only price you pay for larger L2 cache is die area - a larger L2 also takes longer to search and fetch from.

Another key point is that L3 is used for communicating between cores within the same CCD. Zen 2 had a weird partition where 4 cores would share the same slice of L3, but to communicate with the other quad-core complex, you had to bounce off the I/O die. One of the big wins, touted in Zen 3, was to have a unified L3 slice, shared by all the cores of a CCD. What you're proposing is to regress all the way to the point of each core having to bounce off the IO Die to talk with any of its peers. I think that would measurably hurt multithreaded performance.

For further reading, I suggest looking at Chips & Cheese' coverage of AMD's CPUs, since the chiplet-related aspects are ones they frequently revisit.
 
Last edited:
Why not put all the L3 cache on the IOD and use the extra space on the core chiplets for lots of big cores with more L1 and L2 cache memory?
(The larger L1 and especially L2 caches are to compensate for the higher latency of the L3 cache memory being on another chiplet.)
Why is AMD not taking such an approach?

Would love a processor with a single core-focused chiplet housing 16 big cores and an IO die chiplet with 64 or even 128 MB L3 planar/flat cache memory (and iGPU).
Intels basically done that with L4 cache on Broadwell onwards. Didn't go well. Latency off die is catastrophic, limited bandwidth and beats the whole point of splitting IO out
 
Status
Not open for further replies.