Why not put all the L3 cache on the IOD and use the extra space on the core chiplets for lots of big cores with more L1 and L2 cache memory?
I see a few reasons not to do that:
- Increases latency of a L3 cache hit.
- Significantly hampers core-to-core communication, within the same CCD.
- Could bottleneck L3, if all dies are now competing for a unified L3 on the IOD.
- Eliminates natural scalability of L3, that you get from having it in the CCDs.
The thing to keep in mind is that the L3 slice on a CCD is there primarily to benefit the cores on that chiplet. AFAIK, they're the only ones which can populate it, which means they're also the ones primarily interested in reading whatever it holds.
AMD's segmented approach to L3 also makes it a little unfair to compare their L3 quantity with Intel's, since Intel has a unified L3 (all slices can be populated by any core).
(The larger L1 and especially L2 caches are to compensate for the higher latency of the L3 cache memory being on another chiplet.)
It's not as if there's a simple ratio of additional L1/L2 that would work the same for all workloads. There also are other factors which feed into their sizing of L1 and L2 - most importantly, latency. It's not as if the only price you pay for larger L2 cache is die area - a larger L2 also takes longer to search and fetch from.
Another key point is that L3 is used for communicating between cores within the same CCD. Zen 2 had a weird partition where 4 cores would share the same slice of L3, but to communicate with the other quad-core complex, you had to bounce off the I/O die. One of the big wins, touted in Zen 3, was to have a unified L3 slice, shared by all the cores of a CCD. What you're proposing is to regress all the way to the point of each core having to bounce off the IO Die to talk with any of its peers. I think that would measurably hurt multithreaded performance.
For further reading, I suggest looking at Chips & Cheese' coverage of AMD's CPUs, since the chiplet-related aspects are ones they frequently revisit.