That's not justifiable. If you look at the relative latency & bandwidth of layers in the cache hierarchy, there's a much bigger gap than what would exist between what you call L5 and L6. Capacity-wise, it's also hard to justify having both L4 and L5.
We'll see what customers choose to do.
When you have 128+GB of on-package HBM3 like every high-end server CPU is likely going to have by the end of 2024, your application's active working set is unlikely to overflow to external memory often enough to cause major performance issues unless you are plotting Chia in memory and the generally negligible performance impact won't be worth wasting 80 pins per extra dedicated memory channel that could be better used for an extra 5.0x16 interface you could plug 8X as much memory into at 4-6X the speed.
I have a feeling on-package SRAM would be even faster with even lower latency and higher bandwidth.
Given TSMC's stacking technology with 3Dv-Cache and the Die-Area real estate of a typical Zen core, having a dedicated Core Slot with pure SRAM and many layers of SRAM just to store your data set would be the fastest memory possible; WAY faster and lower latency than anything DRAM based.
The fundamental limits of DRAM is coming into play; if you want speed, SRAM > DRAM.
Full Duplex vs Half Duplex.
SRAM L4 Cache would be expensive, but the fastest Memory possible, faster than DRAM, faster than HBM.
Hard to beat the fundamental principles of how each memory type operates.
Yes DRAM has the capacity, but given what I've estimated about how much you can pack on 8x Cores worth of Zen Die Real Estate along with the limits of TSMC's stackable dies.
I estimate that on 16x Cores worth of Zen Die Real Estate on 5nm at ~_6,336 MiB using the upper stacking process limit of 12 layers.
I estimate that on 32x Cores worth of Zen Die Real Estate on 5nm at ~12,672 MiB using the upper stacking process limit of 12 layers.
The $360-$600 or more BoM cost per stacked SRAM cache die would be worth it for Hyper Scalers or those in HPC and those who need the fastest memory alive.
We all know how fast SRAM cache is and how low latency it is, there's nothing in the DRAM world that can touch it in terms of pure speed or latency.
As far as capacity, that's what DRAM has in spades. but given ~12 GiB of L4 scratch area, I'm sure that could make most applications sing while having MASSIVE Bandwidth & Latency close to crossing CCD/CCX domains for accessing L3 cache. That's still better than anything DRAM has currently.
We also know that Enterprise / HPC markets are willing to pay for such high performance parts.
And we also know that AMD LOVES SRAM and isn't afraid to use more of it. imagine what sacrificing 1x CCD slot or 2x CCD slots on a CPU for L4$ Stacked SRAM dies.
That would be INSANELY fast, that buffers everything before you hit DRAM.