Because making 3D V-Cache work
well involves a lot more than just mashing two dies atop each other! You have to adapt your cache architecture and the floorplan of your compute die properly, in order for it to deliver the kinds of benefits we observed. That's AMD's contribution.
Chips & Cheese did some detailed profiling of 3D V-Cache and was surprised at how little latency the enlargement added:
This is the deeper dive of AMD’s V-Cache that we teased with our short latency article and we will be covering a little more on the latency front along with the bandwidth behavior of V-Cache and the performance of V-Cache SKUs.
chipsandcheese.com
SemiAnalysis noted just how much it seemed to have influenced the layout of Zen 4, when comparing it vs. Zen 4C (which eliminated the TSVs used to attach 3D V-Cache):
Bergamo Volumes, ASP, Performance, Hyperscale Order Shift, Die Shot, Floorplan, Physical Design, and Future Use of Dense Core Variants Bergamo, AMD’s upcoming 128-core server part sets new heights …
www.semianalysis.com
Elsewhere, probably during the launch coverage of the 7950X3D, I read about how AMD engineered the N7 cache die to achieve higher SRAM density than they apparently even managed on their N5 CCD, as well as how they carefully engineered placement and overhang vs. the cores. I'll post a link, if I can find it.
In short, I consider it a neat "division of labor" story.