Could AMD use these connectors to actually have 2 layers of chiplets for processing there?
instead of vcache two full processing cores one on top of the other?
I doubt it’s for anything other than cache expansion, but it would be cool to see processing-in-memory applied here. AMD can leverage FPGAs from its Xilinx acquisition and accelerate certain edge functions to speed up memory accesses (FPGA accelerator) and/or directly process AI algorithms in the stacked MCD. With the high-bandwidth interconnect, each MCD has 883GB/s to work with, and is mostly caching temporo-spatial frame data, plus some ray traversal/intersection data; the rest needs to stay very close to the CUs in the larger L0, doubled gfxL1s, 1.5x registers, various parameter caches, and L2. So, if you directly process AI within the MCD and those cached assets, theoretically, GPU proper (GCD) will save time by not having to use CUs to process matrix math ops.
Custom memory controllers can be used and Xilinx APIs need to be used to enable customization of FPGAs in MCDs (Infinity Link already handles communication to/from GCD). This is probably a long way off, but it stands to reason that it may be in conceptual development. It’s a logical way of offloading workloads within memory that’s already in possession of necessary data and has direct access to PHYs to access VRAM. This may take a long while to get to gaming devices, as it might be better to use something like this in professional workloads to accelerate processing. Also: cost!
EDIT: But, if it was implemented in a gaming GPU, improved on-the-fly upscaling and image AI processing via FSR and video encoding are logical targets. As a memory accelerator, FPGAs can learn common patterns in memory accesses and accelerate them, just like in networking. A chip like a GPU is actually a network of discrete IP blocks linked via interconnect.