I held out hope that AMD would crack the gpu chiplet problem.. no mcus, but actual gpu chiplets. That breakthrough would lead to a mini Renaissance for client gpus in my opinion.
That's what they reportedly tried for RX 8000, by using die-stacking, but it didn't pan out.
They've shown it's possible for compute, but not so much for realtime frame rendering.
Their slide deck for RX 7000 explained why it's a much harder problem for GPUs than CPUs. The data movement is a couple orders of magnitude greater, within a GPU.
Apple's M1 Ultra was the first multi-die GPU that truly presented itself to software as a single GPU. It only had 2 dies and an interconnect bandwidth between them of 2.5 TB/s per direction. That puts the aggregate about equal to the RX 7900 XTX's MCD <-> GCD bandwidth. However, Apple's M1 Ultra had rendering performance a couple tiers below the RTX 3090-level they were aiming for. So, I don't know if you could call it a terribly effective solution. It definitely scaled performance beyond that of a M1 Max. I wish I knew more about just how well it did scale.
Anyway, let's say I've stopped holding my breath at this point.
Just because it wasn't feasible with yesterday's technology doesn't mean it can't or won't happen. Chiplet and substrate technology is continually improving. Nvidia finally went to multi-die, for the first time, with Blackwell. AMD first did this with the MI200 series, and doubled down on chiplets in the MI300 series.
The RX 7000 series was only AMD's first effort at using chiplets for client GPUs. Every time you try something new and difficult, there are things you learn that enable improvements in the next go-around. I think the tech just isn't quite there, yet. That doesn't mean it won't get there.
BTW, I think GPUs are more forgiving of defects than CPUs, which makes chiplets less of a win for them. In fact, the main reason AMD gave for using chiplets in the RX 7000 series was to take advantage of cheaper nodes for IO and SRAM, which don't scale down well to N5 and below. So, it was mainly about trying to offer more performance per $, as opposed to way more performance in the absolute sense. Maybe, if their GCD had been nearly as large as the RTX 4090's die, we'd be singing a different tune about the RX 7900 XTX.