Come on Anton, this title is clickbait, because you seem to suggest that this could somehow help current owners of VRAM starved GPUs. The subtitle should dispell that but by then people have jumped into the article.
Let's not forget that consumer GPUs like the RTX 4090 already struggle to get more than 20% compute capacity out of their 1TB/s VRAM for LLMs, because they need a full sequential weights pass for every token they generate. That's why HBM is so much less painful at roughly 4TB/s.
PCIe v4 is 32GB/s, v5 twice that, but that is as if you were trying to extend SSD capacities with floppies or tape.
Feed enough tape drives in parallel and you can achieve any bandwidth for sequential feeds, so yeah, you can design your workloads completely around something like that, if the scale of your problem overcomes all other considerations and you can build your own hardware including chips.
But that doesn't include your readership.
Articles like that are wonderful, add color and expand our horizon. But they should be marked as "technology outfield" or similar, so they don't just create unrealistic expectations and then disappointment.