News GPUs can now use PCIe-attached memory of SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
That is very close to the point where those GPUs could just access CPU or CXL RAM, which they typically can already in newer CUDA variants. That would be too much effort of too little gain.

I'd just be happy if you could buy additional VRAM at linear pricing but GPU vendors see that as a chance to segment the market between hobby and professional use and change accordingly.

And I guess GPU vendors are under contractual obligations not to sell high VRAM capacity consumer GPUs, so we can only dream on.
The main issue there is that the PCIe bus becomes a large bottleneck, and the performance hit is often large, especially for workloads that already have a high PCIe bus usage when not using shared memory. Even if the extra memory is on another PCIe slot instead of the system RAM, it would need to traverse the PCIe bus and that bandwidth is far more limited.

For example, due to frequent stable diffusion and other workloads where many models will have a few stages where the card would run out of available VRAM. Many devs did not want to enable shared memory, as they felt the slowdown was too much, but Nvidia implemented a workaround that will use shared memory even on older builds that refused to use it in the past. https://nvidia.custhelp.com/app/ans...~/system-memory-fallback-for-stable-diffusion

While the performance hit is large, it meant that higher res generations and larger models could be used, and unlike in the past when the response was that it was too slow, many users became fine with a render going from with a render going from 40-60 seconds, to 10-15 minutes, if it meant that they could output at 4K without it stopping with out of memory errors.

If the card has 2 DDR5 RAM slots then the PCIe bus does not need to pull double duty, instead there will simply be spillover to a 2nd pool that runs at over 90GB/s.
 
While the performance hit is large, it meant that higher res generations and larger models could be used, and unlike in the past when the response was that it was too slow, many users became fine with a render going from with a render going from 40-60 seconds, to 10-15 minutes, if it meant that they could output at 4K without it stopping with out of memory errors.

If the card has 2 DDR5 RAM slots then the PCIe bus does not need to pull double duty, instead there will simply be spillover to a 2nd pool that runs at over 90GB/s.
Well, your GPU is currently just PCIe 4.0. So, just upgrading the GPU to PCIe 5.0 could probably cut that to 5 - 7.5 minutes. Compare that to putting local DDR5 DIMMs, which would only get you down to about 3.5 - 5 minutes. That's still massively slower than 40 to 60 seconds, because GDDR5 is like 1 TB/s.

So, we see that upgradable DIMMs really aren't a great solution to this problem. The performance disparity is still simply too large.