razor512
Distinguished
The main issue there is that the PCIe bus becomes a large bottleneck, and the performance hit is often large, especially for workloads that already have a high PCIe bus usage when not using shared memory. Even if the extra memory is on another PCIe slot instead of the system RAM, it would need to traverse the PCIe bus and that bandwidth is far more limited.That is very close to the point where those GPUs could just access CPU or CXL RAM, which they typically can already in newer CUDA variants. That would be too much effort of too little gain.
I'd just be happy if you could buy additional VRAM at linear pricing but GPU vendors see that as a chance to segment the market between hobby and professional use and change accordingly.
And I guess GPU vendors are under contractual obligations not to sell high VRAM capacity consumer GPUs, so we can only dream on.
For example, due to frequent stable diffusion and other workloads where many models will have a few stages where the card would run out of available VRAM. Many devs did not want to enable shared memory, as they felt the slowdown was too much, but Nvidia implemented a workaround that will use shared memory even on older builds that refused to use it in the past. https://nvidia.custhelp.com/app/ans...~/system-memory-fallback-for-stable-diffusion
While the performance hit is large, it meant that higher res generations and larger models could be used, and unlike in the past when the response was that it was too slow, many users became fine with a render going from with a render going from 40-60 seconds, to 10-15 minutes, if it meant that they could output at 4K without it stopping with out of memory errors.
If the card has 2 DDR5 RAM slots then the PCIe bus does not need to pull double duty, instead there will simply be spillover to a 2nd pool that runs at over 90GB/s.