I carefully explained why this isn't nearly as huge a thing as many might be led to believe and you refuse that by arguing that that can't be true, because you're doing it.
But you fail to realize that your use case isn't what a TH audience would expect, while I patiently try to explain that the use cases they are more likely to expect, won't be supported the way they do,
because it's impossible to do so.
In the case of the A40s each VM is a Windows desktop complete with full DX11 3D acceleration. This is used to provide acceleration to a host of applications that are designed to run on real GPUs.
DX11 should be a first hint, because that's not DX12 or CUDA.
NVIDIA vGPU Software (RTX vWS, vPC, vApps) | NVIDIA
www.nvidia.com
Inside vmware we installed the nvidia vGPU drivers then configured Horizon to spin the desktops with the appropriate Windows vGPU driver.
Currently the only vGPU drivers that exist are for data center GPUs. Installing a commodity 4080 or 9070 XT into a server won't matter because there is no hypervisor vGPU driver nor a guest vGPU driver that could use it. AMD releasing theirs open source means the FOSS community can find ways to incorporate commodity AMD GPUs into it allowing for cheap desktop virtualization.
I had to look up our system RQ and I believe we are using A40 cards. They are the server headless add in GPUs. Here's is the documentation for driver support.
https://docs.nvidia.com/vgpu/13.0/grid-vgpu-release-notes-vmware-vsphere/index.html
So I've looked up your references and they pretty much confirm my theories.
You can use three basic ways to virtualize GPUs:
1. partition with pass-through of the slices
2. time-slice
3. new accelerated abstraction
Option #3 is what is historically perhaps the oldest variant, where Hypervisor vendors implement a new abstract 3D capable vGPU, which is using host GPU facilities to accelerate 3D primitives inside the VMs. I believe Nvida called that vSGA, KVM has a Vir-GL, VMware has its own 3D capable VGA, Microsoft has RemoteFX, and there is the VirtualGL I already mentioned.
I'm sure there is more but the main problem there remains that those abstractions can't keep up with the evolution of graphics hardware or overcome the base issues of context switch overhead and VRAM pressure. They lag far behind in API levels and performance, while in theory they can evenly distribute the resources of the host GPU among all VMs.
Option #2 and option #1 are the classic alternatives on how you slice and dice any multi-core computing resource, for a single core #2 is the only option and used in CPUs pretty much from the start. But as I tried to point out, the overhead of time-slicing is dictated by the context size, a few CPU registers are fine, the register files of thousands of SIMD GPU cores measure in megabytes and context switch overhead grew with the power of the GPUs: what appeared acceptable at first, might even still work for vector type CAD work, that doesn't involve complex shaders or GPGPU code may no longer be practical for modern API interactive (game) use. And then you'd still have to partition the VRAM, because even if paging of VRAM is technically supported, the performance penalty would be prohibitive for anything except some batch type HPC.
And that leads to option #1 as the most typical implementation in cases like cloud gaming.
But that isn't "huge", actually what each VM would get can only be
smaller slices. And that won't be very popular when most people here actually yearn for the bigger GPU.
Long story short: when a vendor like AMD is giving something away for free, it's mostly because they can't really monetize it for much as it has low value. If there is a chance to make a buck they are much inclined to take it, even if sometimes they'll forgo a penny if they can make the competition lose a dime with it.
But here it's mostly marketing shenanigans and trying to raise misleading expectations, I don't want people to fall for.
If you didn't, that's great. But I had the impression that you were actually helping AMD in pushing a fairy tale, even if not intentional.