News AMD releases open-source GIM driver aimed at GPU virtualization, support for mainstream Radeon GPUs coming later

If this pans out for consumer cards, it could be pretty handy for any Linux users with AMD GPUs who have GPU-intensive apps (like games) that don't work well under Linux. AFAIK the functionality being discussed here would let you get the near-native performance of doing full GPU-passthrough into a Windows VM, without the usual drawback of having to give the VM exclusive access to the GPU (meaning you have no display from the host unless you have a 2nd GPU).
 
People need to comprehend how huge this is. Currently the only way to do GPU virtualization is to buy very expensive datacenter GPU's then load the driver within the hypervisor. It's not that desktop GPU's can't do virtualization, it's that there is no driver that supports it since both were closed source blackbox. Now with AMD open sourcing their driver that will make cheap virtualization possible.
 
  • Like
Reactions: DS426
Awesome! Good call, AMD. The sooner they can support and publish the documentation going all the way down to the consumer GPU's, the better.
 
People need to comprehend how huge this is. Currently the only way to do GPU virtualization is to buy very expensive datacenter GPU's then load the driver within the hypervisor. It's not that desktop GPU's can't do virtualization, it's that there is no driver that supports it since both were closed source blackbox. Now with AMD open sourcing their driver that will make cheap virtualization possible.
I'm not sure I follow.

First of all: GPU pass-through is generally not an issue any more, not even with Nvidia, which used to block driver installs in pass-through GPUs, but have eliminated those restrictions ever since they wanted their consumer GPUs being usable for AI. I've run Windows VMs with games and CUDA having full access to the full GPU inside a VM on Proxmox or oVirt for many years, even before Nvidia removed the restrictions, because KVM included "cheats" to circumvent them.

What's meant with "GPU virtualization" is typically slicing the GPU into smaller bits so they can be allocated to distinct VMs.

That can be useful for some workloads e.g. VDI, but the main issue is that it's really slicing and dicing the GPU into fixed size partitions, not the abilty to "multi-task" or overcommit GPU resources as if they were CPUs.

The trouble is, that while CPUs are designed for multi-tasking and rapid context switches, both on a single core and on different cores in a CPU die, that's not the case for GPUs: GPU workloads are designed to run on a fixed number of GPU cores in matchingly sized wavefronts with the collective size of register files probably approaching megabytes, rather than say the 16 registers in a VAX. So a task switch would take ages while e.g. gaming workloads on a GPU tend to be really sensitive to latencies.

But to do anything but partitioning would require the hypervisor and perhaps the guest OS to really understand the nature and limitations of GPU...which they don't today. Most operating systems treat GPUs little better than any other I/O device.

And when it comes to the ability to partition GPUs between VMs, I can't see that the ability to split an RTX 5090 into say two RTX 5080 or four RTX 5060 equivalents would be extremely popular with PC owners.

It's nice to have a feature Nvidia charges extra for without the premium and perhaps another way for AMD to kick the competition, but unless they start getting ROCm ready for business on current consumer GPUs it's little more than a PR stunt.
 
  • Like
Reactions: TJ Hooker
I've always wondered when we'll see them sold on mobile-on-desktop boards for a penny: I guess it's sooner now rather than later.

Phoenix based Mini-ITX boards with a nice 8-core 5.1GHz Ryzen 7 are currently €300 with a similarly potent 780m iGPU (not that either is really great at gaming).

And for €500 you'll get a 16 core Dragon Range, just in case you'd prefer a beefier dGPU.

Those probably aren't making big bucks for AMD either, but at least they aren't selling them at a huge loss, unlike these Intel parts, which have cost a ton to make, but can't deliver any meaningful value advantage.

Still, scrapping them would be even more expensive, so I am keeping a lookout on AliExpress.
 
I'm not sure I follow.
PCI pass-through and GPU virtualization are entirely different things. PCI pass-through just passes a physical port into a VM. GPU virtualization is when you do GPU HW acceleration inside VMs by using a host GPU. Currently this is only possible with expensive data center GPUs because there is no hypervisor driver for consumers GPUs that supports this.

We have hundreds of VDI desktops with virtualizated GPUs by putting a bunch of A100s in our APEX stack.
 
PCI pass-through and GPU virtualization are entirely different things. PCI pass-through just passes a physical port into a VM. GPU virtualization is when you do GPU HW acceleration inside VMs by using a host GPU. Currently this is only possible with expensive data center GPUs because there is no hypervisor driver for consumers GPUs that supports this.

We have hundreds of VDI desktops with virtualizated GPUs by putting a bunch of A100s in our APEX stack.
Perhaps you can enlighten us on how this works in your environment and what exactly the VMs get to see in terms of virtual hardware?

The ability to partition a GPU into smaller chunks has been around for a long time and across vendors. Even Intel iGPUs on "desktop" Xeons supported this, at least according to press releases of the time. And those partitioned vGPUs would then be passed through as distinct GPUs (smaller) GPUs to the VMs.

I actually still have a Haswell Xeon somewhere, which should be able to do that, but with an iGPU so weak I never felt like sharing it: it got a GTX 980ti instead, which had all Quadro features fused off or disabled in drivers.

That's the classic domain of Nvidia's Quadro series cards and good enough for some security sensitive CAD work, rather common in the defense sector from what I heard.

To my understanding this partitioning is what Nvidia also does for their cloud gaming with GeFroce Now.

Of course you could also add an abstraction layer which uses GPU resources on the host to accelerate GPU operations inside a VM for VDI purposes. As far as I know this has been done at one time on Microsoft Servers for RDP sessions under the name RemoteFX.

I remember playing around with that and it was complex to set up and yielded modest gains at Direct X10 abstractions, and targeting AutoCAD or similar, nothing involving modern facilities like programmable shaders.

I've also used that approach on Linux using VirtGL and run some graphics benchmarks (Univention Singularity among them) on V100 GPUs sitting in a data center 600km apart from the notebook it rendered on via VNC: since the V100 is a compute device, it was actually the only way to get graphics out of it and performanc wasn't bad.... with exclusive access for a single client.

When it comes to CUDA workloads or Direct X12 or Vulkan games with programmable shaders and massive textures, to the best of my knowledge they can't deal with dynamic or overcommitted resources, they'll ask about the number of GPU cores and their share of VRAM available on startup and expect to have those resources permanently available until they finish. They organize their wave fronts all around those GPU cores and if some of those cores were to go missing or get pre-empted for somebody elses application, that could bring entire wave fronts to a stall.

And the memory pressure on VRAM would be even worse: you can't just swap gigabytes of textures (or machine learning model data) over a PCIe bus and get acceptable performance levels.

If I'm wrong I'm happy to learn about it but I just can't see several thousand GPU cores task switch dynamically, while the near lock-step wave front nature of GPGPU code simply can't deal with those stepping out of line.
 
Perhaps you can enlighten us on how this works in your environment and what exactly the VMs get to see in terms of virtual hardware?

In the case of the A40s each VM is a Windows desktop complete with full DX11 3D acceleration. This is used to provide acceleration to a host of applications that are designed to run on real GPUs.


Inside vmware we installed the nvidia vGPU drivers then configured Horizon to spin the desktops with the appropriate Windows vGPU driver.

Currently the only vGPU drivers that exist are for data center GPUs. Installing a commodity 4080 or 9070 XT into a server won't matter because there is no hypervisor vGPU driver nor a guest vGPU driver that could use it. AMD releasing theirs open source means the FOSS community can find ways to incorporate commodity AMD GPUs into it allowing for cheap desktop virtualization.

I had to look up our system RQ and I believe we are using A40 cards. They are the server headless add in GPUs. Here's is the documentation for driver support.

https://docs.nvidia.com/vgpu/13.0/grid-vgpu-release-notes-vmware-vsphere/index.html
 
Last edited by a moderator:
I carefully explained why this isn't nearly as huge a thing as many might be led to believe and you refuse that by arguing that that can't be true, because you're doing it.

But you fail to realize that your use case isn't what a TH audience would expect, while I patiently try to explain that the use cases they are more likely to expect, won't be supported the way they do, because it's impossible to do so.
In the case of the A40s each VM is a Windows desktop complete with full DX11 3D acceleration. This is used to provide acceleration to a host of applications that are designed to run on real GPUs.
DX11 should be a first hint, because that's not DX12 or CUDA.


Inside vmware we installed the nvidia vGPU drivers then configured Horizon to spin the desktops with the appropriate Windows vGPU driver.

Currently the only vGPU drivers that exist are for data center GPUs. Installing a commodity 4080 or 9070 XT into a server won't matter because there is no hypervisor vGPU driver nor a guest vGPU driver that could use it. AMD releasing theirs open source means the FOSS community can find ways to incorporate commodity AMD GPUs into it allowing for cheap desktop virtualization.

I had to look up our system RQ and I believe we are using A40 cards. They are the server headless add in GPUs. Here's is the documentation for driver support.

https://docs.nvidia.com/vgpu/13.0/grid-vgpu-release-notes-vmware-vsphere/index.html
So I've looked up your references and they pretty much confirm my theories.

You can use three basic ways to virtualize GPUs:
1. partition with pass-through of the slices
2. time-slice
3. new accelerated abstraction

Option #3 is what is historically perhaps the oldest variant, where Hypervisor vendors implement a new abstract 3D capable vGPU, which is using host GPU facilities to accelerate 3D primitives inside the VMs. I believe Nvida called that vSGA, KVM has a Vir-GL, VMware has its own 3D capable VGA, Microsoft has RemoteFX, and there is the VirtualGL I already mentioned.

I'm sure there is more but the main problem there remains that those abstractions can't keep up with the evolution of graphics hardware or overcome the base issues of context switch overhead and VRAM pressure. They lag far behind in API levels and performance, while in theory they can evenly distribute the resources of the host GPU among all VMs.

Option #2 and option #1 are the classic alternatives on how you slice and dice any multi-core computing resource, for a single core #2 is the only option and used in CPUs pretty much from the start. But as I tried to point out, the overhead of time-slicing is dictated by the context size, a few CPU registers are fine, the register files of thousands of SIMD GPU cores measure in megabytes and context switch overhead grew with the power of the GPUs: what appeared acceptable at first, might even still work for vector type CAD work, that doesn't involve complex shaders or GPGPU code may no longer be practical for modern API interactive (game) use. And then you'd still have to partition the VRAM, because even if paging of VRAM is technically supported, the performance penalty would be prohibitive for anything except some batch type HPC.

And that leads to option #1 as the most typical implementation in cases like cloud gaming.

But that isn't "huge", actually what each VM would get can only be smaller slices. And that won't be very popular when most people here actually yearn for the bigger GPU.

Long story short: when a vendor like AMD is giving something away for free, it's mostly because they can't really monetize it for much as it has low value. If there is a chance to make a buck they are much inclined to take it, even if sometimes they'll forgo a penny if they can make the competition lose a dime with it.

But here it's mostly marketing shenanigans and trying to raise misleading expectations, I don't want people to fall for.
If you didn't, that's great. But I had the impression that you were actually helping AMD in pushing a fairy tale, even if not intentional.
 
Last edited by a moderator:
PCI pass-through and GPU virtualization are entirely different things. PCI pass-through just passes a physical port into a VM. GPU virtualization is when you do GPU HW acceleration inside VMs by using a host GPU. Currently this is only possible with expensive data center GPUs because there is no hypervisor driver for consumers GPUs that supports this.
Currently the only vGPU drivers that exist are for data center GPUs. Installing a commodity 4080 or 9070 XT into a server won't matter because there is no hypervisor vGPU driver nor a guest vGPU driver that could use it. AMD releasing theirs open source means the FOSS community can find ways to incorporate commodity AMD GPUs into it allowing for cheap desktop virtualization.
PCIe passthrough gets you GPU HW acceleration within the VM, on regular hardware with regular drivers. E.g. for my RX 6600, I pass it through to the VM, then install the regular AMD desktop graphics drivers from within the guest OS. Running ubuntu with the standard amdgpu driver on the host.
 
PCIe passthrough gets you GPU HW acceleration within the VM, on regular hardware with regular drivers. E.g. for my RX 6600, I pass it through to the VM, then install the regular AMD desktop graphics drivers from within the guest OS. Running ubuntu with the standard amdgpu driver on the host.

vGPU is not PCI pass through, it's quite literally virtualized GPU's inside guest OS's instead of the emulated SVGA video card that normally gets made. This means you can run multiple desktop OS's with GPU's running at full speed. Or you can run a regular desktop OS and have something like VMWare or VirtualBox and have a vGPU inside that runs on your host GPU at full speed.
 
vGPU is not PCI pass through, it's quite literally virtualized GPU's inside guest OS's instead of the emulated SVGA video card that normally gets made. This means you can run multiple desktop OS's with GPU's running at full speed.
Since you seem to have the means and would like to prove me wrong, why don't you go ahead and use one of your machines during a maintenance window to run a nice full depth synthetic benchmark (Furmark or LLMs) on the GPU split between 1/2/4/8..n VMs. Not something SPECS ViewPerf, that's too light a load in terms of graphics pipeline state and primitive vectors and Phong class shading models.

If the sum of all parts and the single part come together more or less equally, I'd be happy to buy you a beer and admit being a fool (I'd count partition reconfiguration as cheating, though).

But I just can't see how modern GPU can still be "time-shared" between VMs, because the context switch becomes too expensive.
Or you can run a regular desktop OS and have something like VMWare or VirtualBox and have a vGPU inside that runs on your host GPU at full speed.
And on splitting the GPU between the host and the guests, I am even more sceptical.

Not only do you need the ability to partition the GPU, you also need the ability to assign those parts into distinct PCIe IO groups, which is the minimal granularity for pass-through... well at least without shunt drivers providing abstractions and costing overhead.

Since a mix of desktop and VM use for Quadra series GPUs (and their spiritual sucessors) makes very little sense economically and in terms of security (a big motivator for VDI), I cannot fathom Nvidia spending extra transistors and/or driver effort on supporting that. That's the sort unequal workgroup setup is something that people had in the late 1970s when one would have the S-100 box running MP/M under his desk and two others would run serial terminals... Something still replicated on the earliest small Unix systems or an entry level VAX.

But today the cost of two operational models on a single piece of hardware outweigh the cost savings at scale. And only scale use cases are typically supported.

I've been running pass-through GPUs for many years now and most of them have actually been V100 used for compute and machine learning only. Orchestrating those resources in such a way that the various lab team member could use those resources with the least amount of manual coordination (or a workload manager like slurm), would have been great there, but once you try using those GPUs also for graphics or latency sensitive stuff it hardley ever makes sense to slice a bigger GPU into smaller pieces (unless your into Geforce Now type cloud gaming). There is much more interest in combining them into something bigger.

These days things like live migration for VMs with GPUs start appearing, or at least being talked about, which was previously not supported, because of the giant extra work (and small benefit). You could say they are the slowest type of load-balancing but require similar facilities and the hardware and software stack.

But a) those migrations would happen very rarely and b) transparent moving between GPU architectures and even different size slices of the same would still be next to impossible, because applications and frameworks simply can't deal with changes in the GPU hardware today. And the evolutionary pressure on making them adapt is quite a lot lower than say implementing hot-swap for PCI, CPUs and RAM, which still isn't commonplace 30 years late, although much easier than GPU virtualization by comparison.

As much as in theory I'd like to have a setup where I can run one big GPU doing full resolutions and frames while only a single player is logged on (e.g. via Steam Remote), which then gracefully and gradually drops resolution or quality as additional users start their respective VMs on the same box, I just can't see anyone making that easy, preferably with an open source stack e.g. Bazzite instances on Proxmox.

And commercial vendors are so bent on squeezing blood from customers and consumers, that they'd only make sure to sabotage anything in that direction.