News DirectX 12 Update Allows CPU and GPU to Access VRAM Simultaneously

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Why infer when we have data? The proof of the pudding is in the eating.

I was talking about it from the down to the MB level. Unlike system memory we can't know exactly how much VRAM is actively being used but the allocated is often not far from what is being used, and the signs a lack of VRAM is clear, as once you dip into system memory, the performance hit is extreme. In the case of the the 970 it has aged poorly compared to equivalent AMD cards, and it is entirely due to the VRAM. While with newer games wanting 9+GB is a separate issue, for a few years we were in a period where games wanted 4-7GB while the GTX 970 had the compute power to run them, the lack of VRAM meant that those games would perform poorly as soon as the shared memory amount started to climb. In those cases, the AMD cards remained very playable while the Nvidia card would drop to the unplayable range.

In the case of system memory, we get a little more detail.
pEuOKBg.png


On a side note, the overarching issue with all of this when it comes to newer videocards, is that we are seeing games where the GPU is capable of handling higher settings but the VRAM is holding it back. As more and more PS5 level games get released on the PC, we will see more cases of cards like the RTX 3070Ti requiring the user to use lower settings than the 12GB RTX 3060

And cards like the RX6800 and 6800XT getting increasingly large performance leads.
 
Last edited:
Last of Us pt1 also likes having 10-16GB of VRAM. Those "edge cases" will get increasingly more common for people who insist on maxing everything out.

The difference between having a GPU with more VRAM than it can currently make much use of in current games vs a faster GPU that has barely enough VRAM to run today's games is that once you push more VRAM usage on the GPU that actually has VRAM to spare, performance on the GPU with more VRAM will trail off proportionally with compute load per frame increase whereas performance on the "faster" GPU that has run out of VRAM will drop straight off a cliff from having to rely on system memory to cover the deficit unless you dial down details enough to avoid exceeding VRAM capacity.

You end up with a 12GB GPU (ex.: RX6700) that may average 61 fps with 55 fps lows vs a technically much faster 8GB GPU (ex.: RTX3070) averaging only 55fps with 18 fps lows. The 12GB GPU may not be setting world records but can still be considered quite playable without compromising visual quality while the "faster" 8GB GPU can be outright nauseous to play on without turning details down due to running out of VRAM. Those are TPU's actual Last of Us numbers.

Anything from the 3070 up is simply too powerful to have less than 12GB without an extremely high probability of being forced into early retirement due to running out of VRAM.

I didn't say it was meaningless, I said it was overrated.

And I do consider this an edge case - there are all sorts of problems people are having with this port. And a lot of the 16 GB cards were destroyed here as well. The 12 GB 4070 Ti gained relative to the 16 GB cards at the higher VRAM usage levels; this isn't an apples-to-apples VRAM test.

The people who obsessively must have everything maxed out all times, they're already buying new GPUs all the time. They're not hanging onto their GTX 970s for a decade. The people who will be selling an RTX 3080 10 GB in the next few years because of VRAM almost certainly would have been selling an RTX 3080 16 GB or 192 GB or whatever.

We literally do this every generation. And there's never a bloodbath. The higher VRAM cards just stick around as entry-level GPUs on the secondary market longer.
 
I didn't say it was meaningless, I said it was overrated.

And I do consider this an edge case - there are all sorts of problems people are having with this port. And a lot of the 16 GB cards were destroyed here as well. The 12 GB 4070 Ti gained relative to the 16 GB cards at the higher VRAM usage levels; this isn't an apples-to-apples VRAM test.
There is nothing overrated about running into insufficient VRAM issue in current-gen GPUs running current-gen games at the 8GB mark. Even the 12GB GTX3060 either matches or beats the 8GB RTX3070 in pt1, that is about as apples-to-apples as you can possibly ever get.

The only reason 12GB GPUs are beating slower 16GB GPUs as they would be generally expected to at the moment is because 12GB isn't a problem yet whereas 8GB clearly is one now.

8GB belongs on $180-300 GPUs, not $400+ ones.
 
Many game engines will manage VRAM differently depending on how much VRAM is available. For example Ark survival evolved will allocate more VRAM on a card with 12GB of VRAM as compared to a card with only 6GB, e.g., moving from an RTX 2060 to an RTX 3060.
And in cases of a cluttered area with lots of dinos and structures, on a 6GB card, you will encounter times when some textures start to fail to load in properly for a few seconds, and at a certain point when it has to dip into system memory, then you get a performance hit. Overall, it seems that the game engines have ways of managing a limited pool of memory to a certain extent to preserve performance, up until it reaches a point where they feel the compromises are too much and shared memory begins to be used.

If anything for this GPU generation, outside of budget cards, the minimum VRAM for everything $250 and up should be 16GB of VRAM.
Given the relative size of gaming markets for major game releases, it is reasonable to assume that the supermajority of development time will be focused on utilizing the hardware resources of the consoles as much as possible. This means that we are likely to see game designs where they minimize game engine memory usage and lean more of the direct storage capabilities to load things as needed, and dedicate 12-13GB of the VRAM to the GPU related tasks. Many mainstream cards are already meeting or exceeding the compute performance of the GPU on the major consoles, and while many factors go into gaming performance, especially with RTX 4000 and RX7000 series cards if the rumored specs are to be believed, we will effectively have more than capable GPUs compared to the consoles, but VRAM amounts that would require the use of lower settings.
 
game designs where they minimize game engine memory usage and lean more of the direct storage capabilities to load things as needed
direct storage is just decompression (zip) from GPU intead of CPU
unreal engine does texture streaming since unreal engine 1, many games have it turned off by default, including hogwarts legacy, once you enable it, memory footprint drops down, coz only texture maps are loaded and actual mips gets loaded only when needed instead of putting them all in vram
 
  • Like
Reactions: KyaraM
direct storage is just decompression (zip) from GPU intead of CPU
unreal engine does texture streaming since unreal engine 1, many games have it turned off by default, including hogwarts legacy, once you enable it, memory footprint drops down, coz only texture maps are loaded and actual mips gets loaded only when needed instead of putting them all in vram
For direct storage, while decompression is one component, another component is asset streaming as well as being able to load in other resources just in time as they are needed. https://developer.nvidia.com/blog/gpudirect-storage/


With consoles, since all of the hardware is the same, that process can be heavily optimized to a point where they can be very selective in what they load into RAM at any given time, since resources for the GPU can be loaded quickly, along with certain guarantees of storage performance from the SSD to the CPU for loading non-GPU related content.

They can load a minimum set of data and can be confident that you will not encounter performance or visual issues as other needed data can be loaded seamlessly as you need it. Aside from the GPU, the CPU related tasks have always benefitted, and with consoles having moved to fast internal SSDs, games can be developed around leveraging that performance.
On that side of things, it will be harder to implement on the PC since not everyone is running an SSD that will saturate a PCIe 4.0 x4 bus, and a game designed around leveraging that level of performance to minimize how much is loaded at any given time, will run into massive issues the moment a user with a mid range or lower SSD runs the game.

You can see these design principles in actionwhen you look at a PS5 game and the PC version of it, where the PC version will use well above 10GB of RAM just on the non-GPU related aspects of the game, while also using 12+GB of VRAM, whereas on the console, 16GB of shared memory will handle loading the game as well as all GPU related data.
 
I didn't say it was meaningless, I said it was overrated...

The people who obsessively must have everything maxed out all times, they're already buying new GPUs all the time.

I am not sure I agree here. I tend to buy every other generation. While I do like to crank everything to 11, I do recognize at some point during that four years I may have to drop some settings. That said the one setting I don't want to be dropping is texture quality unless I have to. It adds the most to visuals typically and for most gamers I think the last choice in a setting to drop. Turning down shadows, AA, killing ray tracing, post processing, etc. These are the kind of things I expect to be diminished over time, not running out of vram because I still want to play games with textures on high (extreme, what ever). Point being with the way Nvidia has frequently crippled their memory capacities in one fashion or another, being forced to turn down texture settings on a card that should otherwise be able to handle it had it not been light on ram in that four year time frame is pretty high these days. Especially if you get 60/70 class cards.

8GB simply isn't going to be enough in two years time and surely won't last four years if we are hitting edge cases now if I am sticking to the every other gen to upgrade mantra which I believe is extremely common among gamers.

While I know these bus/memory configs don't fit...the RTX 4090 should have 32GB, 4080 should have come with 24GB of vram, the 4070/Ti should have been 16GB and the 4060/Ti should have been at least 12GB. Nvidia should have hit these numbers, something closer to those numbers at least or at least offered models with these capacities.
 
  • Like
Reactions: sam buddy
Nobody needs 32 GB of VRAM for gaming. While use will naturally continue to increase, people exaggerate just how much VRAM is actually needed because an inability of many tools and analysts to differentiate between VRAM that is reserved and VRAM that is actually being utilized.

Well you can do other things than gaming with your GPU... like chatting with an AI.

I've just battled with the Llama machine learning models, I can fit the 30B model with 4-bit quantisation into the 24GB of my 3090 just so, but I'd love to see how much better the 65B model is and for that I need an 80GB A100, which is unfortunately 10x as expensive.

So actually I'd like to be able to use my CPU DRAM (128GB) from the GPU, which technically is possible AFIK, but not supported by current APIs, and perhaps too slow to be really interesting.
 
I guess technically this is nothing new: CPUs can always access GPU RAM, there is no other way to get data/code into them. ReBAR mostly means you don't have to switch segments (like you had on old VGA cards with their 64K window at 0xA0000 and then again in the 32-bit address space) and have GPU RAM fully mapped for CPU access.

The opposite should also be possible now, from what I understand GPUs for several generations can access RAM (and PCIe mapped devices) pretty much like any other SMP chip and with proper consistency support across whichever bus they happen to be connected to, NV-link or PCIe, CXL is making that more normal than ever.

Now several xPUs or even active PCIe peripherals doing DMA on the very same phyiscal address space (but via potentially distinct page tables) better be well coordinated, or instability and insecurity fester very badly.

So the only thing really new here is that the DirectX API no longer enforces exclusivity and having to flip pages for use between GPU xor CPU, but that only means that the coordination might now be delegated to an application and it's counterparts on the GPU.

I'd say it's 50% attention grabbing and 45% directing malware towards new vulnerabilities worth exploiting and 5% "why didn't they just always support that?"...
 
There are other areas where consumer GPUs would be very good with, but a lack of VRAM becomes a huge issue. For example, AI optical flow utilities use about 8-10GB of VRAM just for 720p video. Some people have been able to get those utilities to process 4K video on cards with 24GB of VRAM though it is hit or miss as it is highly dependent on the content being interpolated. 32GB of VRAM would allow for more usable optical flow on 4K video, and that is all with half precision calculations.

Newer and more complex AI processes have been increasing in their VRAM needs, especially when it comes to orientation changes such as a hand motion where there is movement while also an orientation change, and various other areas where older AI models will cause artifacts or blending issues.
Sadly many of the utilities tend to use exclusively dedicated VRAM, even though that are not memory bandwidth intensive, since after the model is loaded, the memory copy load is very low and the work is very computationally expensive. In fact on cards with slower RAM, e.g., if tried on a GTX 970, using a resolution that allocates less than 3.5GB of VRAM, and another that allocated 4GB, there is no performance drop from using the smaller 30GB/s pool of VRAM, the rate of frame generation remains the same and it scales linearly with resolution.
Ideally these utilities would just supplement a lack of dedicated VRAM with some system RAM, as people would rather a task run slowly than not at all, especially when real time performance is not needed. For example, AI art generation, if someone were to do it, but wanted to generate something high res (needs lots of VRAM and is also bandwidth intensive), not many people will worry about it taking 4 or 5 times longer because the system RAM is slower.

These are all tasks that are more likely to be done on a user's home PC with a consumer GPU and not something done professionally.
 
  • Like
Reactions: abufrejoval
...
While I know these bus/memory configs don't fit...the RTX 4090 should have 32GB, 4080 should have come with 24GB of vram, the 4070/Ti should have been 16GB and the 4060/Ti should have been at least 12GB. Nvidia should have hit these numbers, something closer to those numbers at least or at least offered models with these capacities.

Especially considering the current retail prices of those cards, and the efforts behind the scenes to maintain them.
 
  • Like
Reactions: atomicWAR