News Edward Snowden slams Nvidia's RTX 50-series 'F-tier value,' whistleblows on lackluster VRAM capacity

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
The goal is to leave memory management of the GPU to the OS and have the game developers focus just on their game without having to bother trying to manage platform resources
No, DX12 open the door to developers managing just about every aspect of the GPU. Games like MSFS 2020 didn't implement full DX12 support because they worked thru a WASM layer (and it showed if you enabled DX12). Any time you add a layer to direct access to a GPU, you slow things down, unavoidable. Why Microsoft decide to go thru a WASM layer for a "game" is beyond me ... maybe they believed content providers were going to secretly access a consumer's PC with their "add-on" so felt the need to isolate the game (which still really isn't fully isolated).
 
As far as efficiency, AMD leads the way. nVidia 5090 600-700Watt load consumption is double that of early AMD 9070XT power consumption leaked reports with the 9070XT showing performance both in raster and RT around the same as nVidia 5080.
In order to compare efficiency between hardware architectures, you need to look at not just one datapoint, but the whole perf/W curve. For Blackwell, the Ada (RTX 4000) data should mostly apply, since the process node and microarchitecture between the two are basically the same. Fortunately, Jarred did some good testing on that and found the RTX 4090 to be supremely efficient, especially if you just dial it back a little bit. However, note that at 100% power, it's not even using 400 W!

wRtTjvWsSDtBbr8xF6r7Um.png


https://www.tomshardware.com/news/improving-nvidia-rtx-4090-efficiency-through-power-limiting

@JarredWaltonGPU , did you do similar testing for the RX 7900 XTX, by any chance?

You think AMD couldn't just double the power requirements and come up with a GPU that performs the same as a 5090?
If you're talking about running the same RX 9070XT die at double the power, then no. Power scaling is very nonlinear. If you're starting at the max non-OC power limit and you double that, it might be good for only like 20% more performance.

nVidia's stock price is free falling downwards, this is clearly a BIG mistake for nVidia on several fronts.
Like Icarus, fly too close to the sun and you're gonna get burnt. That said, Nvidia's stock won't crash, it'll just return back to some semblance of sanity. It's nothing to do with RTX 5000 and everything to do with their datacenter AI products, BTW. That's where they get most of their revenue, these days, and where investors were anticipating the stratospheric growth to continue.

Micron make 32Gb GDDR7.
Oh really? I found this datasheet which mentions only a 16 Gb chip.

Their product page also lists only 16Gb:

GDDR7 comes with a 70% increase in thermals ... these air cooled cards just aren't going to cut it and also why those few that actually got a 5090 are reporting system lockups after 1-3 hours of continuous use. "right?" yeah ok whatever.
First, the "thermals" must be a function of power, which is a function of data rate. So, specifically which data rates are you comparing? Also, the RTX 4090 used GDDR6X - are you comparing against that or regular GDDR6?

Second, the above datasheet claims it's 44.4% more efficient (i.e. in terms of pJ/b). In the comparison, they seem to imply they're comparing 18 Gbps GDDR6 vs. 32 Gbps GDDR7. When I apply that to the data rates given, the GDDR7 should be using on 23.1% more power, not 70%!

Speaking of data rates, the RTX 5090 runs its memory at a modest 28 Gbps, whereas the RTX 5080 uses 30 Gbps. FWIW, the RXT 4090 ran its GDDR6X at 21 Gbps, but because it uses PAM4 (unlike regular GDDR6), the efficiency spec for GDDR6 isn't directly applicable. The info I could find for Micron's GDDR6X doesn't give such an efficiency stat.

So, I'd love to know more about this 70% stat, if you'd kindly enlighten us.

Workstation cards are an entirely different market and you're paying for specific application support/drivers, not so much the hardware and memory. Not to mention the memory bandwidth is actually lower on the W7900 as it uses ECC GDDR6 where as the 7900XTX does not use ECC.
Workstation cards typically have lower power limits and correspondingly reduced clocks. Performance is therefore a little lower, but not much. In the case of memory bandwidth, we're talking 864 GB/s for the Radeon Pro W7900 vs. 960 for the RX 7900 XTX.

As for memory capacity, that's absolutely a differentiator and probably one of the reasons for the lower memory clock. I'm pretty sure they do in-band ECC, and that's probably not being counted in the above data, because that discrepancy follows directly from the corresponding clocks of 18 Gbps vs. 20 Gbps for the gaming card.

Anyway, the whole workstation thing is a mere footnote. Like I said, both AMD and Nvidia do it, so I think it's not worth dwelling on.
 
Last edited:
Games like MSFS 2020 didn't implement full DX12 support because they worked thru a WASM layer (and it showed if you enabled DX12). Any time you add a layer to direct access to a GPU, you slow things down, unavoidable. Why Microsoft decide to go thru a WASM layer for a "game" is beyond me ...
I'd guess it was probably to have a single executable image for both x86-64 and ARM. WASM (Web Assembly) relies on just-in-time compilation to translate an executable to native machine code. It's like a more modern version of Java bytecode. As the name implies, it was developed first and foremost for web applications.

maybe they believed content providers were going to secretly access a consumer's PC with their "add-on" so felt the need to isolate the game (which still really isn't fully isolated).
Isolation could be a reason. WASM typically runs in a sort of VM (again, smells mighty similar to the old Java VM model).
 
No, DX12 open the door to developers managing just about every aspect of the GPU. Games like MSFS 2020 didn't implement full DX12 support because they worked thru a WASM layer (and it showed if you enabled DX12). Any time you add a layer to direct access to a GPU, you slow things down, unavoidable. Why Microsoft decide to go thru a WASM layer for a "game" is beyond me ... maybe they believed content providers were going to secretly access a consumer's PC with their "add-on" so felt the need to isolate the game (which still really isn't fully isolated).

No,

Seriously, no. Neither DX12 nor Vulkan allow you to directly manage GPU memory, you can merely ask the OS but it's not obligated to listen. Contrary to what you think, neither DX12 nor Vulkan manage hardware, they are whats known as User Mode Drivers (UMD).

https://learn.microsoft.com/en-us/w...a-and-later-display-driver-model-architecture

Every OS after Windows 98 creates a virtual display device, essentially a highly abstracted GPU that the programs target for their rendering. Each program gets it's own virtual device and it's the responsibility of the display manager to keep them separate and manage their resources. The programs would execute high order commands that the pipeline and drivers would then translate into hardware specific commands, this translation consumes CPU cycles.

The whole "closer to the hardware / etc.." aspect of DX12 and Vulkan is that the commands available for that abstracted GPU are much closer to the native commands available on that hardware. This means the CPU has to spend less time translating and those (more) native commands allow for higher parallelism.

The CPU isn't in real mode running DOS with you executing machine instructions and sending blocks of code directly into the GPU's memory. There is still an abstracted virtual GPU that you are rendering against, the OS driver framework is still managing the resources on that card. You can see this by just launching two programs that use DX12/Vulkan simultaneously. Or just use DXVK like I do and have previously DX9 and 11 programs running on Vulkan. Or put another way, under no circumstance will a user mode program have direct access to kernel mode resources likes a display device, it's registers, it's IO ports or it's memory. Everything must go through the abstraction layer that is the boundary between user mode and kernel mode. For Microsoft this is called Windows Display Driver Model (WDDM), for Linux this is Direct Rendering Manager (DRM) and Graphics Execution Manager (GEM). Both of these models map GPU VRAM segments into regular memory space, meaning from the programs point of view there is zero difference between system ram and gpu vram, it's just another memory address.
 
Last edited: