News VRAM-friendly neural texture compression inches closer to reality - enthusiast shows massive compression benefits with Nvidia and Intel demos

I hadn't heard of Intel's take on texture compression, but that gives me a lot of hope. Despite owning an Nvidia GPU (probably going to be my last one tbh based on how the company has acted over the last few years and AMD closing the gap) I thought this was a proprietary technique that relied on Nvidia hardware based on Jensen's comments whilst speaking about it at GDC 2025, but to know that it's something potentially every GPU company can do seems like a huge boon for the industry.

I have lamented texture issues since the days of Unreal Engine 3, with some games opting to go for low-fidelity textures to prevent pop-in from needing to load large textures taking time, and other games deciding they can barely be bothered to compress at all which leads to ludicrous install sizes. It would be nice if we could get games under 80GB again without sacrificing quality or causing performance issues (frankly I'd like to see them get below 50GB again, but I'm not holding my breath)
 
Both of these demos raise some interesting questions. It's noted on the Intel T-Rex demo that the texture pass time (on an RTX 5090?) increases from 0.045 ms to 0.111 ms, but we don't know how much VRAM was being used. The Nvidia demo meanwhile notes a compressed texture size that goes from 272MB down to 98MB with BTC, and further drops to 11.37MB with NTC... but then we don't get a pass time.

So what happens if a game uses even 2GB of NTC compressed textures? That should run just fine in terms of VRAM on even 8GB cards like the 5060 Ti 8GB and 5060, and potentially AMD's 9060 XT 8GB as well. But if it takes 0.111 ms for a workload that uses a paltry amount of textures — if T-Rex is anything like the Flight Helmet demo, we could be looking at less than 50MB of textures when compressed — and that's on an RTX 5090! Then what happens when we shift to 2GB of textures on an RTX 5060?

We can guess. RTX 5090 offers 5.4X more AI compute than the RTX 5060. That means potentially the same T-Rex demo that was taking 0.111 ms on the 5090 might now require 0.60 ms on the 5060. And then if we were to just guesstimate that a full game is using 40 times as much texture data as these simplistic demos, we're now talking about potentially spending 24 ms just on the texturing pass.

If you can pipeline things so that the whole engine doesn't stall while waiting for texture decompression, that would still mean at best 40-ish FPS. Drop the resolution to 1080p or even 1440p and potentially we double that performance. But again, this is just rough estimates.

I suspect there's a good reason we haven't seen any of this tech in a shipping game yet. It will take a lot of work to create the assets, both the uncompressed and NTC variants, and games will still need to work on GPUs without NTC support. In that sense, it's the same story as ray tracing yet again. Game publishers and developers are waiting for the proverbial chicken to arrive before they start building eggs into their games.
 
Pretty sure that's for market segmentation purposes. More VRAM starts to eat into the margin on cards for the AI marker.

There's that. Also planned obsolence and upselling. Thankfully people are catching on. That's probably why the 4070 has been readily available. The 4070 Ti will likely be viable for much longer than the 4070 because it has more VRAM.
 
Both of these demos raise some interesting questions. It's noted on the Intel T-Rex demo that the texture pass time (on an RTX 5090?) increases from 0.045 ms to 0.111 ms, but we don't know how much VRAM was being used. The Nvidia demo meanwhile notes a compressed texture size that goes from 272MB down to 98MB with BTC, and further drops to 11.37MB with NTC... but then we don't get a pass time.

So what happens if a game uses even 2GB of NTC compressed textures? That should run just fine in terms of VRAM on even 8GB cards like the 5060 Ti 8GB and 5060, and potentially AMD's 9060 XT 8GB as well. But if it takes 0.111 ms for a workload that uses a paltry amount of textures — if T-Rex is anything like the Flight Helmet demo, we could be looking at less than 50MB of textures when compressed — and that's on an RTX 5090! Then what happens when we shift to 2GB of textures on an RTX 5060?

We can guess. RTX 5090 offers 5.4X more AI compute than the RTX 5060. That means potentially the same T-Rex demo that was taking 0.111 ms on the 5090 might now require 0.60 ms on the 5060. And then if we were to just guesstimate that a full game is using 40 times as much texture data as these simplistic demos, we're now talking about potentially spending 24 ms just on the texturing pass.

If you can pipeline things so that the whole engine doesn't stall while waiting for texture decompression, that would still mean at best 40-ish FPS. Drop the resolution to 1080p or even 1440p and potentially we double that performance. But again, this is just rough estimates.

I suspect there's a good reason we haven't seen any of this tech in a shipping game yet. It will take a lot of work to create the assets, both the uncompressed and NTC variants, and games will still need to work on GPUs without NTC support. In that sense, it's the same story as ray tracing yet again. Game publishers and developers are waiting for the proverbial chicken to arrive before they start building eggs into their games.
The article was interesting. This tech does look amazing but you're left you with a lot of blanks to fill in as a reader. Jarred's reply was very thought provoking on the subject. It really calls into question the viability of the tech in real world games, particularly on lower tier cards. It reminds me of the tech demos you got on disk with OG Geforce 256-5 cards. Those demos looked amazing but were far from a realistic portrayal of what the cards were actually capable of handling in game.

I understand we are hitting a wall in transistor density with the death of Moore's Law which maybe further complicated with the parallel compute core count 'ceilings' also being hit as they are restrained by serial parts of code (Amdahl's law...also predicted as roughly 20K GPU cores by a dev I believe was Todd Howard < maybe > around ten years ago but I couldn't find the article to link, sorry).

My point is I get why Nvidia has to get creative to continue to increase the image quality in games but when does there come a point, regardless if Nvida or it's competitors refuse to acknowledge it, where we simply cannot increase gpu core counts further. Are we there already/close in the high end and this why we are seeing the massive push to AI? I am very curious to see where things go over the next few generations of GPUs. Are we simply going to be forced to either swallow more latency or reduce image quality/frame rates? Can we squeeze more life into these 'laws', do we need to rewrite them or are we required to take entirely new approaches to manufacuring, production and hardware/software rendering to realize further gains in picture fidelity/frame rates.

I remember thinking as a young man how far off these worries felt but I knew they would likely come to a head in my lifetime. This was back when I had a PIII single core CPU on a 250nm node and your GPU was called a 3d accelarator. In my case with 2 pixel shaders, 2 rops and 2 tmus on my Nvidia Riva TnT 2 card. We knew these laws were coming for us back then, particularly the notable death of Moore's Law but they still felt unreal at that time. Now that we are basically there I am always curious how we'll side step them as I always suspected we would. Is Nvidia's AI the answer or will it be something else... I vote (or is it hope) for something else. Because more latency seems to be in AI's anwser.
 
Last edited:
The article was interesting. This tech does look amazing but you're left you with a lot of blanks to fill in as a reader. Jarred's reply was very thought provoking on the subject. It really calls into question the viability of the tech in real world games, particularly on lower tier cards. It reminds me of the tech demos you got on disk with OG Geforce 256-5 cards. Those demos looked amazing but were far from a realistic portrayal of what the cards were actually capable of handling in game.

I understand we are hitting a wall in transistor density with the death of Moore's Law which maybe further complicated with the parallel compute core count 'ceilings' also being hit as they are restrained by serial parts of code (Amdahl's law...also predicted as roughly 20K GPU cores by a dev I believe was Todd Howard < maybe > around ten years ago but I couldn't find the article to link, sorry).

My point is I get why Nvidia has to get creative to continue to increase the image quality in games but when does there come a point, regardless if Nvida or it's competitors refuse to acknowledge it, where we simply cannot increase gpu core counts further. Are we there already/close in the high end and this why we are seeing the massive push to AI? I am very curious to see where things go over the next few generations of GPUs. Are we simply going to be forced to either swallow more latency or reduce image quality/frame rates? Can we squeeze more life into these 'laws', do we need to rewrite them or are we required to take entirely new approaches to manufacuring, production and hardware/software rendering to realize further gains in picture fidelity/frame rates.

I remember thinking as a young man how far off these worries were felt but I knew they would likely come to a head in my lifetime. This was back when I had a PIII single core CPU on a 130nm node and your GPU was called a 3d accelarator. In my case with 2 pixel shaders, 2 rops and 2 tmus on my Nvidia Riva TnT 2 card. We knew these laws were coming for us back then, particularly the notable death of Moore's Law but they still felt unreal at that time. Now that we are basically there I am always curious how we'll side step them as I always suspected we would. Is Nvidia's AI the answer or will it be something else... I vote (or is it hope) for something else. Because more latency seems to be in AI's anwser.

So this is overly pedantic but are you sure you had a 130nm PIII? Those didn't come out until after the first P4's and IIRC weren't very common despite being quite good.
 
  • Like
Reactions: atomicWAR
The question I'd posit as a user is what would you prefer more GPU power being used to save storage space and VRAM or manufacturers to just put more VRAM on the cards. I know what I'd rather have.

Comparing Blackwell to Ada it's very apparent that the only thing really improved from one to the other was AI performance. A lot of the software tricks being enabled by the new hardware are good, but I'd rather just get more performance. Aside from upscaling most of these added features seem to also add latency which makes the experience worse.

It would never happen because money, but I'd rather tensor cores were generationally fixed for consumer hardware rather than scaling with core counts. This would allow for a singular experience across the entire generation and in theory would allow for more raster/RT performance on higher SKUs.
 
I would caution against extrapolating too much from this video. For perspective, I downloaded and compiled both of these tools/demos for my own interest and even on an RTX 5070 (going from 680 Tensor Cores to 192) the pass time I got in Nvidia's NTC renderer is essentially identical (0.18 ms on the RTX 5070 vs the 0.17 that Compusemble saw on the RTX 5090). The pass time for Intel's T-Rex demo is essentially the same as well. There may be a hardware wall that this technique hits at some point, but these tools/demos aren't hitting it even on a 5070.
 
Last edited:
Jarred>In that sense, it's the same story as ray tracing yet again. Game publishers and developers are waiting for the proverbial chicken to arrive before they start building eggs into their games.

I don't see that as an equal comparison. Yes, there's a cost-benefit calculation to be made for any feature/tech to be adopted. For RT's adoption, cost in compute power was high (is still high today) relative to fairly peripheral increase in aesthetics.

For NTC, I don't know the particulars of NTC requirement, but it can't be as high as RT, and the benefit is clear--lower VRAM usage for any given level of texture. That matters much more than the aesthetics increase via RT, given that the bulk of GPUs still have 8GB today, and presumably for the next gen as well. The 8GB VRAM limit is now arguably more of a bottleneck to RT than compute power.

So, yes, assuming the tech has progressed beyond the lab demo stage, I see NTC having a faster adoption rate by game vendors than RT.


Jeff>I would caution against extrapolating too much from this video.

From your investigation, do you have an insight as to how much progress NTC has made beyond this demo? Are gaming vendors talking about it in any capacity?


PS: Glad to see Jarred continuing his participation in the forums. Also, a welcome to Jeff for his first post here. Hopefully it will be a first of many.
 
The way I see it? Will it be beneficial if the performance gains are worth it in higher resolutions.

Consider my use case: modded skyrimVR with a psvr2 and 4070TiSuper. If NTC gives better performance compared to running the game using high resolution textures. And 16gb Vram limit is real here in my use case.