News Nvidia DLSS 3.5 Tested: AI-Powered Graphics Leaves Competitors Behind

tamalero · Sep 23, 2023

randyh121 said:
AMD has finally lost the war.

NVIDIA is on top.

It's over.

did you ever see the reviews?
I highly recommend because the technology as a preview is very good.. but it has way too many bugs and it can only run on the path traced version only.
So only 4080 and 4090's can run it decently.

hotaru.hino said:
PhysX was already a hard sell back then. It was $200 for the card, only three games supported it at the time and none of them really spectacular blockbusters either. And even then, all it provided was more clutter (that disappeared) in certain effects, none of which had any real improvement to gameplay.

Considering you could get most of your interactive physics through Havok (if Half-Life 2 was any indication of this) and Crysis did all sorts of things with destructible environments without needing a PPU or GPU accelerated physics, the writing was kind of on the wall that having dedicated hardware just for physics wasn't going to take off.

I think even today games that have hardware accelerated physics use it only to show more clutter or better simulate fluttering cloth/clothing.

or fluids.
Batman's smoke effects looked so good on physX

Lucky_SLS · Sep 23, 2023

randyh121 said:
AMD has finally lost the war.

NVIDIA is on top.

It's over.

Nah, AMD is just always late to the party. Take freesync and G sync monitors for example.

It's only a matter of time before both FSR becomes the popular tech as it is available for all hardware.

Now the implementation of it will sure take time as not all gpu have tensor cores, aka not all gpu are same hardware wise. But it will happen slowly.

bit_user · Sep 23, 2023

Lucky_SLS said:
Now the implementation of it will sure take time as not all gpu have tensor cores, aka not all gpu are same hardware wise. But it will happen slowly.

RDNA3 is AMD's first consumer GPU to have the equivalent of tensor cores. So does Intel's Alchemist series. This is why I think a generic equivalent of DLSS is theoretically possible.

What I think is more likely is that Microsoft extends Direct 3D to provide an abstraction over upscaling, so that games can be written generically and benefit from whatever vendor-specific technology sits underneath. I'm not guaranteeing it'll happen, but I'm sure game developers are lobbying for it and Microsoft is at least thinking about it.

hotaru.hino · Sep 23, 2023

tamalero said:
Batman's smoke effects looked so good on physX

Which funny enough NVIDIA had a smoke simulation demo that doesn't require PhysX. I also believe STALKER Clear Sky or Call of Pripyat had reactive smoke effect thing too.

P.Amini · Sep 24, 2023

Amdlova said:
Long Long time Ago you can combine Ati graphics with nvidia graphics to use likely a physX card.
Long time Ago you can slave a nvidia card... and the fps in games are insane... have paired a 4870 with a slaved 8600GT... Good times Before nvidia Locked that in drivers...

Don't have good games with that tech, but surely nvidia don't waste any time to buy IT.

I had two 6950 in SLI and a 560Ti as the PhysX card and the motherboard was an MSI Z68A-GD80 with 3 PCI-E X16 slots.

Egladios · Sep 24, 2023

I have the suspicion that it is for the military, only paid by gamers!

JarredWaltonGPU · Sep 24, 2023

bit_user said:
RDNA3 is AMD's first consumer GPU to have the equivalent of tensor cores. So does Intel's Alchemist series. This is why I think a generic equivalent of DLSS is theoretically possible.

What I think is more likely is that Microsoft extends Direct 3D to provide an abstraction over upscaling, so that games can be written generically and benefit from whatever vendor-specific technology sits underneath. I'm not guaranteeing it'll happen, but I'm sure game developers are lobbying for it and Microsoft is at least thinking about it.

Calling AMD's AI accelerators tensor cores still feels like a bit of a stretch. They're a step up from RDNA 2 for FP16/INT8 throughput, but they're not concurrent (meaning, if you're doing GPU shader code and then want to use the AI accelerators, you have to switch modes).

AMD doubled the compute per CU in RDNA3, and that overlaps the AI compute being doubled. There's an optimization in that code that uses the AI accelerators can do a single instruction that will do double the work — so the data transferred is still the same, but one less instruction to load up the execution units.

I'm hoping RDNA4 will do the fully separate (concurrent) Shaders + RT + AI/Tensor for AMD. That would bring them up to parity with Ampere and Ada and make them more versatile. The AI cores should also get double or quadruple the throughput again (relative to RDNA3) to close the gap. Adding support for sparsity would be useful.

bit_user · Sep 25, 2023

JarredWaltonGPU said:
Calling AMD's AI accelerators tensor cores still feels like a bit of a stretch. They're a step up from RDNA 2 for FP16/INT8 throughput, but they're not concurrent (meaning, if you're doing GPU shader code and then want to use the AI accelerators, you have to switch modes).

Unless Nvidia changed how their Tensor "Cores" work from the Turing era, they're not true "cores", either. They tensor-product units that are fed by the same SIMD registers as the CUDA cores, using instructions in the same warp.

Also, I never said anything about the upscaling happening concurrently. It obviously can't, if it needs the entire frame as input.

JarredWaltonGPU said:
I'm hoping RDNA4 will do the fully separate (concurrent) Shaders + RT + AI/Tensor for AMD. That would bring them up to parity with Ampere and Ada and make them more versatile. The AI cores should also get double or quadruple the throughput again (relative to RDNA3) to close the gap. Adding support for sparsity would be useful.

My post wasn't anything to do with AMD reaching "parity" with Nvidia. All I was saying is that Microsoft should device an abstract "Upscaler" facility in Direct3D, which games can instantiate at the end of their pipeline. Somehow, Direct3D or the game would need to provide it with all the required inputs, but if it's abstracted, then you could at least hypothetically get the best upscaling your particular hardware can provide.

JarredWaltonGPU · Sep 25, 2023

bit_user said:
Unless Nvidia changed how their Tensor "Cores" work from the Turing era, they're not true "cores", either. They tensor-product units that are fed by the same SIMD registers as the CUDA cores, using instructions in the same warp.

Nvidia is always a bit cagey about the exact details (just like AMD and Intel), but Ampere changed the Tensor cores relative to Turing. With Turing, I believe RT and Tensor blocked Shader use. With Ampere, all can be done concurrently, at least to varying degrees.

So with DLSS2 on Ampere, the initial frame rendering gets finished and the GPU loves onto rendering the next frame, while at the same time upscaling the previous frame. Or at least, I remember Nvidia talking about it being possible to do that. I mentioned in this article on DLSS 3.5 how it was interesting to see how relatively poorly the RTX 2080 Ti did, and guessed it’s the lack of concurrent RT and Tensor workloads.

I’d have to go dig back through the various white papers to see what exactly they say, but there are definitely differences over the past three architectures in terms of throughput and ability to issue different instruction mixes. I know certain units are shared, others are not, and I’m not sure on the breakdown. But based on what we’ve seen from RDNA3, I think AMD also has certain shared elements that hinder performance of certain types of work.

Bamda · Sep 25, 2023

I have said this same thing so many times now. And yet all the Jarreds of the world don't get is the majority of games don't use Ray Tracing. Until that fact changes RT is just a nicety to use in a few games, in fact, RT is used in less than .01 percent of games made in the last few years, not considering the games from years past. Of all the games I play on a regular basis none of them you RT. I own about 40 RT games, well my collection of games far exceeds that number and yes I do try to play when all. And yet we always come back to one game Cyberpunk 2077 like if you don't own that game you are dead inside or worse.

Lucky_SLS · Sep 25, 2023

Bamda said:
I have said this same thing so many times now. And yet all the Jarreds of the world don't get is the majority of games don't use Ray Tracing. Until that fact changes RT is just a nicety to use in a few games, in fact, RT is used in less than .01 percent of games made in the last few years, not considering the games from years past. Of all the games I play on a regular basis none of them you RT. I own about 40 RT games, well my collection of games far exceeds that number and yes I do try to play when all. And yet we always come back to one game Cyberpunk 2077 like if you don't own that game you are dead inside or worse.

You just did not understand the context here. This is purely a demonstration benchmark of DLSS 3.5

We have average FPS benchmarked on RT games separately here:

GPU Benchmarks and Hierarchy 2023: Graphics Cards Ranked

We've run hundreds of GPU benchmarks on Nvidia, AMD, and Intel graphics cards and ranked them in our comprehensive hierarchy.

www.tomshardware.com

Now why are we discussing RT and upscaling in general? it has already set the precedent to become industry standard and all the modern game engines provide the feature. We are mainly discussing how the tech can be implemented in a hardware agnostic way. Cuz that is indeed the right way to move forward.

Take G-sync for example - few years back, everyone was cross checking and verifying if it was G sync validated. Now Nvidia moved to the G sync compatible and G sync verified badging. People hardly check for G sync certification when buying monitors.

In a similar way, give it a few more years for DLSS and FSR to become hardware agnostic and we just have to see if the tech is included as a feature in the game.

hotaru.hino · Sep 25, 2023

Bamda said:
I have said this same thing so many times now. And yet all the Jarreds of the world don't get is the majority of games don't use Ray Tracing.

So? It's a given that a majority of games won't use the latest and greatest tech. You may as well go back to 2013 and say that a majority of games don't use DirectX 11. And most 3D rendering tech takes a long time to adopt into the mainstream.

One time I looked into the evolution of people's opinion on SSAO, which is practically a standard feature in games today. It started off as "leave it off, it drops performance too much" until about 6-7 years later when the tune changed to "leave it on, the image quality improvement is worth the slight performance hit"

Bamda said:
Until that fact changes RT is just a nicety to use in a few games, in fact, RT is used in less than .01 percent of games made in the last few years, not considering the games from years past. Of all the games I play on a regular basis none of them you RT. I own about 40 RT games, well my collection of games far exceeds that number and yes I do try to play when all. And yet we always come back to one game Cyberpunk 2077 like if you don't own that game you are dead inside or worse.

I mean, is this any different than when we all went back to Crysis way back when?

Cyberpunk 2077 keeps getting mentioned because it's the flagship title for ray tracing and CD Projekt Red seems more than happy to keep throwing money into it. Most other developers would just release a game and call it a day.

bit_user · Sep 25, 2023

JarredWaltonGPU said:
Nvidia is always a bit cagey about the exact details

Anandtech did a deep dive on this, back when Volta launched. Here's the key bit you need to know:

For each sub-core, the scheduler issues one warp instruction per clock to the local branch unit (BRU), the tensor core array, math dispatch unit, or shared MIO unit. For one, this precludes issuing a combination of tensor core operations and other math simultaneously. In utilizing the two tensor cores, the warp scheduler issues matrix multiply operations directly, and after receiving the input matrices from the register, perform 4 x 4 x 4 matrix multiplies. Once the full matrix multiply is completed, the tensor cores write the resulting matrix back into the register.

...

After the matrix multiply-accumulate operation, the result is spread out in fragments in the destination registers of each thread.

Source: https://www.anandtech.com/show/12673/titan-v-deep-learning-deep-dive/3

So, it breaks the SIMD model, harnessing the warp + registers used to drive & feed the normal CUDA cores.

AMD publishes details of its GPUs on GPUOpen.

AMD RDNA™ 3 Instruction Set Architecture (ISA) reference guide is now available - AMD GPUOpen

The AMD RDNA™ 3 ISA reference guide is now available! The ISA guide is useful for anyone interested in the lowest level operation of the RDNA 3 shader core.

gpuopen.com

I haven't gone through it, yet.

AFAIK, Intel doesn't publish such programming details of their GPUs, so you'd have to piece it together by analyzing the source code of their open source driver & deep learning framework code.

JarredWaltonGPU said:
So with DLSS2 on Ampere, the initial frame rendering gets finished and the GPU loves onto rendering the next frame, while at the same time upscaling the previous frame. Or at least, I remember Nvidia talking about it being possible to do that.

The thing is, you could already do that, in Turing. For most of the rendering process, shader occupancy isn't 100%. It's just that, if the tensor cores are truly independent from the CUDA cores, then you'd be able concurrently use the two with less interference. The downside is that you'd have more dark silicon, most of the time - it'd be less area-efficient, because it's rare that you'd be concurrently driving both near peak-occupancy.

As for pipelining, I even heard about one game that overlaps rendering of 4 consecutive frames, in order to achieve good shader occupancy and the highest framerates. I assume it was probably more like a RTS game, because 3 extra frames of latency sounds horrible for a twitchy FPS game.

JarredWaltonGPU said:
I mentioned in this article on DLSS 3.5 how it was interesting to see how relatively poorly the RTX 2080 Ti did, and guessed it’s the lack of concurrent RT and Tensor workloads.

Turing was also made on 12 nm and the first iteration of RT and but a small revision to Volta's Tensor cores. So, I wouldn't expect it to perform very well.

Bamda · Sep 26, 2023

hotaru.hino said:
So? It's a given that a majority of games won't use the latest and greatest tech. You may as well go back to 2013 and say that a majority of games don't use DirectX 11. And most 3D rendering tech takes a long time to adopt into the mainstream.

One time I looked into the evolution of people's opinion on SSAO, which is practically a standard feature in games today. It started off as "leave it off, it drops performance too much" until about 6-7 years later when the tune changed to "leave it on, the image quality improvement is worth the slight performance hit"

I mean, is this any different than when we all went back to Crysis way back when?

Cyberpunk 2077 keeps getting mentioned because it's the flagship title for ray tracing and CD Projekt Red seems more than happy to keep throwing money into it. Most other developers would just release a game and call it a day.

All those pretty words change nothing about the fact that games are made using Rasterization. It is the king of gaming and that fact will not change for a while, regardless of what Nvidia and anyone else tells you. Yes, I understand we need to move forward, I am all for that but RT isn't the end-all technology in gaming just yet.

Bamda · Sep 26, 2023

Lucky_SLS said:
You just did not understand the context here. This is purely a demonstration benchmark of DLSS 3.5

We have average FPS benchmarked on RT games separately here:

GPU Benchmarks and Hierarchy 2023: Graphics Cards Ranked

We've run hundreds of GPU benchmarks on Nvidia, AMD, and Intel graphics cards and ranked them in our comprehensive hierarchy.

www.tomshardware.com

Now why are we discussing RT and upscaling in general? it has already set the precedent to become industry standard and all the modern game engines provide the feature. We are mainly discussing how the tech can be implemented in a hardware agnostic way. Cuz that is indeed the right way to move forward.

Take G-sync for example - few years back, everyone was cross checking and verifying if it was G sync validated. Now Nvidia moved to the G sync compatible and G sync verified badging. People hardly check for G sync certification when buying monitors.

In a similar way, give it a few more years for DLSS and FSR to become hardware agnostic and we just have to see if the tech is included as a feature in the game.

When that technology is in every game I purchase then it will be relevant otherwise it's a niche feature at best.

geogan · Sep 26, 2023

Quality over quantity... I prefer to have fewer *quality* pixels than more inaccurate poor quality pixels. I don't need 4 million poor quality pixels repeating 160 times a second.

bit_user · Sep 27, 2023

Bamda said:
When that technology is in every game I purchase then it will be relevant otherwise it's a niche feature at best.

Most people don't buy a graphics card only to play games that are already out at the time they buy it. Rather, the expect it to play games released in the years to come.

In the future, RT will be increasingly important, especially if you're someone who likes to turn up the details and quality settings.

xmod · Sep 27, 2023

Elusive Ruse said:
You know this is just a marketing schtick to sell you overpriced GPUs when only 4080 and 4090 give you viable frame rates with DLSS at 1440p.

People said the same thing about ray tracing and RTX cards. Moore's law has been dead for years. AI is the best way forward for fancier effects and higher frame rates.

xmod · Sep 27, 2023

geogan said:
Quality over quantity... I prefer to have fewer *quality* pixels than more inaccurate poor quality pixels. I don't NEED 4 million poor quality pixels repeating 160 times a second.

Gaming is a leisure activity. Gaming is about wants, not needs. Of course people would want fancy effects and 165fps if given the choice between that and muted effects and 60fps. The real question is at what cost.

Photonfanatic · Sep 27, 2023

JarredWaltonGPU said:
Games and how they look are an interesting category of graphics. Sometimes, totally incorrect lighting can be perceived as "better" — look at the low/medium screenshots on TPU. For people used to the way rasterization looks, some scenes might be deemed more atmospheric. Add in path tracing, or even partial ray tracing, and some of those scenes end up being a lot brighter and more lit up.

Agree completely. The more of these kinds of articles I read, the more I think "What's the difference"?

I'm not seeing some notable, big improvement. Just a slight change. Thus as a gamer, I do not care. You'll have to do better than that, Nvidia.

Lucky_SLS · Sep 27, 2023

^ this is why i hope Nvidia's Remix gets a good reception. Take Skyrim for example. Mods FTW!

News Nvidia DLSS 3.5 Tested: AI-Powered Graphics Leaves Competitors Behind

Distinguished

Glorious

Titan

Glorious

Reputable

Reputable

Splendid

Titan

Splendid

Distinguished

Glorious

Glorious

Titan

Distinguished

Distinguished

Distinguished

Titan

Prominent

Prominent

Glorious

Share this page