News Jensen says DLSS 4 "predicts the future" to increase framerates without introducing latency

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Except for the polished version of CP 2077, Wukong and a few others, you really cant say game graphic peaked in 2024. But they again require beefy GPUs to run at max settings.

CP 2077 is an example of "supposed generation crushing game that didn't really blow me away with graphics especially for the hardware cost".

Remember the first Unreal game running on a Voodoo 1? The first time you stepped to the edge overlooking that waterfall---that, to me, is an example of getting blown away with graphics that makes you say "yeah, I'm glad I bought that card."

It may just be that advancements that big aren't possible right now.
 
  • Like
Reactions: snemarch
Upscaling made some sense, frame generation did too, but predicting the future? No chance.

Maybe if you can single out only certain parts of the frame, like characters doing an animation, yes.

I do believe there's something fundamentally wrong about using AI to try and increase the graphics quality. IMO it will never perfectly work.
 
Last edited:
I think it's purely image-based. So, it's only going to extrapolate based on what was visible in the previous frame.
No. DLSS, much like TAA, uses motion vectors, which you provide to the algorithm. That is how the can do the extrapolation of the frame using a bit of motion data for the generated new frame. If they're now doing 2 or 3 deep extrapolation*, it means they're pretty darn confident in their algorithm.

EDIT: Straight from the horses mouth: https://www.nvidia.com/en-gb/geforce/news/dlss4-multi-frame-generation-ai-innovations/

Regards.
 
I finally had a chance to read through the nvidia DLSS 4 news page and noticed this:
We have also sped up the generation of the optical flow field by replacing hardware optical flow with a very efficient AI model. Together, the AI models significantly reduce the computational cost of generating additional frames.
That would indicate to me that they've shifted to doing optical flow in the same fashion as Intel. That should also mean they could use this to rework regular frame generation and have it work on anything with tensor cores. I get that they won't because they don't need to, but to me that's just another example of how nvidia is capitalizing on market and mind share.
 
That would indicate to me that they've shifted to doing optical flow in the same fashion as Intel. That should also mean they could use this to rework regular frame generation and have it work on anything with tensor cores. I get that they won't because they don't need to, but to me that's just another example of how nvidia is capitalizing on market and mind share.
Classical optical flow algorithms tend to be rather expensive and have weaknesses where they get the wrong answer. By using a neural optical flow implementation, not only can Nvidia potentially achieve better accuracy, but they can also tune it to pick up on precisely the details (e.g. lighting effects) they want and have it disregard others. I'd guess that's what really motivated the change, but perhaps they also wanted to reclaim some die area previously used for the hardware optical flow engine.

Being able to port it to other hardware is an interesting side-benefit, but I doubt it was the main reason.
 
perhaps they also wanted to reclaim some die area previously used for the hardware optical flow engine.
DLSS 3.0 (FG) runs on the 50 series so the hardware is either still there or nvidia is an even worse company than I imagine.
Being able to port it to other hardware is an interesting side-benefit, but I doubt it was the main reason.
Oh I'm certain it never entered into the equation at all.
 
DLSS 3.0 (FG) runs on the 50 series so the hardware is either still there or nvidia is an even worse company than I imagine.
Oh, that's a good point. They have a whole Optical Flow SDK, which they must support. So, either you're right that Blackwell still contains the hardware, or else they are now faithfully emulating the older functionality on their CUDA/Tensor cores, when such APIs are in use.

Oh I'm certain it never entered into the equation at all.
Eh, maybe for things like Nintendo Switch 2 or the rumored desktop APU they're likely working on with MediaTek. They might prefer not to burn die space on hardware optical flow engines, in those chips.
 
Oh, that's a good point. They have a whole Optical Flow SDK, which they must support. So, either you're right that Blackwell still contains the hardware, or else they are now faithfully emulating the older functionality on their CUDA/Tensor cores, when such APIs are in use.
I think Blackwell still has the OFA, but it's fixed function just like on Ada. The new DLSS framegen models deliver higher perf and better quality, according to one of the videos/images from Jensen's keynote IIRC. Will it support Ampere and Turing? I would be shocked if they allow that. I think there are performance requirements and some other stuff that makes multi frame generation require Blackwell... but also I'm pretty sure that's again just locking a feature that could work on older architectures to a new architecture.

As for framegen / MFG, I was right in what I initially said. It's still using interpolation. So if anyone wants to argue that it's not... well, for now that's wrong. I definitely think extrapolation or projection or whatever you want to call it is being researched and will happen at some point. Because think about this:

You render frame 1 initially. Now, frame 2 is a special case, maybe you just skip that one frame, but after rendering two frames you now have at least some semblance of a pattern with motion vectors and such. So, take frames 1 and 2, and project where that's going and use AI to create frame 3. If rendered frame 4 continues the trend, all is well and things should look fine, and you haven't added latency.

But what if frame 4 has a major change in camera position or viewport compared to frame 2? Well, it would be just as big and noticeable of a change as from frame 2 to 4 as from projected frame 3 to 4. Which resets the pattern, but it shouldn't really look any worse than the current interpolation approach between two wildly divergent frames.

Basically, project every other frame based on the past trend (faking a trend if necessary) and then use a fast in-painting algorithm to make up the difference. Intel has said it's researching this as well. Like I said, I think this is very much a matter of "when" not "if."

And of course, multi frame projection would be much harder to pull off than a single frame projection. I don't think projecting more than one or maybe two frames is viable. Interpolating three frames, though... that's reasonably easy if the algorithm and hardware are fast enough.
 
Here's a thought experiment, showing where frame extrapolation breaks down. Imagine you're playing a first-person game of some sort. You're standing near a corner or some kind of large obstacle that someone could hide behind. If they step out from behind it, then the algorithm isn't going to know what to do in those trailing-edge pixels that are newly revealed in each successive, extrapolated frame.

You might be right that they try to do some sort of AI in-painting, but models like what Adobe uses for that are probably huge and complex, nowhere near realtime. More likely, they just smear the object or reuse previous frame's pixels at that location. Basically, you'd see this sort of ghosting effect, along that trailing edge. Worse yet, it'd probably flicker as each real frame corrects it, drawing even more attention to the artifact. At high native frame rates, the effect might be subtle enough that you wouldn't really notice, but when the native framerate is low, then it'd be very pronounced.

Similar to what I said before, I think the proper solution to this problem is just to natively rasterize and shade these areas. Assuming Nvidia still uses tile-based rasterization, they could actually do this without a ton of overhead, though you would need to reprocess the geometry for that frame. With ray-tracing, it's even easier just to shoot some assorted rays where & when you need them, although there's again the problem of not only needing to do the geometry transforms, but also building/updating the BVH structure.
 
  • Like
Reactions: Loadedaxe
TAA doesn't use optical flow and neither did DLSS until Ampere GPUs added a hardware optical flow engine. The motion vectors used by TAA and DLSS2 were analytical. What makes it possible is that you know the screen space texture coordinates of each object, so you can compute the correct motion vector, whereas optical flow is merely a guess that's based on visual similarity and can easily get confused.
Optical flow has been present in GPUs since they started adding video encoder FFBs, as producing an optical flow field is a mandatory part of all modern video CODECs (since at least MPEG2). It's exactly this FFB that is used to generate the motion vector field for ASW in VR applications.
The reason they added optical flow to the mix was to deal with hard lighting boundaries, which usually don't follow what object surface textures are doing. The combination of both techniques gives you the best of both worlds.
Using the MVec field for ASW has nothing to do with 'lighting boundaries'. It is to allow for reprojection of all moving elements of the scene regardless of whether that motion is doe to head motion or in-scene motion. In other words: if you rigid-mount a HMD and run an application using ASW, moving objects within a scene will still have smooth motion extrapolation.
Some of these VR tricks don't attempt to estimate the world state at a new time point, but merely compensate for head movement.
False. ASW accounts for in-scene object motion as a fundamental requirement. Image synthesis (to account for disocclusion from object parallax) is a fundamental requirement. The technique would not work without doing both. I linked the page explaining how it works already, but I'll do it again too.
 
to this day i find it surreal how rare it is that journalists don't mention how bad DLSS looks visually.

this isn't tech anyone who actually plays games uses. (and just because the amd version looks worse, that doesn't excuse NVIDIA for pushing this shovelware at us.)
I am for sure a gamer. I spend a lot of hours playing games. And I don't find dlss to be visually degrading at all. I have a 4090 and don't really need to use it but I do anyway because it makes the games look better when I turn it on.

It's very difficult to keep trying to make graphics more and more realistic. Look at the 4090. Do you think that creating a GPU that can do twice as good as the 4090 is going to be more power efficient? Even if you shrunk it down to two nanometers the engineering feat to do that and to create the super high performance that's needed to give two times the performance of 4090 is pretty staggering I think.

Therefore, you can't just keep going bigger. You have to bring in other technologies to mitigate some of the obstacles or challenges.

I'm not saying that I really care about frame generation or frame warping.

But dlss in general, it's a pretty darn good AA solution.
 
Jensen used a lot of words to attempt to justify not using at least 16gb VRAM on everything above the 5060 and why it's not a bad thing there is very little on paper (and in practice perhaps) difference, outside of the Titan class 5090, for the 5000 series over the 4000 series.

Perhaps their 5% stock drop today is a result of that as well.
I don't think that's why the stock dropped. Most of what the 5000 series is good for is good for servers and creators. The money makers. They just happen to be good for gaming too.
 
Would it not be possible to just.. go bigger? Physically I mean. If it's not that feasible to go smaller anymore.
Personally I'm not that much opposed to owning a bigger computer.

Unless we discover another way to compute/calculate the same things faster.

I'm still baffled they didn't go bigger with the VRAM.
 
So if this new compression technique doesn't require more vram, why does the 5090 have 8gb more then the 4090? People are going to buy it anyway given all the other improvements. They don't need to compel the elitist to upgrade from last gen to this gen. It just seems like more marketing speak.
 
Optical flow has been present in GPUs since they started adding video encoder FFBs, as producing an optical flow field is a mandatory part of all modern video CODECs (since at least MPEG2).
No, video codecs don't use optical flow. They employ motion vectors, but those vectors are optimized to minimize the residuals from macroblock motion compensation, which is a different problem than optical flow is meant to solve.

Furthermore, it's because video compression motion vectors are optimized for coding efficiency that makes them nearly useless for computer vision applications. I actually did a research survey on this topic, at one point.

It is to allow for reprojection of all moving elements of the scene regardless of whether that motion is doe to head motion or in-scene motion.
That's a plus, but it's not necessary simply to avoid motion sickness. Optical flow is irrelevant for avoiding motion sickness, as the only thing which tells you how much the wearer's head has moved since the frame started being rendered is the HMD's tracking system, which is focused on the wearer's pose within the real environment and not at all at what's happening in the virtual environment.

ASW accounts for in-scene object motion as a fundamental requirement. Image synthesis (to account for disocclusion from object parallax) is a fundamental requirement.
Then you can't do that in the HMD (due to such compensation requiring depth information, which is absent from the video signal), which puts it at a disadvantage vs. other techniques. VR users tend to be relatively stationary, so the changes in orientation will be much more important than changes in position. Simple VR implementations only track orientation, not position, showing the relative lack of importance in matching position changes vs. orientation changes, for wearer comfort.

I linked the page explaining how it works already, but I'll do it again too.
I already tried this link before, but it simply took me to a product listing for Meta Quest. Even if they implemented it like you're saying, that doesn't make object motion compensation a fundamental requirement - it just shows they went further than others.
 
Would it not be possible to just.. go bigger? Physically I mean. If it's not that feasible to go smaller anymore.
Personally I'm not that much opposed to owning a bigger computer.
That's exactly what they did. Instead of manufacturing it on a smaller process node, they kept using virtually the same node as the RTX 4000 series but just made the dies bigger.

So far, multi-die GPUs haven't worked terribly well and the industry seems to have moved away from multi-GPU rendering. After AMD killed its multi-die GPU for the RX 9000 generation, Apple remains the lone holdout for multi-die GPUs capable of gaming, via their "Ultra" tier of M-series SoC. Those scale up to two dies, but I think it's unlikely they'd scale well beyond that. Performance-wise, they still fall short of the fastest monolithic dGPUs, but they've also got CPU cores in there - so, they're trying to do a lot of different things well and that makes it hard also to be the best GPU.
 
  • Like
Reactions: Loadedaxe
So if this new compression technique doesn't require more vram, why does the 5090 have 8gb more then the 4090? People are going to buy it anyway given all the other improvements. They don't need to compel the elitist to upgrade from last gen to this gen. It just seems like more marketing speak.
The 5090, Like bit_user pointed out is not just for gaming. As a matter of fact, more Ai and Content Creators are using them rather than gamers.
 
  • Like
Reactions: bit_user
Here's a thought experiment, showing where frame extrapolation breaks down. Imagine you're playing a first-person game of some sort. You're standing near a corner or some kind of large obstacle that someone could hide behind. If they step out from behind it, then the algorithm isn't going to know what to do in those trailing-edge pixels that are newly revealed in each successive, extrapolated frame.

You might be right that they try to do some sort of AI in-painting, but models like what Adobe uses for that are probably huge and complex, nowhere near realtime. More likely, they just smear the object or reuse previous frame's pixels at that location. Basically, you'd see this sort of ghosting effect, along that trailing edge. Worse yet, it'd probably flicker as each real frame corrects it, drawing even more attention to the artifact. At high native frame rates, the effect might be subtle enough that you wouldn't really notice, but when the native framerate is low, then it'd be very pronounced.

Similar to what I said before, I think the proper solution to this problem is just to natively rasterize and shade these areas. Assuming Nvidia still uses tile-based rasterization, they could actually do this without a ton of overhead, though you would need to reprocess the geometry for that frame. With ray-tracing, it's even easier just to shoot some assorted rays where & when you need them, although there's again the problem of not only needing to do the geometry transforms, but also building/updating the BVH structure.
This is a contrived example that doesn't really happen in practical use. Think about this: how many frames are required for a person to step out from behind an obstacle? Particularly if a game is rendering at 30+ FPS, it's not like they pop out for a single frame and then disappear one frame later. There will be dozens of frames where the other person is coming out and then returning.

If there's really fast camera motion, things will break down somewhat. But that's always the case. And it really doesn't matter as much as when you spin the camera really fast, everything blurs together and looks ugly regardless. (And your monitor pixel persistence will contribute to this as well.)

What will really happen is that things shift and, in most situations, you'll have edges of maybe a few pixels where the correct data is missing. Those would get filled in by a fast in-painting algorithm, and they'll be visible for maybe 10–20 ms at most. Then a fully rendered frame would come along and you get the correct pixels everywhere.

This is all for a hypothetical framegen projection future, of course, and only for projecting one frame, not multiple frames. But if you're doing multiple frames, all you need is faster hardware. Like if you're running at 100 FPS native, you have 10 ms between real frames. Now if you want to project in frame into that space, it needs to be done in 5 ms. If you want to do two projected frames, each has to be ready within 3.3 ms. Three frames? 2.5 ms per frame. And probably shave off a few tenths for each of those to be ready with some room to spare... and at higher FPS, you'd either need faster projection or fewer projected frames.

But again, the more I look at that scenario, I'm sure it's precisely what Intel, Nvidia, and AMD are working on right now. And you have other more reasonable scenarios. What if the base framerate is only 60 FPS? How you have ~8 ms to project a single frame, ~5 ms each for two frames, or ~4 ms each for three frames and you could get a frame generated 120, 180, or 240 FPS. Is the hardware fast enough to do that, with in-painting, right now? Probably on a 5090 you could at least do one or two frames that way. On a future 5060, or with an RTX 4060, maybe only a single frame is possible.

And in-painting in some ways becomes easier if you're projecting multiple frames. Like suppose the there's a projected shift in camera position of ~6 pixels to the right. With a single frame projection, the algorithm has to fill in a whole six pixel wide stripe along the left side of the screen. With three frames, it only has to do a two pixel wide frame at each step.

Given what we know about Jensen (i.e. he's always thinking of the next step, planning ahead, working on the future), I suspect what he told me wasn't exactly wrong... it's just not what DLSS 4 will be doing with RTX 50-series when they launch. Instead, he was probably talking about what the next generation DLSS 5 or whatever is going to do in a year, or maybe for the RTX 60-series. Because 50-series shipping right now means the hardware has been done for six months or more, and a lot of the key people are already working on the next, and next-next generation GPUs and software solutions!
 
What will really happen is that things shift and, in most situations, you'll have edges of maybe a few pixels where the correct data is missing. Those would get filled in by a fast in-painting algorithm, and they'll be visible for maybe 10–20 ms at most. Then a fully rendered frame would come along and you get the correct pixels everywhere.
Can you please try to confirm that with Nvidia? As I said, in-painting is a lot harder than run-of-the-mill DLSS and requires a much bigger model (think Stable Diffusion) that I doubt they could inference in the sort of time budgets we're talking about.
 
No, video codecs don't use optical flow. They employ motion vectors, but those vectors are optimized to minimize the residuals from macroblock motion compensation, which is a different problem than optical flow is meant to solve.
Yet is has been successfully implemented in practice for 7 years.
That's a plus, but it's not necessary simply to avoid motion sickness. Optical flow is irrelevant for avoiding motion sickness, as the only thing which tells you how much the wearer's head has moved since the frame started being rendered is the HMD's tracking system, which is focused on the wearer's pose within the real environment and not at all at what's happening in the virtual environment.
It is necessary, because both scene and head motion causes disocclusion, which required 'inpainting' to fill in the disoccluded areas. But you can't identify the disoccluded areas just from IMU data alone, because that would mean repositioning the camera and rerendering the scene for the new viewpoint, which is the entire exercise you are trying to avoid in the first place. Optical flow is used to ensure scene objects shift correctly based on their depth and prior motion, and this works identically whether that motion is from head motion (camera shift) or from object motion (scene shift). The technique is motion origin agnostic by default: it would literally be more difficult (both conceptually and computationally) to try and compensate for head motion alone and not scene motion.
Then you can't do that in the HMD
It's been done in the HMD since 2019 (release of the Quest 1).
Simple VR implementations only track orientation, not position
Nobody has done so for the last half a decade, at least not without being laughed out of the room. Releasing a HMD today with orientation-only tracking would be seen about as favourably as releasing a monitor with only green subpixels.

I really do recommend actually following the earlier link, it explains the functioning of ASW and why it is implemented that way. This is well-known old tech at this point, so it is good to see it implemented outside of VR (with or without NN assistance with inpainting).
 
Can you please try to confirm that with Nvidia? As I said, in-painting is a lot harder than run-of-the-mill DLSS and requires a much bigger model (think Stable Diffusion) that I doubt they could inference in the sort of time budgets we're talking about.
Confirm that Reflex 2 uses in-painting? It absolutely does. More details to come soon, but if Nvidia can create a fast algorithm that works there, it could do it for other things as well.

And if you’re running at a low FPS and make a big camera shift, it will break down and look bad I’m sure. But it’s intended for games probably running well over 100 FPS. That’s my take anyway.
 
  • Like
Reactions: bit_user