News AMD plans for FSR4 to be fully AI-based — designed to improve quality and maximize power efficiency

edzieba

Distinguished
Jul 13, 2016
565
569
19,760
Ok so it’s AI based, will rdna3’s AI focused matrix extensions allow it ton on there without discrete matrix accelerators?
Likely not without a performance impact comparable to the performance uplift from the reduced rendering resolution, making it pretty much moot for RDNA3 and below. Hence the focus on handheld gaming devices which contain AMD chips with dedicated matrix accelerator hardware. AMD have yet to announce consumer desktop cards with such hardware, so they will likely not promote FSR4 in that market until they have announced cards that could take advantage of it.
 
  • Like
Reactions: mikeztm
Likely not without a performance impact comparable to the performance uplift from the reduced rendering resolution, making it pretty much moot for RDNA3 and below. Hence the focus on handheld gaming devices which contain AMD chips with dedicated matrix accelerator hardware. AMD have yet to announce consumer desktop cards with such hardware, so they will likely not promote FSR4 in that market until they have announced cards that could take advantage of it.
That hardware may very well be in RDNA 4 but we will have to wait and see.
 
  • Like
Reactions: jlake3

mikeztm

Distinguished
Feb 15, 2012
10
5
18,515
Ok so it’s AI based, will rdna3’s AI focused matrix extensions allow it ton on there without discrete matrix accelerators?
RDNA3 does not have any discrete matrix unit as you said, so it will not be useful there.
RDNA3's WMMA extension is almost pointless since it does not have any execution unit support.
 
While Microsoft has made it simple to implement all of the upscalers I do still hope AMD has an open implementation similar to Intel's. I absolutely think it's necessary for AMD to move beyond what they have been doing as there's a definite ceiling as to how high quality it can be.
 

DS426

Upstanding
May 15, 2024
196
169
260
What is with everyone's obsession with thinking NPU's are needed for AI work? NPU's just make AI workloads more efficient on low power / battery-life-important devices like laptops.

Additionally, FSR4 would have the potential of more real-world impact on a handheld, even as the desktop gamer world does demand FSR that's more competitive with DLSS. As stated in the article, let it prefer AMD hardware first and then if it's a different vendor, fall back to DP4a and FP16 as I also agree remaining hardware vendor agnostic and mostly open source is important now and going forward.
 
  • Like
Reactions: JarredWaltonGPU

usertests

Distinguished
Mar 8, 2013
855
774
19,760
What is with everyone's obsession with thinking NPU's are needed for AI work? NPU's just make AI workloads more efficient on low power / battery-life-important devices like laptops.

Additionally, FSR4 would have the potential of more real-world impact on a handheld, even as the desktop gamer world does demand FSR that's more competitive with DLSS. As stated in the article, let it prefer AMD hardware first and then if it's a different vendor, fall back to DP4a and FP16 as I also agree remaining hardware vendor agnostic and mostly open source is important now and going forward.
If their focus is on laptops and handhelds, the NPU is an untapped resource that would let them leave GPU resources to games. Especially the XDNA2 NPU which is relatively powerful. But we don't know how this is going to shake out. The only details we actually know are that FSR4 will use AI, and they are focusing on mobile, efficiency, battery life.

DLSS does a lot better than FSR at low input resolutions, so an AI-based FSR4 focusing on mobile makes a lot of sense. Crappy 540p input + AI = 1080p magic. Rendering at 540p lowers power consumption of the APU, leading to longer battery life.
 
  • Like
Reactions: JarredWaltonGPU

deksman

Distinguished
Aug 29, 2011
234
19
18,685
What a waste of hardware and dev-time, all just to peg a crummy "AI" label w/ negligible real-world benefits.

I wouldn't be so sure its a waste of hw and dev-time.
For one thing, NV has been using AI for DLSS for a while now. AMD is simply following that route.
to be fair, they've done a remarkable job with what's available, but going one step further would likely help in cleaning up that small % variation that some people are complaining about in regards to quality (personally I agree this is mainly negligible and not very noticeable when gaming, especially when the implementation is well made).

But this will allow existing AI hw to actually be used.
 

usertests

Distinguished
Mar 8, 2013
855
774
19,760
>Will that be enough, and will FSR4 usher in a truly competitive alternative to DLSS and XeSS that finally leverages AI?

I don't see framegen's relative performance between GPU vendors as an issue. It's an iterative process; they will all get better. The more important question is if or when frame-gen will be functional enough to be entirely subsumed into GPUs proper. My gauge is it'll be "when," not "if."

Currently, framegen's main drawback is lag, which is a showstopper for fast-twitch gaming. With AI, one can conceivably mitigate lag, as instead of waiting for the "next" frame before interpolating the extra frame, AI can obviate the wait with some sort of predictive "look-ahead."

I have no knowledge or insight into framegen tech, but that would be the logical goal. Just as upscaling is now considered (by present company) to be an acceptable means to get higher fidelity for normal gaming, albeit not for benchmarking purposes, framegen will not be too far behind. The yardstick will be lag (or lack of).
You keep on saying framegen, but FSR4 will probably be for upscaling, separate from frame gen.

FSR3.0 was framegen, but it's otherwise referred to as Fluid Motion Frames (FLUMOFUKR).
 

baboma

Notable
Nov 3, 2022
282
337
1,070
>You keep on saying framegen, but FSR4 will probably be for upscaling, separate from frame gen.

"So now we're going AI-based frame generation, frame interpolation, and the idea is increased efficiency to maximize battery life."

If there's a distinction, it's semantics. Per Huynh's above statement, AMD will use AI for framegen--as DLSS/XeSS already do--regardless of whether framegen is inside FSR4 or outside. I'm commenting on framegen as a general tech, not just AMD's implementation. Whatever label it has is irrelevant.

Jarred eval'ed framegen is currently good as a "smoothing" feature. I doubt that's the end-goal. Carried to its logical conclusion, framegen will be an integral part of the GPU's performance. That's what the iterations are for, not just to improve efficiency, but to improve functionality. At a certain lag threshold, framegen will be usable for all games including fast-twitch. Upscaling is already there.
 
  • Like
Reactions: JarredWaltonGPU
>You keep on saying framegen, but FSR4 will probably be for upscaling, separate from frame gen.

"So now we're going AI-based frame generation, frame interpolation, and the idea is increased efficiency to maximize battery life."

If there's a distinction, it's semantics. Per Huynh's above statement, AMD will use AI for framegen--as DLSS/XeSS already do--regardless of whether framegen is inside FSR4 or outside. I'm commenting on framegen as a general tech, not just AMD's implementation. Whatever label it has is irrelevant.

Jarred eval'ed framegen is currently good as a "smoothing" feature. I doubt that's the end-goal. Carried to its logical conclusion, framegen will be an integral part of the GPU's performance. That's what the iterations are for, not just to improve efficiency, but to improve functionality. At a certain lag threshold, framegen will be usable for all games including fast-twitch. Upscaling is already there.
The thing is, framegen and frame interpolation are, to me, the same thing. You're interpolating a frame between two rendered frames. Intel has hinted at investigations into frame prediction or whatever you want to call it — generating the "next frame" based on the current frame plus AI. That would conceivably allow for increased smoothness without the additional lag.

Ultimately, as long as frame generation/prediction stuff isn't using any additional user input, it will still just end up being frame smoothing in my book. I'd like to see something where you take the last rendered frame, sample user input, and predictively generate a next frame from that. Or at the very least something like asynchronous time warp coupled to AI to get user input right before the generation of a new frame. Basically, we need something that responds to user input somehow rather than just interpolating between two rendered frames for this to feel better and not just look better.
 
Intel has hinted at investigations into frame prediction or whatever you want to call it — generating the "next frame" based on the current frame plus AI. That would conceivably allow for increased smoothness without the additional lag.
It sounds like there are a lot of hurdles to getting this sort of technology going. I'm not sure how viable it will end up being, but it's good the investigation is happening.

I do think Frame Generation is a very valuable technology that has been marketed all wrong. With 4k/240 displays appearing along with high refresh 1440p/UW 1440p it's starting to become necessary. Even with 4090 levels of performance these frame rates aren't realistic in most gaming outside of e-sports type titles. Instead of marketing pushing the very real advantages we keep getting nonsense about high fps on underpowered cards.
 

edzieba

Distinguished
Jul 13, 2016
565
569
19,760
Intel has hinted at investigations into frame prediction or whatever you want to call it — generating the "next frame" based on the current frame plus AI. That would conceivably allow for increased smoothness without the additional lag.
That's been in active use for many years in the VR space. A combination of depth-buffer-based occlusion prediction and optical-flow-based motion prediction. Oculus pioneered this as 'Aysnchronous Spacewarp' but others have replicated this for their own implementations (e.g. Valve).
 

Pierce2623

Prominent
Dec 3, 2023
403
291
560
RDNA3 does not have any discrete matrix unit as you said, so it will not be useful there.
RDNA3's WMMA extension is almost pointless since it does not have any execution unit support.
Well the WMMA extensions seemingly work well enough in standard generative AI work loads. What do you mean they have no execution unit support? They allow the standard shaders to run matrix math quickly enough that they work well in LLMs. It’s not exactly a fair comparison but I can now get my 7800xt to outperform one of the 3060s in my little “AI experimentation box” with two 3060s. It’s only in a couple cases using models optimized to perform on AMD but I think it shows that the matrix throughput is there or thereabouts.
 

Pierce2623

Prominent
Dec 3, 2023
403
291
560
What is with everyone's obsession with thinking NPU's are needed for AI work? NPU's just make AI workloads more efficient on low power / battery-life-important devices like laptops.

Additionally, FSR4 would have the potential of more real-world impact on a handheld, even as the desktop gamer world does demand FSR that's more competitive with DLSS. As stated in the article, let it prefer AMD hardware first and then if it's a different vendor, fall back to DP4a and FP16 as I also agree remaining hardware vendor agnostic and mostly open source is important now and going forward.
Nobody thinks it AI REQUIRES matrix acceleration. It just makes it vastly more performant. My question was specifically asking if AMD would vendor lock the “AI-based” upscaling to rdna4, when rdna3’s matrix extensions mean its matrix throughput is actually pretty decent.
 

tstager

Distinguished
Aug 27, 2013
10
10
18,515
Ok so it’s AI based, will rdna3’s AI focused matrix extensions allow it ton on there without discrete matrix accelerators?
I think it will because RDNA3 ai hardware hasn't been used for anything yet. I think they were planning for FSR4 when they added these accelerators to rdna3.
 

tstager

Distinguished
Aug 27, 2013
10
10
18,515
Well the WMMA extensions seemingly work well enough in standard generative AI work loads. What do you mean they have no execution unit support? They allow the standard shaders to run matrix math quickly enough that they work well in LLMs. It’s not exactly a fair comparison but I can now get my 7800xt to outperform one of the 3060s in my little “AI experimentation box” with two 3060s. It’s only in a couple cases using models optimized to perform on AMD but I think it shows that the matrix throughput is there or thereabouts.
The 7900xtx has 192 ai accelerators. There are 2 per cu! 248 Tops!
 

TRENDING THREADS