Video Encoding Tested: AMD GPUs Still Lag Behind Nvidia, Intel

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

PlaneInTheSky

Commendable
BANNED
Oct 3, 2022
556
759
1,760
this has been a really useful tool for me.
it allows even my older pc's to encode at a decent speed with Handbrake.

yep, NVENC kicks ass ond old systems. Works for OBS, Handbrake as long as you turn it on in settings.

(works blazing fast for Adobe Rush too, I think it uses CUDA)

CPU doesn't matter. I even used OBS in 1080p on an old AMD X3. That's a CPU from 2009, it is 14 years old.

As long as a GPU with NVENC is used, it works fine.

The only thing the PC needs to use NVENC is a PCIe bus. The first mobo with PCIe bus are from 2004, so you should be able to turn 18-year-old systems into streaming / encoding monsters with an NVENC GPU.
 
Last edited:

bit_user

Polypheme
Ambassador
It's done now. I wrote lots of VBA stuff for these charts already, I just never explicitly set the line colors. I've cleaned things up as well, so hopefully the various lines (where they're not fully obscured) are a bit easier to pick out. Lots of AMD overlapping AMD and Nvidia overlapping Nvidia, though. I could potentially drop some of those lines, but the data's all there now so...
Looks better!

For the cases where several lines completely overlap, one thing that would help is call-outs. Just a suggestion for future reference.
 
D

Deleted member 431422

Guest
Can you please make the graphs readable? You used 5 shades of blue!!!
I switch to box graphs. The next one. Line graphs are hard to read and not very informative in this case.
I don't understand the article either. What is it comparing, old GPU's to latest? I see RDNA3 compete with RX4xxx. Old Vega and 1080 are only a reference. I wouldn't be considering them even if I did encode video.
 
D

Deleted member 431422

Guest
I can see two reasons for comparing old with new:
  1. Help someone decide whether to upgrade.
  2. To show progress (or lack thereof) that a given vendor has made on quality or performance.

Regarding point #2, there are a couple interesting and rather surprising regressions.
You mean 1080 vs 2080? I was surprised to see old 1080 compete with later.
 
You mean 1080 vs 2080? I was surprised to see old 1080 compete with later.
How about these two observations, which I did hint at in the article:

-------------------------
1: AMD's H.264 quality basically didn't improve since at least Polaris (RX 400-series) through RDNA, and RDNA 2/3 only provide a minor bump in quality at the lowest bitrates. Performance did improve, however, from ~100 fps with the RX 590 to ~425 fps with RDNA3. Also interesting was that performance was faster with RX 5700 XT than with the 6900 XT. Nvidia's H.264 quality also has remained virtually unchanged since at least Pascal at 1080p, but the 4K encoding quality showed a clear bump going from Pascal to Turing (Turing through Ada have been pretty static, though).

Performance has also only improved from about 305 fps at 1080p with Pascal to 490 fps with Ada, or from ~83fps at 4K with Pascal to ~134fps with Ada. However, both Nvidia's base performance and quality are significantly better than AMD's to start with. There's literally no good reason why, in over six years, AMD hasn't been able to close the quality gap in H.264 encoding at all. It's rather sad that the most popular video codec of the past 15 years or so (it was first published in 2003) has received so little attention from AMD.

And it's not like Nvidia is alone in doing better with hardware H.264 encoding. Look at Intel's QuickSync. Sure, the UHD 770 is behind Nvidia by a few points, but it's not the 10~20 point delta that AMD sees. I'm pretty sure the UHD 770 isn't significantly changed on QuickSync compared to even Skylake's Gen9.5 HD 530, or maybe Kaby Lake's HD 630. Actually, I think HEVC got most of the updates after the Ivy Bridge (3rd Gen Intel) generation, and H.264 support was mostly static, other than the quality bump with Arc.

-------------------------
2: HEVC quality and speed of encoding sort of peaked in 2016~2017 and then received a bit less emphasis going forward. AMD's Polaris and Vega generation GPUs had slightly better results in HEVC quality comparted to RDNA 1/2, though RDNA 3 has mostly recovered the lost quality. Speed got a big boost from Vega to RDNA, though, and then RDNA 2 actually got slower. Somewhat similarly, GTX 1650 (Pascal encoder with Turing non-RTX GPU cores) had higher performance than the GTX 1080 Ti as well as the RTX 2080 Ti and RTX 3090. RTX 4090 did reclaim the performance crown, however.

Again, this suggests after building hype for HEVC (originally released to the public in 2013), over the next 3~4 years there was some excitement thinking it would replace H.264 as the codec of choice, but the royalty fees ended up killing that. So now companies are putting more effort into AV1 and hoping that can truly put H.264 in the rearview mirror.

-------------------------
The reason for the line charts on quality is because increasing bitrates correlate proportionately with increasing quality. Lines give you a clear slope, bar charts don't. But if you really want the raw numbers, it's all in the bottom tables. I pointed out in the text that there was a lot of overlap in quality on the AMD vs AMD and Nvidia vs Nvidia, so sometimes bars are obscured — completely in the case of AMD's RDNA 2 vs RDNA 3 with H.264, as they have exactly the same quality results. Nvidia's 1080 Ti and 1650 are also identical, for both H.264 and HEVC, while the 2080 Ti and 3090 are only perfectly identical on their HEVC results.

I did post a video of this yesterday, where I skipped including the GTX 1080 Ti quality lines and the RTX 2080 Ti quality lines for precisely this reason. I figured our readers would be able to grok the results, or refer to the numerical table for the full explanation, since many of them prefer more data rather than less.

View: https://www.youtube.com/watch?v=elZH8iXGTPk
 
Looks better!

For the cases where several lines completely overlap, one thing that would help is call-outs. Just a suggestion for future reference.
I added an edit back when I redid all the VMAF calculations, so if you read the original it may have been missed:

"There's plenty of overlap in the VMAF scores, so if you can't see one of the lines, check the tables lower down that show the raw numbers — or just remember that GPUs from adjacent generations often have nearly the same quality. If you don't want to try to parse the full table below, AMD's RDNA 2 and RDNA 3 generations have identical results for H.264 encoding quality, though performance differs quite a bit. Nvidia's GTX 1080 Ti and GTX 1650 also feature identical quality on both H.264 and HEVC, while the RTX 2080 Ti and RTX 3090 have identical quality for HEVC."
 
  • Like
Reactions: bit_user

bit_user

Polypheme
Ambassador
"There's plenty of overlap in the VMAF scores, so if you can't see one of the lines, check the tables lower down that show the raw numbers
I get that, and I realize that automatic placement of the call-outs might not be great. Still, it's worth keeping in mind for the future, which is the sense in which I suggested it.

Thanks, again, for being so diligent and responsive.
 
  • Like
Reactions: JarredWaltonGPU
There's literally no good reason why, in over six years, AMD hasn't been able to close the quality gap in H.264 encoding at all.
I'll hand pick that one and ask: is that really the case?

I thought neither AMD nor nVidia actually "handcraft" their decoding hardware and instead source it from 3rd parties which, in turn, have their own IP's and the usual building (patented) blocks for them.

I'm not claiming AMD can't do it because of patents (at the end of it), but I'm curious as if that could be the reason why. Also, that developing a better encoding engine is just not within their priorities and the "good enough" is ok or the time being while they try to catch up (and compete) in all other relevant areas.

I mean: software encoding is much much better when you want quality and file size and time is not that much of a deal. Most professionals use software encoding for their final products (think Pixar), so pro cards having dedicated engines means squat to them and for the users over a certain "score" (no idea how it's calculated), they just won't notice a difference I'd imagine? Specially at higher bitrates.

I would even go as far as saying most people on the receiving end of a YouTube/Twitch/etc stream will have their internet crap the quality before the encoding quality becomes a factor.

I know you took the approach of this from the "backup" perspective, which is valid, but it's not really one or the other? Like a happy medium?

Regards.
 

bit_user

Polypheme
Ambassador
I thought neither AMD nor nVidia actually "handcraft" their decoding hardware and instead source it from 3rd parties which, in turn, have their own IP's and the usual building (patented) blocks for them.
I'll bet Nvidia does it in-house. I know AMD used Tensilica DSP cores for audio, at one point, but I didn't happen to hear any such claims about video.

Funny enough, ATI used to be one of the leading companies at video playback, in the late 90's and early 2000's. If you wanted full-screen video playback, they were one of the go-to brands. And their All-In-Wonder products had integrated tuning and recording.

Even if their codec IP is licensed, the onus is on AMD to either switch vendors or also license the source code and improve it. I'll bet they just kept putting their resources into supporting & optimizing newer codecs, figuring that legacy codecs mean less & less to people. Probably, they predicted use of H.264 would've mostly fallen off, by now. Too bad about the licensing woes of H.265.

Most professionals use software encoding for their final products (think Pixar),
Don't confuse realtime streaming use cases with content authoring. The latter also uses 2-pass, for additional quality.

I would even go as far as saying most people on the receiving end of a YouTube/Twitch/etc stream will have their internet crap the quality before the encoding quality becomes a factor.
The screen caps Jarred included are pretty illuminating for me. I can easily spot the differences.

Plus, for people doing livestreaming, their uploads tend to be capped much more heavily than downloads. So, that means they need to focus on encoding the best quality for the allowed bitrate.
 
  • Like
Reactions: JarredWaltonGPU
Don't confuse realtime streaming use cases with content authoring. The latter also uses 2-pass, for additional quality.
Actually, ffmpeg says that doing CRF generally provides the same quality as 2-pass encodes now and is the "preferred method," with the only real benefit of 2-pass being if you're hyper-focused on a specific file size and bitrate. I actually verified this with some testing, but the catch of course is that with CRF you're never sure of the exact bitrate of the output file that you'll get until you do the encode. But for people that do 2-pass encoding for quality purposes, setting CRF to 17 or 18 will generally yield a VMAF score in the 96+ range (virtually indistinguishable from the source). It will probably also end up being something like 25Mbps for 4K H.264, whereas with a 2-pass encode you might be able to get close to the same quality score while using a 10-20% lower bitrate.

Quick aside: Targeting a specific bitrate doesn't always do exactly what you'd expect. For example, with the same 12Mbps bitrate target and H.264 on the Borderlands 3 4K video, I got actual video bitrates ranging from 11485kbps (CPU libx264) to 12461kbps (RTX 4090). AMD and Intel GPU encoding was much closer to the target at 11949kbps~11998kbps. It does vary a bit by video source, but Nvidia consistently "cheats" and uses 3~8% more bitrate than the target. They're not alone in missing the target, though. Intel's 4K 8Mbps H.264 Arc results were 9~15% higher than the 8000kbps target. AMD's 4K 8Mbps H.264 results were only 1~5% above the target. Interestingly, Arc with HEVC actually came in slightly below the target on 4K 8Mbps, so it's not always just a case of more bitrate being the explanation for the higher scores.

I would even go as far as saying most people on the receiving end of a YouTube/Twitch/etc stream will have their internet crap the quality before the encoding quality becomes a factor.
I would wager heavily this isn't the case for 99% of users. Download speeds are normally 10X or more higher than upload speeds. I have 1000Mbps down and only 20Mbps up, for example (50X higher download), and I used to have 450Mbps down and 12Mbps up when I was forced to use Xfinity at my previous home (37.5X higher download). Even a pretty weak and inexpensive internet connection these days will be something like 50Mbps down and 7Mbps up (7X higher downstream).

Unless you're sharing the connection with a lot of other people, 50Mbps is plenty for all but the most demanding of video streams (4K H.264 still would typically be in the <40Mbps range, and YouTube would convert a popular 4K video to HEVC or AV1 and thus only need in the ~25Mbps range for great quality). So it's basically always the upstream bandwidth for a livestream that limits the bitrate. Now, there's talk in my town of doing a municipal ISP with fiber and 1Gbps up/down, and I will be all over that stuff if it's in the $100 or less range (which it should be, judging by neighboring towns).
 
  • Like
Reactions: bit_user
I'll bet Nvidia does it in-house. I know AMD used Tensilica DSP cores for audio, at one point, but I didn't happen to hear any such claims about video.
That's why I brought it up... I know AMD outsourced, but I don't know to whom and what IP blocks they use compared to nVidia.

Don't confuse realtime streaming use cases with content authoring. The latter also uses 2-pass, for additional quality.
I am not. I am very well versed with encoding and also why I mentioned this is neither use case strictly speaking. It's obviously needed for the comparison, but there's things that are outside of the scope of this testing that would still be interesting to investigate: encoding latency, real-time bandwidth and performance drops/dips of each solution.

Not easy to capture, but it would give a more complete picture of the whole "end to end" (being generous) live streaming journey and would give a better idea how each card/solution behaves.

The screen caps Jarred included are pretty illuminating for me. I can easily spot the differences.

Plus, for people doing livestreaming, their uploads tend to be capped much more heavily than downloads. So, that means they need to focus on encoding the best quality for the allowed bitrate.
Yes for the image quality with one nuance: this is motion. The algorithm of AV1, overall, handles motion better compared to H265's HEVC implementation, but picture-per-picture, you can see they're not far off in quality. That is what I've noticed myself and I have the videos to prove it.

And yes, I would agree on the upload speed thing. That's why I always target 6mbps overall. I don't know how Discord manages the encoding part, but I always see it crap out on the slightest network blip on either side. I also notice that on Twitch and YT, where YT is fairly more consistent thanks to their buffering algorithm for offline content. I rarely see things "live" in YT, so I can't remember how it behaves there. And interesting tangent, for sure.

Regards.
 
Actually, ffmpeg says that doing CRF generally provides the same quality as 2-pass encodes now and is the "preferred method," with the only real benefit of 2-pass being if you're hyper-focused on a specific file size and bitrate. I actually verified this with some testing, but the catch of course is that with CRF you're never sure of the exact bitrate of the output file that you'll get until you do the encode. But for people that do 2-pass encoding for quality purposes, setting CRF to 17 or 18 will generally yield a VMAF score in the 96+ range (virtually indistinguishable from the source). It will probably also end up being something like 25Mbps for 4K H.264, whereas with a 2-pass encode you might be able to get close to the same quality score while using a 10-20% lower bitrate.
That is interesting to know. For my "quality" I always run tests on the content I want to encode and decide based on my own subjective analysis of the results vs the source, but always do* 2-pass. That's the way I've always done it and that's what has given me the best results, so it's interesting to know there's not much difference nowadays with quality-based compression. Have you tried quantizised as well, by any chance?

Quick aside: Targeting a specific bitrate doesn't always do exactly what you'd expect. For example, with the same 12Mbps bitrate target and H.264 on the Borderlands 3 4K video, I got actual video bitrates ranging from 11485kbps (CPU libx264) to 12461kbps (RTX 4090). AMD and Intel GPU encoding was much closer to the target at 11949kbps~11998kbps. It does vary a bit by video source, but Nvidia consistently "cheats" and uses 3~8% more bitrate than the target. They're not alone in missing the target, though. Intel's 4K 8Mbps H.264 Arc results were 9~15% higher than the 8000kbps target. AMD's 4K 8Mbps H.264 results were only 1~5% above the target. Interestingly, Arc with HEVC actually came in slightly below the target on 4K 8Mbps, so it's not always just a case of more bitrate being the explanation for the higher scores.
Haha, careful there, or you'll get the angry mob behind you in no time xD

But I didn't know that. In fact, now that you mention it, that's why it's never been a "thing" for me, since I've never noticed any deviations on my own tests in regards to the bandwidth for the video or audio with AMD or CPU based encoding. It's an interesting point, so I now wonder how things are at strictly 1:1 bitrate.

I would wager heavily this isn't the case for 99% of users. Download speeds are normally 10X or more higher than upload speeds. I have 1000Mbps down and only 20Mbps up, for example (50X higher download), and I used to have 450Mbps down and 12Mbps up when I was forced to use Xfinity at my previous home (37.5X higher download). Even a pretty weak and inexpensive internet connection these days will be something like 50Mbps down and 7Mbps up (7X higher downstream).

Unless you're sharing the connection with a lot of other people, 50Mbps is plenty for all but the most demanding of video streams (4K H.264 still would typically be in the <40Mbps range, and YouTube would convert a popular 4K video to HEVC or AV1 and thus only need in the ~25Mbps range for great quality). So it's basically always the upstream bandwidth for a livestream that limits the bitrate. Now, there's talk in my town of doing a municipal ISP with fiber and 1Gbps up/down, and I will be all over that stuff if it's in the $100 or less range (which it should be, judging by neighboring towns).
Yeah, I guess I should've clarified that no matter how good your internet is, if you're using a crappy router with a WiFi connection on "poor" and things (more common than what we all think), you're bound to see how the quality on any stream drops to the ground. Even just being in voice chat drops. Calls on Teams/Discord/Slack/etc.

I find it surprising a lot of "tech" oriented individuals still pay for such expensive connectivity outside the door, but internally they don't put enough quality on their internal network. I've found a lot of people buys the most expensive routers the can get, but never configure them correctly.

Also, "3rd world countries internet" is still a thing... Ugh.

Regards.
 
D

Deleted member 431422

Guest
How about these two observations, which I did hint at in the article:

-------------------------
1: AMD's H.264 quality basically didn't improve since at least Polaris (RX 400-series) through RDNA, and RDNA 2/3 only provide a minor bump in quality at the lowest bitrates. Performance did improve, however, from ~100 fps with the RX 590 to ~425 fps with RDNA3. Also interesting was that performance was faster with RX 5700 XT than with the 6900 XT. Nvidia's H.264 quality also has remained virtually unchanged since at least Pascal at 1080p, but the 4K encoding quality showed a clear bump going from Pascal to Turing (Turing through Ada have been pretty static, though).

Performance has also only improved from about 305 fps at 1080p with Pascal to 490 fps with Ada, or from ~83fps at 4K with Pascal to ~134fps with Ada. However, both Nvidia's base performance and quality are significantly better than AMD's to start with. There's literally no good reason why, in over six years, AMD hasn't been able to close the quality gap in H.264 encoding at all. It's rather sad that the most popular video codec of the past 15 years or so (it was first published in 2003) has received so little attention from AMD.

And it's not like Nvidia is alone in doing better with hardware H.264 encoding. Look at Intel's QuickSync. Sure, the UHD 770 is behind Nvidia by a few points, but it's not the 10~20 point delta that AMD sees. I'm pretty sure the UHD 770 isn't significantly changed on QuickSync compared to even Skylake's Gen9.5 HD 530, or maybe Kaby Lake's HD 630. Actually, I think HEVC got most of the updates after the Ivy Bridge (3rd Gen Intel) generation, and H.264 support was mostly static, other than the quality bump with Arc.

-------------------------
2: HEVC quality and speed of encoding sort of peaked in 2016~2017 and then received a bit less emphasis going forward. AMD's Polaris and Vega generation GPUs had slightly better results in HEVC quality comparted to RDNA 1/2, though RDNA 3 has mostly recovered the lost quality. Speed got a big boost from Vega to RDNA, though, and then RDNA 2 actually got slower. Somewhat similarly, GTX 1650 (Pascal encoder with Turing non-RTX GPU cores) had higher performance than the GTX 1080 Ti as well as the RTX 2080 Ti and RTX 3090. RTX 4090 did reclaim the performance crown, however.

Again, this suggests after building hype for HEVC (originally released to the public in 2013), over the next 3~4 years there was some excitement thinking it would replace H.264 as the codec of choice, but the royalty fees ended up killing that. So now companies are putting more effort into AV1 and hoping that can truly put H.264 in the rearview mirror.

-------------------------
The reason for the line charts on quality is because increasing bitrates correlate proportionately with increasing quality. Lines give you a clear slope, bar charts don't. But if you really want the raw numbers, it's all in the bottom tables. I pointed out in the text that there was a lot of overlap in quality on the AMD vs AMD and Nvidia vs Nvidia, so sometimes bars are obscured — completely in the case of AMD's RDNA 2 vs RDNA 3 with H.264, as they have exactly the same quality results. Nvidia's 1080 Ti and 1650 are also identical, for both H.264 and HEVC, while the 2080 Ti and 3090 are only perfectly identical on their HEVC results.

I did post a video of this yesterday, where I skipped including the GTX 1080 Ti quality lines and the RTX 2080 Ti quality lines for precisely this reason. I figured our readers would be able to grok the results, or refer to the numerical table for the full explanation, since many of them prefer more data rather than less.

View: https://www.youtube.com/watch?v=elZH8iXGTPk
I revisited all the charts and see where you're coming from. It's sad to see so little attention to H.264 from AMD, especially when you support them to keep Nvidia in check so we don't have "tick tock Intel" in GPU's. Correct me if I'm wrong, but even Google, one of the main creators of AV1, didn't implement it fully in Youtube, so AMD's gamble on the codec seems rather misplaces at the moment.
 

bit_user

Polypheme
Ambassador
It's sad to see so little attention to H.264 from AMD
For them, it would probably mean taking resources away from somewhere else. So... where should they have cut back, in order to redo their H.264 encoder? Plus, there's perhaps a theoretical possibility that they release upgraded firmware/drivers which improves quality.

AV1, ... so AMD's gamble on the codec seems rather misplaces at the moment.
Hardware needs to anticipate what people are going to want to do, over the two years or so following its launch. And they have to make those decisions about what to emphasize, probably at least 2 years before launch.

For AMD, there's the added aspect of their iGPUs and consoles, both of which have a longer lifespan than the average mid/upper-end dGPU. So, you'd expect that to incentivize them to invest in new codecs rather early, since they'll want to have codec implementations tested, debugged, and ready to go for the next round of consoles and APUs.
 
Last edited:

armirol

Distinguished
Feb 22, 2012
14
0
18,520
Looking at the charts, I assumed the "CPU" was still using its Quicksync Video hardware encoder, but I see Jarred says it's using a pure software path. Very impressive!

Would've been cool to use a Ryzen 7950X and remove all doubt about whether any hardware assist was at play, but I trust Jarred to know what he's doing.
Rx7900xtx works a lot better with AMD CPUs and can use SAM (about +10-15% perfs vs intel), i would agreed to get a CPU comparison in that article to get a better overall comparison... we also see a slight reduction of performances with AMD CPUs and Rtx4000 GPUs. It seems the optimum combinations are Ryzen 7000/Rx7000 or Intel/Rtx4000, other mix are not performing as good.
 
I thought SAM and ReBAR were basically equivalent and worked with any modern GPU. You're saying not?
I haven't done extensive testing, but AMD has indicated in the past that SAM (which is just AMD's platform/GPU specific implementation of ReBAR) works better with AMD CPU + AMD GPU than other options. So, according to AMD:

AMD CPU + AMD GPU + SAM = best
Intel CPU + AMD GPU + ReBAR ~= AMD CPU + Nvidia GPU + SAM = "okay"
Intel CPU + Nvidia GPU = worst

And by "best/okay/worst" it's referring to the potential performance gains from SAM/ReBAR. But gains from SAM/ReBAR are also pretty game specific, so one or two games might get 20% faster, a lot are going to be 5~10% faster, and many others will be in the 0~5% range — and a few select games may even see negative scaling.
 
  • Like
Reactions: klavs and armirol
Jan 3, 2024
3
4
15
Hi, I was doing some research about video encoding quality with HW acceleration and saw your article. Firstly, nice job. One thing interesting I found is that, before Arc and Meteor Lake, Intel implmented two kind of video hardware engine: 1. Es=Hardware Encode via (PAK) + Shader (media kernel +VME); 2. E=Hardware Encode via low power VDEnc.
Es is actually Media Engine + GPU shader. E is the full hardware encoder via Media Engine only.
This documents in following link:
https://www.intel.com/content/www/u...n-guide-gpu/2023-0/media-engine-hardware.html
Based on my test, VDEnc is a few times faster than Es, which may make encoding performance of UHD770 match or close to Arc A770. But the trade off is minor VMAF drop (less than 1% according to my test).
VDEnc encoding is enabled by using "-low_power 1" swith in ffmpeg when using QSV. I downloaded your test file and appears that you were using Es doing your test. Probably you can give VDEnc a shot, if you are interested.
 
Last edited:
  • Like
Reactions: bit_user

bit_user

Polypheme
Ambassador
Hi, I was doing some research about video encoding quality with HW acceleration and saw your article. Firstly, nice job. One thing interesting I found is that, before Arc and Meteor Lake, Intel implmented two kind of video hardware engine:

1. Es=Hardware Encode via (PAK) + Shader (media kernel +VME);
2. E=Hardware Encode via low power VDEnc.

Es is actually Media Engine + GPU shader. E is the full hardware encoder via Media Engine only.

This documents in following link:

Based on my test, E is a few times faster than ES, which may make encoding performance of UHD770 match or close to Arc A770. But the trade off is minor VMAF drop (less than 1% according to my test).

ES encoding is enabled by using "-low_power 1" switch in ffmpeg when using QSV.

I downloaded your test file and appears that you were using Es doing your test. Probably you can give E a shot, if you are interested.
Tagging @JarredWaltonGPU , so he sees this.
 
Hi, I was doing some research about video encoding quality with HW acceleration and saw your article. Firstly, nice job. One thing interesting I found is that, before Arc and Meteor Lake, Intel implmented two kind of video hardware engine: 1. Es=Hardware Encode via (PAK) + Shader (media kernel +VME); 2. E=Hardware Encode via low power VDEnc.

Es is actually Media Engine + GPU shader. E is the full hardware encoder via Media Engine only.
This documents in following link:
https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/media-engine-hardware.html

Based on my test, VDEnc is a few times faster than Es, which may make encoding performance of UHD770 match or close to Arc A770. But the trade off is minor VMAF drop (less than 1% according to my test).
VDEnc encoding is enabled by using "-low_power 1" swith in ffmpeg when using QSV. I downloaded your test file and appears that you were using Es doing your test. Probably you can give VDEnc a shot, if you are interested.
"A few times faster" than whatever the default is? That seems hard to believe. The numbers I got already show the Arc A770 running at reasonably competitive performance levels — not the fastest, but not that much slower than the RTX 40-series stuff. Even a 50% boost in performance (1.5X) would put it into the lead.

Anyway, thanks for the heads up. I don't know when I'll get around to retesting this, because there's a lot of stuff coming down the pipeline in the next month, and I'll need to relearn how to do all of this testing to update the charts if I retest things. Last time I did anything with the video encoding stuff was in March 2023.
 
  • Like
Reactions: bit_user
Jan 3, 2024
3
4
15
"A few times faster" than whatever the default is? That seems hard to believe. The numbers I got already show the Arc A770 running at reasonably competitive performance levels — not the fastest, but not that much slower than the RTX 40-series stuff. Even a 50% boost in performance (1.5X) would put it into the lead.

Anyway, thanks for the heads up. I don't know when I'll get around to retesting this, because there's a lot of stuff coming down the pipeline in the next month, and I'll need to relearn how to do all of this testing to update the charts if I retest things. Last time I did anything with the video encoding stuff was in March 2023.
Yes, on my dell laptop with i5-1345u to do H2640->H265 transcode for 1080p video, using QSV HEVC veryslow preset with global_quality = 24 (~4M bit rate), I got about 50fps using Pak and 320fps using VDEnc. And you should see in Windows Task Manager that video decode unit is max out when using VDEnc. But when using PAK, it is 3D unit that max out. This indicating they are using different circuits.
Anyway, just to let you know we have this option. Looks like there are very few articles on internet mentioned this option. And it is also not a default option for ffmpeg. So most people may not know about it at all. Some video editing software (like Handbrake) enables it by default, but some others not.
 
Yes, on my dell laptop with i5-1345u to do H2640->H265 transcode for 1080p video, using QSV HEVC veryslow preset with global_quality = 24 (~4M bit rate), I got about 50fps using Pak and 320fps using VDEnc. And you should see in Windows Task Manager that video decode unit is max out when using VDEnc. But when using PAK, it is 3D unit that max out. This indicating they are using different circuits.
Anyway, just to let you know we have this option. Looks like there are very few articles on internet mentioned this option. And it is also not a default option for ffmpeg. So most people may not know about it at all. Some video editing software (like Handbrake) enables it by default, but some others not.
Oh, yeah... slower laptop CPUs could behave very differently than the desktop parts. I tested everything on a 13900K, so CPU bottlenecks are going to be much reduced compared to an i5-1345U. I also wonder if there's anything extra I need to pass to ffmpeg for the AMD and Nvidia GPUs. I swear, figuring out the "correct" ffmpeg options is black magic.