Video Encoding Tested: AMD GPUs Still Lag Behind Nvidia, Intel

AMD expected ppl to buy expensive threadripper CPU for streaming. That was literally one of their promotion slides.

But x86 CPU suck at parallelisation.

So everyone went with Nvidia's NVENC and CUDA instead. Rightfully so.

To this day, AMD sucks at GPU encoding and decoding. You're lucky if your whole encode isn't a corrupted file.
 
  • Like
Reactions: Tac 25
Thanks for the data!

It would've been nice to add a CPU-based encoded video for reference and see how any of the fixed-function pipelines of the GPUs compare. EDIT: I'm dumb as always missing something in the graphs; there's the 13900K, so nice one!

I've been using software AV1 with the CPU on "fast" and it beats anything on the GPU. It's kind of hilarious. And I "only" have a 5800X3D. Newer CPUs should perform even better; specially CPUs with even more cores.

Code:
Video
ID                             : 1
Format                         : AV1
Format/Info                    : AOMedia Video 1
Format profile                 : Main@L4.1
Codec ID                       : V_AV1
Duration                       : 4 min 2 s
Width                          : 1 920 pixels
Height                         : 1 080 pixels
Display aspect ratio           : 16:9
Frame rate mode                : Constant
Frame rate                     : 60.000 FPS
Color space                    : YUV
Chroma subsampling             : 4:2:0
Bit depth                      : 8 bits
Default                        : Yes
Forced                         : No
Color range                    : Limited
Color primaries                : BT.709
Transfer characteristics       : BT.709
Matrix coefficients            : BT.709

That's just the video general settings as described by the encoder. Also, I only use OBS as it is the practical and realistic software to use.

Regards.
 
Last edited:
Great article! Looks like you put a ton of work into making a fair comparison. I'm impressed the CPU encoding was able to hold its own against dedicated silicon, even if it uses much more power. Which CPU encoder did you use?

I'm happy recording NVENC_HEVC 720p 20Mbps and re-encoding 20-second clips at 3Mbps for sharing on discord. Looking at these results seems like I can decrease that further to 10Mbps. Cheers!
 
We tested multiple generations of AMD, Intel, and Nvidia GPUs to look at both encoding performance and quality. Here's how the various cards stack up.

Video Encoding Tested: AMD GPUs Still Lag Behind Nvidia, Intel : Read more
Would you also be willing to run an x264 encode at "very slow" preset, tune -film (or -grain) to see how quality compares? For those of us doing encodes not for streaming but for archive purposes, its always good to stay abreast of how all the encoders are performing from a quality first perspective.
 
It would've been nice to add a CPU-based encoded video for reference and see how any of the fixed-function pipelines of the GPUs compare. EDIT: I'm dumb as always missing something in the graphs; there's the 13900K, so nice one!
Looking at the charts, I assumed the "CPU" was still using its Quicksync Video hardware encoder, but I see Jarred says it's using a pure software path. Very impressive!

Would've been cool to use a Ryzen 7950X and remove all doubt about whether any hardware assist was at play, but I trust Jarred to know what he's doing.
 
Thank you for the effort! I don't stream, but I am doing a video series of 3D open-world games on YT for fun, recorded at 1440p. And a concern was to not have a file size too huge, while having the full-screen end result as crisp as the source screengrabs show or at least as in the screengrab below (from the video), without an easily visible blur as can be seen with the grass in the other screengrabs, which can also occur e.g. with the faces and clothing of NPCs.

In that regard it is nice to see that I likely wouldn't be able to push the bitrate down much with another GPU (instead of the RX 6700 XT here) for the same end quality, as the video encoding quality seems to level out beyond a certain point for all the (newer) GPUs. But also good to know that hardware can make a visible difference when going with a lower bitrate, such as for the streaming.
IjGUTzx.jpg
 
Looking at the charts, I assumed the "CPU" was still using its Quicksync Video hardware encoder, but I see Jarred says it's using a pure software path. Very impressive!

Would've been cool to use a Ryzen 7950X and remove all doubt about whether any hardware assist was at play, but I trust Jarred to know what he's doing.
The 7950X does have an iGPU, but I wonder if it has the fixed-hardware encoder in it?

The APUs all have the full fledged VCE built in at least, but I'm not 100% sure the iGPU in the Ry7K does. The Wiki does say they have VCN 3.1, so RDNA1 and 2 encoding engines.

Regards.
 
  • Like
Reactions: bit_user
The 7950X does have an iGPU, but I wonder if it has the fixed-hardware encoder in it?
Not sure, but I assuming it's low-performance enough that we'd notice.

Perhaps the best option would be to benchmark both the iGPU and pure software path. As long as they perform differently, we can be reasonably certain the software path really isn't using the iGPU.
 
Thank you for the effort! I don't stream, but I am doing a video series of 3D open-world games on YT for fun, recorded at 1440p. And a concern was to not have a file size too huge, while having the full-screen end result as crisp as the source screengrabs show or at least as in the screengrab below (from the video), without an easily visible blur as can be seen with the grass in the other screengrabs, which can also occur e.g. with the faces and clothing of NPCs.

In that regard it is nice to see that I likely wouldn't be able to push the bitrate down much with another GPU (instead of the RX 6700 XT here) for the same end quality, as the video encoding quality seems to level out beyond a certain point for all the (newer) GPUs. But also good to know that hardware can make a visible difference when going with a lower bitrate, such as for the streaming.
When you upload to youtube the video is re-encoded. Given the tiny file size that results, the encoder that youtube uses is the best you can get. It's great for minimising file sizes for portability - like presentations. You just upload to youtube, wait for re-encoding, then download again without publishing the video.

- But since it will always be re-encoded, for maximum quality, you are better off uploading the uncompressed video (if you are able).
 
Great article! Looks like you put a ton of work into making a fair comparison. I'm impressed the CPU encoding was able to hold its own against dedicated silicon, even if it uses much more power. Which CPU encoder did you use?

I'm happy recording NVENC_HEVC 720p 20Mbps and re-encoding 20-second clips at 3Mbps for sharing on discord. Looking at these results seems like I can decrease that further to 10Mbps. Cheers!
libx264, libx265, and libsvtav1, using the "medium" presets on the first two and I think the default on AV1. I had some odd behavior while testing where some of the presets caused issues on my 13900K — as in, the CPU hit 100C and the PC crashed. I can't recall exactly, but I think the VMAF scores weren't increasing much with the slower encodes. It's probably the use of a target bitrate, which maybe overrides the preset or something.
Looking at the charts, I assumed the "CPU" was still using its Quicksync Video hardware encoder, but I see Jarred says it's using a pure software path. Very impressive!

Would've been cool to use a Ryzen 7950X and remove all doubt about whether any hardware assist was at play, but I trust Jarred to know what he's doing.
I guess I didn't make that entirely clear, and the charts on this one are a bit of a kludge so I left off the UHD 770 results. I didn't test the 7950X mostly because it's not really a CPU showdown, though I suspect performance is going to be relatively close to the 13900K. Here's the full table of data for reference. (It will probably look a lot better in here than it would on our main site! Too bad it didn't import the bold and borders and centering.) Anyway, 13900K is faster at CPU-based encoding than the UHD 770 QuickSync, which isn't too surprising given the 8 P-core and 16 E-core configuration.

Update! No, our forums did not like my table. Let's try pasting the image instead...

211
 
  • Like
Reactions: P1nky and bit_user
I had some odd behavior while testing where some of the presets caused issues on my 13900K — as in, the CPU hit 100C and the PC crashed.
🤯

Was it running with stock settings?

Anyway, 13900K is faster at CPU-based encoding than the UHD 770 QuickSync,
Nice! I bet UHD is still a lot more power-efficient. Less likely to send your CPU into meltdown!
🫠
 
🤯 Nice! I bet UHD is still a lot more power-efficient. Less likely to send your CPU into meltdown! 🫠
Yeah, for sure. I think the problem might be with the motherboard firmware, but I honestly can’t say for certain. I mean, bad code can crash a PC as well. Still, the fact that it hard locked was weird. But then, Hogwarts Legacy also started hard locking the PC after last week’s update!
 
  • Like
Reactions: bit_user
I would be very interested to see how QuickSync stacks up in a test like this as well.
The name of i9-13900K's iGPU is "UHD 770". Look for that column, in the image Jarred attached to post #13.

The main piece of information we're missing (for all devices) is the average Watts consumed. Better yet might be to actually have the Joules used, during the entire encode.
 
libx264, libx265, and libsvtav1, using the "medium" presets on the first two and I think the default on AV1. I had some odd behavior while testing where some of the presets caused issues on my 13900K — as in, the CPU hit 100C and the PC crashed. I can't recall exactly, but I think the VMAF scores weren't increasing much with the slower encodes. It's probably the use of a target bitrate, which maybe overrides the preset or something.

I guess I didn't make that entirely clear, and the charts on this one are a bit of a kludge so I left off the UHD 770 results. I didn't test the 7950X mostly because it's not really a CPU showdown, though I suspect performance is going to be relatively close to the 13900K. Here's the full table of data for reference. (It will probably look a lot better in here than it would on our main site! Too bad it didn't import the bold and borders and centering.) Anyway, 13900K is faster at CPU-based encoding than the UHD 770 QuickSync, which isn't too surprising given the 8 P-core and 16 E-core configuration.

Update! No, our forums did not like my table. Let's try pasting the image instead...

View attachment 211
Jared, do you have the final file sizes with each encode?
Is there any way you can list the file sizes with each specific encode as well?

Cause file size efficiency relative to VMAF score is kinda important to me.

Also, what was the encode times and your subjective image quality analysis as well as the VMAF.
 
Are we using quicksync or the normal software encoder for those tests, though? I don't see any mention of quicksync in the article.
The data in the article is using software-only. The chart Jarred included post #13 contains both, with the CPU column being software-only and the UHD 770 column being Quicksync.

@JarredWaltonGPU , I think he wants you to confirm.

Jared, do you have the final file sizes with each encode?
As long as they're within a couple % of the specified bitrates, I don't think they need to be listed. However, it would be good to know if any significantly overshot or undershot the bitrate target.

Edit: I just noticed the article says:

"Even then, there are slight differences in encoded file sizes (about a +/-5% spread)."

Also, what was the encode times
Each image pane contains both the VMAF score as the first image, and the encoding speed (i.e. in terms of fps) as the second.

your subjective image quality analysis
The last image pane includes screenshots from each, in case you want to asses for yourself.
 
Last edited:
As long as they're within a couple % of the specified bitrates, I don't think they need to be listed. However, it would be good to know if any significantly overshot or undershot the bitrate target.
I'd like to know if they undershot, overshot, or was exact on the file size.

Each image pane contains both the VMAF score as the first image, and the encoding speed (i.e. in terms of fps) as the second.
It's cute that you list FPS, but I still liked to know the time since we can compare the Min and Max across all the various encode tests.

The last image pane includes screenshots from each, in case you want to asses for yourself.
A screen shot isn't the same as watching the entire video clip and comparing with the original source.
 
I'd like to know if they undershot, overshot, or was exact on the file size.
The article says:
"Even then, there are slight differences in encoded file sizes (about a +/-5% spread)."​

It also says:
"if you want best quality you'd generally need to opt for CPU-based encoding with a high CRF (Constant Rate Factor)"​
So, what the article measured isn't terribly relevant to someone encoding with a focus on quality. Also, you'd want to use 2-pass.

It's cute that you list FPS, but I still liked to know the time since we can compare the Min and Max across all the various encode tests.
I assume FPS was computed by dividing the total number of frames by the encoding time. Perhaps Jarred can confirm.
 
I think your VMAF scores are wrong.

From 1Encodes CPU-13900K-Medium.txt, I got this sample VMAF command line:

ffmpeg.exe -i BL3-Seq1-1080p-CPU265-13900K-Medium-3M.mp4 -i BL3-Seq1-1080p.mp4 -lavfi [0:v]setpts=PTS-STARTPTS[reference];[1:v]setpts=PTS-STARTPTS[distorted];[distorted][reference]libvmaf=n_threads=20 -f null -

But you mixed [0:v]...[reference] with [1:v]...[distorted]

Based on the order give in ffmpeg -i ... -i ... :
0:v is BL3-Seq1-1080p-CPU265-13900K-Medium-3M.mp4 (and is the distorted)
1:v is BL3-Seq1-1080p.mp4 (and is the reference)

The above command gives VMAF 79.897865

The correct VMAF command line should be:

ffmpeg.exe -i BL3-Seq1-1080p-CPU265-13900K-Medium-3M.mp4 -i BL3-Seq1-1080p.mp4 -lavfi [1:v]setpts=PTS-STARTPTS[reference];[0:v]setpts=PTS-STARTPTS[distorted];[distorted][reference]libvmaf=n_threads=20 -f null -

and gives a VMAF of 70.303203

The order in VMAF is VERY important. As you can see, it gives entirely different results.
See also here: https://stackoverflow.com/questions/67598772/right-way-to-use-vmaf-with-ffmpeg
 
  • Like
Reactions: klavs