News Stable Diffusion Benchmarked: Which GPU Runs AI Fastest

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
If it is possible to generate 2x images at the same time, then two 2080's could match the performance of a 4080. are you able to test the 20xx series with dual gpus, as well as, the 30xx and 40xx series? updating this article with the 4060 might be an option, too. Thanks.
I don't have any duplicate GPUs for most of the cards I don't think, and only RTX 3090/3090 Ti and some RTX 20-series stuff support NVLink. I would think there's a way to just have a project like Stable Diffusion use multiple GPUs without SLI, NVLink, or other connectors, but it would have to programmed into the repository. By default I think it just selects the first GPU? To be honest, I've never even tried doing two GPUs in recent history, not since Nvidia basically declared SLI dead with the RTX 30-series.
 
  • Like
Reactions: AndrewJacksonZA
Jun 21, 2023
1
0
10
Hi @JarredWaltonGPU, thanks a lot for the article! I just tried on my rig here (6800 XT) and got two distinct result:
  • using the nod-ai shark, which its seems vulkan based, i got around 9.2 it/s as per screenshot here.
  • using the automatic1111, which is rocm based (running on ubuntu), i got similar score like yours, around 3.5-4 it/s.
So I'm wondering why the vulkan backend is better than the rocm one 🤔
 
Hi @JarredWaltonGPU, thanks a lot for the article! I just tried on my rig here (6800 XT) and got two distinct result:
  • using the nod-ai shark, which its seems vulkan based, i got around 9.2 it/s as per screenshot here.
  • using the automatic1111, which is rocm based (running on ubuntu), i got similar score like yours, around 3.5-4 it/s.
So I'm wondering why the vulkan backend is better than the rocm one 🤔
Likely optimizations are lacking. I haven't retested lately, but AMD sent out a guide with the RX 7600 launch where it suggested using the Nod.ai version. Which is funny, because I started using Nod.ai's release months before AMD. 🙃
 
  • Like
Reactions: AndrewJacksonZA
Hello.

I think the boost promised by AMD is here.
7.734 IT/s on Radeon 6950XT (AMD RocM / Linux)
I'm running a bunch of updated numbers. Latest Nod.ai (stable release, anyway — the daily automatic builds are still slower with one that I just checked yesterday) now does quite a bit better. I just need to retest a bunch of GPUs. I'm about half-way there (whoa, livin' on a prayer).
 
  • Like
Reactions: bit_user
I have to make more tests but I think A1111 + AMD GPUs under Linux are much faster than Nod.ai/Windows.
Entirely possible, though these days I suspect the difference has shrunk quite a bit. That's partly because I'm also quite sure that Nod.ai is pretty heavily invested in making AMD GPUs look as good as possible. I say that because I have tried to test Nod.ai with Nvidia, and the results were universally poor at the time I last tried it, and also because I have tried the latest A1111 instructions for running it on AMD under Windows, and performance was also worse than Nod.ai.

Anyway, I'm also sure that there are more optimized variants of SD than A1111 for Nvidia GPUs. Especially if we wanted to start looking for tuned versions that use the FP8 mode on the tensor cores, which would be a potentially easy doubling of performance. But the good thing is that Nod.ai, OpenVINO, and the base Automatic1111 instructions can all get up and running with a minimum of hassle under Windows.
 
  • Like
Reactions: AndrewJacksonZA
Sep 14, 2023
4
0
10
A1111/Windows/AMD is not using RocM but OpenML. It works but it's ~1/10 speed from RocM.
Nod.ai/Windows/AMD seems to work better but with no XL models support, you miss a lot new models.
 
For anyone still following this thread, I have (finally) updated all the testing for all the pertinent GPUs and published a new article, redirecting the old one (because that's what our SEO team tells us is best). So, here's the new article:

 
AMD results are totally flawed, Windows was used instead of Linux :-/
DirectML is much slower than RocM.
ROCm doesn’t work with all AMD GPUs. Also, images per minute is not directly comparable to iterations per second, as the latter omits a lot of the time taken. I tried to get ROCm working with the 7900 XTX recently under Linux and eventually gave up in frustration… again.
 
According to this, their ROCm support for RDNA3 is still ongoing, with a suggestion that ROCm 6.0 will be the one to watch for:
I have asked AMD about this directly, though I haven't really received a straight answer yet. DirectML seems to be the "preferred" way to do SD on AMD GPUs, at least for now, though performance on RDNA 2 doesn't look great. I think ROCm under Linux is supposed to do much better than DirectML on RDNA 2.

But again, I note that getting things running under Linux is more difficult in general. At one point I had Automatic1111 working with ROCm on RDNA2 (this was early this year). Something changed along the way, and when I recently tried running Ubuntu I could not get things working. Linux kernels had changed, ROCm didn't install properly, whatever. I lost a day of work trying to make it function.
 

Firedrops

Distinguished
Apr 22, 2012
138
2
18,695
For anyone still following this thread, I have (finally) updated all the testing for all the pertinent GPUs and published a new article, redirecting the old one (because that's what our SEO team tells us is best). So, here's the new article:

I love you Jared.

I tried to get ROCm working ... and eventually gave up in frustration… again.
A tale as old as time. I'm certain that millions of dev hours over the past decade have been lost chasing after this mirage. AMD has probably made more self-congratulatory and blatantly false ROCm press announcements than the total amount of systems/users that have ever gotten it working.

Somehow Intel got their OpenVINO unconditionally working within a year of their dGPU release.

AMD results are totally flawed, Windows was used instead of Linux :-/
DirectML is much slower than RocM.
Theoretically this makes sense, but practically I agree with using Windows for benchmarking. It is (currently) what the overwhelming majority of people use, and where all GPU companies are focusing their driver support, which recently includes some SD/LLM optimizations.
 
Last edited:
  • Like
Reactions: JarredWaltonGPU

bit_user

Polypheme
Ambassador
Somehow Intel got their OpenVINO unconditionally working within a year of their dGPU release.
OpenVINO existed before that. I first used it on a Skylake iGPU, back in early 2021. The main reason they could get it working so quickly is that Intel had an open source GPU software stack for at least as long as AMD, IIRC. Intel has also done a better job maintaining OpenCL support, which OpenVINO benefited from.

Theoretically this makes sense, but practically I agree with using Windows for benchmarking. It is (currently) what the overwhelming majority of people use, and where all GPU companies are focusing their driver support,
Not if you're talking about AI. For that, the OS of choice is & always has been Linux. Windows only enjoys better driver support if you're gaming.
 
  • Like
Reactions: JarredWaltonGPU
Status
Not open for further replies.