I don't have any duplicate GPUs for most of the cards I don't think, and only RTX 3090/3090 Ti and some RTX 20-series stuff support NVLink. I would think there's a way to just have a project like Stable Diffusion use multiple GPUs without SLI, NVLink, or other connectors, but it would have to programmed into the repository. By default I think it just selects the first GPU? To be honest, I've never even tried doing two GPUs in recent history, not since Nvidia basically declared SLI dead with the RTX 30-series.If it is possible to generate 2x images at the same time, then two 2080's could match the performance of a 4080. are you able to test the 20xx series with dual gpus, as well as, the 30xx and 40xx series? updating this article with the 4060 might be an option, too. Thanks.
Likely optimizations are lacking. I haven't retested lately, but AMD sent out a guide with the RX 7600 launch where it suggested using the Nod.ai version. Which is funny, because I started using Nod.ai's release months before AMD. 🙃Hi @JarredWaltonGPU, thanks a lot for the article! I just tried on my rig here (6800 XT) and got two distinct result:
I'm running a bunch of updated numbers. Latest Nod.ai (stable release, anyway — the daily automatic builds are still slower with one that I just checked yesterday) now does quite a bit better. I just need to retest a bunch of GPUs. I'm about half-way there (whoa, livin' on a prayer).Hello.
I think the boost promised by AMD is here.
7.734 IT/s on Radeon 6950XT (AMD RocM / Linux)
Entirely possible, though these days I suspect the difference has shrunk quite a bit. That's partly because I'm also quite sure that Nod.ai is pretty heavily invested in making AMD GPUs look as good as possible. I say that because I have tried to test Nod.ai with Nvidia, and the results were universally poor at the time I last tried it, and also because I have tried the latest A1111 instructions for running it on AMD under Windows, and performance was also worse than Nod.ai.I have to make more tests but I think A1111 + AMD GPUs under Linux are much faster than Nod.ai/Windows.
ROCm doesn’t work with all AMD GPUs. Also, images per minute is not directly comparable to iterations per second, as the latter omits a lot of the time taken. I tried to get ROCm working with the 7900 XTX recently under Linux and eventually gave up in frustration… again.AMD results are totally flawed, Windows was used instead of Linux :-/
DirectML is much slower than RocM.
According to this, their ROCm support for RDNA3 is still ongoing, with a suggestion that ROCm 6.0 will be the one to watch for:ROCm doesn’t work with all AMD GPUs.
I have asked AMD about this directly, though I haven't really received a straight answer yet. DirectML seems to be the "preferred" way to do SD on AMD GPUs, at least for now, though performance on RDNA 2 doesn't look great. I think ROCm under Linux is supposed to do much better than DirectML on RDNA 2.According to this, their ROCm support for RDNA3 is still ongoing, with a suggestion that ROCm 6.0 will be the one to watch for:
I love you Jared.For anyone still following this thread, I have (finally) updated all the testing for all the pertinent GPUs and published a new article, redirecting the old one (because that's what our SEO team tells us is best). So, here's the new article:
A tale as old as time. I'm certain that millions of dev hours over the past decade have been lost chasing after this mirage. AMD has probably made more self-congratulatory and blatantly false ROCm press announcements than the total amount of systems/users that have ever gotten it working.I tried to get ROCm working ... and eventually gave up in frustration… again.
Theoretically this makes sense, but practically I agree with using Windows for benchmarking. It is (currently) what the overwhelming majority of people use, and where all GPU companies are focusing their driver support, which recently includes some SD/LLM optimizations.AMD results are totally flawed, Windows was used instead of Linux :-/
DirectML is much slower than RocM.
OpenVINO existed before that. I first used it on a Skylake iGPU, back in early 2021. The main reason they could get it working so quickly is that Intel had an open source GPU software stack for at least as long as AMD, IIRC. Intel has also done a better job maintaining OpenCL support, which OpenVINO benefited from.Somehow Intel got their OpenVINO unconditionally working within a year of their dGPU release.
Not if you're talking about AI. For that, the OS of choice is & always has been Linux. Windows only enjoys better driver support if you're gaming.Theoretically this makes sense, but practically I agree with using Windows for benchmarking. It is (currently) what the overwhelming majority of people use, and where all GPU companies are focusing their driver support,