News Nvidia GeForce Driver Promises Doubled Stable Diffusion Performance

The 2x performance claim is certainly interesting, given we can more or less get 50% uplift with SD using hardware-specific optimized models.

Though it looks like this optimized version has to be executed through an ONNX pipeline which is not out of the box for SD webui as of now. Also interested to know the relative memory requirements between using the optimized pipeline vs the current SD pipeline, since they haven't mentioned this.

Kind of off topic, and not directly related to this news, but if anyone is interested now, for preliminary testing generative AI models, the Dolly 2.0 large language model is actually now on Hugging Face, which is also an Olive-optimized version.

NVIDIA's NeMo LLM for conversational AI is also coming soon, although there is no ETA.


It'd be interesting to see if these older-architecture cards that don't sport any Tensor cores see a similar 2x performacne improvement in ML acceleration or not.

I
don't think the older GTX cards can get a 2X perf improvement uplift, unlike the RTX GPUs, since they obviously lack specific dedicated hardware.
 
The 2x performance claim is certainly interesting, given we can more or less get 50% uplift with SD using hardware-specific optimized models.

Though it looks like this optimized version has to be executed through an ONNX pipeline which is not out of the box for SD webui as of now. Also interested to know the relative memory requirements between using the optimized pipeline vs the current SD pipeline, since they haven't mentioned this.

Kind of off topic, and not directly related to this news, but if anyone is interested now, for preliminary testing generative AI models, the Dolly 2.0 large language model is actually now on Hugging Face, which is also an Olive-optimized version.

NVIDIA's NeMo LLM for conversational AI is also coming soon, although there is no ETA.



I don't think the older GTX cards can get a 2X perf improvement uplift, unlike the RTX GPUs, since they obviously lack specific dedicated hardware.
Looks like I have something else to try and figure out. Gotta talk to Nvidia and see what exactly is needed to get this speedup with Automatic1111's WebUI.

As for the older GTX cards, I'm also curious if there's any benefit from these drivers. Not even 2X, but maybe 10-25% faster? But the last time I tried Automatic1111 on a GTX 1660 Super, it was horribly slow — far slower than it ought to be!

M6upEQHu7FHtmUcBpYVjK8.png


Considering RX 6600 manages over two images per minute, I'd expect a GTX 1660 Super to at least be able to do maybe one per minute. Last time I tried, I think I got about one 512x512 image every two and a half minutes!
 
Looks like I have something else to try and figure out. Gotta talk to Nvidia and see what exactly is needed to get this speedup with Automatic1111's WebUI.

As for the older GTX cards, I'm also curious if there's any benefit from these drivers. Not even 2X, but maybe 10-25% faster? But the last time I tried Automatic1111 on a GTX 1660 Super, it was horribly slow — far slower than it ought to be!

M6upEQHu7FHtmUcBpYVjK8.png


Considering RX 6600 manages over two images per minute, I'd expect a GTX 1660 Super to at least be able to do maybe one per minute. Last time I tried, I think I got about one 512x512 image every two and a half minutes!
If you figure it out, I for one would love a "how to" article so I can follow along at home.
 
The training speed will actually depend on many other factors as well. But I will let Jarred confirm this.
 
I tested it on my GTX 1080.
Installed new drivers, optimized the ONNX model.
But the generation time has not changed. Tested on 512x512 50 steps, 1 image. It was 24 seconds and it became 24 seconds.

I created a question on github, I hope to explain there why there are no changes, or just the model without RT kernels does not improve

241099936-c881448f-5279-4389-8330-241d8b11b246.png
 
  • Like
Reactions: JarredWaltonGPU