News Nvidia Boosts AI Performance With TensorRT

Admin · Oct 17, 2023

Nvidia has released TensorRT support for large language models, including Stable Diffusion, boosting performance by up to 70% in our testing. In other workloads, Nvidia touts up to a 4X improvement in throughput.

Nvidia Boosts AI Performance With TensorRT : Read more

derekullo · Oct 18, 2023

Would it be possible doing a upscale test to see how long it takes to upscale a single image with SwinIR or another upscaler?

Using a 1024x1024 image from Bing AI I am able to upscale it to 8192x8192 using an 8x scale in SwinIR with Stable Diffusion in about 4 minutes using a Geforce 3080 Ti.

I just stumbled upon SwinIR the other day ... had no idea you can add detail so easily!

JarredWaltonGPU · Oct 18, 2023

derekullo said:
Would it be possible doing a upscale test to see how long it takes to upscale a single image with SwinIR or another upscaler?

Using a 1024x1024 image from Bing AI I am able to upscale it to 8192x8192 using an 8x scale in SwinIR with Stable Diffusion in about 4 minutes using a Geforce 3080 Ti.

I just stumbled upon SwinIR the other day ... had no idea you can add detail so easily!

I haven't played around much with upscaling, as it tends to be very memory intensive and runs into out of memory errors a lot. I'm not sure what variant of SwinIR you're using, but the "SwinIR_4x" that's part of the base Automatic1111 install isn't doing so hot trying to go from 768x768 to 3072x3072 (4x upscale) with an RTX 2080 Ti. A 2x upscale went relatively quickly, which makes me think this is barely fitting in system+GPU memory and will take forever — I don't even have a progress bar yet! LOL Poking around a bit more, I seem able to do 768x768 to 1920x1920 via a 2.5X upscale without going beyond 8~10 GB of VRAM use.

I had the 2080 Ti in my test PC from yesterday, so maybe it would do better with a 30-series card. But I'm not sure what settings exactly you're using with SwinIR, or if you're using it in a different fashion than the standard A1111 integration. (You probably are, since I can only go to 4X with the built-in options.)

Details on what precisely you're doing would be helpful, but I also suspect that while I might be able to get this working to varying degrees with Nvidia GPUs, it could be more problematic with the Intel and AMD GPUs.

derekullo · Oct 19, 2023

SwinIR_4x was the setting I used ... sorry for the confusion!
I also use the base Automatic111 install.
Not sure why your resize slider doesn't go to 8x.
I didn't set many other settings besides the upscaler, image scale multiplier and the image itself.
Upscaler 2 is set to none
GFPGAN visibility, CodeFormer visibility, CodeFormer weight sliders at the bottom are set to 0.

Was able to upscale a 450x303 picture to 3600x2424 in 41.79s ...8x
Postprocess upscale by: 8, Postprocess upscaler: SwinIR_4x
Time taken: 41.79s
Torch active/reserved: 2982/3676 MiB, Sys VRAM: 5914/12288 MiB (48.13%)

Was able to upscale a 1024x1024 picture to 8192x8192 in 5 minutes 20 seconds.
Postprocess upscale by: 8, Postprocess upscaler: SwinIR_4x
Time taken: 5m20s
Torch active/reserved: 6943/9058 MiB, Sys VRAM: 11424/12288 MiB (92.97%)

Looking at those results a 12 gigabyte card may be needed for an 8x upscale on a 1024x1024 image.

More pixels takes more vram but the relationship is a bit unclear.
Multiplying it all out 1024x1024 is 7.69 times the amount of pixels as 450x303, but only about twice the amount of vram.

JarredWaltonGPU · Oct 19, 2023

derekullo said:
SwinIR_4x was the setting I used ... sorry for the confusion!
I also use the base Automatic111 install.
Not sure why your resize slider doesn't go to 8x.
I didn't set many other settings besides the upscaler, image scale multiplier and the image itself.
Upscaler 2 is set to none
GFPGAN visibility, CodeFormer visibility, CodeFormer weight sliders at the bottom are set to 0.

Was able to upscale a 450x303 picture to 3600x2424 in 41.79s ...8x
Postprocess upscale by: 8, Postprocess upscaler: SwinIR_4x
Time taken: 41.79s
Torch active/reserved: 2982/3676 MiB, Sys VRAM: 5914/12288 MiB (48.13%)

Was able to upscale a 1024x1024 picture to 8192x8192 in 5 minutes 20 seconds.
Postprocess upscale by: 8, Postprocess upscaler: SwinIR_4x
Time taken: 5m20s
Torch active/reserved: 6943/9058 MiB, Sys VRAM: 11424/12288 MiB (92.97%)

Looking at those results a 12 gigabyte card may be needed for an 8x upscale on a 1024x1024 image.

More pixels takes more vram but the relationship is a bit unclear.
Multiplying it all out 1024x1024 is 7.69 times the amount of pixels as 450x303, but only about twice the amount of vram.

Are you doing this in one pass, or is this a separate upscale, after you've already generated an image? Any screenshots would be useful, as well as information on whether you're using xformers or not.

I am testing other GPUs right now and have a 3080 12GB in my test PC, so I'm trying to figure out how to test upscaling. One thing I'm relatively sure of is that less tuning and optimization has been put into the upscalers as opposed to the base Stable Diffusion stuff.

Here's what I've tried (which does not seem to be going too well as it has shown zero progress on the actual resizing process). I don't know if you're using the Hires.fix or something else. Best I can do, with a 12GB 3080, is about a 3X upscale from 768x768.

Are you running on Windows, or Linux? Did you have to do any special patching or whatever first?

derekullo · Oct 19, 2023

I am running xformers ... must have toggled that on when I first configured it.

@Echo off
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--xformers --listen
call webui.bat
Is the entire .bat i use to launch it.

I am up-scaling the image as a separate step after an image has been created ... no point up-scaling an image I don't like!

At the top click Extras and that should take you to the section I am at.

https://flic.kr/p/2pakvAw

I am pretty sure the checkpoint/model isn't used for this ... tried an anime checkpoint and the image was identical.
I am using Windows 11 Pro

JarredWaltonGPU · Oct 19, 2023

derekullo said:
I am running xformers ... must have toggled that on when I first configured it.

@Echo off
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--xformers --listen
call webui.bat
Is the entire .bat i use to launch it.

I am up-scaling the image as a separate step after an image has been created ... no point up-scaling an image I don't like!

At the top click Extras and that should take you to the section I am at.

https://flic.kr/p/2pakvAw

Ha! I never even looked in the "Extras" tab... because why would upscaling of an existing image be listed under "Extras" rather than "Upscaling"? I'll have to see if the upscaling works with non-Nvidia cards. I did confirm it works, going up to very high resolutions, even with cards that only have 10GB. I suppose it probably does a sort of tiled upscaling, whereas if you use the "Hires Fix" it does some weird stuff — like not just upscaling, but generating in a fashion that is not at all the same as the upscaling.

derekullo · Oct 19, 2023

Here are the actual 2 pictures from the test!

https://flic.kr/p/2paezih

https://flic.kr/p/2pakBqN

KarlKnecht · Oct 20, 2023

The future for Stable Diffusion is SDXL with a resolution of 1024x1024. Can you give us an idea of the performance boost we can expect for that scenario? Would it be similar?

JarredWaltonGPU · Oct 21, 2023

KarlKnecht said:
The future for Stable Diffusion is SDXL with a resolution of 1024x1024. Can you give us an idea of the performance boost we can expect for that scenario? Would it be similar?

I would expect relatively similar scaling, though my past experience suggests Nvidia GPUs do better at higher resolutions than AMD GPUs. I'm not sure what the exact cause might be, possibly just better memory management in the drivers, or CUDA, or whatever.

One thing to also consider is whether you even want to bother with SDXL. You could potentially do 512x512 and upscaling using SwinIR_4X and get similar results, faster. But that's a different can of worms to open up. 🙂

Which brings up the final point: I think the base models from Hugging Face (SD1.5, SD2.1, SDXL) can all work as is with the TensorRT stuff from Nvidia. But other things like upscaling may need tweaking and work to convert them to TensorRT. I'm not even sure how to go about doing that, which means you might end up back at base CUDA performance in many cases until/unless TensorRT uptake for AI stuff really takes off.

Search

News Nvidia Boosts AI Performance With TensorRT

Admin

Administrator

derekullo

Splendid

JarredWaltonGPU

Splendid

derekullo

Splendid

JarredWaltonGPU

Splendid

derekullo

Splendid

JarredWaltonGPU

Splendid

derekullo

Splendid

KarlKnecht

JarredWaltonGPU

Splendid

TRENDING THREADS

Latest posts

Moderators online

Share this page