Question What could be causing this behavior? Is my GPU failing, or is it likely something else?

maquis

Commendable
Apr 27, 2022
21
6
1,515
Hi!

I have a home-built PC that I set up about 2 years ago for 3d art with Blender. It includes a Ryzen 9 5950x CPU, RTX 3080 ti GPU. This week, I was rendering an animation, and I noticed that every frame of the animation was taking longer and longer to render, despite the frames being pretty similar. The first few were taking around 2-3 minutes to render, and by frame 27, I was killing the render after it had gone for 2.5 hours. I considered the possibility of a memory leak (even though I had previously done much longer renders without issues), so I tried closing blender, and even entirely logging out of my account on the computer, then starting the render up again, but I was still not able to make any progress when I started up again.

So, I rebooted. When I shut down, it took a pretty long time to even start booting up again, and then I got the bios message saying that I had a new CPU, and needed to set up the BIOS again. (I don't remember the text of it, but I think it's pretty standard when you change your CPU). All I had done was reboot, but I went in, and reset my BIOS settings (pretty much defaults except that XMP is on), then booted up. This time, restarting the render at frame 27 went great, and it was back to ~2.5 minute renders for a couple frames, and then started slowing down. By around frame 35, I killed it because it had been going for over an hour, and was 10% through the frame.

Rebooted again. This time, no weirdness in the reboot, but when it came back, the renders were still not moving at anywhere near a reasonable speed. There were a couple times that when I started a render, it quit claiming to be out of GPU memory within a couple minutes, but when it didn't have GPU memory issues, it was clear once it started doing samples that we were looking at a multi-hour frame, so I cancelled the render. Cancelling a render hung pretty badly and usually required using Task Manager to kill Blender completely.

So, I switched blender to rendering with CPU instead of GPU. Works fine -- 12-15 minutes per frame, which sucks compared to 2.5 minutes, but is pretty reasonable for CPU rendering. It is able to do plenty of frames in a row without any significant change in per-frame render time, so it doesn't seem to be having issues with that.

I also was able to move the blender file to a similarly-spec'd computer and verified that the other computer was able to do 30+ frames in a row in GPU render mode without significant change in per-frame render speed, so the slow-down seems to be machine-specific.

So, it *appears* to me that this is a GPU failure, or maybe a GPU RAM failure? Does that seem like a reasonable conclusion? Is there anything else that I'm missing that could cause this type of failure? Or, is there anything else that I should be trying/testing before I go through the 3+ week RMA process? (GPU is still under warranty.... as much as I'd love the excuse to upgrade, I'd rather get a free replacement).

I honestly just replaced the CPU in this machine about 2 months ago due to a CPU failure, so I'm wondering if there might be something else b0rk3d in the machine that is causing repeated hardware failures, although I admit that with 3d art, I am pushing the computer pretty hard and frequently do long renders / animations. I do have pretty good cooling, and whenever I check, the cpu and gpu temps are generally within normal ranges. (I have an iCue water cooling system for the CPU, and the case has a total of 10 fans...). I use a Corsair HX1200 PSU, and it seems to be fine. I admit, though, I'm not particularly good at hardware, and this is the first machine I ever built, so if there's something I could've done that would cause two components to fail so close together, 2 years after building, I'd like to fix it!

Thank you so much for the help!
 
When posting a thread of troubleshooting nature, it's customary to include your full system's specs. Please list the specs to your build like so:
CPU:
CPU cooler:
Motherboard:
Ram:
SSD/HDD:
GPU:
PSU:
Chassis:
OS:
Monitor:
include the age of the PSU apart from it's make and model. BIOS version for your motherboard at this moment of time.

Did the time in BIOS change to 00:00/12:00? If so, might want to replace the CMOS battery with a fresh cell.
 
CPU: ryzen 9 5950x
CPU cooler: icue h100i RGB pro x
Motherboard: gigabyte x570 Aorus master
Ram: Corsair 128gb. Pretty sure it's vengeance RGB pro. 4 32gb sticks. It all passed memtest a month ago.
SSD/HDD: Samsung m.2s. a 970 pro and 980 pro
GPU: gigabyte rtx 3080 ti gaming OC 12gb
PSU: Corsair hx1200
Chassis: Corsair 5000d airflow
OS: win11
Monitor: don't remember but this shouldn't be relevant here.
 
When posting a thread of troubleshooting nature, it's customary to include your full system's specs. Please list the specs to your build like so:
CPU:
CPU cooler:
Motherboard:
Ram:
SSD/HDD:
GPU:
PSU:
Chassis:
OS:
Monitor:
include the age of the PSU apart from it's make and model. BIOS version for your motherboard at this moment of time.

Did the time in BIOS change to 00:00/12:00? If so, might want to replace the CMOS battery with a fresh cell.
Added a reply providing the listed details.

I didnt pay attention to whether the bios time reset. I can check that if it does it again, but future reboots have not reset the bios. CMOS battery wouldn't cause issues while the machine is running though, right?
 
Well, looks like it might not be the video card. I swapped video cards with my spouse's computer (similar specs), and his computer continues to be able to render rapidly while mine is still *painfully* slow.

So, what else could be causing this? It isn't slow on CPU rendering, which uses the CPU and system RAM more. The slowness is during the time where it's running the samples -- they seem to load everything into GPU memory and save out the finished render with about the same speed, so it doesn't appear to be a disk issue. Would that make it a motherboard issue??? Or something else???

After the problem started, I did update to the latest Nvidia studio driver, but that didn't resolve anything....
 
Last edited:
Well, looks like it might not be the video card. I swapped video cards with my spouse's computer (similar specs), and his computer continues to be able to render rapidly while mine is still *painfully* slow.

So, what else could be causing this? It isn't slow on CPU rendering, which uses the CPU and system RAM more. The slowness is during the time where it's running the samples -- they seem to load everything into GPU memory and save out the finished render with about the same speed, so it doesn't appear to be a disk issue. Would that make it a motherboard issue??? Or something else???

After the problem started, I did update to the latest Nvidia studio driver, but that didn't resolve anything....
Definitely an odd problem. How much trouble would it be to uninstall/reinstall Blender?
 
Definitely an odd problem. How much trouble would it be to uninstall/reinstall Blender?
Actually headed that direction today.

Interesting updates from last night: I did a clean install of the Nvidia drivers, which did nothing.

I tried rendering a simpler scene (one with fewer volumes and no VDBs), and it rendered at the same speed on both machines that I have access to. This could simply be due to the simplicity of the scene.

Much more interestingly, I tried swapping the GPU compute from CUDA to Optix, and my computer was able to render at normal speed. So, it is b0rk3d on CUDA, but not Optix, which makes me think this is more likely to be a software issue than a hardware issue. (The other machine is still using CUDA, so it's not a problem with the scene being unable to work in CUDA).

I'm going to try clearing out all of blender's settings and add-ons completely, to see if that helps. My husband's machine doesn't have any add-ons installed to blender since he doesn't do blender and I just use it occasionally for extra rendering. So, maybe one of my add-ons is breaking things, or there's a corrupted cache file somewhere or something like that. (Particularly suspicious of the addon that I use to manage/work with vdbs).

Hopefully I can get this figured out soon. Worst case, I just switch permanently to Optix rendering, at least for these more complicated scenes, but I get weird errors more often in Optix than in CUDA, so I like to have CUDA at least as an option. :)

Thanks!
 
  • Like
Reactions: Richj444
It could be a number of things causing the behavior you're seeing. Before jumping to conclusions about your GPU, try checking for software issues like outdated drivers or a corrupted installation. If those seem fine, monitor your GPU temperatures to see if it's overheating. Also, consider running a stress test to identify any potential hardware failures. Sometimes, even a loose connection or power supply issue can be the culprit. Start with these steps, and you might pinpoint the problem without needing a new GPU!
Yeah, I'm really really confused by what is causing this issue. Sometimes, Optix seems to be completing things pretty fast, and other times Optix is also slow. It doesn't seem to be the GPU, but I'm not seeing particularly high temps (3 sensors, 66, 70, 76C while rendering), and I am seeing slowness sometimes even after the computer has been off for a bit. But maybe the fans aren't doing their job well enough? Temps drop pretty quickly (50, 60, 60) once the render is killed or completes.

I have completely removed all blender settings / addons, and uninstalled blender completely (removed all the temp files I could find), then installed blender fresh. I am just so confused by the whole thing because it *feels* like it should just work, but it isn't working!