Question GPU spikes to 100% and programs hang ?

May 6, 2023
5
0
10
Hello,
I've done my best to try to search for and follow along with other threads that have discussed this, both here and on other forums, but I haven't been able to figure out why this issue is happening. I'm desperate for any help. I apologize in advance if this is in the wrong section.

A few days ago, I started running into issues when running software (like Stable Diffusion), games (like Prey), or benchmark software (like 3DMark.) I've never had any issue with these programs before this started happening, and I've had this computer for a few years now. While running these, the GPU usage shown in things like Task Manager or GPU-Z will show a pretty normal level of activity and temperature, until suddenly the usage will spike to 100%, followed immediately by the program hanging and usage dropping to about 0%. The computer remains on, the monitors don't turn off, I can hear the ambient sound in the games, but the image will either freeze or go black entirely, forcing me to close the program with Task Manager. In especially demanding things, like 3DMark, the problem will happen extremely quickly (meaning I can't even finish the benchmark), while in games, I can sometimes go for a few minutes before the hang. It seems like in some older games, this hang doesn't happen, but I haven't had a chance to test this extensively, so it could still be spiking, but not hanging.

In event viewer, I'm seeing errors from the source "nvlddmkm" with information saying that "\Device\Video5 Error occurred on GPUID:10-0", which seems to be common among some of the other threads. The only other error I'm seeing is a "non-critical" one when installing the NVIDIA driver (possibly because I'm not connected to the internet while installing) and a "PerfDiag Logger" failed to start, which seems to be happening on start-up and not during the spikes.

Things I've tried to fix this so far:
  • Performing a clean installation through just the NVIDIA driver.
  • Updating my BIOS.
  • Performing a clean installation with DDU.
  • Unplugging the computer, removing the graphics card, and putting it back in.
  • Performing another clean installation with DDU.
  • Praying, begging, and cursing.

The only thing which seems to have an effect is following the advice I saw in another post, where I use EVGA Precision X1 to lower the power to the GPU to about 75 or 80%. When I do this, I'm able to go through the entirety of 3DMark's Time Spy benchmark, for example, instead of it hanging after about 20 seconds. I don't know if this means that there's something going wrong with the driver software (since I remember updating it to 531.79 about a week ago), something wrong with the GPU itself, or something wrong with the PSU. I've tried watching Task Manager and seeing if there's anything other than the 3D that spikes, maybe to narrow it down, and noticed a spike on CUDA, but I'm not savvy enough to know if that means anything.

Here are the specs I can think of:
Windows 10 Home
NVIDIA GeForce RTX 2080 Ti
i9-9900K 3.60 GHz
EVGA Supernova 850 G3 PSU
4x Corsair Vengeance 8GB DDR4 DRAM

Like I said, I'm desperate for any help in figuring out why this is happening or how to fix it. While setting the power to 75% seems to work for the most part, I'm worried it's only a matter of time till it starts doing this again, and I'd really prefer being able to get all the power I can out of this hardware. I really have no idea why this started happening so suddenly, and short of trying to find an old driver from February or March and trying that, I'm pretty much completely out of ideas.
 
May 6, 2023
5
0
10
Try using Cpuid Hwmonitor and check your voltages, (at the top of the graph) to see if there are any obvious problems. Montor them while gaming, and don't use the undervolting, as this would be the strongest draw.
Sorry, I don't know a lot when it comes the power and such, so what might an obvious problem with the voltages look like? I ran through some stuff after resetting it default values in EVGA Precision. The board and CPU's values seem fine, they don't change. The GPU's goes from a min of .706V to 1.050V at high load (when it spikes). HWMonitor's also saying that the GPU's hitting 264.83W at those points, with the Core Power Supply being 211.37W. Are these outside the norm? I don't know if that means it's trying to pull too much power and that's causing the problem, or if that's the symptom of something else making it try to draw too much.
 

scout_03

Titan
Ambassador
get hwinfo to check temp and voltages with fan speed also did you try to run system on generic os driver not using the nvidia the other part this cpu have a video chipset so remove gpu and try to use it for some test with hwinfo runing in background this software could produce a log to be read .
 
  • Like
Reactions: Dr.Cthulhu
May 6, 2023
5
0
10
get hwinfo to check temp and voltages with fan speed also did you try to run system on generic os driver not using the nvidia the other part this cpu have a video chipset so remove gpu and try to use it for some test with hwinfo runing in background this software could produce a log to be read .
Anything in particular I should be looking for in HWInfo? I used DDU again to uninstall the driver and unplugged my ethernet cable, so it was just running on the integrated graphics or whatever, but I couldn't really do much testing. 3DMark didn't list any benchamrks to run, Prey was just a 1FPS slideshow. Is there a particular test I can do?
I plugged the ethernet back in and reset, so it downloaded whatever default driver (31.0.15.1694) and tried running 3DMark again, but it seemed to have the same crashes as on the most recent driver.

I used HWInfo and saved the sensor log from when I started the Time Spy test, though I don't know what the best way of sharing it here would be. Looking at the Performance Limit stuff for the GPU, it looks like it's all "No" for Thermal, but "Yes" for Power when it's hitting those spikes. The Total GPU Power column lists it as going over 100% during those spikes. Like I said though, I'm pretty uninformed with all of this, so I don't know what this specifically means, just that it's bad. Bad PSU? Bad GPU?
 

jahu00

Reputable
Nov 22, 2019
36
8
4,535
Not sure if this would be of any help, but I had problems with GPU under load crashing and restarting system (most of the time) caused by MOBO. Card had at least a 180 TDP and was powered using 2 PCIe cables from PSU, but that must've been not enough during spikes and it tried to get the remaining power through MOBO. Problem went away after replacing MOBO (using much lower TDP GPU also worked). And I've replaced MOBO after replacing PSU and trying different GPUs.

While trying to solve my problem I found following things:
- If your GPU uses more than one PCIe powier connector, use 2 different cables just to be safe (some PCIe cables have double connectors at the end, but they likely still can only deliver 150W total)
- What gave me a temporary help was using another connector on modular PSU. PSU had 2 pairs of PCIe connectors (probably for powering 2 GPUs) and I used 1 connector from each pair.
- Some connectors on the PSU can be dedicated for GPU while others may struggle to deliver enough power (at least that's how it was on my modular PSU). PSU instruction listed which connectors were meant for GPUs and which were not.

EDIT: HWinfo logs had not obvious clue as to what was failing in my case.
 
Last edited:
  • Like
Reactions: Dr.Cthulhu
May 6, 2023
5
0
10
Not sure if this would be of any help, but I had problems with GPU under load crashing and restarting system (most of the time) caused by MOBO. Card had at least a 180 TDP and was powered using 2 PCIe cables from PSU, but that must've been not enough during spikes and it tried to get the remaining power through MOBO. Problem went away after replacing MOBO (using much lower TDP GPU also worked). And I've replaced MOBO after replacing PSU and trying different GPUs.

While trying to solve my problem I found following things:
- If your GPU uses more than one PCIe powier connector, use 2 different cables just to be safe (some PCIe cables have double connectors at the end, but they likely still can only deliver 150W total)
- What gave me a temporary help was using another connector on modular PSU. PSU had 2 pairs of PCIe connectors (probably for powering 2 GPUs) and I used 1 connector from each pair.
- Some connectors on the PSU can be dedicated for GPU while others may struggle to deliver enough power (at least that's how it was on my modular PSU). PSU instruction listed which connectors were meant for GPUs and which were not.
Yeah, my current setup has an 8pin coming from the PSU, and then that connector has a small loop going from it into the 2nd 8pin connector (if that makes sense). I probably have the old box of cables somewhere around here, might have the manual. If I can find it, I might be able to take a shot at changing the cables out later today.
 

jahu00

Reputable
Nov 22, 2019
36
8
4,535
So, you're using a single cable for both connectors on the GPU? Try using 2 different ones. Maybe that PSU is perfectly fine doing everything over a single cable, but using 2 cables might be worth a shot.

EDIT: While trying to figure out if my PSU can handle an XTX, I found a review of my PSU here on TH and using 2 cables for GPU with 2 connectors was recommended (in relation to 3000 series, but should apply to 2000 series as well).

If EVGA page is to be trusted, your PSU has 4 connectors for GPU with VGA written over/under them. There are CPU connectors right next to them. Possibly, those CPU connectors don't have what it takes to power a GPU. Make sure you're using the VGA connectors. Also, PSU manufacturers sometimes recommend using shorter cables for GPU (at least that what it said in instruction to a Silentium PSU a few years ago).

EDIT2: Make sure not to use cables from another PSUs. Most of the time, they're not interchangable even if they look like they might be. There is absolutly no standard on what comes out of the PSU, even the same manufacturer can change the pinout from revision to revision of the same PSU. Maybe EU should have a look into this.
 
Last edited:
May 6, 2023
5
0
10
Haven't posted in a bit, since I haven't had much time to mess around with it. I went and replaced the old cable (the "pigtail" one, PCI-E 8(6+2) x2) with two separate cables (the "normal" PCI-E 8(6+2)) going from the PSU to the two ports on the GPU. Made sure I had the right end in the PSU, the cables are all from the same box as the original one, all came with the PSU back when I bought it.

Unfortunately, it doesn't seem to have made any difference to the issue. Benchmarks still hang within seconds of starting. Considering the computer worked fine running the same things back when I first built it until recently, and I haven't made any serious adjustments, I really can't come up with any explanation other than either the GPU or the PSU just starting to break down.

If anyone has other explanations or things I can do to try to fix it, please let me know. At this point, I can't really think of much else to try other than seeing if I can find a shop around here that might test them out for me to see which might be the issue, and then trying to get a replacement.

Edit: The two separate cables seem to make things even more unstable, so I swapped them out for the second pigtail cable that came with the PSU. Still crashes, but things are a little more stable when underclocking it to 75% power, as before.
 
Last edited: