Hello,
I've done my best to try to search for and follow along with other threads that have discussed this, both here and on other forums, but I haven't been able to figure out why this issue is happening. I'm desperate for any help. I apologize in advance if this is in the wrong section.
A few days ago, I started running into issues when running software (like Stable Diffusion), games (like Prey), or benchmark software (like 3DMark.) I've never had any issue with these programs before this started happening, and I've had this computer for a few years now. While running these, the GPU usage shown in things like Task Manager or GPU-Z will show a pretty normal level of activity and temperature, until suddenly the usage will spike to 100%, followed immediately by the program hanging and usage dropping to about 0%. The computer remains on, the monitors don't turn off, I can hear the ambient sound in the games, but the image will either freeze or go black entirely, forcing me to close the program with Task Manager. In especially demanding things, like 3DMark, the problem will happen extremely quickly (meaning I can't even finish the benchmark), while in games, I can sometimes go for a few minutes before the hang. It seems like in some older games, this hang doesn't happen, but I haven't had a chance to test this extensively, so it could still be spiking, but not hanging.
In event viewer, I'm seeing errors from the source "nvlddmkm" with information saying that "\Device\Video5 Error occurred on GPUID:10-0", which seems to be common among some of the other threads. The only other error I'm seeing is a "non-critical" one when installing the NVIDIA driver (possibly because I'm not connected to the internet while installing) and a "PerfDiag Logger" failed to start, which seems to be happening on start-up and not during the spikes.
Things I've tried to fix this so far:
The only thing which seems to have an effect is following the advice I saw in another post, where I use EVGA Precision X1 to lower the power to the GPU to about 75 or 80%. When I do this, I'm able to go through the entirety of 3DMark's Time Spy benchmark, for example, instead of it hanging after about 20 seconds. I don't know if this means that there's something going wrong with the driver software (since I remember updating it to 531.79 about a week ago), something wrong with the GPU itself, or something wrong with the PSU. I've tried watching Task Manager and seeing if there's anything other than the 3D that spikes, maybe to narrow it down, and noticed a spike on CUDA, but I'm not savvy enough to know if that means anything.
Here are the specs I can think of:
Windows 10 Home
NVIDIA GeForce RTX 2080 Ti
i9-9900K 3.60 GHz
EVGA Supernova 850 G3 PSU
4x Corsair Vengeance 8GB DDR4 DRAM
Like I said, I'm desperate for any help in figuring out why this is happening or how to fix it. While setting the power to 75% seems to work for the most part, I'm worried it's only a matter of time till it starts doing this again, and I'd really prefer being able to get all the power I can out of this hardware. I really have no idea why this started happening so suddenly, and short of trying to find an old driver from February or March and trying that, I'm pretty much completely out of ideas.
I've done my best to try to search for and follow along with other threads that have discussed this, both here and on other forums, but I haven't been able to figure out why this issue is happening. I'm desperate for any help. I apologize in advance if this is in the wrong section.
A few days ago, I started running into issues when running software (like Stable Diffusion), games (like Prey), or benchmark software (like 3DMark.) I've never had any issue with these programs before this started happening, and I've had this computer for a few years now. While running these, the GPU usage shown in things like Task Manager or GPU-Z will show a pretty normal level of activity and temperature, until suddenly the usage will spike to 100%, followed immediately by the program hanging and usage dropping to about 0%. The computer remains on, the monitors don't turn off, I can hear the ambient sound in the games, but the image will either freeze or go black entirely, forcing me to close the program with Task Manager. In especially demanding things, like 3DMark, the problem will happen extremely quickly (meaning I can't even finish the benchmark), while in games, I can sometimes go for a few minutes before the hang. It seems like in some older games, this hang doesn't happen, but I haven't had a chance to test this extensively, so it could still be spiking, but not hanging.
In event viewer, I'm seeing errors from the source "nvlddmkm" with information saying that "\Device\Video5 Error occurred on GPUID:10-0", which seems to be common among some of the other threads. The only other error I'm seeing is a "non-critical" one when installing the NVIDIA driver (possibly because I'm not connected to the internet while installing) and a "PerfDiag Logger" failed to start, which seems to be happening on start-up and not during the spikes.
Things I've tried to fix this so far:
- Performing a clean installation through just the NVIDIA driver.
- Updating my BIOS.
- Performing a clean installation with DDU.
- Unplugging the computer, removing the graphics card, and putting it back in.
- Performing another clean installation with DDU.
- Praying, begging, and cursing.
The only thing which seems to have an effect is following the advice I saw in another post, where I use EVGA Precision X1 to lower the power to the GPU to about 75 or 80%. When I do this, I'm able to go through the entirety of 3DMark's Time Spy benchmark, for example, instead of it hanging after about 20 seconds. I don't know if this means that there's something going wrong with the driver software (since I remember updating it to 531.79 about a week ago), something wrong with the GPU itself, or something wrong with the PSU. I've tried watching Task Manager and seeing if there's anything other than the 3D that spikes, maybe to narrow it down, and noticed a spike on CUDA, but I'm not savvy enough to know if that means anything.
Here are the specs I can think of:
Windows 10 Home
NVIDIA GeForce RTX 2080 Ti
i9-9900K 3.60 GHz
EVGA Supernova 850 G3 PSU
4x Corsair Vengeance 8GB DDR4 DRAM
Like I said, I'm desperate for any help in figuring out why this is happening or how to fix it. While setting the power to 75% seems to work for the most part, I'm worried it's only a matter of time till it starts doing this again, and I'd really prefer being able to get all the power I can out of this hardware. I really have no idea why this started happening so suddenly, and short of trying to find an old driver from February or March and trying that, I'm pretty much completely out of ideas.