[SOLVED] GPU/Drivers Crashing with artifacts under any load (weird fan speed fix?) (GTX 1080)

Oct 25, 2020
1
0
10
So I recently got a second-hand Gainward GTX 1080 Phoenix in very good condition. With recently replaced thermal paste (Thermal Grizzly), thoroughly cleaned, something to do with replacing the old factory cooler screws, etc (the guy I got it from does something related to GPU maintenance). The only issue is that ever since I got it, I've been getting crashes even under a medium load (like playing CSGO on medium settings).

A short description of what happens is: Whenever running any game or benchmarking software (CSGO, Furmark, Space Engineers, etc), after a while (anywhere from 2-40 minutes depending on load, although even then it's quite random) the screen including the mouse will completely freeze with batches of red/blue/purple artifacts scattered across the screen (https://prnt.sc/v6tr1v https://prnt.sc/v6tr5l). The screen then stays frozen for about 10-20 seconds, then throws me to desktop with everything relying on the GPU having crashed and stuff like discord/slack having turned into a black box which can only be fixed by restarting the respective programs.
During the display crash, if there's a youtube video or any audio running in the background, the sound will become distorted/robotic, partially slowed down, and stutter (but not completely stop).

Here are Event Viewer and Reliability Monitor errors at the moment of the crash:
Level: Error
Source: nvlddmkm
Message: The description for Event ID 14 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
0a97(2ae4) 00000000 00000000

The message resource is present but the message was not found in the message table
Error 1:
Description
A problem with your hardware caused Windows to stop working correctly.

Problem signature
Problem Event Name: LiveKernelEvent
Code: 141
Parameter 1: ffff9d8534da4460
Parameter 2: fffff8008f94c188
Parameter 3: 0
Parameter 4: 0
OS version: 10_0_19041
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.19041.2.0.0.768.101
Locale ID: 1033


Error 2:
Description
A problem with your hardware caused Windows to stop working correctly.

Problem signature
Problem Event Name: LiveKernelEvent
Code: 141
Parameter 1: ffff9d8537b48050
Parameter 2: fffff8008f94c188
Parameter 3: 0
Parameter 4: 1478
OS version: 10_0_19041
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.19041.2.0.0.768.101
Locale ID: 1033


Error 3:
Description
A problem with your hardware caused Windows to stop working correctly.

Problem signature
Problem Event Name: LiveKernelEvent
Code: 117
Parameter 1: ffff9d853791d010
Parameter 2: fffff8008f94c188
Parameter 3: 0
Parameter 4: 0
OS version: 10_0_19041
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.19041.2.0.0.768.101
Locale ID: 1033

The weirdest part is that I have actually found a weird temporary fix for this, which is setting a high custom fan curve in MSI Afterburner. Currently, I have it set to 30C = 25%, 50C = 50%, 60C = 75%, 70C = ~95% https://prnt.sc/v6tsy6

I've already tried using DDU and reinstalling drivers. Didn't change anything.
I completely reinstalled windows, which also didn't fix it.
I've tried using an old GPU driver, that also didn't help at all.
Downclocking Memory/Core does not reduce or fix the crashes.

Initially, I thought it's just bad luck and a dying GPU, however, if that was the case then why would the custom fan curve completely fix the problem? Also, should a GPU crash be distorting background audio?
In terms of overclocking it goes up to around +220 core and +500 or so memory before instability.
For testing purposes, I've also temporarily pushed it to 92 degrees C with no crashing or instability. (Even though without the custom fan curve the crashes usually happen when it's at 68-75 degrees)
Due to the GPU barely fitting in the case (and I don't want to damage the PSU cables/connectors by squeezing them in there) I leave the side panel completely off, so airflow/heat buildup shouldn't be an issue.
I also have the GeForce Experience Instant Replay running in the background constantly, but I've gotten just as many crashes without it.



I'm not an expert on this stuff, but could it be that some GPU component isn't reading its temperatures properly, therefore not using the appropriate fan speed and leading to a crash during longer loads where it has enough time to overheat?
The other option I can imagine (other than saying it's a faulty GPU) would be a power supply issue, since all other components including the PSU are over 3 years old now. A PSU issue could maybe explain the audio starting to slow down/stutter during crashes?

If anyone has any ideas or has experienced something similar before, any suggestions/ideas/potential fixes would be much appreciated. (Note: I am aware that 1080's/1080TI's have a history of weird crashes/issues like this)

The system specs are:
Windows 10
CPU: Intel Core I5-6500 3.20GHz (Stock cooler)
GPU: Gainward GTX 1080 Phoenix
Ram: 24GB Ram (2x GSkill Aegis 8GB 3200MHz + 2x Ballistix 4GB 2400MHz) capped at 2133MHz by motherboard
Motherboard: MSI B150M Bazooka
PSU: Integra M 550W
Storage: Samsung SSD 850 Evo 250gb and WD My Passport Ultra 500gb
Monitor: Philips 243V5LHAB 1920x1080 60Hz
 
Last edited:
Solution
I too would suspect the PSU, not just it's age but the fact that it's build-quality is not high enough to reliably power a gaming rig in my opinion.
I would ask the PSU experts here what they recommend to replace it.