Question Displays Shut Off, GPU Fans Speed Up

D

Deleted member 2955755

Guest

The Problem​

Yesterday when I was playing a game, my displays (both of them) suddenly shut off and my GPU's fans speed up to max RPM. I have done various research and debugging to try and solve this problem, but I have not had any success, unfortunately.
I should probably clarify, the GPU is about a couple years old by now so it has way exceeded any kind of warranty it had placed on it. Basically, RMA is not a option.

Specs​

  • OS: Microsoft Windows 11 Home
  • CPU: AMD Ryzen 5 3600 6-Core Processor
  • MB: B550I AORUS PRO AX
  • RAM: TEAMGROUP T-Force Delta RGB DDR4 32GB (2x16GB) 3600MHz
  • GPU: GIGABYTE RTX 2080 SUPER WindForce OC
  • PSU: Cooler Master V650 SFX Gold
  • Disk 1: Samsung SSD 980 1TB

Tested Methods​

Software Related:
  • Uninstalled -> reinstalled NVIDIA drivers
  • Scanned for corrupted files with sfc /scannow
  • Ensured all drivers were up to date
  • Updated BIOS with multiple restarts
  • Verified memory was running at correct speeds
  • Tested different games while monitoring GPU sensor metrics
  • Clean installed Windows 11 and removed bloatware
Hardware Related:
  • Thoroughly dusted interiors with electric air duster
  • Ensured RAM was firmly nested
  • Let PC cooldown for ~1.5 hours
  • Unplugged and reinserted display cables
  • Cleaned up and reapplied thermal paste to GPU
  • Moved GPU cable to a different connector on PSU
  • Replaced thermal paste on CPU (might as well since I am taking everything apart)
  • Undervolted GPU (only tried this once)

What I Learned​

The GPU seems to be evidently "overheating", or at least the computer thinks so, at "low" temps. When trying different games, GPU-Z reported that everything appeared normal--temps, load, wattage, etc. Whenever the temps averaged around 68-70 C while playing a game, the problem described above would occur within a couple minutes. However, when the temps were around 60 C, the game would continue running without issue.
After trying various "fixes", I ultimately resorted to a clean install of Windows 11. When I ran PC Benchmark, there were noticeable performance improvements across the board. For example, +16% CPU, +2% GPU, and a +60-80% disk speed boost. Despite this significant performance increase, the problem persisted.
What I thought would be the most surefire solution would be dusting off my GPU and reapplying the thermal paste. A lot of threads had this as the fix so I carefully and thoroughly applied this method. Well, it didn't work. F*%k!

Help​

At this point, I do not know what to do. Some people reported that MSI Afterburner can be a culprit, but I do not even use that software. Some others said it just "magically" fixed itself over time, definitely not worth betting on. The worst case scenario would be that I have to replace some hardware, so that will only be a last resort.

Update 8/24/23​

I took the PC into a repair shop to be diagnosed and gave them a list of everything I did. I just got it back and they told me the issue was the GPU, as I feared. They were confident it was due to a degraded chip just from wear and tear over time so I have to buy a replacement.
 
Last edited by a moderator:
D

Deleted member 2955755

Guest
one reason I found was an unstable overclock, I assume CPU running at stock?

do you get any errors in reliability history about GPU drivers stopping?

maybe its the PSU?

cleaning PCI connector has been known to fix it.

here are some other ideas - https://www.nvidia.com/en-us/geforc...rs/13/433015/black-screen-and-gpu-fans-maxed/
So my CPU was not [to my knowledge] overclocked prior to my current predicament. When I went to update BIOS, after all this happened, I actually tried experimenting with the CPU by overclocking it to 4.2 Ghz and undervolting to 1.376V. It did slightly increase my CPU performance, but it had no affect on fixing my issue.

This is the first time I heard about reliability history so I went ahead and checked. Here is what is shows. The "Hardware error" specifically reads as this. I did three or four test before I clean installed Windows, and only one after the install was complete, which is why there are not more "Critical events".

I could try cleaning the PCI connector, but I kind of already did that when I dusted out the case and parts. I think I forgot to mention it, but I moved the GPU's cable to a different connector and it did not change anything.

Assuming it is a broken piece of hardware, it is either the GPU, PSU, or cables. MB is a possibility too. My parts have been working together fine for a long time so I fear this might be the case. My only choice will be to take it into a repair shop that can swap parts out because I do not have spare parts lying around to test.
 

Colif

Win 11 Master
Moderator
windows sees unexpected restarts as hardware errors, I should have mentioned that.
live kernel 141 can be caused by drivers, or GPU/Memory or drive.
My only choice will be to take it into a repair shop that can swap parts out because I do not have spare parts lying around to test.
That seems a logical first step. Beats guessing what cause might be. I have seen people replace almost entire systems blindly trying to fix a problem. I see that as a waste of money.
cables would be a different reason. It would seem to be unlikely if PC worked fine until it didn't.
PSU seems okay from the reviews I can find. 10 year warranty makes me think it should have good parts in it.
 
D

Deleted member 2955755

Guest
windows sees unexpected restarts as hardware errors, I should have mentioned that.
live kernel 141 can be caused by drivers, or GPU/Memory or drive.

That seems a logical first step. Beats guessing what cause might be. I have seen people replace almost entire systems blindly trying to fix a problem. I see that as a waste of money.
cables would be a different reason. It would seem to be unlikely if PC worked fine until it didn't.
PSU seems okay from the reviews I can find. 10 year warranty makes me think it should have good parts in it.
Got it. Yeah, I think I will just take it in then. I ran a memory diagnostic test and both passes returned no errors so I will have to just let a professional look at the system.
 

Colif

Win 11 Master
Moderator
I have a 2070 Super but I retired it at beginning of the year, as I wanted to keep it as a working spare. I guess the model is from 2019, so its not exactly new anymore. I didn't get mine until 2020. The fact they stopped making them 6 months before the 30 series was announced sure makes it hard to get replacements.

If it was memory, you would have other errors besides this one. Memory would cause all sorts of things to not work. More likely to freeze and get random bsod. In other words, it doesn't look like a memory problem.
 
The GPU seems to be evidently "overheating", or at least the computer thinks so, at "low" temps. When trying different games, GPU-Z reported that everything appeared normal--temps, load, wattage, etc. Whenever the temps averaged around 68-70 C while playing a game, the problem described above would occur within a couple minutes. However, when the temps were around 60 C, the game would continue running without issue.
It's not overheating, but rather a power draw problem. So there are basically 3 likely culprits here: bad power supply, bad video card, or potentially the thermal pads for the video card power delivery have degraded.

If you have another system (or PSU) you can test with then you can rule the PSU out that way.

If you don't have another system to test with (or have rules out PSU already) then I'd suggest taking apart the card to see how the pads and paste look. If everything looks okay here then you're looking at video card or power supply (unless you've already ruled out PSU).