Question Graphics card hangs on heavy load, screen goes black and fans go to 100%

Jul 9, 2024
4
0
20
My GPU (3070) has been in my system for 4 years without problems, but for the past week or so when my GPU is under load for some variable time sometimes the screen goes black and the fans keep spinning at max(?) speed. For a while audio keeps playing but shuts off after a few minutes. Games that don't put much load on the GPU run without issue.

Time between game start and crash varies but running Furmark fairly reliably crashes the system within a 2-5 minutes. While running the GPU usage is at 100%, temps stay stable at 70 degrees right up to the moment the screen goes blank.

Also tried running games in Linux and the same problem occurs which makes me fear that maybe it's a hardware problem, but it would be the first time in 2+ decades a graphics card has just randomly died on me for no real reason.

I checked the Windows event log but nothing seems to show up. I do see some old error messages from nvlddmkm, but the most recent ones are from 2 days ago, so they don't seem to be related.

Things I read in other threads that were already tried:

* Cleared the drivers using DDU in safe mode and installed the latest ones.
* Replaced the PSU with a more powerful one, my previous one was probably a bit underpowered (550W) even though it ran fine with that wattage for the past few years.
* Took out the GPU, ram, reseated everything and reconnected the power leads to the mobo and GPU.
* Reset the BIOS, even cleared CMOS

Any ideas what else I can try? Time to start shopping for a new card?

CPU: intel 13700
CPU cooler: BeQuiet Silent Loop 2
Motherboard: ROG STRIX B760-I GAMING
Ram: 2x16G Corsair Vengeance
SSD/HDD: Ehh under the heatsink, I don't remember offhand >_>
GPU: Gigabyte GeForce RTX 3070, 8GB, Eagle OC
PSU: MSI Mag A750GL (new today, previously Corsair TX550M)
Chassis: Asus Prime AP201
OS: Win10 - Linux dual boot
 
Last edited:
Solution
Right so turns out I just don't know how to read temps. After bunch more experimenting I realized the GPU at the time of failure was actually running at half its max clock speed. The main GPU temp read 70 degrees but the hotspot was running at 100+, which was causing thermal throttling and eventually system shutdown.

I applied some new thermal paste yesterday (the old paste it turns out was completely solidified) and now everything seems fine. Running at full clocks it just barely touches 65 deg. main temp, 80-ish degrees on the hotspot. No thermal throttling, no crashing so far.

While I had everything open I also refreshed the CPU's thermal paste and it still runs hot but no longer hits its limit immediately, seems to stabilize...

Lutfij

Titan
Moderator
Welcome to the forums, newcomer!

When posting a thread of troubleshooting nature, it's customary to include your full system's specs. Please list the specs to your build like so:
CPU:
CPU cooler:
Motherboard:
Ram:
SSD/HDD:
GPU:
PSU:
Chassis:
OS:
Monitor:
include the age of the PSU apart from it's make and model. BIOS version for your motherboard at this moment of time.

Please list all parts used to swap out your originally built machine in order to troubleshoot said system.
 
Jul 9, 2024
4
0
20
Right so turns out I just don't know how to read temps. After bunch more experimenting I realized the GPU at the time of failure was actually running at half its max clock speed. The main GPU temp read 70 degrees but the hotspot was running at 100+, which was causing thermal throttling and eventually system shutdown.

I applied some new thermal paste yesterday (the old paste it turns out was completely solidified) and now everything seems fine. Running at full clocks it just barely touches 65 deg. main temp, 80-ish degrees on the hotspot. No thermal throttling, no crashing so far.

While I had everything open I also refreshed the CPU's thermal paste and it still runs hot but no longer hits its limit immediately, seems to stabilize around 95. According to intel this insane heat is actually normal for a 13th gen i7, so I'll leave it at that.

So problem solved. Thanks for the suggestions.
 
Solution