Question PC crashes only when benchmarking, and then refuses to post of 2 hours before spontaneously recovering

Nov 19, 2022
6
0
10
I built a new PC, it posted and installed Windows 11 without issues. I then used the PC for a few hours with no issues. I then decided to run Heaven benchmark and the PC crashed. Then when I tried to reboot, the Mother board has a constant VGA LED on and the PC won't Post. I can get around this by changing GPU (to a GTX 1070) However, after a few hours, using the GPU (RTX 3080) that crash, the PC will boot and run perfectly (I could restart as many times as I want and the PC restarts fine each time), until I try to run a benchmark again. I can't identify anything I did during the few hours that would consistently fix the problem.

My spec:
CPU: Ryzen 5900x
Mother board: MSI B550-a PRO
GPU: RTX 3080
PSU: Gigabyte 750W
Ram: 16gb x 2 (total 32 gb)

Any ideas, would be appreciated.
 
Sounds like heat, which takes time to lower again.

EDIT: Would you happen to have a thermal/IR camera? Watch it during the test to see if heat goes up where it does not normally go.

I managed to printscreen right before the PC crashed after running Hevean benchmark, the reported CPU temp is 83 C and the GPU temp is 73C.
Heaven_benchmark.png
 
Take the side panel off the PC and point a desk fan at the motherboard. Does the problem still happen ?

SOLVED:

THANK YOU! This solved the issue. I ended out taking the motherboard out of the case, and added an external fan (greater than 300mm). I then undervolted my GPU. I was able to run multiple consecutive runs of Heaven benchmark without issue.
 
SOLVED:

THANK YOU! This solved the issue. I ended out taking the motherboard out of the case, and added an external fan (greater than 300mm). I then undervolted my GPU. I was able to run multiple consecutive runs of Heaven benchmark without issue.

Just some trivia for you: The temperatures reported were well within normal. However, those are averages at the point of measurement. The reality is that over the die temperatures vary, and for example, it isn't unusual that missing a tiny spot on the heat sink thermal compound shows a reasonable "average" temperature, but it will still overheat at some tiny bubble without compound (and a literal air bubble is one of those things which cause thermal failure). When a die actually fails from thermal issues in reality it has probably been failing for quite some time, whereby atom-sized breaks or holes are the start at a tiny hot spot. These in turn can cause increased thermal issues, and eventually it gets worse at a faster rate until actual thermal failure (one reason why it isn't wise to buy an aftermarket used crypto farming GPU which was run at high temperature and clock rate even if it is currently functional). Quite possibly the CPU just needs new thermal paste (or GPU, but it is less common for a GPU to need a user to open it up and apply thermal compound or pads).

Also, many thermal compounds dry out over time (not all, I stick to compounds designed to not dry out). This produces reduced thermal transfer ability when they start to dry, and when it dries enough for a crack, then you can get rapid thermal issues. Perhaps you need a thermal compound reapplied (look for one designed for long life and not one which dries out and cracks). If this is the case, then the extra fan would still be good, but eventually it wouldn't be enough.
 
I actually thought it was the CPU overheating because I had a low profile air cooler. I upgraded to a be quite shadow rock3 today. But still overheating. So maybe gpu? Also any chance it would be a psu issue? I can't figure out why it takes so long to be able to post again. The gpu feels cold long before being able to post.
 
I think the situation is Solved. I ended up disassembling my RTX 3080 (its a Dell version that comes with Ailenware PC). There is very little thermal paste over the GPU die. On other parts of the cooler there was a lot of thermal paste, but the paste was applied to stickers there were stuck on the metal plate of the cooler. I removed the old paste, removed the stickers, and reapply a lot of paste of the GPU die and surrounding cooler. Once reassembled, my PC now boot and I can run benchmarks, without needing an external fan. No crashes so far (5 Heaven benchmarks in a row).