Jun 23, 2019

So let me give you the whole context of what happened:

I was in the middle of playing Metro Exodus. That game seems to be highly unstable if I activate RTX, and put it on Ultra settings. Particularly in the Caspian region of the game (which is apparently a well-known issue on more powerful machines).

I've had many crashes playing it, and every time, it is the hardest of a hard crash. It is impossible to tab out, and impossible to kill the process. So I need to hold down the power button, let the machine turn off, and then restart again. Except this time, it wouldn't start back up. At all. It was as if everything was 100% dead in the rig. I opened it up, and found that only two things would show any sign of life when the PSU was plugged in and ready. These two things were the CMOS button on the back of my rig for clearing CMOS to fix boot issues. And then the power button located directly on my Gigabyte Auros x570 Master motherboard.

So I pressed the motherboard power button. It would flicker like a candle in a breeze, struggling to stay on, but it would fade out completely in about 4 seconds. And any attempt to unplug/flip the PSU switch would not make the light on the mobo power button come back on. But if I disconnected the mobo pins from the PSU and reconnected them, I could repeat this whole process.

Since the PSU fan would not even spin at this point, I figured it was clearly a PSU issue... because I figured the Mobo would still offer the same grounding of the PSU wires that using the "paper clip method" would accomplish. So the PSU fan should still turn on (I thought). But I took the PSU out of this rig, and plugged it into my home server build. Note, the home server uses a 550w. My gaming rig, which is the one with an issue, uses a 1kw PSU.

So when I plugged my 1kw PSU, that I thought had died, into the server machine, it immediately powered the machine up normally. Then, I plugged the 550 supply into the gaming rig. It should be able to power just the Mobo at least. But I had the same issue. Fan doesn't spin, mobo power button lights up, then flickers and dies.

So it seems like my motherboard just suddenly died, right? Is there any way to confirm that the motherboard is definitely the issue?

I also don't want it to seem like an afterthought, since I'm sure it's worth mentioning: I spent the previous few hours of the day updating the bios to F21. This was a successful update. I restarted the computer at least a dozen times after this. I also overclocked my memory (4400mhz Patriot Viper) from 1533:1533 mem/fabric to 1600:1600 mem/fabric. Finally, I slightly increased by RTX Gigabyte Auros 2080 TI. I don't remember the associated clock values themselves, but the total was +80 core, + 1000 mem.

Then, I ran a ton of benchmarks. I previously bought Superposition and 3dMark when I constructed this machine back in July 2019. I ran several runs of those, and got great results with great stability. I also ran Cinebench and User Benchmark and also got great results and seemingly perfect stability. It was about a half hour after these overclocks that I started playing Metro Exodus. I played that for about 2.5 hours before the crash happened that the machine didn't recover from.

Also, I very closely monitored my CPU throughout this overclocking process. My idle was 32 C. My peak during benchmarks was 78 C. Ryzen Stress Test peak was 88 C. My RTX card was reporting 76 C via Unigine Superposition. Also my machine is plugged into a pretty solid CyberPower surge protector.

My rig is:
Ryzen 3900x
Gigabyte Auros x570 Master
2x8 GB Patriot Viper (4400mhz)
Gigabyte RTX 2080 ti
2 TB 970 evo pro NVMe
1TB 950 evo NVMe