Diagnosing the cause of random black-screening

Jul 30, 2018
5
0
10
I built a new ITX rig back in late June, every now and then the screens go black, no audio/video output, but power stays on and the case lights run as normal. No input in any form, even the power button, goes through - I have to flip the switch on the PSU to reboot. I have googled this problem to hell and back with no results. Below are the specs, a better description of the issue, and a list of everything I've tried in order to locate the cause of the problem. Any advice would really be appreciated, I'm pulling hair out at this point. I use this system for both university and work, and RMA-ing one of the parts would be a nightmare.

SPECS:
Ryzen 7 1800x 3.6Ghz (stock) water-cooled
GTX 1060 3Gb
2x8Gb Kingston HyperX Fury 2933Mhz
ASUS ROG STRIX X470i mobo
Corsair SF600 80+ Gold

The system blackscreened maybe an hour or two after Win10 was set up and the thing was configured. I didn't know what the issue was then (still don't), double checked all the cable connections and said 'fuck it, it's built, it works fine', which was a dumb move. Since then, at random times it will still black-screen (3-5 times a week I'd say) with absolutely no correlation to system load or temperatures. 8 months of intermittent troubleshooting have narrowed it down to either some kind of random PSU failure, or maybe a component on the mobo shorting out. Here's everything I've tried:

-Reset bios settings
-cleared CMOS
-Scrubbed through event viewer; no info AT ALL other than post-reboot "system shutdown incorrectly" kernel message
-Updated ALL drivers, gpu, audio, bluetooth etc
-Updated windows to most recent
-Updated bios (I think, I will try to do this again when I have time)
-Tried replacing critical power cables (24pin ATX, CPU, GPU power), no effect on crashes
-Tried leaving system shut off as often as possible - only got 1-2 hours of use per day. happened less, mostly because it was powered on less
-logged & monitored temps at both idle & load, first with side panels on and then with panels off, but system blackscreened 3 or 4 times while panels were off & system temps were nearly the same, so no correlation there
-Ran memcheck, no issues
-Completely dusted & cleaned system, spotless (no effect obviously but it looked nice)
-redid rear panel cable management in case it was causing overheating (no effect again, but it looked better)
-ran diskmgmt check, no issues
-ran *dozens* of malware scans w/ different programs, no issues from any:
-kaspersky
-malwarebytes
-heimdall
-avast
-tried using asus rog's 'ez tuner', which suggested slight (~11%) increase in clock speed & ram speed, PC would not boot with those settings
-reset clock settings to normal (cpu ratio was set to 39? changed back to 36)


Again, any advice would be massively appreciated.

UPDATE: Went back and checked BIOS version this morning, an update was released while I was home for winter break (PC stays at uni). I reset all the settings / timings after reflashing and I realized the CPU had been pulling 1.4-1.48v constantly, at stock 3.6Ghz, even just sitting in BIOS idling. AMD says 1800x shouldn't go above 1.35v on OC.... Clearly not good. Tried manually setting voltage to 1.35, system wouldn't post with those settings. ???? no idea why. Turned off performance bias and core performance boost, it's now sitting comfy at 1.34v, which hopefully will be more stable. I will update again if this doesn't end up being the fix, but fingers crossed until then.
 
Jul 30, 2018
5
0
10


I appreciate you trying to help, but I don't think you read the whole post. It's not a GPU issue. I have swapped the 1060 with cards from friends in the past; my system still blackscreens, theirs do not. As of today my best guess is either mobo shorting out or the voltage settings on the 1800x going wonky.
 
Jul 30, 2018
5
0
10


Updated the bios yesterday, it's all at the bottom of the OP. Voltage doesn't seem to have been the issue, it crashed again after the adjustments I made.

I think I may have identified the problem though, last night my roommate's SSD died (turned out to be a different issue) so I pulled one of my unallocated drives out of the case to use in the meantime. Had to jumble some stuff around to pull it out because it's a damn small case and I discovered that if I jiggled the 24pin ATX around ever so slightly, it would cut power to the mobo. Immediately ordered a replacement cable for it. I really hope this is the fix but if it is I'll be miffed I didn't think of it earlier, considering I already replaced the cable extensions.
 
Jul 30, 2018
5
0
10
I'm back - just over a week ago I swapped out the mobo PSU cable. It was a real bitch because it's an ITX case and there's more packed in there than the case is meant to handle, but for about a week it seemed fine. Issue I had with jiggling the mobo cable disappeared. THEN, last week while playing the new yakuza game it happened again. I'm at square one here.

Theories:
-Maybe the PSU itself? Corsair SF600 reviews mention a lot of QC issues
-Voltage settings on the 1800x are capped at 1.35v on stock speeds, which seems high. Maybe the chip is damaged in some way?
-Impossible-to-diagnose problem somewhere in the mobo circuitry which is causing power cuts
-Possibly, and this is a stretch, a GPU problem; computer also froze while playing yakuza, but the picture remained frozen on the screen and holding the power button actually works. I think, though, this indicates that whatever crashes I'm getting aren't GPU related since it behaves differently when it crashes

As usual I'd be really grateful for any advice! This is becoming a major source of stress, lol