Question Monitors keep shutting off at very unpredictable intervals.

katsushige724

Prominent
Jan 19, 2018
11
0
510
Specs:

CPU: Ryzen 5 1600
Motherboard: Asus Strix B-350 Gaming
GPU: Nvidia GTX 1060
RAM: Corsair DDR4 2X8 GB 3000 MHZ, model number CMK16GX4M2B3000C15
PSU: EVGA 850W Bronze
SSD: Samsung 850 EVO 250 GB
OS: Windows 10 home (Version 1809)

This problem suddenly started up about two weeks ago, notably after Windows installed an update when I shut down my PC.

I started up Final Fantasy XIV as usual, and maybe about 30 seconds after I logged in and the GPU was put under load, I was smacked with a fatal Direct X error which closed the game, then was shortly followed by my monitors blanking out and no longer receiving a signal. The system's lights and fans were running still, but the power button and keyboard were unresponsive (Lights stay on but numlock/caps lock do nothing), which forced me to cut the power to get it to turn off. The fatal direct X error was nothing new and a fairly common issue for players of the game due to its dodgy DX11 client, but the GPU seemingly deciding to shut itself off then not turn back on was definitely not a normal occurrence.

The GPU temps were within safe ranges, as was the voltage (Within the recommended +-5% for non 12V lines and +/-10% for the 12V line) and other parameters when the crash happened, and the card itself is just short of two years old at this point.

Assuming this may have been tied to the Windows update, I did a system restore, and attempted to log in again...only to be hit with the same error once more. As with last time, I cut the power except this time the Windows file system somehow got corrupted and I was forced to do a clean install just to be able to log-in again.



Here I started I thinking I'd surely have resolved the problem now, so I logged back in again after updating my drivers to the latest ones and this time the game managed to run for four hours before I was yet again hit with the same crash.

This time I had the sense to check the Windows event log, and there were "Display driver stopped working and has recovered" errors, coupled with DWM errors. After doing some research, I learned these errors were tied to unstable Nvidia drivers, so after testing a number of different ones (And continuing to crash almost immediately), I was able to eliminate the DWM errors, but was still crashing. I then decided to install the latest chipset drivers for the motherboard, which seemed to have done something significant because while I still crashed very quickly, I instead got a Kernel 32 error that only crashed the game, not the GPU, which was then quickly resolved by disabling Windows Game Mode.

I managed to keep the game running for 12 hours+ that day, with no errors for my entire session and was feeling confident. This persisted for exactly a week...afterwhich I was abruptly hit with the no signal crash again after about 17 hours of power time from my PC that particular day. This time there were absolutely no errors in the Windows event log outside of the one from forcing the power off. Trying to log into XIV after a restart resulted in the same error within less then a minute, but more concerningly, on subsequent troubleshooting attempts the error occurred without me even starting the game up (Shortly after running a chkdsk to see if that could somehow be related, but no errors were found there), then most recently while I was in the BIOS attempting to tweak settings immediately after the previous crash, which had me especially concerned.



At this current point of time, the PC seems to be working fine (I downclocked my RAM as the BIOS had set it to 1.4 V instead of the recommended 1.35 at some point and also downloaded a firmware update for my SSD), but there's no telling how long this will continue. I did learn that this motherboard can be very touchy about hardware monitoring programs and have decided to stop using HWInfo for the time being (Though I don't know if that's the issue as I had no monitoring software running when the crash happened for the very first time).

The fact that the errors kept recurring rapidly after the recent crash, but subsided after leaving the PC unplugged and with no power overnight is making me feel like this is likely a hardware issue of some sort. I had also previously suspected GPU issues as prior to this error coming up, I was getting random white LED hangs during the boot sequence that seemed to just randomly resolve themselves after reseating the GPU/RAM and replugging cables or even just attempting another boot. I can't help but have doubts, though, as the GPU continues to run smoothly during gaming and exhibits none of the typical signs of failure such as slowdown, artifacting, etc. I have no overclocks on any of my hardware either, so that rules out another a potential cause.
 
Last edited:

McKeu

Proper
Mar 27, 2019
240
28
140
I am not 100% convinced it is a hardware issue, but it might be that your PCIe socket or the PCIe pins on the graphics card are damaged, or the graphics card is "moving" in the slot.
Doesn't harm to take it out and check the gold contacts for damage (note that one or two will by default only be half length, so don't worry about that) and/or change the PCIe slot.
 

katsushige724

Prominent
Jan 19, 2018
11
0
510
Well, the PC has now been on for 10+ hours with no crashes, but I still can't help but be concerned about how it was behaving after the recent string of crashes, mostly in regards to the monitors losing signal while still in the BIOS and the error swiftly repeating itself after the first crash.

The only adjustment I made to the hardware in the case since the latest crash was changing which PCI-E plug the GPU was using.

The way the PC needed time to "cool off" before the error stopped repeating constantly made me think it might've been a heat issue of some sort, so perhaps making the GPU/CPU fans work a bit harder wouldn't hurt even if the temperature readings are normal.
 

katsushige724

Prominent
Jan 19, 2018
11
0
510
Well, after working fine for the entirety of another day, the PC just up and decided to freeze shortly after hitting the desktop when I powered it on today, basically replicating the previous error minus the loss of display.

Windows attempted a start-up repair (Which failed), but it was able to boot to the desktop regardless. I'm really at a loss as to what the issue could be, as again, there were no errors in the event log nor any error LEDs and now it seems to not be having any issues, unlike the previous instances where one crash tended to be followed by more.

I'm thinking of running a memtest in case there could be something wrong on that end.

Now that I think about it, I didn't actually start getting these errors until after I had installed the RAM kit mentioned in my specs that's supposedly compatible with my motherboard (My previous one stick of RAM was incompatible and was seemingly causing freezes, some indicated by a orange LED, others not, but I had never gotten the no signal issues with it). The other major change was that I updated the BIOS to the latest version.
 
Last edited:

katsushige724

Prominent
Jan 19, 2018
11
0
510
After no issues all day yesterday, I recently had a recurrence of the "Display driver has stopped and recovered" and DWM errors while I was playing XIV again, except this time my sound and keyboard remained functional despite losing the display..

This really does seem like a GPU issue due to how erratic and inconsistent the errors are, so I guess I'll just order a replacement GPU and pray for the best.

I'm thinking I probably damaged it by neglecting to dust it off since the day I got it (It has no backplate so the circuitry had build up a layer), which may have been causing issues despite it registering safe temps in HW monitor and that it starting to get the worst of the crashes after the Windows update was just a coincidence. I cleaned it off anyways then set more aggressive fan speeds via MSI Afterburner, so hopefully it can at least hold out until the new GPU arrives.
 

katsushige724

Prominent
Jan 19, 2018
11
0
510
I think I found my solution.

I noticed that even though the temps were in supposedly safe ranges (Topping off at 77 degrees C), my game was lagging in certain specific places that were very heavy on texture rendering when they didn't previously.

After adjusting fan speeds to keep the temps at 65 degrees or lower, said FPS drops have now vanished, which very strongly suggests the GPU was throttling and becoming unstable because of lack of cooling. Hopefully the thing hasn't already been damaged badly because of this, though.