Video driver crashing but PC still running? i.e. Video dropping randomly but PC stays running w/o crash logs

BobCharlie

Distinguished
Sep 2, 2011
221
1
18,710
Ran into an odd issue with a GTX 970 MSI Gaming 4g setup I have had almost 2 years now, and figured I'd share it in the off chance it helps others.

Over time, I noticed temps getting a touch higher, which usually means about every 3-6 months, yanking the 970 out of the case, carefully removing it's fan cover and the fans, and putting the vacuum tube on the outlet of my shop vac, and blowing the dust out of the massive air-cooled heat sink while agitating with a toothbrush, hand wipe each fan blade (dust on the blades creates turbulence reducing efficiency and can wobble if bad enough, so don't ignore it) and everything was good to go. Last cleaning time, it had been up a bit higher in temps than previously encountered, and was a bit troubled when there was only a small amount of dust build-up, as that can mean something else is wrong.

I usually run fans max for anything gaming (which might actually mask some issues come to think of it), and the most recent game @4k (Fractured BWhole) medium preset, was doing OK, but noticed temps at one point were creeping into the low 70's. Before playing FBW, I updated to the most current nvidia driver 387.92. And wasn't long after I started getting random video drops, where the monitor's "no input" is sweeping across the screen meaning the PC isn't sending a video signal, and yet the PC was still running, but Event viewer wasn't showing anything, other than the my forced power resets (kernel warning).

Thinking a driver was to blame, I reinstalled it. A little time went by w/o issue, but starting the game (which had been played a few hours here and there already), it would drop video again (actually thought at one point the game install broke something as it didn't start until after it was installed). Then it dropped video opening a web browser. After triple-checking all connections AGAIN, decided to try Win 7 safe mode to bypass the nvidia drivers, and it booted into it w/o issue. Back into normal boot, it eventually crashed again. And next time into safe mode, it crashed there as well. So the troubling notion from earlier was leading me to believe it was hardware related vs. software.

I remembered how Xbox 360 had a serious flaw (red ring of death) related to the thermal paste giving up due to excessive and heat poor cooling design, and figured it was worth checking here given the higher than normal temps observed from time to time. And to be clear, temps were often lower when the crashing happened, and I have a ton of fans, nothing near/under the 970 for almost 6" and open above it, and I leave the side panel off (about as open as you can get).

Anyhow, pulled the 970, carefully removed it's twin fans, set them aside, then carefully removed the heat sink. Paste was hardened and brittle and came off w/o effort. I carefully cleaned the back plate of the chip to a mirror-shine carefully avoiding the microscopic resistors on the board (pretty sure they are resistors) with a DRY cloth on my fingertip and just rubbing (no solvents), and thoroughly rinsed the massive air-cooled heat sink for 5 minutes making sure any dingle berry dust particles were gone, then set it on a floor vent and turned the furnace up for 25 minutes to make sure it evaporated everything (it'll get HOT). My heat sink, had NOTHING electrical on it, and I was careful to avoid the silicon pads (it had a couple).

Applied new thermal paste (the WHITE stuff as you don't want the silver paste making a circuit by accidentally seeping onto the board and destroying everything) and carefully reassembled everything in reverse.

Finger-crossed, put it back in the case, and booted the PC back up, giving it time to warm and settle the paste, with eyes glued to EVGA P16 temps, which looked normal. After no crashes, booted up the game and watched the temp overlay, while fiddling with the resolution and in-game graphic settings to put some stress on it and temps were nearly 25c cooler vs. the previous time noted in-game (when it actually made it in).

So far, temps are back where they should be, no more video drops, and this 970 gets to live a little longer instead of being a paperweight. If you know how to do a CPU cooler replete with with thermal paste, cards should be similar/same. In this throw away generation where everything just gets replaced, sometimes it can be saved!

Sorry for walls of text people, just trying to be thorough. As always, IF working on anything in your
PC, use caution. Unplug the unit. (I unplug AC from wall, then turn on PC so it'll drain the caps in the PSU; unit will come to life for a brief second). Don't touch anything you aren't willing to outright replace. Repair/replace at your own risk!
 


cmqmidgold2.jpg


I bet you feel smart???

By the way. Almost all "silver" thermal pastes are non-conductive.