Question Friend's PC is going haywire after GPU died, I have no idea what could be wrong.

Cyber_Akuma

Distinguished
Oct 5, 2002
274
3
18,785
0
For reference this is my friend's build: View: https://i.imgur.com/w8lbuhL.png


We built it in 2019, it's for gaming and video editing. Original plan was to use an Intel CPU with an iGPU and buy a better GPU later, but since AMD was overtaking Intel significantly at the time and that CPU power would help with his video editing work, we decided to go AMD and just get a "temporary" cheap used GPU for the time being.

Anyway, it was working fine until about two weeks ago when he told me he was having issues with games crashing, weird visual issues, and youtube videos messing up. This sounded like a video card issue, and since we weren't able to meet, I advised him how to try to update his video drivers since they were rather out of date. He claimed this fixed it.

A week later he contacted me again saying it was happening again. We arranged a time for me to come over to take a look.

When I got there, the first thing that happened was the PC displayed a checkerboard pattern all over the screen in Windows, which he claims had never happened before. There was also shimmering pixels in games and vertex explosions. Seemed definitely to be the video card to me, but just to be safe I ran a virus scan, reflashed the card's BIOS, updated the motherboard's rather old BIOS, took out and cleaned out the card and case with an electric air-blower, and ran the latest beta of memtest86+. Other than what seemed very likely to be the card failing nothing else seemed out of the ordinary.

We decided to hunt for a used GTX 1060 on eBay as those were being fairly affordable now and I would get back to him once I could find a good one for a good price.

However, he contacted me later again saying now it was giving BSODs on boot and bootlooping, eventually ending up on Automatic Repair. I figured his card was dying even further and preventing Windows from even booting anymore, but he needed to use his computer for work and the 1060 would take a while, so I suggested I could get a very cheap card and temporarily install that in the meantime. I needed a cheap low-power GPU for debugging anyway, so I got a GT 720 for about $17 on eBay. I tested the 720 in my 11700K system when it arrived with Furmark, OCCT, 3DMark, and a few hours of different games. The card seemed to run just fine (Well, if you can call the performance of a GT 720 "fine").

I arrived at his house, removed the 770 to install the 720, that's when I noticed that the clip to retain the card in the PCIe slot had snapped off (Likely my fault from last time, that thing is VERY hard to reach). However, since it wasn't broken I plugged it back in, I also scanned the area around it with a flashlight and didn't notice any damage to the board. I also managed to find the USB-C cable from the case, something we was not able to find when we originally built the system in 2019, and plugged it into the motherboard's USB-C header. He also wanted me to install a Blu-Ray burner he had just gotten. I plugged in an additional SATA power cable into the PSU, but he could not find his motherboard for for a SATA data cable nor drive screws so we put that off, leaving the SATA power cables plugged in. That's when I noticed that the PSU was not secured to the case. I have no idea if we forgot when we initially built it (I doubt it) or if we by accident one day unscrewed the thumb screws holding it down when unscrewing the case, but I re-screwed it in, I really hope that did not cause any of the problems somehow.

Once all was said and done I turned it on and it worked fine, so I shut it off again and put it completely back together... then.... still BSOD-bootlooping, and every time it did so the error message was different. From common ones like "irql not less or equal" and "page fault in nonpaged area" to ones I had never seen before like "dcp watchdog violation" and "pfn list corrupt". Googling these wasn't too helpful as it basically said it could be nearly any of the critical hardware components.... or that it could be the Windows install or drivers. The motherboard's BIOS seemed to have everything set correctly so I figured maybe the Windows install got damaged from all the reboots from a damaged video card. I ran another memtest86+ while preparing a Linux Mint LiveUSB and it passed again.

So I booted into the Linux Mint USB to run stress tests to make sure the hardware is working fine. However, I could barely get that to even boot. It USUALLY got to the desktop, but after that it would randomly completely lock up, being unresponsive to even CTRL+ALT+DEL. Other times it gave a message that seemed to be the Linux equivalent of booting into safe mode, and apps would randomly crash. I couldn't even TRY to run any testing.

Sometimes the system would work fine.... usually when for some reason I had all the covers off, and usually once fully re-assembled it would start giving problems. But this wasn't always the case. I tried unplugging that USB-C header again, using a different PCI-E slot, re-seat the RAM, re-check all the PSU connections, nothing I tried would help.

This was worrying, if even a LiveUSB was crashing this seemed like a pretty bad hardware issue that could be anything. My friend suggested to just try the old GTX 770 again. I reconnected it just to test and... Windows booted just fine, other than the graphical glitches of course.

Thinking maybe the system for some reason didn't play nice with said GT 720, we decided to just re-assemble it with the broken 770 for now until the 1060 comes in.... but when I re-assembled it, BSOD on boot with random errors again (sometimes not even displaying an error on the BSOD screen).

I am completely at a loss at this point. I have no idea where to even begin trying to figure out what could be wrong, or how to even try to test it. If even live disks I am booting into are crashing I don't know what to do. We don't have any spare parts on hand to try swapping the PSU, CPU, Mobo, etc. Not even a decent second computer to test with. In fact, I had to bring all of my own tools and even USB drives as he didn't have ANYTHING backup on hand.

I really hate the fact that I am failing to help one of my closest friends with this. We decided to meet again in a few days once the 1060 and some of his other parts come in to try again. Anyone have any ideas what this could be or how to even begin trying to troubleshoot this without basically just buying a new Motherboard, CPU, and PSU?
 

Ralston18

Titan
Moderator
The first culprit may be that 3 year old Seasonic 750 watt PSU. Especially if it has a history of heavy gaming use.

Boot up his build.

Look in Reliability History and Event Viewer for error codes, warnings, and even informational events captured just before or at the time the PC experienced some problem.

Do you have a multi-meter and know how to use it or know someone who does?

FYI:

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

Not a complete test because the PSU is not underload. However any voltages out of tolerance make the PSU suspect.
 

Cyber_Akuma

Distinguished
Oct 5, 2002
274
3
18,785
0
I have a multimeter, though I mostly used it to test components and continuity on electronics projects, never used it to test a PSU. I do have a PSU tester though (... somewhere). I know it can't test it under load, but it can check the voltages and the PG signal.

And is there any decent way to get into the event logs if I can't boot the OS? Another issue with those logs is that booting into a Linux LiveCD/USB seems to change the system time, which throws off the timestamps of the logs.
 

Cyber_Akuma

Distinguished
Oct 5, 2002
274
3
18,785
0
Well, this just got even more confusing.

I have the PC at my house now and the first thing I did was disconnect the SATA drive and check the BIOS settings. Everything seemed normal so I ran the latest beta of Memtest86+ .... errors.

I tried swapping the RAM.... nonstop instant errors

I tried single sticks, nonstop errors.

I went through the BIOS again, and noticed that there was an "auto" setting to overclocking the CPU that seems to have been enabled by default. I did mention I updated the motherboard BIOS, and I know updating it can break overclocks even though we had never manually overclocked the system. I tried disabling both that and XMP... no more errors.

I tried enabling XMP again, no errors. Tried enabling that auto overclock..... errors. Disabled it, no more errors, even with two sticks.

I let it run like this for over 15 hours and multiple passes of memtest86+... no errors.

Just to be through I also ran memtest86 free... no errors.

For the hell of it, I decided to also run the latest beta of memtest86..... nonstop errors. Rebooted just to double-check..... nonstop errors.

After I was done pulling all of my hair out, I tried to see if the previous memtest86+ would give errors but it felt like the system was not posting and just ... not even bootlooping, more like post-looping? Not even a display on the screen. I have a PC speaker attached actually and it wasn't doing the usual beep. After a poweroff and on though it booted, and I noticed that the BIOS has been reset to defaults. I enabled XMP again and when I went to check on the CPU overclock to disable it, it instead gives me a warning that I need to agree/accept that it can cause instability before I do it... I just chose no.

NOW memtest86 beta is passing... so far. But having had nearly 24 hours of various RAM tests work perfectly fine only for the system to suddenly start having nonstop errors and then reset it's BIOS has me even more confused than before what's going on.
 

Ralston18

Titan
Moderator

Cyber_Akuma

Distinguished
Oct 5, 2002
274
3
18,785
0
Someone suggested to me that the board might be switching BIOSes and their settings on me, especially during the times it seems to take long to post/reboot.

I checked and yeah, it had booted to the other BIOS despite there being a physical switch for that. I checked and it was on version F4, which was the first-release version. I had updated it to the latest version, F36f

So I used the other switch to force it into single BIOS mode and it's showing version F36f again.

I chose "Load Optimized Defaults" and checked what it defaults the CPU overclock settings to. It seemed to thankfully have them disabled now... I think, there was still some stuff set to "auto" that sounded like it still overclocks the CPU a bit, so I set it to "Disabled"

The problem is that there are a few dozen settings that all sound like they could be overclock related that all default to Auto.

Annoyingly, there is also now a grinding noise at boot that goes away after a few seconds. I checked every fan I could and none of them seem to be making it, the only other fan I can think that might be the cause is the chipset fan, but it's far too small to make a noise that loud.

It was also making two different beeps now when posting.... at first. The first one would be lower pitched, kinda shrill, almost sounds like a negative/error beep..... except then it does the usual positive beep and posts/boots like normal. It looked like it was working fine.... except during one reboot when memtest86 started instantly failing across the board all over again. Reboot, turned off XMP, it was working.... reboot, turn XMP back on again... still working.

It feels like the RAM will just randomly have instant thousands of errors sometimes.... and other times it can literally run tests for over 15 hours straight and not have a single issue. Nothing about this is being consistent and I have no idea where to even begin looking now.
 

Cyber_Akuma

Distinguished
Oct 5, 2002
274
3
18,785
0
Neither BIOS seems to work 100% of the time, both can seem to be working for hours... and then suddenly the RAM has errors everywhere. I just set it to single BIOS mode to eliminate the BIOS being switched on me and further adding to the confusion. Also the F4 version of the BIOS is the very first release version and lacks a lot of fixes and support that was added, such as that USB issue AMD cpus had.
 

ASK THE COMMUNITY

TRENDING THREADS