[SOLVED] Help! My Gaming Rig Is Crashing

Sep 22, 2018
11
0
20
Hello. I am hoping someone here can help me diagnose (or better yet, resolve) what's wrong with my computer. A few weeks ago I started experiencing instability which I believe is hardware related. The symptoms are rather peculiar and I've had a hard time tracking down the cause.

I have an MSI PG67A-GD65 motherboard with an Intel I7 2600K CPU. My GPU is an EVGA Nvidia GTX Titan X. I assembled the machine myself and have been using it for several years, having swapped in the Titan somewhere along the line.

The pattern I have experienced is the following:
- Computer will boot up, and work for a few minutes, if I let it rest for a few hours or more.
- After running for about 15-20 minutes, the mouse cursor will become laggy, then the computer will freeze.
- Computer will have difficulty booting after this. Most of the time (not always) it manages to get to the point where it loads windows, but it will invariably freeze/reboot at this point.
- If I turn off the computer and let it "rest" for an hour or more I can usually boot it up again and run it for another 15-20 minutes.

I have used Furmark to run some stress tests on the machine while monitoring with MSI Afterburner. The GPU happily crunches Furmark without issue, and temps top out at 86c, which is on the high side, but not insane. CPU temps remain solid at around 45c or less at max load.

Curiously, I have noticed that often as soon as I stop the Furmark test, and the GPU temps start falling back down, the machine will lock up and freeze as described above. This may just be a coincidence, however.

I have also attempted to run memtest86 from a USB drive to check RAM, but it will not run. There is an "SMP boot error" when memtest86 tries to start up, and then it finally loads the main memtest screen, but it just hangs there forever and won't respond to any keyboard commands.

I originally began experiencing this problem while my machine was running Windows 7. While I was debugging this I reformatted the hard drive and installed Windows 10, but the issue persisted, with the exact same symptoms. For this reason I am further inclined to believe it is a hardware issue.

Any ideas what might be causing this? Given the fact that letting the machine sit for a bit allows it to start up, I thought it might be a temperature issue, but temps never really get super high, and the crashes tend to happen even when the GPU and CPU are cool, and not while under max load.

Any assistance in troubleshooting this issue would be greatly appreciated!

 
You can test the power supply yourself if this is the problem. Start up the PC and operate as normal until it shuts own again. Then switch the power supply to off, and then press and hold the power button for 15 seconds or so. Then unplug the PC and wait for a minute or two.

Then take the side of the PC off. Then place your hand near or on the power supply to see if it is hot (don't burn yourself). Under normal conditions the power supply should not be hot. If it is the power supply is faulty, It may have a fan issue or an electrical issue. But in any case, I would replace the power supply if it is hot.
 
Thanks for your replies! I did the testing you suggested, and didn't notice any issues with the PSU being hot. On Sunday I did some more investigating and found that leaving the machine to sit for a few hours doesn't necessarily ensure that it will start up, so it's possible that there is some general instability and that the heat theory is a red herring.

Perhaps I have some bad RAM? Would it be possible that some bad memory sectors could cause this kind of instability? I find it weird that attempting to boot memtest86 would be so problematic, unless there is something fundamentally wrong with either the CPU, the motherboard, or the RAM itself.
 


That's an old unit, I would think about replacing it.

And you don't need more than a 650W or 750W PSU.
 


I got the PSU when I was running an SLI setup, which required a lot more power, which is why it's so beefy. But in any event, my understanding is that the main thing that happens to aging power supplies is that they generate fewer watts than the box advertises. But as you point out, losing a few watts isn't going to impact me with my current setup. So I'd be hesitant to replace the PSU until we can rule out the other factors... bad memory or a bad mobo seems a lot more likely to me.
 


Depends on the actual PSU, they don't get better with age...

And that one has terrible cooling so it's obviously been running hot the whole time you have had it. It's more for servers that have a high speed fan mounted right in front of it blowing cool air into it.

 
The other possibility in the power supply is that it is detecting a fault. The power supply will shout down the PC in that event too.

The fault can be within the power supply itself or the cables or in the rest of the
PC.

If you have another power supply (with the necessary watts), then you can switch it out to test it. Otherwise, you may end up just replacing it.
 


That's a good idea, I think I can borrow one from my buddy and swap it in.

However, I think another knock against it being the power supply is that the failure causes the machine to reboot, not shut down completely. And sometimes before it reboots the screen will either freeze or go black for some period of time while the machine is still on (fans are humming, LEDs lit up, etc.) It's not a hard power-off which is what I'd expect with a PSU fault, although perhaps that's a naive assumption. Originally, I had thought maybe the GPU is at fault, but it seems unlikely if it's ripping through a Furmark test just fine.
 


Good call, I'll check when I get home from work.
 
You could be having more than one problem. Freezing is usually software related. An incompatible or corrupted device driver, a corrupted or missing system or even a corrupted motherboard BIOS are all possibilities. They are difficult to troubleshoot. So it is often easier to just update all of the device drivers. I would start with the graphics driver and then the motherboard driver.

Then if that doesn't fix the issue then do a fresh install of the operating system. And lastly update the motherboard BIOS. I put that at the last because it isn't a great idea if the system is unstable.
 
Yeah, as mentioned I have already reformatted the drive and reinstalled Windows, but it didn't resolve the problem... and it was difficult to get an install run to actually make it all the way to to the end, since the machine rebooted a couple times during the process and I had to start over. I can try updating the BIOS... I already cleared CMOS, though, which should have reset the BIOS to its factory settings, and that did not help. So I'm pretty confident it's a hardware issue at this point.
 


Updating the motherboard BIOS is completely different from resetting the BIOS to the defaults. The BIOS is like the instructions for the operating system to use the drivers. Resetting the BIOS defaults just clears any setting that you made. Updating the BIOS changes the instructions all together.
 
If you are confident that it is a hardware issue, then I would recommend bread-boarding the motherboard. It will help isolate the problem component.

Here are three sets of bread-boarding instructions. Read through them and pick one to follow.

http://www.tomshardware.com/faq/id-2176482/breadboarding-stripping-basics-troubleshooting.html

http://www.tomshardware.com/forum/262730-31-breadboarding

http://www.tomshardware.com/faq/id-1753671/bench-troubleshooting.html



If you don't have a case speaker, here is an example (also available on Amazon).

http://www.newegg.com/Product/Product.aspx?Item=N82E16812201032&cm_re=case_speaker-_-12-201-032-_-Product