System crash while gaming, requires hard reset, only once a day

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

furkandeger

Honorable
Nov 29, 2012
84
0
10,640
Hi guys,

I have been experiencing a weird issue recently. While playing certain games after the first boot of the day, I get a black screen crash. So I start playing, a couple of minutes later a sudden black screen occurs, no response from the system, no sound, no numlock/capslock toggle light working, in short system turns into a dead mode with power on. Fans are all spinning, everything seems to be running yet a black screen with no response. At this point pushing the reset button resets the pc only into a state like before - everything powered up but no boot, no logon, just fans spinning. Only way to restore the system is power off by holding power button and back on. This way system boots normally. You don't have to first reset and hard reset as hard reset in the first attempt does the work as well.

Here is what makes it weird. This only happens once a day. For example I finish playing, spend the night sleeping, wake up, turn on pc, play a game, a couple of minutes into the game black screen occurs, hard reset, play the same game from where I left and voila, no problems till tomorrow, I can play for hours and hours and nothing happens. (yes continuous gameplay for 8~9 hours at least, no problems.)

The games I experienced this are as follows:
The Witcher 2
DC Universe Online
Metro Last Light
Titanfall

As strange as it is, I don't experience this with other games such as Battlefield 4, Guild Wars 2, Crysis 2 or 3.

Why did I submit that in Graphics forum? Because a red led turns on on mobo that indicates gpu after I reset after the black screen.

Here's my specs:

Asus M5A99X EVO R2.0
AMD FX8350 Stock
Noctua NH U14S
Kingston 4x2GB ddr3 1600 cl9 ram sticks
SAPPHIRE R9 290X BF4 EDITION (REFERENCE) also no oc
CORSAIR CX750M
CORSAIR FORCE GS 128GB ssd
Two other hdds

Edit: forgot to mention that there is no log in the event viewer about this. Just kernel power which gives no information at all.

Accidentally selected a solution, can someone fix it?
 
Solution


Guys, I cannot thank you...
CMOS battery? After you turn it on the first time, before you start playing, tell it to shutdown / restart and see if it stays up, just as it would after a crash.
Thermal cycling? Remove the graphics card, blow out its slot, then reseat it.
 


Did I try pulling out cmos battery and resetting? Frankly, no I did not. However, I reset the bios settings to factory default before and it didn't help.

I also tried my GPU on another PCI-E slot (which is 8x) on my mobo and got the same error again.

As for restarting before playing, I will try it ASAP.
 


It runs fine after the initial restart. As I stated in the original post, after the restart forced by the black screen, I can play exactly the same parts of the same game and even more for hours without any problem.

First thing tomorrow, I will change the CMOS battery with a new one; however, I'm not sure I understand the connection between the battery and my crash. Can you enlighten me a bit? Thanks.
 
If the machine is "losing its mind" while off, due to the weak battery, replacing it may prevent that. After the initial crash, when you restart it, the system has been powered on for a while and settings should be consistent with hardware that has actually been detected. When you first turned it on though, there may be some anomaly because the weak battery wasn't enough for the CMOS to properly retain settings.
Edit: This may manifest as an inaccurate time in the BIOS before Windows loads the first time and hits a time server somewhere.
 
Just tried pulling the power cable from the wall and waiting for a while to boot the pc into bios. However system time was correct, all settings remained. I will still change the battery tomorrow, but seems like current one is working properly.
 
I didn't check the rams and I was thinking of it however I get a red led on gpu after a normal reset when black screen happens, which makes me think the issue is caused by the graphics card.

Btw, I accidentally selected your answer as a solution, can a mod fix it?
 


I have done some tests since last night, but that just caused everything to be much more complicated.

I've run memtest86 on each module for hours. After a couple of passes on each module, I installed all modules and did one test again. Not a single error.

After that I went back to Windows, ran 9 hours of OCCT GPU Test. Not a single error or failure.

I've run a couple of hours of Prime95 on CPU... No errors here as well.

This is getting ridiculous. I am pretty sure when I opened up Metro Last Light and played that last mission again, It would crash. Or I am gonna go run Witcher 2 Arena and It will crash. However stress tests doesn't give any results.

So what now? If it not for that red led on GPU, I wouldn't even be able to suspect GPU.

I would suspect mobo maybe, but I remember getting this same problem with another mobo as well, back when I had a Giga 870A-USB3 (didn't care then). Also tried putting the gpu on two different PCI slots, still no help.

I would RMA the card, but it doesn't give any errors in 9 hours of stress testing, so they would probably send it back saying "this one is fine".

I am running out of ideas. I was gonna think that maybe, just maybe one of the HDDs cause this, however I've experienced this same error with two different games on two different HDDs. So two of them being corrupt at the same time is not something I would think of and they too don't give any errors etc. in hdd tests.

 
I'm running out of ideas, as well. Given that you've encountered this primarily when you've overclocked, it could indicate a power supply or motherboard with failing power delivery, but you've also encountered it when not overclocking...which could still indicate the same, but I would expect consistencies in some form if that were the case.

MemTest86+ is pretty quick to sort out if you have a bad stick, so you shouldn't have to run for hours, but it is the most complete when you run it that long. Usually it will find a bad stick within a few minutes. Did you MemTest on all your DIMM slots on the motherboard, as well? I've seen motherboards that have bad DIMM slots but the RAM sticks were all good.
 
Could even go as far to say that power delivery BEFORE it gets to the power supply could also be an issue...if the area he lives in is prone to brownouts or he has issues with wiring at his residence, it could simulate power delivery problems that are PC related, but you would also see similar power issues with other appliances, etc.
 
PSUs with APFC shouldn't be quite as prone to line fluctuation issues, but MrFace's question is important because the Corsair CX is made with some inferior capacitors that cannot take heat and degrade; this would increase ripple and noise on the outputs, possibly leading to instability. This wouldn't explain why the machine would be stable after an initial crash though. When the machine is first turned on, are all fans running normally?
Here's a scenario...
1. Freshly turned-on PC is cool. Some fans may not even be running.
2. PC heats up. Fans turn on or accelerate. Sudden power draw introduces a fluctuation causing a crash.
3. Machine is restarted. Already warm, fans are running and there is no sudden change.
...if caps in the PSU are marginal, this could be why it crashes. If you can put a decent 550W-600W PSU in there for a test, it might confirm this or rule it out.
 
Thanks for your comments, guys.

I first ran memtest86 on each stick, using the first slot, then I put all sticks (have four) in place and tested again so all sticks and slots have been tested.

As to psu, this was actually the first thing to come to mind, yet I am pretty sure it is solid. It's one year old and already took the liberty of buying a brand new xigmatek vector s 750w, which can provide 744 watts from its 12v rail. I got the issue with both PSUs.

I'm trying some alternative solutions.

Someone suggested putting a 10mhz oc on PCI-E slot and giving a bit voltage to the chipset. I did that. Now PCI-E is 110 mhz and nb voltage is 1.2 instead of auto which set it to 1.113.

Also after suspecting the rams and finding out they are not faulty. I thought maybe it might be caused by the pagefile instead. While the hard drives that had the games installed didn't have any pagefiles, I noticed there is a pagefile in the system drive, the ssd. Now I disabled the pagefile on the ssd and instead enabled it on the other two drives.

I have done the following:

Shut down pc, turn it on
Played the last episode of metro last light (where the crash happens usually)
Shut down again, repeated twice.

This time no crash occured.

Right now it has been a couple of hours since I started playing Witcher 3 and still nothing.

Maybe something I did, did the trick or maybe I am about to see the crash. I don't know, I should test for more.
 


Guys, I cannot thank you enough for your support during this period.

I think I can finally confirm that I solved my issue.

Of all the things I tried, I rolled back whatever setting I changed and tested after every change. Turns out, what fixed this for me was actually PCI-E frequency. Just set it back to 100 and I got the crash again. And set it once again to 110, no crashes. Reverted NB voltage back to auto and no problems. I had also connected two seperate PCI-E power cables to the gpu and now I have connected it normally and still no problems. However whenever I set PCI-E frequency to 100 mhz from bios, I can get the crash. When it is 110, there isn't one.

I will be testing in the upcoming days to be completely sure, however I think I can say the problem is gone.

YET, there is something I wonder... In my particular case, what causes this? Is it because GPU is faulty/defected/whatever that it needs a PCI-e frequency of 110 mhz or is it mobos fault for not being able to support my graphics card's needs with 100 mhz of PCI-e frequency?

I will RMA my mobo in any case since its integrated sound chip has fried apparently, however I am not sure whether I should RMA my gpu or not. I mean, yes it won't hurt to RMA it anyway, but I just wonder. What do you think?
 
Solution


Today I am switching to Intel, so we will see how that goes :)
 


Thank you for all your support!

Just switched to Intel i5 4690K along with an MSI z97 Gaming 5 mobo. No issues, no nothing!

Seems like the mobo was the culprit :)