System crashes always about 1 minute after boot (reportedly a hardware problem)

Dec 9, 2018
2
0
10
Introduction

Greetings everyone.

One week ago I was on my computer playing online games with some friends on the afternoon. We had been playing for a while and now we were taking a break and just talking in Discord. Suddenly my computer restarts out of nowhere, this had never happened before. Ever since then, every time I boot up my computer, it stays on for a minute at a time after it reboots again without even presenting the BSOD.

I found that if I boot Windows in safe mode with networking this does not happen, the computer runs as normally as it has always run. That's how I am writing this post right now.

I will now list my system components and provide a comprehensive list of all the troubleshooting I have done so far.

System Specs

OS: Windows 7
Motherboard: Asus Z170-DELUXE
CPU: Intel i7-6700K
GPU: Asus STRIX GEFORCE GTX 1080 OC Edition
RAM: Gskill Ripjaws V DDR4-2666MHz 2x8GB (F4-2666C15D-16GVB)
PSU: Corsair CS850M
Water Cooler: Corsair H115i
SSD: Samsung 950 PRO NVME M.2 (512 GB)
Disk: Western Digital Black 3TB SATA III 64MB (WD3003FZEX)

Notes:

  • I have had this system for little over 2 years now, and the only problem I ever had was that nearly a year ago I had to replace the water cooler because its pump broke. There was no liquid spill, the pump just stopped working so I replaced it with a new one.
  • Although this system is perfectly capable of running in an overclocked setting, I have never done so because I never really felt the need to do it

Troubleshooting

Dump files: I started by checking the Windows dump reports (I have a lot of them by now), and it always seems to be a 0x00000124 bug check code (Hardware failure) caused by ntoskrnl.exe, with the exception of two occasions where it was dump_dumpfve.sys (second crash I ever experienced) and dumpstorport.sys (this one happened way after the first one). That's about all the information I can read from these dump files.

GPU: Right off the bat I noticed that the GPU fans were unusually loud. Through Asus GPU Tweak application, I noticed that as soon as Windows started the GPU's memory would be maxed out, without even launching any application, and that the fans were spinning at full speed, although the temperatures were perfectly fine, after all the system had just booted. By killing a certain process the GPU's memory usage dropped sharply along with the fans speed, but this did not stop the system from crashing as usual.

Right now I don't remember what was the process/application that I killed to lower the GPU usage as this happened a week ago, and ever since I have continued troubleshooting using the MOBO's on board graphics from here on. Removing the GPU did not solve the issue. I will reconnect the GPU, test the system again with it and update this post because I have forgotten the processes name by now and that seems to be relevant information.

Drivers: My immediate first suspect was the drivers, I didn't know exactly which one of them could be causing a problem, so I updated every single one I could think of (MOBO, CPU, GPU), but with to no avail. I ended up also uninstalling every Nvidia related driver and program I could find, as I read somewhere they could be causing this issue, but the problem still persists.

RAM: I ran the system with multiple RAM configurations: I switched the slots they were in, ran with only one of them, then switched it with the other, brought a couple of RAM's from a friends computer and booted the system with only these two, but the problem persisted. Similarly I also installed my RAM's on my friends computer and it worked just fine with them. I also ran MEMTEST64 for over 30 minutes and found no error. Checking the BIOS they are configured to run at 2133 MHz, and that has always been their configuration.

Water Cooler and Fans: Checked Corsairs LINK 4 application which lets me overlook the water cooler's functionality and everything seems to be in order. I have ran the system without the water cooler, then without the fans just to see if any of them could maybe be causing some instability in the MOBO, but the problem continued. The fans had some dust in them, but nothing remotely close to what could actually cause some system instability. I usually clean my computer once a month to keep it from gathering too much dust. My case has some very useful removable filters to help with this task, so the case itself was not too dusty on the inside, still I took the opportunity to make a more thorough cleaning session despite not finding anything out of the ordinary inside.

I should also note on this topic that I have been monitoring the temperatures of the components this entire time, and they have always been way bellow anything that could be considering problematic.

PSU: Since I do not have any equipment to test the PSU, I have removed it and installed it on my friends computer, completely replacing his PSU. We played a couple games and watched some videos online for a while and the system always ran smoothly, so I don't really think that could be the problem.

Disk: Running the system with the disk removed still crashes, and my friends computer ran just fine with my disk connected there, so no problem there.

OS: While the disk was on my friends computer, i created a 50 GB partition with a fresh installation of Windows 10. I have been meaning to update my OS for a while, but since there was no real need to go through the trouble it sat on my TODO list for the longest time. I was able to boot my friends computer in this new installation just fine, so I brought the Disk back to my system and attempted to boot Windows 10. Its even worse than in Windows 7, the system isn't even able to boot properly because as soon as the 'Please wait' phase ends, the system immediately reboots, creating an endless loop. At least in Windows 7 I have a minute or so to try and do something before the system reboots, so no help there either.

SSD: Naturally the SSD Disk is where my original Windows 7 partition is. Now that I had a second booting option I removed the SSD and booted the system, resulting in the previously mentioned reboot loop, so I just put the SSD back in.

Motherboard: Having already reinstalled all the drivers listed in the manufacturers page, removed and troubleshoot nearly all of the computers components, there isn't much I think I could do to troubleshoot the MOBO. I investigated it for signs of physical damage, like bloated or damage capacitors, connectors or sockets, but found nothing of the sort. The only thing I found was a USB connector that has a couple bent pins (my fault) and isn't even being used anymore.

I also reset the BIOS and CMOS by removing the battery from the MOBO for a minute. They both resetted and I was prompted to insert the date and time again, but the problem persisted.

CPU: Its temperature has always been in very acceptable values. When I first started troubleshooting I checked the CPU (didn't remove it from the socket) removed the thermal paste it had and reapplied a new coating, although that seems to be unrelated to the problem.

Last night I ran the (Intel Processor Diagnostic Tool) to find that the test fails in the IMC (Integrated Memory Controller) step. I am unsure on how seriously I should take this result as the diagnostic tool's page does not list my model (i7-6700K) as one of the valid products covered by the diagnostic tool.

BIOS: I felt a bit stupid that only after all of these ordeals I remembered about updating my BIOS. I had never done it before on this computer, and its version was from March of 2016. I updated it last night to its latest version from April 2018. Although the problem persists, its behavior changed: Before the computer would reboot itself, now the system freezes and I am forced to reboot the computer myself. I am not sure what to make of this, or if it is even relevant at all.


Conclusion

Its been a week now and I have been trying to diagnose the problem whenever I get some spare time. I have always been able to sort these things out by myself, but this time I am lost. I have never created a post online before because chances are there already is one or more posts that solve your problem, but I have already gone through dozens of similar reported issues and haven't been able to find a solution.

It still seems like it could be some corrupted driver, but shouldn't the fresh Windows 10 installation cover that scenario? The CPU does not overheat, the diagnostic tool's test fails, but this particular model is not listed on the supported products, so I cant fully trust this result. Maybe its the Motherboard, but I don't think I have solid enough evidence pointing in that direction.

I have the budget to go out and replace any malfunctioning component I might have, but I need to be sure that I am taking the right step as I don't want to end up sending money down the drain to find out that I bought a replacement for a component that was working fine all along.

Please let me know your opinion, what have I failed to identify so far? I am happy to provide any missing information if deemed relevant or run any other tests that could help identify the issue.


Thanks in advance!
 
Dec 9, 2018
2
0
10
After painstakingly testing every component in isolated systems I found out the problem was in one of the components I least expected initially: the CPU.

Its very rare that a CPU becomes damaged, especially considering I was using this one for over 2 years without ever having any issue, it had never overheated, the system was working behind surge protection, it had no visible signs of physical damage and I had not even taken the CPU out of the socket ever since I initially built the computer, so I am at loss as to why it became defective.

The CPU was still covered by Intel's warranty, so I contacted them and they were fast in scheduling a pick up of my defective model and sending me a brand new one. I installed it and everything is working smoothly as before.