Stuck and blaming the Motherboard

JClark02

Commendable
Dec 15, 2016
2
0
1,510
Hello,
I apologize in advance for this being an essay. First time posting here but I have been lurking for years and finding solutions to problems I have hit from time to time. Now I am completely stuck and unsure of what to do. Any help or ideas would be appreciated.

Encountered the WHEA_UNCORRECTABLE_ERROR out of no-where a couple of days ago and it has been happening almost non-stop (I was lucky to get 3-4min of time to troubleshoot and look into it before it occurred again). The error (0x0000124) pointed to the processor having problems. After a bit more digging other causes looked like they could be driver related, windows related, or hardware related (potentially something other than the processor even though that was singled out in the logs?). This error continued in safe mode as well (not sure if this helps at all?).

I unplugged everything aside from the keyboard, mouse, and screen then managed to update my BIOS, chipset, Graphics Card, and windows between the WHEA_UNCORRECTABLE_ERRORs without any change. With the drivers updated I decided it was time to move on to hardware. This error continued in safe mode as well. I downloaded the intel processor diagnostic tool and ran that test 3 times. First time the WHEA_UNCORRECTABLE_ERROR hit at the tail end of it and the next 2 times I was able to successfully complete the test. At this point I moved on to the memory. I ran the windows memory diagnostic tool a few times with repeatable results of having no problems. Took the ram out and cleared as much dust as I possibly could (there was really none to start with). Put it in and re-ran the test. All passed again. I have never had any issues with my graphics card and wasn't even sure how to run a diagnostic on that. So I think this remains one area that is uncharted unless I'm missing something. I then reset the motherboard by unplugging the power, pulling the battery out, and letting it sit while I went to work. No change to the error messages and restarts when I got home and booted it up.

At this point I really wanted to rule windows out so between all the restarts I managed to get all my important files backed up to an external hard drive. I started a windows 10 re-install on my SSD, got hung up because of error messages saying the SSD was locked, found some bad advice on the web about editing the registry through a shift + F10 to a command prompt and the problem became unfixable... endless loop of restarting with no way back into the command prompt (if anyone knows how to stop this please help!?). Either way when I get this thing back up and running I am planning to reformat that drive. With that error I thought this actually may be all caused by the SSD so I used another hard drive (3 year old SATA hard drive) to try a clean install of windows with the SSD unplugged... before the install could even get going the WHEA_UNCORRECTABLE_ERROR message showed itself again!

Right now I think I've managed to rule out the hard drive without question, and windows 10 as this was happening on a fresh install as well. What I'm not sure of is:
-Have I ruled out the processor with the intel diagnostic tool?
-Have I ruled out the RAM with the memory diagnostic tool?
-Can I rule out the graphics card?
-Can the CD/DVD drive even cause something like this?
-Can the power supply cause anything like this?

My hunch is the motherboard being the problem and ASUS said they are willing to RMA it after explaining all of this to them... (I actually think they went cross-eyed and gave up on trying to help me).

Any help would be great! At this point my next plan is to get ahold of intel customer support and discuss how reliable their diagnostic tool is and then RMA the motherboard if the processor can be ruled out... what do you guys think? Is this a good next step or is there something else you guys would try first? What do you guys think the problem could be? Any advice is helpful.

ASUS Maximux VIII Hero
Intel i7-6700K
2x Corsair Vengeance LPX 16GB
EVGA Geforce GTX 980 Ti
Corsair RM850
Patriot Ignite 480GB SSD (OS and Prepar3D flight simulator)
WD 2TB (all the other software/storage)

The plan was to overlock someday but that was 9mo ago when I built this and the overclocking hasn't happened yet.... So at least everything is still covered under warranty.


 
Solution
Windows Hardware Error Architecture errors are normally CPU errors, so I would look at it over any other part in PC. I mean, they can be solved with BIOS updates and possibly chipset drivers - so make sure you have latest of both, but I would run INtel Processor DIagnostic Tool first and see what it says. but since you have... and it bsod three times during test, I would perhaps run Prime95

How old is the CPU? Might need to rma it.



Windows Hardware Error Architecture errors are normally CPU errors, so I would look at it over any other part in PC. I mean, they can be solved with BIOS updates and possibly chipset drivers - so make sure you have latest of both, but I would run INtel Processor DIagnostic Tool first and see what it says. but since you have... and it bsod three times during test, I would perhaps run Prime95

How old is the CPU? Might need to rma it.



 
Solution
The WHEA_UNCORRECTABLE_ERROR (0x0000124) is usually a fatal hardware error and a systematic approach is required to resolve it. Checks to RAM, HDD, PSU, GPU and MB will have to be tested.
In some cases it can be Driver related. Not so in your case.
As you have attempted to re-installed the OS, concentrate on your hardware.

First check if the system created a dump file in the root directory .dmp which can help identify the culprit.
You can look at Event Viewer in windows for clues if you can, to see if it lists the fatal error.
Have you tried booting in Safe Mode.? Probably not if the error happens immediately.

You may have to swap out hardware till the culprit reveals itself.
Start with RAM. run Memtest86 by booting from a USB stick. That's better than the Windows test.

Do you have HWMonitor.? It can advise on temperatures and rail voltages and conditions when the system is under load.
Aida64 is a good stress tester if you can download and use it that is.

See how you go and if no help then I suggest you bench test the system outside your case.
Start in a minimalist state then add hardware progressively till the culprit is found.

 
Carefully remove the CPU and look for a bent pin (a jeweler's loupe may be helpful). After nine months, thermal cycling might have flexed something that wasn't perfectly aligned in the first place, and now contact is no longer good. A dressmaker's pin (long, has a ball on the end of it) is a good tool to prod any back into position.
 
I ended up starting with the processor. Intel support is top notch. 1st call and instead of blaming something else they said it's either the motherboard or CPU, asked for a bit of CPU information, and got me a new one. Problem solved.