Assistance troubleshooting Bug Check 0x124: WHEA_UNCORRECTABLE_ERROR

Traemandir

Honorable
Oct 17, 2013
23
0
10,520
Hi Everyone,

Just looking for help troubleshooting a reoccurring BSOD on my Ryzen build. Every now and then I'll go to wake my computer from it's idle state, to find it quietly reboot and start over from the UEFI splash screen. When checking event viewer, I've found multiple logs for unexpected shutdowns followed by bugcheck 0x124, WHEA_UNCORRECTABLE_ERROR.

My troubleshooting process has been fairly messy, as I've had multiple problems and hardware changes lately. I'm not sure if any of them are directly related to this BSOD or not, but I figured I'll just lay out all my recent hardware changes, as well as troubleshooting info:

1) I have a brand new motherboard installed. Long story short, I did some sketchy installation practices while adding in my first custom water loop, and the computer was hard crashing within minutes of booting. New motherboard did fix this, thankfully...
2) I upgraded my graphics card while doing the motherboard swap .... I know I should have waited to verify that everything was stable before introducing a new hardware change, but I didn't want to wait and drain my loop AGAIN later to add the new card.
3) Performed a clean installation of Windows 10 Pro, most recent Fall Update ISO. Installed the latest drivers for everything.
4) My Corsair Dominator memory was giving errors in Memtest86. Rather than starting an RMA, I replaced them with a more Ryzen friendly Gskill TridentZ kit, for good measure. This new kit is on the motherboard's compatibility list, while the Dominator kit was not.
5) While I have been playing with overclocking since I made this build back in May, I reset UEFI settings to factory defaults for troubleshooting.
6) A few weeks ago I was getting some sporatic error logs for Kernel-Processor-Power in event viewer. This seems to have been resolved by nerfing my 3200 factory recommended memory settings to the UEFI defaults for 2133.
7) CPU temps are fine, latest version of Ryzen master shows 32C on idle.

A couple thoughts... I kinda doubt it's a corrupt driver, as I'm on a clean install with the latest drivers. If I had a corrupt driver, there would be a lot of other people bluescreening? BSOD tells me it's probably a hardware failure, but the information on the stop code is very vague. It says it could be anything ranging from the processor, to the system disk. I'm suspicious of my M.2. SSD because the windows installation process wouldn't identify the drive as drive 0, and instead prioritized my SATA drive. Not sure if this is normal, but in my experience Windows installer has always prioritized the fastest drive as drive 0. I'm also a little suspicious of my CPU. I have been overclocking it, but more importantly it was a little manhandled tightly to the socket on my old motherboard. On the old board, I didn't have the original AM4 backplate I needed for my water block. Since i'm impatient I hacked together the installation.... it worked well for 2 weeks but then started hard crashing and freezing too frequently to use (R.I.P.). Also a little suspicious of my GPU just because its new, and has existed as long as I've had this BSOD error.

So what do you guys think my next troubleshooting steps should be? Not sure how to reliably diagnose the suspected components without haphazardly RMA'ing them all.

Thanks for the help everyone, hardware specs are the following:
- Asus ROG Crosshair VI Hero
- Ryzen 7 1700X
- G.Skill F4-3200C14D-16GTZSK dual channel memory kit
- Asus ROG Poseidon 1080 TI
- EVGA SuperNova 750 P2
- Samsung 960 250GB EVO M.2 (System drive)
- HyperX Savage 240GB SATA (Normally I have two in a RAID 0, but can't fit them both in my new case until I get a mounting adapter)

Other irrelevant(?) specs:
- Phanteks Enthoo Pro M tempered glass
- A pair of Thermaltake Riing Plus kits
- Fancy Cablemod cables for the PSU
- ... A ton of HKWB loop parts
 

Traemandir

Honorable
Oct 17, 2013
23
0
10,520



Yes, I'm on BIOS 1701. Sorry I forgot to mention that!

Not overclocked right now ... I thought it was probably just blue screening because of my overclock, so I had already restored UEFI settings to their optimized defaults. Unfortunately this didn't solve the issue.

Here's a link for the latest BSOD's .DMP:
https://drive.google.com/file/d/0B-nGHtWxCOG_UW85NDhxUVgxSHM/view?usp=sharing
 

Traemandir

Honorable
Oct 17, 2013
23
0
10,520


Thanks for the link. I gave that thread a read through, but it looks like their problem was not enough PSU power. My PSU has more than enough wattage for my config, and it's a fairly high end one as far as reliability goes. Are you thinking there is a problem with the PSU? Or should I just camp out and wait for a BIOS update? :p When I get home I'll check my UEFI settings for any defaults that look like overclock settings ... such as "Multi Core Boost" on the new coffee lake boards. As far as I know though, my board doesn't have any defaults that overclock.

Not sure if this helps, but I have another event log from WHEA-Logger that exists just after the bugcheck:

"A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 2

The details view of this entry contains further information."