Question Frequent "WHEA Uncorrectable Error" BSODs except in specific scenarios ?

Mar 20, 2024
5
1
15
Hello, i am having constant Whea uncorrectable error ( rarely machine check exception BSOD, when in a whea BSOD loop) issues. At the beginning it was not frequent, later on it happened regularly or even stuck in a loop. After doing a completely clean win11 install (from win10), i was fine for 5 hours. Then BSOD happened again with more frequency until it happens 5-10 minutes with some exceptions. ( No big changes when BSODs started besides maybe win10 and/or nvidia driver update).

Examples in a clean win11 install:
Idling in microsoft edge browser 2h+, nothing happens, then going to control panel, BSOD happens in less than 10min, sometimes almost immediately while doing stuff;
Having discord or steam or malwarebytes etc. installed , when idling but being online will lead to BSOD (maybe with just one of the mentioned apps installed won't BSOD, dunno);
(Possibly, not sure) When PC not turned on for multiple hours, won't have BSOD until 30min or 1 hour or later;
After regular 5-10 min crashes when doing stuff, when idling and having everything uninstalled, running prime95 stress test 3+ hours there was NOT a BSOD;
Never BSOD when just idle on start screen or with edge launched and idling.

Attempted diagnostics, fixes:
Win mem diagnostic tool; 5 cycles of memtest86 (not+), intel processor diag. tool, prime95 about 4hours, crystaldiskinfo for SSD smart stuff before and after win reinstall, checking temps, voltages etc. (did not check PSU). Could only test 1 individual RAM, other is behind CPU cooler, only could check if it is seated properly (cooler isn't touching the RAM stick)).
Clean win 11 install, default bios from UEFI, clearing CMOS with CLRTC pin by following MB manufacturer's instructions, XMP on/off, properly reinstalling previous nvidia drivers, removing headphone jack and ethernet cable, checking everything is properly connected inside the case (cables, ram etc.), plugging PC in different outlets.

Specs:
intel i5 10600KF (never OC'd), Deepcool Neptwin white cooler;
Asus PRIME Z490-V-SI (latest bios, BSOD started 2+ years since latest bios)
Crucial Ballistix White, 16GB, DDR4, 3200 Mhz, CL16, Kit of 2 (8GBx2) (not OC'd above 3200)
MSI RTX 3060 Gaming X 12G (not OC'D)
Xilence 550W, performance X, 80+ Gold
Everything barely 3 years old now.
Win 11 home (not activated) version - 23H2, build - 22631.3296

dmps and event log
 

ubuysa

Distinguished
Hello, and welcome to the forum!

You were wise to run all those hardware tests because a machine check exception 0x124 BSOD is most usually a hardware problem. However, there are some rare occasions whean a bad driver can cause a 0x124 machine check and I think that may be the case here.

All four dumps are similar. At first sight they appear identical, but after closely examining the raw call stack they're not quite identical. All fail whilst both networking and storage operations were in progress, possibly buffering an audio or video stream.

Whilst the actual sequence of function calls leading up to the bugcheck does vary slightly in each dump, they all call the same third-party driver; mwac.sys. This is the Malwarebytes Web Access Control driver, the version you have seems recent, dating from December 2023...
Code:
3: kd> lmvmmwac
Browse full module list
start             end                 module name
fffff802`6a7c0000 fffff802`6a7eb000   mwac     T (no symbols)       
    Loaded symbol image file: mwac.sys
    Image path: mwac.sys
    Image name: mwac.sys
    Browse all global symbols  functions  data
    Timestamp:        Thu Dec 14 12:49:35 2023 (657ADDBF)
    CheckSum:         0003DF43
    ImageSize:        0002B000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:
Malwarebytes is generally a very robust product, as long as it's kept updated. I would first suggest that you look for an update to Malwarebytes, if it still BSODs after updating I would fully uninstall Malwarebytes (this can be done from normal Programs & Features).

If the BSODs continue after fully uninstalling Malwarebytes (and rebooting) then please download and run the SysnativeBSODCollectionApp. Upload the zip file that results. Also, please do not reinstall Malwarebytes or any other non-Microsoft product until your BSOD issue has been resolved.

This will give us all the troubleshooting data that we might need. The SysnativeBSODCollectionApp does not collect any personally identifying data, it's perfectly safe, and it's used by many well-respected help forums. You can look at the files in the zip file before upload if you wish, but please don't change or delete anything. Details on what these files contain can be found here.
 
Mar 20, 2024
5
1
15
BSODs persist, and i checked, when updating and uninstalling Malwarebytes, driver's date change and if it's gone from the folder.
Since BSODs don't happen if PC is idle, i had to do some stuff. So, just in case it could potentially be helpful, i added a txt file of what i was doing , some changes in variables, for all 5 BSOD dumps that happened after i followed Your instructions, as well as adding 2 dumps in an extra folder before i did what You asked.

SysnativeFileCollection
 
Last edited:

ubuysa

Distinguished
It's not Malwarebytes then, but please leave it uninstalled until you're sorted.

There are a couple of curiosities regarding your disk layout...
  • The System Reserved partition has been assigned a driver letter (D) and that's not normal. This partition isn't normally user accessed, all it contains are the boot files, bitlocker files, and some system recovery tools. Is there a reason why this partition has been assigned a drive letter? This is a minor issue but it's the first thing I noticed.

  • You seem to have two drives containing a Windows system partition structure on them. Your system drive (C) is the Samsung 870 SATA SSD, but the Seagate 500GB HDD also appears to have a Windows partition structure on it? The larger (Windows) partition on there is your F drive. Having two Windows systems accessible at the same time is know to cause all manner of problems.
The other thing that's obvious is that all of these dumps, and the earlier ones, fail as a processor comes out of the idle state. Here's a call stack from one dump, you read these from the bottom up (they're push-down stacks)...
Code:
3: kd> knL
 # Child-SP          RetAddr               Call Site
00 ffff8001`850d1908 fffff806`358ff80b     nt!KeBugCheckEx
01 ffff8001`850d1910 fffff806`317110c0     nt!HalBugCheckSystem+0xeb
02 ffff8001`850d1950 fffff806`35a0ed3f     PSHED!PshedBugCheckSystem+0x10
03 ffff8001`850d1980 fffff806`3590123a     nt!WheaReportHwError+0x38f
04 ffff8001`850d1a50 fffff806`35901690     nt!HalpMcaReportError+0xb2
05 ffff8001`850d1bc0 fffff806`35901524     nt!HalpMceHandlerCore+0x138
06 ffff8001`850d1c20 fffff806`359017c9     nt!HalpMceHandler+0xe0
07 ffff8001`850d1c60 fffff806`35900982     nt!HalpMceHandlerWithRendezvous+0xc9
08 ffff8001`850d1c90 fffff806`3590314b     nt!HalpHandleMachineCheck+0x62
09 ffff8001`850d1cc0 fffff806`35969769     nt!HalHandleMcheck+0x3b
0a ffff8001`850d1cf0 fffff806`3582997e     nt!KiHandleMcheck+0x9
0b ffff8001`850d1d20 fffff806`35829593     nt!KxMcheckAbort+0x7e
0c ffff8001`850d1e60 fffff806`6b1e156f     nt!KiMcheckAbort+0x2d3
0d ffffac88`c88774d8 fffff806`6b1e15c4     intelppm!MWaitIdle+0x1f
0e ffffac88`c88774e0 fffff806`35670c1f     intelppm!AcpiCStateIdleExecute+0x24
0f ffffac88`c8877510 fffff806`356705d1     nt!PpmIdleExecuteTransition+0x41f
10 ffffac88`c8877950 fffff806`3581c794     nt!PoIdle+0x361
11 ffffac88`c8877b40 00000000`00000000     nt!KiIdleLoop+0x54
You can see at the bottom that we start with this processor (#3) in the idle loop. When running the idle loop the processor C-State is lowered so that it enters a low power state. This is for power-saving and heat-reduction purposes. You can see that the machine check (nt!KiMcheckAbort+0x2d3) occurs immediately after calls to the intelppm.sys driver. This driver is the Intel CPU power management driver (AMD processor systems have a similar driver; amdppm.sys) and this is in part responsible for bringing this processor out of the low-power state and into the high-power running state. That we get a machine check whilst intelppm.sys is running power transitions (note the previous intelppm!AcpiCStateIdleExecute function call) suggests that the CPU might not be handling power transitions well.

I've seen this problem before on AMD CPUs and once or twice on an Intel CPU. If your BIOS allows you to modify processor C-States then please disable C-States for all processors. This will stop the processors from entering low power states and that will mitigate this issue.

If you can't disable C-States then in the active Windows power plan change the Processor Power Management section so that the maximum AND minimum processor power states are 99%. This too will stop the processors entering lower power states.

Let me know whether that stops the BSODs, also please let me know whether there are two WIndows installations active on different drives.
 
Mar 20, 2024
5
1
15
F: drive windows is not accessible. It's an old drive from a previous pc. There's stuff there that my family sometimes needs (documents, pictures, sometimes trying to remember a program they've used), so i left the drive untouched. (Yes, i've copied pics and docs elsewhere, but it's always: "Maybe you didn't copy it, check your pc." :D )
This pc has never recognized it as a bootable drive with windows since i've first got this pc, don't remember why, did i do something in the past or whatever.
But only the SSD, the C; drive , was in the pc, when i reinstalled windows, and when many of the BSODs happened. I put the E: and the F: drive back much later.
I do not know about the D:drive - system reserved. It has been there before i reinstalled windows. I can't recall messing with my SSD in disk management or wherever else. I can see 2 files there with my current access - "bcdbackup" and "recovery" txt file with 2016 last modified date.
I've disabled C-States in the UEFI. Now to monitor.
Thank You a lot for the help so far!

Edit: After disabling C-States, had a BSOD, though i don't know if C-States were actually turned off. I've put on 99% power state, with C-States off in bios. Stable 4100 Mhz, 35-40 C.
This is almost the longest i've been without a BSOD, so will need a couple days at least to be sure.
Edit: Was good while it lasted, BSOded :(
 
Last edited:
Mar 20, 2024
5
1
15
Hello, after i turned off c-states and put 8-10% min power, 99%max (didn't want to waste power by having 99% min all the time), i had rare BSODs here and there, but the pc was perfectly usable. Sadly, now even with 99% min power, it's back to BSODs in like 3min max , and after the restart in less than 1min or i'm in a BSOD loop, or i'm put in system recovery. Windows once rolled back like one day, but rest of the times it just fixes itself or i turn off the pc and i can get back in windows.
There's not been any changes in the pc, not that i know of, that made BSODs now happen all the time again.-Still, if i quickly launch a more semi-demanding game, BSODs will never happen. Though, if it is in the background, i will get a BSOD. Maybe, with a very demanding game, it wouldn't happen, but i obviously don't want this to be the fix for the BSODs.
I have 2 dumps. The one with the much smaller file size was when there was a BSOD loop and some windows shenanigans with recovery or diagnostics and stuff, so maybe that dump isn't proper. Of course, both dumps are with 99%min power, 99% max, c-states off.
I know for AMD in a similar issue there was some voltage increases as a fix. Been wondering, if i'm gonna need to play with the UEFI. I don't really trust what's happening there, ASUS with it's auto settings.
dumps

Edit: Also this is the first time, i get windows doing diagnostic repairs regularly after bsods, i had one time no audio, going into power setttings showed a error (File system error (-144...) (...-4686-A.../pagePlanSettings), and stuff like that. Had to go through couple BSODs so windows diagnosed and fixed it again.
 
Last edited:

ubuysa

Distinguished
This really does look like a CPU problem I'm afraid. Those two dumps are almost identical and I'll explain what's going on in some detail for you...

The call stack is a push-down stack used to hold retrun addresses and parameters for kernel function calls, because it's a push-down stack you read it from the bottom up...
Code:
6: kd> k
 # Child-SP          RetAddr               Call Site
00 ffffac00`62b49908 fffff804`446ff1eb     nt!KeBugCheckEx
01 ffffac00`62b49910 fffff804`470710c0     nt!HalBugCheckSystem+0xeb
02 ffffac00`62b49950 fffff804`4480eb2f     PSHED!PshedBugCheckSystem+0x10
03 ffffac00`62b49980 fffff804`44700c1a     nt!WheaReportHwError+0x38f
04 ffffac00`62b49a50 fffff804`44701070     nt!HalpMcaReportError+0xb2
05 ffffac00`62b49bc0 fffff804`44700f04     nt!HalpMceHandlerCore+0x138
06 ffffac00`62b49c20 fffff804`447011a9     nt!HalpMceHandler+0xe0
07 ffffac00`62b49c60 fffff804`44700362     nt!HalpMceHandlerWithRendezvous+0xc9
08 ffffac00`62b49c90 fffff804`44702b1b     nt!HalpHandleMachineCheck+0x62
09 ffffac00`62b49cc0 fffff804`44769539     nt!HalHandleMcheck+0x3b
0a ffffac00`62b49cf0 fffff804`4462a43e     nt!KiHandleMcheck+0x9
0b ffffac00`62b49d20 fffff804`4462a053     nt!KxMcheckAbort+0x7e
0c ffffac00`62b49e60 fffff804`6a4e41b2     nt!KiMcheckAbort+0x2d3
0d fffffe0d`29ebf498 fffff804`6a4e5870     intelppm!C1Halt+0x2
0e fffffe0d`29ebf4a0 fffff804`6a4e15c4     intelppm!C1Idle+0x30
0f fffffe0d`29ebf4d0 fffff804`444e1456     intelppm!AcpiCStateIdleExecute+0x24
10 fffffe0d`29ebf500 fffff804`444e0fcb     nt!PpmIdleExecuteTransition+0x426
11 fffffe0d`29ebf950 fffff804`4461cf44     nt!PoIdle+0x68b
12 fffffe0d`29ebfb40 00000000`00000000     nt!KiIdleLoop+0x54
You can see that the first function call is nt!KiIdleLoop+0x54, this is the 'idle loop' and its job is to find useful work for the processor to do. It's called when the ready queue of threads waiting to execute is empty. Typically it looks for DPCs to run, timers to update, and similar background work.

The next call is to nt!PoIdle+0x68b, which happens because the idle loop found no useful work to do. The nt!PoIdle+0x68b function will put this processor into an idle state (effectively stopped).

Then we see the nt!PpmIdleExecuteTransition+0x426 call, this is to the Processor Power Manager and it starts the process of placing the now idle processor in a lower power C-State.

Then we see two calls to the intelppm.sys driver (intelppm!AcpiCStateIdleExecute+0x24 and intelppm!C1Idle+0x30) these are Intel specific calls that place the processor in the C1 low power state.

Then we see a third intelppm.sys function call; intelppm!C1Halt+0x2. This function call actually halts the processor now that it's in a C1 low power state. This function is the one that failed, because we see the nt!KiMcheckAbort+0x2d3 machine check function call immediately following. That's the bugcheck happening in that intelppm!C1Halt+0x2 function call.

The intelppm!C1Halt+0x2 is incredibly simple, here is a disassembly of that function...
Code:
6: kd> uf intelppm!C1Halt
intelppm!C1Halt:
fffff804`6a4e41b0 fb              sti
fffff804`6a4e41b1 f4              hlt
fffff804`6a4e41b2 c3              ret
The STI instruction sets the 'enable interrupts' flag and the HLT instruction halts (stops) the processor (which is now in a C1 low power state). The processor will stay halted until an interrupt occurs. The machine check failure that you're getting is in this small function and is probably the processor struggling to exit the halt state and execute the RET instruction whilst still in the C1 low power state. That is almost certainly a CPU failure I think.

I suggest you now do two things...
  • Download and run the Intel Processor Diagnostic Tool and see what that says. It's not good at finding these low-power C-State problems though.
  • Start Windows in Safe Mode (with networking) and leave it in that state for 24 hours. Use it as much as you can (which won't be very much) and see whether it will BSOD in Safe Mode. If it does, and I expect it will, then that's a very clear indication of a hardware cause.
The hardware experts on here might be able to advise some CPU voltage changes you might try in the BIOS. That might help with this issue. How to do that is beyond my expertise however.
 
Last edited:
Mar 20, 2024
5
1
15
Yeah, thanks for Your help. I've tried the voltage changes with some help, but it doesn't seem to affect things. Maybe if go way higher with the voltage, but, eh. I'll be upgrading my PC soon to 7800x3D, so i'll give up.
In safe mode it still crashed, and i kinda have a hard time doing the intel diagnostics again. The BSODs started to get much worse, but that was before i tried voltage changes. I'd have to turn on prime95 stress test after turning on the PC so i don't get a BSOD in the first 10 sec after getting to windows. Then i could do some stuff on the PC. I'd launch a game, then turn off the stress test to game.
Curiously, for the very first time i started also getting watchdog BSODs sometimes , when the BSODs started getting even worse before trying voltage changes.
But yeah, time for an upgrade. Thanks for Your help. I managed to squeeze couple months of the PC thanks to You.
 
  • Like
Reactions: ubuysa