Question How to fix this "WHEA UNCORRECTABLE ERROR" ?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Jul 11, 2024
23
1
15
Hello

Laptop model: ASUS X515JA - BQ1575
Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz 1.20 GHz

For some time now I sometimes get "WHEA UNCORRECTABLE ERROR".
In most cases it occurs when laptop runs on battery but it occassionally crashes when plugged in.

- I've started with MyAsus Diagnostics - result - - - > driver suggestion with no additional data so I don't now which particular driver - how to verify which driver should be updated?.
- I run CMD (administrator) and tried SFC /SCANNOW + chkdsk C: /f
- also DISM /ONLINE /CLEANUP-IMAGE /RESTOREHEALTH
- chdcked Display Adapters - drivers seem to be up to date
- I removed Realtek from Network Adapters
- I run WINDOWS Memory Diagnostics Tool - no errors
- in Power Options I've changed max CPU usage to 90%
- I removed laptop cover, cleaned the fan and checked if all cables stick well.

Nothing helped.
Next, I checkedCrystalDiskInfo with no errors
I've checked Display Drivers again and changed the driver.
As a last hope solution I've reinstalled Windows with BSOD crash returned after several days.

- - - - - - - - - - - - - - - - - - - - - -

So, I've analyzed dump file:

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffffde0cf39e2028, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 00000000b2000000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000030019, Low order 32-bits of the MCi_STATUS value.

Debugging Details:
KEY_VALUES_STRING: 1

Key : Analysis.CPU.mSec
Value: 3343

Key : Analysis.Elapsed.mSec
Value: 4474

Key : Analysis.IO.Other.Mb
Value: 0

Key : Analysis.IO.Read.Mb
Value: 0

Key : Analysis.IO.Write.Mb
Value: 0

Key : Analysis.Init.CPU.mSec
Value: 453

Key : Analysis.Init.Elapsed.mSec
Value: 3810

Key : Analysis.Memory.CommitPeak.Mb
Value: 103

Key : Bugcheck.Code.LegacyAPI
Value: 0x124

Key : Bugcheck.Code.TargetModel
Value: 0x124

Key : Dump.Attributes.AsUlong
Value: 1008

Key : Dump.Attributes.DiagDataWrittenToHeader
Value: 1

Key : Dump.Attributes.ErrorCode
Value: 0

Key : Dump.Attributes.KernelGeneratedTriageDump
Value: 1

Key : Dump.Attributes.LastLine
Value: Dump completed successfully.

Key : Dump.Attributes.ProgressPercentage
Value: 0

Key : Failure.Bucket
Value: 0x124_0_GenuineIntel_PROCESSOR__UNKNOWN_IMAGE_GenuineIntel.sys

Key : Failure.Hash
Value: {5371cb52-c3d9-558e-47d4-d31c09567ca2}


BUGCHECK_CODE: 124

BUGCHECK_P1: 0

BUGCHECK_P2: ffffde0cf39e2028

BUGCHECK_P3: b2000000

BUGCHECK_P4: 30019

FILE_IN_CAB: 071024-8140-01.dmp

DUMP_FILE_ATTRIBUTES: 0x1008
Kernel Generated Triage Dump

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT: 1

PROCESS_NAME: firefox.exe

STACK_TEXT:
fffff804`25e43908 fffff804`1faffb5b : 00000000`00000124 00000000`00000000 ffffde0c`f39e2028 00000000`b2000000 : nt!KeBugCheckEx
fffff804`25e43910 fffff804`1edb10c0 : 00000000`00000000 fffff804`25e439e9 ffffde0c`f39e2028 ffffde0c`f1346d10 : nt!HalBugCheckSystem+0xeb
fffff804`25e43950 fffff804`1fc0e8df : 00000000`00000000 fffff804`25e439e9 ffffde0c`f39e2028 00000000`00000000 : PSHED!PshedBugCheckSystem+0x10
fffff804`25e43980 fffff804`1fb0158a : ffffde0c`f38ae680 ffffde0c`f38ae680 ffffde0c`f1346d60 fffff804`1f9813de : nt!WheaReportHwError+0x38f
fffff804`25e43a50 fffff804`1fb019e0 : 00000000`00000000 ffffde0c`00000000 00000000`00000000 00000000`00000000 : nt!HalpMcaReportError+0xb2
fffff804`25e43bc0 fffff804`1fb01874 : ffffde0c`f1343810 00000000`00000001 00000000`00000000 00000000`00000000 : nt!HalpMceHandlerCore+0x138
fffff804`25e43c20 fffff804`1fb01b19 : 00000000`00000008 00000000`00000001 00000000`00000000 00000000`00000000 : nt!HalpMceHandler+0xe0
fffff804`25e43c60 fffff804`1fb00cd2 : 00000000`00000000 00000000`00000000 fffff804`25e43ef0 00000000`00000000 : nt!HalpMceHandlerWithRendezvous+0xc9
fffff804`25e43c90 fffff804`1fb0348b : ffffde0c`f1343810 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalpHandleMachineCheck+0x62
fffff804`25e43cc0 fffff804`1fb69da9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalHandleMcheck+0x3b
fffff804`25e43cf0 fffff804`1fa2843e : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiHandleMcheck+0x9
fffff804`25e43d20 fffff804`1fa28053 : 00000000`00000000 00000000`00000000 000001dd`e4dc4de0 fffff804`25e43ef0 : nt!KxMcheckAbort+0x7e
fffff804`25e43e60 00007ffb`28ff5324 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x2d3
0000000e`6af3ea70 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffb`28ff5324


MODULE_NAME: GenuineIntel

IMAGE_NAME: GenuineIntel.sys

STACK_COMMAND: .cxr; .ecxr ; kb

FAILURE_BUCKET_ID: 0x124_0_GenuineIntel_PROCESSOR__UNKNOWN_IMAGE_GenuineIntel.sys

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {5371cb52-c3d9-558e-47d4-d31c09567ca2}

Followup: MachineOwner

- - - - - - - - - - - - - - - - - - - - - -


So it looks that it might be memory issues, is that correct?
 
The Windows Memory Diagnostic Tool is far from being a thorough test of your RAM, and I think RAM is the most likely cause here. The dumps are not that much help, they often aren't for 0x124 bugchecks when the WHEA error record is not contained in the dump (as is the case here). Your System log however does show a WHEA error for each of the BSODs and they are all similar...
Code:
Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          19/07/2024 21:18:05
Event ID:      18
Task Category: None
Level:         Error
Keywords:  
User:          LOCAL SERVICE
Computer:      DESKTOP-J49E3IU
Description:
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Translation Lookaside Buffer Error
Processor APIC ID: 1

The details view of this entry contains further information.
This is telling you that the processor (logical processor #1) detected a machine check exception when reading the TLB. This is an address translation buffer that is kept in RAM, any errors in this are so catastrophic that a WHEA BSOD always results. In addition, in your Application log, there are more application error messages form memory related exceptions than is normal, so everything seems to be pointing at RAM.

You should test your one RAM stick with Memtest86, a proper memory tester....
  1. Download Memtest86 (free), use the imageUSB.exe tool extracted from the download to make a bootable USB drive containing Memtest86 (1GB is plenty big enough). Do this on a different PC if you can, because you can't fully trust yours at the moment.
  2. Then boot that USB drive on your PC, Memtest86 will start running as soon as it boots.
  3. If no errors have been found after the four iterations of the 13 different tests that the free version does, then restart Memtest86 and do another four iterations. Even a single bit error is a failure.
Let us know how that goes.
 
normally, my first fix attempt is to update the BIOS and delete the pagefile.sys.
I would tell windows to delete the pagefile.sys on system shutdown.
ie google "tell windows to delete the pagefile.sys on reboot"

deleting the pagefile.sys is a work around that will delay the bugcheck until you figure out the real cause.
(bad drivers, sata firmware, bad cpu microcode settings in bios, malware,...)
 
There is another issue related with memory.
I've just wanted to play FIFA 98 😀 I like to play for a while occassionally.
No issues with that, until now everything worked fine.... but today....

View: https://imgur.com/wrjDV1f
the error message could be caused by corruption of the virtual memory on the machine. you should delete the pagefile.sys and try to find the cause of the corruption. delete the pagefile.sys since the corruption will be saved (paged out) from the RAM to the disk then reloaded as corrupted pages later. IE, you could have a bad RAM spot in RAM, or in the cpu cache, or a stupid driver or hacking driver corrupting the table. deleting the pagefile can delay the problem, you still have to find the source of the problem.
ie update bios, update the chipset driver, remove any overclock drivers, check for heat problems, check for malware, update any SSD firmware. Generally, you get slightly different error messages and kernel dumps with various causes.

for bugcheck 0x124 i think parameter 1 = zero would mean that the cpu called the bugcheck, most often from a error in the cache RAM inside the cpu. (most often caused by a bios setting mistake setting a voltage that is used to set a frequency in the circuits that move data from one level to another level in the cache RAM inside the CPU.

with the newer computer of last 15 years you can get bugcheck 0x124 with various parameter one values. the value should indicate the subsystem that called the bugcheck. (it can be anything connected to the PCI/e bus. video card, cpu, USB cards, SATA devices,...)

https://learn.microsoft.com/en-us/w...er/bug-check-0x124---whea-uncorrectable-error
 
for bugcheck 0x124 i think parameter 1 = zero would mean that the cpu called the bugcheck, most often from a error in the cache RAM inside the cpu. (most often caused by a bios setting mistake setting a voltage that is used to set a frequency in the circuits that move data from one level to another level in the cache RAM inside the CPU.

with the newer computer of last 15 years you can get bugcheck 0x124 with various parameter one values. the value should indicate the subsystem that called the bugcheck. (it can be anything connected to the PCI/e bus. video card, cpu, USB cards, SATA devices,...)

https://learn.microsoft.com/en-us/w...er/bug-check-0x124---whea-uncorrectable-error
The 0x124 bugcheck can take several values for argument 1, but the most common are 0 and 1, although we do see other values. Both 0 and 1 indicate a machine check exception, these are always detected by the CPU but that doesn't mean that the CPU is at fault, a machine check can happen for many reasons. The difference between 0 and 1 is that 0 indicates a fatal machine check and 1 is a corrected machine check (these dumps are see in the live kernel reports folder). The link provided above shows the other values for argument 1, and the other arguments as well.

The 0x124 BSOD with argument 1 set to 0 or 1 can be really tricky to debug because the WHEA_ERROR_RECORD (address in argument 2) is generally not included in a minidump and so cannot be examined. They are generally hardware caused BSODs, although it is possible for a really flaky driver to cause a 0x124 BSOD.

That's why I like to use the Sysnative file collection tool on 0x124 bugchecks in particular, because the additional troubleshooting data usually helps localise the problem.
 
The 0x124 bugcheck can take several values for argument 1, but the most common are 0 and 1, although we do see other values. Both 0 and 1 indicate a machine check exception, these are always detected by the CPU but that doesn't mean that the CPU is at fault, a machine check can happen for many reasons. The difference between 0 and 1 is that 0 indicates a fatal machine check and 1 is a corrected machine check (these dumps are see in the live kernel reports folder). The link provided above shows the other values for argument 1, and the other arguments as well.

The 0x124 BSOD with argument 1 set to 0 or 1 can be really tricky to debug because the WHEA_ERROR_RECORD (address in argument 2) is generally not included in a minidump and so cannot be examined. They are generally hardware caused BSODs, although it is possible for a really flaky driver to cause a 0x124 BSOD.

That's why I like to use the Sysnative file collection tool on 0x124 bugchecks in particular, because the additional troubleshooting data usually helps localise the problem.
for a for a CPU translation lookaside buffer error I guess i would look at the system up timer in the mindump f just to make sure the motherboard did not reboot and restart without waiting for the power ok signal to be properly set.
The system uptime will be a very short value in the minidump. Indicating the motherboard protection circuit reset the cpu but the power supply did not hold the power ok signal down and the CPU restarted before the power was again stable. (results in a bughceck 0x124 from the cpu cache controller)
 
normally, my first fix attempt is to update the BIOS and delete the pagefile.sys.
I would tell windows to delete the pagefile.sys on system shutdown.
ie google "tell windows to delete the pagefile.sys on reboot"

deleting the pagefile.sys is a work around that will delay the bugcheck until you figure out the real cause.
(bad drivers, sata firmware, bad cpu microcode settings in bios, malware,...)
Hi, I had tried with updating BIOS, Chipsets, keep drivers updated and tried RAM diagnostics.
If deleting pagefiles.sys doesn't fix anything and just delays crash, I'll leave it for now.
 
Hi, I had tried with updating BIOS, Chipsets, keep drivers updated and tried RAM diagnostics.
If deleting pagefiles.sys doesn't fix anything and just delays crash, I'll leave it for now.
failure to delete the pagefile.sys will obscure your testing to find the failure. IE you could fix the failure by updating a driver but the pagefile will still have the data corruption saved. The system will not bugcheck until the corruption is loaded and executed. Not deleting the pagefile just prolongs the hunt for the failure and leads to incorrect conclusions as to the actual fix.
 
Deleting the pagefile temporarily is a good idea, this could be a system drive issue possibly.

BTW thanks for the activation window. I asked because your System log contained several activation errors, so software licensing is struggling with something. Do you have MsOffice installed and is it licensed?
 
failure to delete the pagefile.sys will obscure your testing to find the failure. IE you could fix the failure by updating a driver but the pagefile will still have the data corruption saved. The system will not bugcheck until the corruption is loaded and executed. Not deleting the pagefile just prolongs the hunt for the failure and leads to incorrect conclusions as to the actual fix.
Now I see why that's important. Thanks for explaining this.
OK, I unticked the 'automatically manage.....' and set 'No paging file'
View: https://imgur.com/a/qkEUlUD


And then changed - ClearPageFileAtShutdown
View: https://imgur.com/YrHFEo0

I restarted PC.
 
Deleting the pagefile temporarily is a good idea, this could be a system drive issue possibly.

BTW thanks for the activation window. I asked because your System log contained several activation errors, so software licensing is struggling with something. Do you have MsOffice installed and is it licensed?
Well, I use very old key for MS Office 2007
 
Hmmm, so if I understand this, it can be cause of whole BSOD problem?
I've been using it for so many years and such problems never happened.
If this is the cause, why it started to occur now? After Windows update maybe?

edited:
I uninstalled Office, restarted Windows and after 1 hour had BSOD again...

I noticed new thing 3 days ago. When I have many tabs opened (Mozilla) some tabs have tendency to crash (new tab, old one when reloading, or just when opening new website). It never happened again.

I've just thought that maybe it's worth to try reinstalling Windows with erasing all the files? I've already reinstalled Windows some time ago as a one possible fix solution however I kept all the files...

edited:
I reinstalled Windows again, this time I erased everything, updated Windows, Chipset and other drivers. After 1 day I've got BSOD again. Should I re-do some diagnostics?
 
Last edited:
I've just noticed another strange thing.... I can't turn on my laptop if the charger is not plugged in...
On the other hand after turning laptop on I can unplug the charger and work on battery....