Question Getting regular "Event 47, WHEA-Logger" and "Event 46 , WHEA-Logger" errors which causes system reboot ?

sj_woolf

Prominent
Mar 12, 2022
4
0
510
I've been recently experiencing system reboots that I'm convinced are connected with recurring Event 47, WHEA-Logger errors. The details of the error state the following:

A corrected hardware error has occurred.
Component: Memory
Error Source: Corrected Machine Check

When my computer does crash, the Event 46, WHEA-Logger details are as follows:

A fatal hardware error has occurred.
Component: Memory
Error Source: Machine Check Exception

The details page shows the following:

Event 46 Details:
- <Event xmlns=" ">
- <System>
<Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
<EventID>46</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2023-07-23T05:11:46.1302526Z" />
<EventRecordID>1876</EventRecordID>
<Correlation ActivityID="{c3a98e0e-f8ba-4c15-9b36-93961bdc4320}" />
<Execution ProcessID="3744" ThreadID="3972" />
<Channel>System</Channel>
<Computer>Shauns_PC</Computer>
<Security UserID="S-1-5-19" />
</System>
- <EventData>
<Data Name="ErrorSource">3</Data>
<Data Name="FRUId">{00000000-0000-0000-0000-000000000000}</Data>
<Data Name="FRUText" />
<Data Name="ValidBits">0x2</Data>
<Data Name="ErrorStatus">0x0</Data>
<Data Name="PhysicalAddress">0x1000003118ebe18</Data>
<Data Name="PhysicalAddressMask">0x0</Data>
<Data Name="Node">0x0</Data>
<Data Name="Card">0x0</Data>
<Data Name="Module">0x0</Data>
<Data Name="Bank">0x0</Data>
<Data Name="Device">0x0</Data>
<Data Name="Row">0x0</Data>
<Data Name="Column">0x0</Data>
<Data Name="BitPosition">0x0</Data>
<Data Name="RequesterId">0x0</Data>
<Data Name="ResponderId">0x0</Data>
<Data Name="TargetId">0x0</Data>
<Data Name="ErrorType">0</Data>
<Data Name="Extended">0</Data>
<Data Name="RankNumber">0</Data>
<Data Name="CardHandle">0</Data>
<Data Name="ModuleHandle">0</Data>
<Data Name="Length">1019</Data>
<Data Name="RawDataata>
</EventData>
</Event>


This issue plagued me for a long time about a year ago and I replaced the memory after MEM TESTS came up inconclusive. This did not correct the issue. I eventually replaced the power supply because I eventually concluded that the issue was that the power supply was failing. After replacing the PS, the issue mostly went away. While I still received the Event 47 error, it was not nearly as often and I did not experience any system reboots because of it.

That is until recently. My system has unexpectedly rebooted 5 times in the last couple of weeks and the occurrence of Event 47 has gone up drastically. At this point, in addition to clean OS installs, I have replaced everything related to the memory except for the CPU.

Aside from replacing the CPU (which I'd really rather not do), I am unsure how to proceed and I'm hoping someone out there has anything else to try. Please see my computer specifications and a copy of the error log below.

Screenshot of recent Event Viewer events:
Image


Event 47 Details:

- <Event xmlns=" ">
- <System>
<Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
<EventID>47</EventID>
<Version>0</Version>
<Level>3</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2023-07-23T03:12:46.9287349Z" />
<EventRecordID>1011</EventRecordID>
<Correlation ActivityID="{b9559660-87b1-47ea-acf4-d98504998072}" />
<Execution ProcessID="3612" ThreadID="16280" />
<Channel>System</Channel>
<Computer>Shauns_PC</Computer>
<Security UserID="S-1-5-19" />
</System>
- <EventData>
<Data Name="ErrorSource">1</Data>
<Data Name="FRUId">{00000000-0000-0000-0000-000000000000}</Data>
<Data Name="FRUText" />
<Data Name="ValidBits">0x2</Data>
<Data Name="ErrorStatus">0x0</Data>
<Data Name="PhysicalAddress">0x100000537ac303c</Data>
<Data Name="PhysicalAddressMask">0x0</Data>
<Data Name="Node">0x0</Data>
<Data Name="Card">0x0</Data>
<Data Name="Module">0x0</Data>
<Data Name="Bank">0x0</Data>
<Data Name="Device">0x0</Data>
<Data Name="Row">0x0</Data>
<Data Name="Column">0x0</Data>
<Data Name="BitPosition">0x0</Data>
<Data Name="RequesterId">0x0</Data>
<Data Name="ResponderId">0x0</Data>
<Data Name="TargetId">0x0</Data>
<Data Name="ErrorType">0</Data>
<Data Name="Extended">0</Data>
<Data Name="RankNumber">0</Data>
<Data Name="CardHandle">0</Data>
<Data Name="ModuleHandle">0</Data>
<Data Name="Length">1019</Data>
<Data Name="RawDataata>
</EventData>
</Event>


Computer Specs:
CPU: AMD Ryzen 5 3600
MB: ASUS TUF Gaming B550-PLUS AMD AM4 Zen 3 Ryzen 5000 & 3rd Gen Ryzen ATX Gaming Motherboard
RAM: 32GB Corsair DDR4-3200 (2 16 GB sticks)
Graphics: NVIDIA GeForce RTX 3060
PS: Corsair RM750 750 Watt
 

ubuysa

Distinguished
I've been recently experiencing system reboots that I'm convinced are connected with recurring Event 47, WHEA-Logger errors. The details of the error state the following:

A corrected hardware error has occurred.
Component: Memory
Error Source: Corrected Machine Check
Well that's a big clue! I would download Memtest86 (free), use the imageUSB.exe tool extracted from the download to make a bootable USB drive containing Memtest86 (1GB is plenty big enough), and then boot that USB drive. Memtest86 will start running as soon as it boots. If no errors have been found after the four iterations that the free version does, then restart Memtest86 and do another four iterations.
 

sj_woolf

Prominent
Mar 12, 2022
4
0
510
Well that's a big clue! I would download Memtest86 (free), use the imageUSB.exe tool extracted from the download to make a bootable USB drive containing Memtest86 (1GB is plenty big enough), and then boot that USB drive. Memtest86 will start running as soon as it boots. If no errors have been found after the four iterations that the free version does, then restart Memtest86 and do another four iterations.
Thanks for your reply. I will run memtest again just to be thorough, but I ran it a while ago when this issue first appeared with my initial pair of memory sticks and they passed. While they did pass, I figured them to be the likely culprit and cheapest to replace. They new sticks have the same issue. I have not run more than 1 pass of memtest on them, but seeing as how the issue affect two completely different sets of sticks (different manufacture as well), I concluded that the issue was not the sticks themselves. Though I could be wrong, so I will run the full test just to be sure.
 
Is the RAM you're using compatible with the motherboard? Check the QVL of the motherboard to be sure.
QVL doesn't mean the RAM isn't compatible, it's just the RAM they tested and verified.

I haven't used RAM in any QVL on my motherboards for as long as I can remember, and only one had a issue that was fixed simply by doing the settings manually.
 

sj_woolf

Prominent
Mar 12, 2022
4
0
510
Is the RAM you're using compatible with the motherboard? Check the QVL of the motherboard to be sure.

QVL doesn't mean the RAM isn't compatible, it's just the RAM they tested and verified.

I haven't used RAM in any QVL on my motherboards for as long as I can remember, and only one had a issue that was fixed simply by doing the settings manually.

@ubuysa Thanks for helping me troubleshoot. Yeah, that thought crossed my mind, but as @hotaru.hino said, that list just means those sticks have been tested. I actually could not find ANY popular ram being sold (and marketed as compatible) by newegg that appeared on the QVL list. That combined with the fact that I had the same issue with the previous pair of sticks, chances were low that the issue was with the RAM itself.

Ultimately, after troubleshooting several things, I ended up replacing my CPU which (so far) appears to have resolved the issue. Prior to replacement, I was getting the WHEA-Logger errors as soon as the system was booted up. After several hours with the new CPU, which included heavy gaming, there have been 0 recorded WHEA-Logger errors.

Seems like the old CPU was faulty in some way (my guess is low voltage) which led to the error. Hopefully that is the last I see of this issue.
 

ubuysa

Distinguished
I know perefect well what the QVL means, but you were complaining of WHEA errors that were memory related. When troubleshooting you start with the obvious and work down.
 
Ultimately, after troubleshooting several things, I ended up replacing my CPU which (so far) appears to have resolved the issue. Prior to replacement, I was getting the WHEA-Logger errors as soon as the system was booted up. After several hours with the new CPU, which included heavy gaming, there have been 0 recorded WHEA-Logger errors.

Seems like the old CPU was faulty in some way (my guess is low voltage) which led to the error. Hopefully that is the last I see of this issue.
Assuming you didn't muck with the voltage settings, Ryzen CPUs tend to be very generous with their voltages. People have been complaining about them peaking to nearly 1.5V for years. Though for all-core boosting it usually hovers around 1.3-1.35V

If anything, it was probably getting old enough to the point where the higher end of the boost speed wasn't working anymore. From what little I know about degrading CPUs, that's how they usually go when they're approaching the end of their "usable" life. If it gets to that point, the only way to remedy this is to either dial down the clock speed or bump up the voltages. And the latter is likely not viable since Zen 2 doesn't have a V-F curve feature so any bump of the voltage may kill the processor if you hit it with a low-core workload.

Though if you do have the old processor, it may be worth while check out if setting a static clock speed (maybe at the base speed) makes the errors go away.

I know perefect well what the QVL means, but you were complaining of WHEA errors that were memory related. When troubleshooting you start with the obvious and work down.
If you knew what a QVL was and the constraints surrounding it, then why suggest it in the first place?

Also I've seen a lot of issues where people thought something was "obvious" but it turned out that the solution wouldn't have actually worked. Like say a leaky driver consuming a lot of RAM and people saying "go buy more RAM" because they believe the person simply doesn't have enough in their system.
 

ubuysa

Distinguished
Don't try and put words in my mouth.

If you check back you'll see I phrased it as a question. When you're getting WHEA errors with RAM indicated it's a perfectly reasonable question.