Question WHEA UNCORRECTABLE ERROR (AuthenticAMD)

Diablosrouge

Prominent
Mar 12, 2021
9
0
510
Hello,

How have you been? :)

I'm currently having a problem with my computer and I have nowhere else to go now, as I've ran out of solutions.

I've been getting this BSOD for some months now, since August last year or something. I tried to updated all the drivers, the BIOS and everything, deactivated the XMP memory overclocking profile from 3600mhz to 2133mhz (even tho the memory is supposed to run at 3600mhz), did a DDU refresh. I basically did all I could think of in terms of software.
I also tested the memory sticks one by one, and their slots, tested both SSDs, no errors whatsoever. However, the BSOD persisted.
I decided to format Windows entirely in January and things got running smooth again for about 1 month, until it started giving BSOD once again. I'm an IT graduate, so im not entirely a newbie in the field, but im focusing my studies more on visual art and animation, so I'm not the ultimate expert either.
Coincidently, the BSOD reappeared by the time I installed once again the Norton 360 package (paid license), but i cannot guarantee this is caused by Norton, as i've had it in the past on this same computer for months and it didnt give me a single problem.

I've always had problems with my GPU drivers, because being a fresh AMD product, their drivers came with problems, and along came BSODs such as TDR, etc. But as drivers got updated, these errors disappeared and only one came up, a new one, the WHEA Uncorrectable Error. This BSOD is totally random, it doesn't matter whether I'm coding on Visual Studio, or browsing the web on Chrome, and even when the computer goes idle and the monitor switches off, the BSOD can happen. It doesn't matter if the computer is on load or on idle, so it's not temperatures. Most of the times, when this BSOD occurs and the computer restarts, when I get to Windows logon, the CPU fan stops, and I can only get it spinning again by shutting down and starting Windows again. It's a Noctua NF-A12x25 fan with a Big Scythe Shuriken 3 cooler.

One thing I also noticed was that once the BSOD happens, it restarts the computer immediately. So i'm not sure if the minidump ever gets fully created because I don't see the 0% get to the 100%. Not sure if its due to my M2 SSD being fast or not. Also, the BSOD is becoming more and more frequent each time it happens.

MB: AORUS X570 Pro Wifi Mini ITX
BIOS: F6b
CPU: AMD Ryzen 7 3700X
RAM: 2x16GB G.SKILL Trident Z DDR4 3600Mhz CL17 (F4-3600C17D-32GTZKW)
GPU: Sapphire Pulse RX 5700XT 8GB
Current GPU Drivers: 20.12.1
PSU: Seasonic Focus 550W Gold


I ran the only .dmp file on WinDbg and managed to get this:


Loading Dump File [D:\Users\jppbs\Desktop\030921-9203-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Mini Kernel Dump does not have process information
Symbol search path is: srv*
Executable search path is:
Unable to load image Unknown_Module_0000000000000000, Win32 error 0n2 *** WARNING: Unable to verify timestamp for Unknown_Module_0000000000000000
*** ERROR: Module load completed but symbols could not be loaded for Unknown_Module_0000000000000000 Unable to add module at 0000000000000000
WARNING: .reload failed, module list may be incomplete
Debugger can not determine kernel base address
Windows 10 Kernel Version 19041 MP (16 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Machine Name:
Kernel base = 0xfffff80271a00000 PsLoadedModuleList = 0xfffff8027262a510
Debug session time: Tue Mar 9 10:17:17.379 2021 (UTC + 0:00)
System Uptime: 0 days 0:50:25.996
Unable to load image Unknown_Module_0000000000000000, Win32 error 0n2 *** WARNING: Unable to verify timestamp for Unknown_Module_0000000000000000
Unable to add module at 0000000000000000 WARNING: .reload failed, module list may be incomplete Debugger can not determine kernel base address Loading Kernel Symbols .Unable to load image Unknown_Module_0000000000000000, Win32 error 0n2
*** WARNING: Unable to verify timestamp for Unknown_Module_0000000000000000 Unable to add module at 0000000000000000

Loading User Symbols
Missing image name, possible paged-out or corrupt data.
Loading unloaded module list
.Missing image name, possible paged-out or corrupt data.
.Missing image name, possible paged-out or corrupt data.
.
For analysis of this file, run !analyze -v
0: kd> !analyze -v
***
  • *
  • Bugcheck Analysis *
  • *
***

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000010, Error Source Type
Arg2: ffffda8bdf70b028
Arg3: ffffda8bc59f292c
Arg4: ffffda8bc871a1a0

Debugging Details:
------------------

* Debugger could not find nt in module list, module list might be corrupt, error 0x80070057.


KEY_VALUES_STRING: 1

Key : Analysis.CPU.Sec
Value: 0

Key : Analysis.DebugAnalysisProvider.CPP
Value: Create: 8007007e on S2K

Key : Analysis.DebugData
Value: CreateObject

Key : Analysis.DebugModel
Value: CreateObject

Key : Analysis.Elapsed.Sec
Value: 0

Key : Analysis.Memory.CommitPeak.Mb
Value: 42

Key : Analysis.System
Value: CreateObject


BUGCHECK_CODE: 124

BUGCHECK_P1: 10

BUGCHECK_P2: ffffda8bdf70b028

BUGCHECK_P3: ffffda8bc59f292c

BUGCHECK_P4: ffffda8bc871a1a0

CUSTOMER_CRASH_COUNT: 1

MODULE_NAME: Unknown_Module

IMAGE_NAME: Unknown_Image

STACK_COMMAND: .thread ; .cxr ; kb

FAILURE_BUCKET_ID: CORRUPT_MODULELIST_0x124_AuthenticAMD

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {12a698bc-58f9-85fa-efc6-5c42d213b271}

Followup: MachineOwner
---------

It's seems to be something AMD related, but I'm not sure what, as the minidump doesn't seem to be so precise. The only AMD stuff I have is the Ryzen 7 3700X CPU and the Sapphire 5700XT.
Any chance to know whether this is an hardware failing (in which case im still in time to activate the warranty) or a software/missing update? I have the F6b BIOS version. I noticed there are a few more updates now, but it's mainly AMD 5000 series now, should i update?

PS: As i was writting this post, my system crashed again, 30 minutes into usage.
After restarting, I had this notification in the AMD Radeon Software: View: https://imgur.com/a/EFxQDSE


There's no new minidump for this crash, so I don't how we'll trace this back. The BSOD happens, and it won't let the percentage complete, it restarts the computer on 0% almost instantly.

Event Viewer events from today:

There's plenty of errors of the same kind: SppExtComObj.exe, but it only comes up on windows start. BSODs happen much later.
There's no real traces of this BSOD right before it happens. It simply happens. Then I get the error "Dump file creation failed due to error during dump creation." There's also a HAL information notice "The iommu fault reporting has been initialized."
Eitherway, I uploaded an events file on both txt and evtx format with the meta data folder, which you can download and view over here: https://drive.google.com/drive/folders/1bAYjEODW133UnJXxgBbFuPbcoNhypIi1?usp=sharing

Hope someone of you can help me figure out what's going on with this. Specially @Colif or @gardenman :) as I've seen you guys responding to these WHEA errors quite often and often you seem to get to the source.
I have no worries about replacing hardware since it's on warranty, but I want to be sure it's the hardware first, because I need the computer for work and it will take a couple of weeks to get it back together.


Best Regards,
Diablosrouge
 
Last edited:

Colif

Win 11 Master
Moderator
Can you follow option one on the following link - here - and then do this step below: Small memory dumps - Have Windows Create a Small Memory Dump (Minidump) on BSOD - that creates a file in c windows/minidump after the next BSOD

Open Windows File Explorer
Navigate to C:\Windows\Minidump
Copy the mini-dump files out onto your Desktop
Do not use Winzip, use the built in facility in Windows
Select those files on your Desktop, right click them and choose 'Send to' - Compressed (zipped) folder
Upload the zip file to the Cloud (OneDrive, DropBox . . . etc.)
Then post a link here to the zip file, so we can take a look for you . . .

Gardenmans debugger may know what the unknow module is. WHEA errors hardly ever tell me about the cause but we shall see.

it would be nice to blame Norton for whea errors but it would also be a first.
WHEA errors are either hardware or hardware drivers, Norton doesn't really fall into that category. Only thing I can think of is it could cause LAN drivers to crash.
can be caused by overclocking
can be caused by overclocking software so things like Ryzen Master, MSI Afterburner

Most of the times, when this BSOD occurs and the computer restarts, when I get to Windows logon, the CPU fan stops, and I can only get it spinning again by shutting down and starting Windows again. It's a Noctua NF-A12x25 fan with a Big Scythe Shuriken 3 cooler.
that is odd. can't point as Seasonic PSU and say its the cause... as well, its a Seasonic. Sure, one day I might see one that is cause.

got latest bios? https://www.gigabyte.com/au/Motherboard/X570-I-AORUS-PRO-WIFI-rev-10/support#support-dl-bios
got chipset drivers - https://www.amd.com/en/support/chipsets/amd-socket-am4/x570
try updating lan/WIFI - https://www.intel.com.au/content/www/au/en/support/intel-driver-support-assistant.html
I guessed an your WIFI would be same as mine (I have an X570 Aorus Elite Wi-Fi)
 

Colif

Win 11 Master
Moderator
MB: AORUS X570 Pro Wifi Mini ITX
BIOS: F6b
CPU: AMD Ryzen 7 3700X
RAM: 2x16GB G.SKILL Trident Z DDR4 3600Mhz CL17 (F4-3600C17D-32GTZKW)
GPU: Sapphire Pulse RX 5700XT 8GB
Current GPU Drivers: 20.12.1
PSU: Seasonic Focus 550W Gold
what SSD/NVME do you have? no dump files could mean its the C drive that is cause, as it can't record a dump if the device is where it records them.

https://www.amd.com/en/technologies/radeon-wattman (this is more for me)

until I see dumps I don't know what drivers are installed.
 

Diablosrouge

Prominent
Mar 12, 2021
9
0
510
Hello @Colif !
Thank you for the fast reply.

I just did option one and switched to small memory dump. :) Let's see what happens after the next BSOD.

I don't have the latest bios, i have version F6b, which surprisingly is no longer present on that bios list. It jumps from F5 to F10. I can try updating it to the latest.
The chipset drivers are up to date, all of them. I checked one by one in the Device Manager under System Devices.
About the LAN/WiFi, when I click that link Intel says my drivers are also up to date. ( I had to install their DSA).

Only the BIOS is out of date it seems, so I guess i'll go tackle that now and see what happens.

Update:
Should I get rid of WattMan? It was automatically installed I think, or it's automatically incorporated into AMD Radeon Software. I don't really need that.

These are my drives: View: https://imgur.com/a/UCyCc0K

C is the NVME one, D is the 2.5"
 

Diablosrouge

Prominent
Mar 12, 2021
9
0
510
I did an extended self-test on the C: drive, the P1 500GB and it passed. Doing a test on D: now, which will take a while.
About the chipset drivers, when i checked them one by one, i checked the driver version and it matched the ones on the AMD Chipset installation package. The only thing I installed was AMD Power Plan (now).

Should I proceed with the NVME firmware update and then BIOS?
 

Diablosrouge

Prominent
Mar 12, 2021
9
0
510
I haven't touched the page file. It's default. System hasn't crashed yet again, and the extended self test on D: drive is at 60% at the moment.
One thing I'm noticing is that the progress bar on the scan goes back and forth from what % it is, to 10%, then back to the actual %. Is this normal? It keeps scanning and incrementing the actual %, but it keeps going back to 10% before jumping back to the correct values. Now it's at 70%.
 

Colif

Win 11 Master
Moderator
its bad enough its a whea error, it also not even supplying dumps just adds another layer of fun...
can you run this, it creates a zip file, collects info about system. If you can upload zip to a file sharing website, it might show me a clue - https://www.sysnative.com/forums/pages/bsodcollectionapp/
might not as if windows isn't reporting errors fully, it might not know.

I have to go soon as I am tired today.
 

Diablosrouge

Prominent
Mar 12, 2021
9
0
510
its bad enough its a whea error, it also not even supplying dumps just adds another layer of fun...
can you run this, it creates a zip file, collects info about system. If you can upload zip to a file sharing website, it might show me a clue - https://www.sysnative.com/forums/pages/bsodcollectionapp/
might not as if windows isn't reporting errors fully, it might not know.

I have to go soon as I am tired today.

A BSOD just happened again, with once again no minidump, even with the small memory dump option.
I was running bsodcollectionapp and it stopped during Gathering Network Statistics, then it crashed to BSOD. The D: drive self test was at 80% as well.

View: https://imgur.com/a/k48eQZp
 

Colif

Win 11 Master
Moderator
hmmm, your system doesn't want to tell us anything

which makes it hard to guess where to start.

* Debugger could not find nt in module list, module list might be corrupt, error 0x80070057.
this didn't help as its the part of the dump that shows what was happening at time.

Lets see how good device manager is
Can you download and run Driverview - http://www.nirsoft.net/utils/driverview.html

All it does is looks at drivers installed; it won't install any (this is intentional as 3rd party driver updaters often get it wrong)

When you run it, go into view tab and set it to hide all Microsoft drivers, will make list shorter.

You can look through the drivers and try to find old drivers, All I do is look at driver versions (or dates if you lucky to have any) to see what might have newer versions.

You probably pretty good, it might reveal a few that won't show in device manager. I know rgb doesn't show there.

Also, the BSOD is becoming more and more frequent each time it happens.
It sounds like its a hardware fault then and whatever hardware is approaching break time.

Should check it out then
CPU
Prime 95 - https://www.mersenne.org/download/
Prime 95 How To Guide: http://www.playtool.com/pages/prime95/prime95.html

Ram - Try running memtest86 on each of your ram sticks, one stick at a time, up to 4 passes. Only error count you want is 0, any higher could be cause of the BSOD. Remove/replace ram sticks with errors. Memtest is created as a bootable USB so that you don’t need windows to run it

GPU - mainly benchmarks to see if it crashes during them (I don't need actual results)
https://geeks3d.com/furmark/

https://benchmark.unigine.com/heaven

Storage - you checking that or have already
PSU - I need evidence its a Seasonic to blame, and it doesn't feel like a power problem
MB - No real tests for MB, use process of elimination, if everything else works, its time to get a 3rd opnion and take PC for repairs. I don't want to guess its MB just to find its not.

Anything else in PC I missed?

I go to sleep soon enough.
I look at sysnative report in morning.,
 

Diablosrouge

Prominent
Mar 12, 2021
9
0
510
Hello @Colif :)

After updating the BIOS to its most recent version, the computer stayed stable for the weekend, except for an AMD error window which came up on the screen once, about a crash report. The crash report came with the AMD Radeon software window, i submitted the report with my problem and then i couldn't open the AMD Adrenaline software until rebooting. After that, everything was fine. This leads me to think these BSODs are related to the GPU, even tho the GPU stress-tests are solid.
However, today, the BSODs came back with the same error code, and no dumps. I remembered that my PC case, being a mini-ITX (it's a Silverstone FTZ-01), has a PCI-E riser in order for me to be able to plug the GPU into the motherboard. My motherboard, being PCI-E 4.0 ready, automatically sets these slots to run at PCI-E 4.0, however the case being older than that, the riser is just PCI-E 3.0 capable, and this could very well be the source of the BSODs.
I had changed the PCIX16 slot to run at Gen3 in the BIOS, but for some reason it came back to Auto, so I went back to the BIOS to check it out and it was indeed on Auto and I changed it back to Gen3. I also reactivated the XMP Memory profile OC back to the 3600mhz, as the BSOD was not related to that.
I also deactivated Hardware Acceleration within Discord and Chrome, as it's known to be the source of problems from the AMD GPU drivers.

I'm hoping now that I can find some stability, as I've put everything running at Gen3. That being said, do you have any idea where I can get a PCI-E 4.0 riser? My riser is not a flexible cable, it's more like a piece of hardware, cartridge-like, that fits into the GPU and plugs into the MB. I did some research and I can't find a similar one for Gen4, neither did Silverstone yet release an update. All I can find is some massive 4.0 extension cables, pretty expensive and way too big for what i need.

This is what I have: http://images.bit-tech.net/content_...one-raven-rvz01-review/rvz01-10-1280x1024.jpg

Update: Well not even the BIOS PCI-E Gen3 switch made a difference. BSOD happened again. I'm gonna have to call it a day and just put the computer at the store to recheck for hardware problems and let warranty do the rest.
 
Last edited:

Colif

Win 11 Master
Moderator
My last 2 previous cases were both Silverstone. TJ09 & an Ft02 - I still have the FT02, I don't want to let go. Its based on the same frame as the RV02 which is still best air cooling case on Gamers Nexus forums 10 years after release. They make good cases, I would have got another if it was a normal year.

could still be the gpu riser. I guess the shop will hopefully figure it out for you.

re your question about PCIE 4 risers, I would start another thread and ask about that as I really don't know, but someone else may
 
Last edited: