[SOLVED] Over the last month or so, I've been getting random BSODs ?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Jan 15, 2022
18
3
15
Hello

I have the following system:

AMD Ryzen 5 4650G
Asus B550m-Plus
32 GB (4 x 8 GB) RAM Crucial Ballistix 3200MT/s (XMP)
512 Gb M.2 NVME Adata XPG
550W Super Flower 80 Plus Gold power supply

And in the last month or so, I've been having random BSODs that I can't find the reason of.

At first I've thought it might be something memory-related. I built this rig with 2x 8 GB RAM, and then bought another 2 sticks a while later, but I've tried running the system with only the old sticks and still got the BSODs. Running it with only the new sticks also did the same, and I've tried running memtest86 and it didn't detect any problem (4 passes only, that is what I'm allowed to do in the free version). Running the memory with XMP profile disabled also presented the same system crashes.

I've also thought it was something heat-related. It's summer here, and it's impossibly hot. I've also bought a second display, so I've thought that maybe the extra strain on the chipset was pushing it above the limit. But I've bought a fan kit to introduce some airflow, and in my testing it shows both CPU and chipset are running at a relatively cold 40ºC.

After having the presence of spirit to disable the "Restart automatically" checkbox, I was able to see the error messages.

Last night, the BSOD was a KMODE_EXCEPTION_ERROR_NOT_HANDLED, which, by my quick research, is a catch-all term for when the system has no idea what went wrong, so it wasn't very helpful.

Today, just a while before, I got a KERNEL_AUTO_BOOST_INVALID_LOCK_RELEASE. While looking for this error, I've found this old thread: https://forums.tomshardware.com/threads/bsod-kernel-auto-boost-invalid-lock-release-help.3119950/
Unfortunately, the person with the problem stoped answering, and the question remains unsolved. But I've seen someone mentioning that it might be a CPU error, and that worried me a bit.

Detecting hardware errors at this level is a bit over my current capabilities, I don't even have the tools to test it. My CPU is still under warranty with the store, but I'd need to have some way to prove to them that the completely random crashes I'm having and that I have absolutely no idea the cause are actually the CPU's fault.

I have a minidump folder, I guess it's holding the last 5 crashes. I put everything in a zip and hosted it here: https://drive.google.com/file/d/189CxMx4eKZ1jLh1TLk6MpWveSj6FPgkH/view?usp=sharing


Can someone help me here?
 
Solution
rounding errors will cause BSOD. No point looking at dumps until we stop the rounding errors. If you still bsod after that, we can look at new dumps
@gardenman don't convert above dumps.

Perhaps I was too quick saying its not ram.
Can be "fixed" by upping the voltage on the ram in bios.

its a power thing but i don't know if its the PSU or just not enough power going to ram.

gardenman

Splendid
Moderator
I ran the dump file through the debugger and got the following information: https://jsfiddle.net/ebnx0rks/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:012222-4968-01.dmp (Jan 22 2022 - 13:34:58)
Bugcheck:CLOCK_WATCHDOG_TIMEOUT (101)
Probably caused by:Unknown_Image (Process running at time of crash: System)
Uptime:0 Day(s), 0 Hour(s), 06 Min(s), and 29 Sec(s)

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.

Edit:
Results for 3 additional dumps posted below: https://jsfiddle.net/8ykm9wsg/show
File information:012222-5421-01.dmp (Jan 22 2022 - 16:00:43)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:memory_corruption (Process running at time of crash: System)
Uptime:0 Day(s), 1 Hour(s), 47 Min(s), and 01 Sec(s)

File information:012222-5031-01.dmp (Jan 22 2022 - 16:36:49)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:ntkrnlmp.exe (Process running at time of crash: prime95.exe)
Uptime:0 Day(s), 0 Hour(s), 35 Min(s), and 22 Sec(s)

File information:012222-4703-01.dmp (Jan 22 2022 - 16:45:50)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Driver warnings:*** WARNING: Unable to verify timestamp for amdkmdag.sys
Probably caused by:memory_corruption (Process running at time of crash: dwm.exe)
Uptime:0 Day(s), 0 Hour(s), 05 Min(s), and 07 Sec(s)
 
Last edited:
  • Like
Reactions: Sou Suzumi
Jan 15, 2022
18
3
15
New dump

BSOD while watching an Youtube video and almost no programs open.

KMODE_EXCEPTION_NOT_HANDLED

Maybe I'm overwhelming with a bunch of minidumps, but whatever I can get that can show the store that "yep, CPU is the problem" will help a lot.

https://drive.google.com/file/d/1AIKt1ODFno5RIKx2Ofw6qYv17VRSJuHI/view?usp=sharing

EDIT: so I've tried running Prime 95 with less cores.
Running it with 1 core, it doesn't seem to put up any errors, but I didn't leave it for too long.
Running it with 2 cores, the system shows an error on Core #2, as picture below:
2-cores.png

After a while (one minute or so) the system crashes and BSODs

With 3 cores, the program shows errors on cores #2 and #3, and also BSODs after not too long
3-cores.png


With 4 cores, the program crashes and closes as soon as I start the test. I've tried a quick picture, but it shows nothing, the program doesn't even shows the errors in cores #2 and #3 before closing

With 5 and 6 cores, the system reboots without a BSOD and without generating a minidump. I guess cores #4 and #5 also have problems and it becomes too much for the system to handle.
A quick pic before the reboot with 5 cores shows the same errors on cores #2 and #3
5-cores.png


I also have the new minidumps I've managed to get while running Prime 95 with only 2 and 3 cores, here: https://drive.google.com/file/d/10HFYR48_H69AuyodjzL665tcckKy359a/view?usp=sharing
 
Last edited:

Colif

Win 11 Master
Moderator
rounding errors will cause BSOD. No point looking at dumps until we stop the rounding errors. If you still bsod after that, we can look at new dumps
@gardenman don't convert above dumps.

Perhaps I was too quick saying its not ram.
Can be "fixed" by upping the voltage on the ram in bios.

its a power thing but i don't know if its the PSU or just not enough power going to ram.
 
  • Like
Reactions: Sou Suzumi
Solution
Jan 15, 2022
18
3
15
Perhaps I was too quick saying its not ram.
Can be "fixed" by upping the voltage on the ram in bios.

its a power thing but i don't know if its the PSU or just not enough power going to ram.

OK, so after you said that, I decided to do another thing that, in retrospect, I probably should have done first of all, which was reseting my BIOS to default settings.
It hadn't occurred to me to do that because I didn't really change anything, the only thing I did was turning XMP off and on when I was testing if it was the culprit of the crashes.

Alas, when I tried to save and exit, the BIOS warned me that a bunch of stuff was changed, including the RAM voltage that was set to 1,35V and now is on "auto". So maybe when I changed the XMP settings it set some settings manually?
Either way, HWinfo is showing me the RAM is working at 3200MHz, which means XMP is enabled, so the default BIOS settings have it active. So I can't understand why turning it on by itself set those settings to determinate values (the frequency and timings were also set to specific numbers and now are all on "auto").

The good news is that putting everything on "auto" solved the Prime 95 stress test problem. It's been running for about 10 minutes and showed no errors neither crashed the system, and that's like, at least 10 minutes more than it was working before.

So I'm assuming I was having some weird errors that were driver-related and when testing to see if the RAM was at fault I somehow made everything worse. Cool.

Now we test again.
 
  • Like
Reactions: Colif
Jan 15, 2022
18
3
15
OK, so as an update, I've been testing the system for almost a week and it seems it's stable now.

As an addendum, I misread the HWinfo information, and had to turn XMP again, which changed the frequency, timings and voltage just like they were before.
However, even with the RAM settings again on the specific values (1,35V, etc), the system haven't crashed anymore, and I tried to run Prime 95 for a night and found no errors (I noticed XMP wasn't enabled and activated it again in the BIOS still on Jan 23).

So, I assume there was some other non-default option in the BIOS that was causing problems.

In the end, the solution was:
- updating drivers
- resetting the BIOS to default settings.

I'll flag the question as "solved", thanks for the help everyone.
 
  • Like
Reactions: gardenman and Colif