[SOLVED] Over the last month or so, I've been getting random BSODs ?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

gardenman

Distinguished
Moderator
I ran the dump file through the debugger and got the following information: https://jsfiddle.net/ebnx0rks/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:012222-4968-01.dmp (Jan 22 2022 - 13:34:58)
Bugcheck:CLOCK_WATCHDOG_TIMEOUT (101)
Probably caused by:Unknown_Image (Process running at time of crash: System)
Uptime:0 Day(s), 0 Hour(s), 06 Min(s), and 29 Sec(s)

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.

Edit:
Results for 3 additional dumps posted below: https://jsfiddle.net/8ykm9wsg/show
File information:012222-5421-01.dmp (Jan 22 2022 - 16:00:43)
Bugcheck:KMODE_EXCEPTION_NOT_HANDLED (1E)
Probably caused by:memory_corruption (Process running at time of crash: System)
Uptime:0 Day(s), 1 Hour(s), 47 Min(s), and 01 Sec(s)

File information:012222-5031-01.dmp (Jan 22 2022 - 16:36:49)
Bugcheck:IRQL_NOT_LESS_OR_EQUAL (A)
Probably caused by:ntkrnlmp.exe (Process running at time of crash: prime95.exe)
Uptime:0 Day(s), 0 Hour(s), 35 Min(s), and 22 Sec(s)

File information:012222-4703-01.dmp (Jan 22 2022 - 16:45:50)
Bugcheck:SYSTEM_SERVICE_EXCEPTION (3B)
Driver warnings:*** WARNING: Unable to verify timestamp for amdkmdag.sys
Probably caused by:memory_corruption (Process running at time of crash: dwm.exe)
Uptime:0 Day(s), 0 Hour(s), 05 Min(s), and 07 Sec(s)
 
Last edited:
Reactions: Sou Suzumi
Jan 15, 2022
17
3
15
0
New dump

BSOD while watching an Youtube video and almost no programs open.

KMODE_EXCEPTION_NOT_HANDLED

Maybe I'm overwhelming with a bunch of minidumps, but whatever I can get that can show the store that "yep, CPU is the problem" will help a lot.

https://drive.google.com/file/d/1AIKt1ODFno5RIKx2Ofw6qYv17VRSJuHI/view?usp=sharing

EDIT: so I've tried running Prime 95 with less cores.
Running it with 1 core, it doesn't seem to put up any errors, but I didn't leave it for too long.
Running it with 2 cores, the system shows an error on Core #2, as picture below:

After a while (one minute or so) the system crashes and BSODs

With 3 cores, the program shows errors on cores #2 and #3, and also BSODs after not too long


With 4 cores, the program crashes and closes as soon as I start the test. I've tried a quick picture, but it shows nothing, the program doesn't even shows the errors in cores #2 and #3 before closing

With 5 and 6 cores, the system reboots without a BSOD and without generating a minidump. I guess cores #4 and #5 also have problems and it becomes too much for the system to handle.
A quick pic before the reboot with 5 cores shows the same errors on cores #2 and #3


I also have the new minidumps I've managed to get while running Prime 95 with only 2 and 3 cores, here: https://drive.google.com/file/d/10HFYR48_H69AuyodjzL665tcckKy359a/view?usp=sharing
 
Last edited:

Colif

Win 11 Master
Moderator
rounding errors will cause BSOD. No point looking at dumps until we stop the rounding errors. If you still bsod after that, we can look at new dumps
@gardenman don't convert above dumps.

Perhaps I was too quick saying its not ram.
Can be "fixed" by upping the voltage on the ram in bios.

its a power thing but i don't know if its the PSU or just not enough power going to ram.
 
Reactions: Sou Suzumi
Jan 15, 2022
17
3
15
0
Perhaps I was too quick saying its not ram.
Can be "fixed" by upping the voltage on the ram in bios.

its a power thing but i don't know if its the PSU or just not enough power going to ram.
OK, so after you said that, I decided to do another thing that, in retrospect, I probably should have done first of all, which was reseting my BIOS to default settings.
It hadn't occurred to me to do that because I didn't really change anything, the only thing I did was turning XMP off and on when I was testing if it was the culprit of the crashes.

Alas, when I tried to save and exit, the BIOS warned me that a bunch of stuff was changed, including the RAM voltage that was set to 1,35V and now is on "auto". So maybe when I changed the XMP settings it set some settings manually?
Either way, HWinfo is showing me the RAM is working at 3200MHz, which means XMP is enabled, so the default BIOS settings have it active. So I can't understand why turning it on by itself set those settings to determinate values (the frequency and timings were also set to specific numbers and now are all on "auto").

The good news is that putting everything on "auto" solved the Prime 95 stress test problem. It's been running for about 10 minutes and showed no errors neither crashed the system, and that's like, at least 10 minutes more than it was working before.

So I'm assuming I was having some weird errors that were driver-related and when testing to see if the RAM was at fault I somehow made everything worse. Cool.

Now we test again.
 
Reactions: Colif
Jan 15, 2022
17
3
15
0
OK, so as an update, I've been testing the system for almost a week and it seems it's stable now.

As an addendum, I misread the HWinfo information, and had to turn XMP again, which changed the frequency, timings and voltage just like they were before.
However, even with the RAM settings again on the specific values (1,35V, etc), the system haven't crashed anymore, and I tried to run Prime 95 for a night and found no errors (I noticed XMP wasn't enabled and activated it again in the BIOS still on Jan 23).

So, I assume there was some other non-default option in the BIOS that was causing problems.

In the end, the solution was:
- updating drivers
- resetting the BIOS to default settings.

I'll flag the question as "solved", thanks for the help everyone.
 
Reactions: gardenman and Colif

ASK THE COMMUNITY