Question Windows 11 - random BSODs with different stop codes ?

DarkThroat

Distinguished
Feb 15, 2016
45
1
18,535
Okay, where do I even begin with this? About a month ago, I had upgraded my GPU to an RTX 4080 and everything was fine for a number of weeks.

Then I started encountering this issue where my monitors would go black and all the fans in my system would ramp up to full speed. I feel I was able to narrow this down to the computer being physically bumped, as it would consistently happen whenever my knee bumped into it.

This persisted randomly for a couple weeks until suddenly I started encountering blue screen errors, each time with a different stop code such as UNEXPECTED_KERNEL_MODE_TRAP and SYSTEM_SERVICE_EXCEPTION.

I opted to completely reset my installation of Windows 11 to factory defaults to see if this fixed it, and it seemingly did for a full day. I was even able to play graphically intensive games for several hours without encounting blue screens, and it seemed that strange bug where my monitors went black and fans ramped up had been curbed too.

That is until the next day, where I started getting blue screens again. I only encountered three over a 12 hour period, but my system was running abnormally sluggish after the third one. I tried a system restore back to when I had reinstalled Windows, and that has not helped.

I also tried running these two commands in Power Shell and they turned up no errors or concerns

Dism /online /cleanup-image /restorehealth
SFC /scannow

These blue screen errors are both frustrating and concerning. It feels impossible to pinpoint the exact cause of them because they'll occur seemingly at random, whether I'm watching a video on Twitter or idling on my desktop, occurring anytime between minutes after logging into Windows to several hours of being online.

tl:dr - Random Blue Screens with different stop codes are occurring

What I've done to try and fix this:
  • Reseating my GPU
  • Reseating my RAM
  • Updating my BIOS
  • Resetting my BIOS settings to default
  • Swapping back to my RTX 2080
  • Reinstalling Windows 11 (twice)

Here's my system configuration if it helps at all:
Motherboard: Asus Prime Z390-A​
GPU: RTX 4080 (vertically mounted with a CoolerMaster GPU mount/riser card kit if that matters)​
CPU: i9-9900KF​
CPU Cooler: CoolerMaster ML240L​
RAM: 32GB G.Skill Trident Z Neo​
PSU: CORSAIR RMx 850W ATX12V​
Storage: Multiple SSDs and HDDs, but Windows is installed on a 256GB Samsung SSD (I think it's an 840 Evo?)​

Thanks for any input that may serve to fix these issues
 

Ralston18

Titan
Moderator
How old is the PSU? Condition (original to build, new, refurbished, used)?

History of heavy use for gaming or video editing?

Look in Reliability History/Monitor and Event Viewer.

Either one or both tools may be capturing some error codes, warnings, or even informational events just before or at the time of the BSODs.
 

DarkThroat

Distinguished
Feb 15, 2016
45
1
18,535
How old is the PSU? Condition (original to build, new, refurbished, used)?

History of heavy use for gaming or video editing?

Look in Reliability History/Monitor and Event Viewer.

Either one or both tools may be capturing some error codes, warnings, or even informational events just before or at the time of the BSODs.
PSU is just a few years old, got it brand new in June of 2020, but yes the PC's primary functions are high fidelity gaming, video/music production and 3D modeling/rendering.

Checking Reliability History shows three critical events each time Windows blue screens. When viewing the technical details for 'Windows stopped working,' it displays a description that reads as such:

"The computer has rebooted from a bugcheck. The bugcheck was: 0x0000003b (0x00000000c0000005, 0xfffff8074f032185, 0xffffa60860e66fc0, 0x0000000000000000). A dump was saved in: C:\WINDOWS\MEMORY.DMP. Report Id: abc8e740-4aa0-4a96-8520-3a721d3f89d9."

Each time the system stops working, it produces the same description, but the bugcheck lists different text each time. I could try to upload the most recent dump file somewhere, but it's over 1GB in size. Strangely enough, it generated this error and dump file at a time I booted the computer into safe mode, not when it blue screened.

Event Viewer shows a Kernel-Power error with an Event ID of 41 and a Task Category of 63 each time the system blue screened, showing a description that reads as such:

The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

Hope this information helps.
 

Ralston18

Titan
Moderator
Forego the mni-dump for now unless someone specifically asks to see the mini-dump.

= = = =

Overall, I think the PSU is a likely suspect.

The new GPU may have required/demanded more wattage (normal and/or peak) than the PSU was able to provide.

Any means to swap in another known working 850 or higher wattage PSU? Remember use only the cables (as approriate) that come with the subsitute PSU.

If you have a multimeter and know how to use it (or know someone who does) the PSU can be tested to some extent.

Not a full test because the PSU is not under load. However, any voltages out of tolerance would make the PSU even more suspect.

FYI:

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

= = = =

As an iterim effort:

Power down, unplug, open the case.

Clean out dust and debris.

Verify by sight & feel that all connectors, cards, RAM, jumpers, and case connections are fully and firmly in place.

Use a bright flashlight to inspect for signs of damage.
 

DarkThroat

Distinguished
Feb 15, 2016
45
1
18,535
Forego the mni-dump for now unless someone specifically asks to see the mini-dump.

= = = =

Overall, I think the PSU is a likely suspect.

The new GPU may have required/demanded more wattage (normal and/or peak) than the PSU was able to provide.

Any means to swap in another known working 850 or higher wattage PSU? Remember use only the cables (as approriate) that come with the subsitute PSU.

If you have a multimeter and know how to use it (or know someone who does) the PSU can be tested to some extent.

Not a full test because the PSU is not under load. However, any voltages out of tolerance would make the PSU even more suspect.

FYI:

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

= = = =

As an iterim effort:

Power down, unplug, open the case.

Clean out dust and debris.

Verify by sight & feel that all connectors, cards, RAM, jumpers, and case connections are fully and firmly in place.

Use a bright flashlight to inspect for signs of damage.
Are the error codes I provided leading you to believe it's a PSU problem? At this time, I don't have a spare that I could use for testing purposes, but I'm going to be getting an Amazon gift card very soon that I could use to purchase a newer one, probably a 1000w just to give myself extra headroom.

I had dusted my computer out prior to installing the new GPU so it should be clean as can be. I had also done my research prior to making my purchase and it seemed my 850w would be enough to handle it, but maybe the GPU is overworking it in the PSU's age?

There was also a period a couple years ago where I had my CPU overclocked to 5.0GHz, but that proved to be a bit unstable after getting a new motherboard, so I dialed it down to 4.8GHz. I was encountering boot loops when I first installed the 4080 so I replaced the CMOS battery on the motherboard which I believe reset all my settings and undid my overclock (though strangely task manager still shows the CPU running at 4.6GHz even though it acknowledges it's base clock speed as 3.6GHz). Think the overclock could've reduced the life span of the PSU and the 4080 is the straw that broke it's back?

I have a friend that works as a contractor, so I could ask if he has a multimeter. If he does, I'll see if he can swing around my place to run some tests and I'll report back.
 

Ralston18

Titan
Moderator
Take a deeper look into the error codes, etc. captured by Reliability History and Event Viewer.

Reliabiity History is much more user friendly and the time line format can be very revealing.

Event Viewer requires more time and effort to navigate and understand.

To help with Event Viewer:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)

How to use Event Viewer on Windows 10 | Windows Central

Increasing numbers of errors and varying errors over time is, to me, an indicator of a failing faltering PSU.

Another thing you can do:

https://www.tomshardware.com/reviews/best-psus,4229.html

Not with the immediate intent to buy a new PSU. Simply use the suggested calculators to determine the load that is or may be being imposed on the PSU.

And do your own manual component listing and total up the wattage. Then add 25% more.

If any given component provides a range of wattage requirements the use the high end value.
 

DarkThroat

Distinguished
Feb 15, 2016
45
1
18,535
Take a deeper look into the error codes, etc. captured by Reliability History and Event Viewer.

Reliabiity History is much more user friendly and the time line format can be very revealing.

Event Viewer requires more time and effort to navigate and understand.

To help with Event Viewer:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)

How to use Event Viewer on Windows 10 | Windows Central

Increasing numbers of errors and varying errors over time is, to me, an indicator of a failing faltering PSU.

Another thing you can do:

https://www.tomshardware.com/reviews/best-psus,4229.html

Not with the immediate intent to buy a new PSU. Simply use the suggested calculators to determine the load that is or may be being imposed on the PSU.

And do your own manual component listing and total up the wattage. Then add 25% more.

If any given component provides a range of wattage requirements the use the high end value.
Taking a look at the errors presented in Reliability Monitor, these are the bugchecks produced over the last few days at the time of each blue screen

6/18/23
8:23am - 0x0000003b (0x00000000c0000005, 0xfffff8074f032185, 0xffffa60860e66fc0, 0x0000000000000000)
9:40am - 0x0000000a (0xffff9181c65910b0, 0x0000000000000002, 0x0000000000000000, 0xfffff801238320c3)
5:56pm - 0x000000d1 (0x000000000001006e, 0x0000000000000002, 0x0000000000000008, 0x000000000001006e)

6/19/23
9:18pm - 0x0000007f (0x0000000000000008, 0xffffc401c12c1e50, 0xffff9b0224090f80, 0xfffff8062cc2de10)

6/20/23 (This one did not occur during a blue screen, it occurred when booting to safe mode)
8:42am - 0x0000003b (0x00000000c0000005, 0xfffff8066887a7c4, 0xffffe305acf56d20, 0x0000000000000000)

Following this most recent error, Reliability Monitor also produced two errors at 8:43am, showing "Windows failed to start because of missing system files" with technical details reading as "Windows was unable to determine the problem. Error code: 0x3b"

Another thing of note, is that among the trio of errors that were produced at each blue screen, there's an error that reads "The previous system shutdown at XX:XX:XX AM/PM on ‎6/‎XX/‎2023 was unexpected," however the times that each error is listing are not times that the PC had been shut down at all. For example, the error at 8:23am on 6/18 shows a time of 8:17:12 AM of the same day, but the computer was up and running at that time.

Also, it seems my contractor friend doesn't have a multimeter, so I feel my options would either be to buy/rent a tool that I don't really know how to use or commit to purchasing a new PSU which at this point is something I'm more comfortable doing.
 

Ralston18

Titan
Moderator
Unfortunately I am not at all familar (full disclosure) with what those bug checks may be indicating. Hopefully the developers can understand it all.....

"Missing system files". For the most part "sfc /scannow" and "dism" would fix such things - and maybe did so at some point but the failures have again corrupted the fixed files or perhaps other files.

The PC does not have to be explicity shutdown for Windows to believe a shutdown has occurred. Just the briefest glitch in power, a voltage dip perhaps, could trigger the shutdown sequences and/or garble up the process. Making things worse.

On face value I am still thinking PSU.

Poke around in Event Viewer some. No need to rush through it all.

And, late thought, check Update History for any failed or problem updates.
 

DarkThroat

Distinguished
Feb 15, 2016
45
1
18,535
I appreciate your help regardless, cuz you've at least helped narrow what the cause of the errors could be.

The only Critical Event errors that Event Viewer is listing are the same repeated "Event 41, Kernel-Power" error at the time of each blue screen. Each one has a description that reads as:

The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

The details tab for each error also lists the same bugcheck parameters that I mentioned earlier.

Over the last few days since resetting Windows, Event Viewer has logged 624 errors in the Error Event tab. In no particular order, some of the errors are listed as:

  • Kernel-EventTracing
  • AppModel-State
  • AppReadiness
  • SecurityCenter (this one's interesting cuz I've also been having problems with Windows Defender since resetting Windows)
  • CertificateServicesClient-CertEnroll
  • BonjourService
  • volmgr
  • ESENT
  • AppModel-Runtime
  • DeviceSetupManager
  • Client-Licensing
  • DeviceManagement-Enterprise-Diagnostics-Provider
  • Application Error
  • BugCheck


Many of those errors are repeats throughout the three days and these are not all the different errors that are listed. As such, there are too many errors to provide individual details on.
 

DarkThroat

Distinguished
Feb 15, 2016
45
1
18,535
UPDATE: I looked into some of the power draw calculators in the forum thread you linked earlier and it's looking like the estimated draw for my build is between 750-800w, which isn't leaving me a lot of headroom on my current 850w power supply. Guess I should've done a bit more research before I bought the 4080.

I'm no expert, but I guess what's happening is that the PSU has lost efficiency over time since I did have my CPU overclocked for a couple years and the extra draw from this absolute monolith of a GPU is resulting in occasional power spikes that take it over it's max power threshold, which is causing the blue screens to trigger? That's just my guess.

Thanks again for your input, I really appreciate it. I'll update this thread once I get a new PSU and report back with the results after some tests.
 

DarkThroat

Distinguished
Feb 15, 2016
45
1
18,535
 UPDATE: New 1000w PSU installed. Too early to tell if the problem is fixed but I haven't encountered any blue screens in the brief time my computer has been on. There are other software issues I'm working through, but those will warrant threads of their own.

Will update again after a week or so to see how the system fares.
 

DarkThroat

Distinguished
Feb 15, 2016
45
1
18,535
 UPDATE: New 1000w PSU installed. Too early to tell if the problem is fixed but I haven't encountered any blue screens in the brief time my computer has been on. There are other software issues I'm working through, but those will warrant threads of their own.

Will update again after a week or so to see how the system fares.
At this point I'm gonna call it good. Not a single blue screen since installing the new PSU. Thanks again for your helpful insight. Closing the thread.