Question Intermittent crashes - Kernel PnP at fault ?

Lumberjack88

Prominent
Dec 21, 2022
11
0
510
I've built my Windows 11 PC system a while ago and never encountered intermittent crashing problems until a few days ago.

When the crash happens, my two monitors go black and the fans star spinning with maximum RPM. If I don't do anything, this state can persist for hours. I have to manually force a shutdown with the I/O button and only then do I get a proper restart. The PC still does its tasks in the background, i.e. I can hear the audio of the youtube video that I was watching before the crash, but the two displays don't react anymore.

The latest crash was today, the CPU and GPU were nowhere near full capacity (I was just browsing youtube) and the temperatures couldn't have been that high since the PC was running maybe 10-15 minutes and I have a lot of fans plus a big AIO cooler for the CPU. When I consult Windows Event Viewer, these are teh critical or warning events that happened right before the crash;

Error (Event ID (6008), Task Category (None)) Event Log: The previous system shutdown at 10:53:11 PM on ‎9/‎9/‎2024 was unexpected.

Critical (Event ID (41), Task Category (63)) Kernel Power: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

Warning (Event ID (219), Task Category(212)) Kernel-PnP: The driver \Driver\WUDFRd failed to load for the device ROOT\WINDOWSHELLOFACESOFTWAREDRIVER\0000.

I also get this error from time to time, but I don't think it's causing this crash as it appears a couple of times over a few hours:

Error (Event ID (1796), Task Category (None)) TPM-WMI: The Secure Boot update failed to update a Secure Boot variable with error Secure Boot is not enabled on this machine.. For more information, please see https://go.microsoft.com/fwlink/?linkid=2169931

I never used secure boot, do I have to update it now manually so that this message doesn't appear anymore?

Also, is there any other file that I could provide that would give me more insight into what happened before the crash?

Do you guys happen to know what's causing these crashes? Is it maybe buggy fan control software, or a damaged GPU PSU cable?

Here are my system specifications:

Motherboard: Gigabyte B550 Vision D Firmware 17 (1 year old firmware, current firmware has AGESA updates for the CPU)

CPU: AMD 5950X

RAM: G-Skill Trident Z NEO 3600MHz 4x16GB

GPU: MSI 4090 Suprim X (latest NVidia driver update)

PSU: Seasonic Prime TX-1000

Case: Corsair 5000T (13 x Corsair SP120 fans)

OS: Latest Windows 11

Fan Control Software: Corsair iCue (latest update)
 
you do get secure boot warning because you have enabled TPM, disable TPM in bios and warning will go away (or enable secure boot, your choice)
its irrelevant

that pnp error shouldnt cause black screen, its from windows hello (face recognition/fingerprint/pin for login)

black screen usualy comes from GPU, check power cables if connected properly

btw normal reboot on case button doesnt work?
 
  • Like
Reactions: Lumberjack88
you do get secure boot warning because you have enabled TPM, disable TPM in bios and warning will go away (or enable secure boot, your choice)
its irrelevant

that pnp error shouldnt cause black screen, its from windows hello (face recognition/fingerprint/pin for login)

black screen usualy comes from GPU, check power cables if connected properly

btw normal reboot on case button doesnt work?
AFAIK, I have to enable TPM to register Windows 11 or is there actually no reason to leave TPM enabled?

I'm using a straight 12VHPWR from cablemod that is compatible with my TX 1000 PSU and, so far, it has served me rather well. I've unplugged that power cord and saw no melted spots or anything similar, so I can only assume that the power deliver should be intact. I know that checking it with a Multimeter would probably be the better method to ascertain full functionality...

I noticed something strange when I was running some pytorch machine learning scripts. The training stopped after about 30 minutes or so with the notifications that there was a CUDA error, the screens didn't black out this time though. Could this just be a damaged GPU that is starting to act erratically? Or maybe it's just overburdened by the heat it had to put up with during this rather hot end of summer season?

Is there any way for me to check if all the CUDA cores are doing their jobs correctly? Can CUDA cores cause black screens as well?

That CUDA core error appeared before I unplugged and plugged in the power cord back again into the GPU. Maybe this simple procedure helped, the weather also got quite a bit colder so maybe that's helping as well.

I do have access to both a functional reset and shutdown button on my case, but I usually just press the shutdown button, leave the PC be for a minute or two and start it up from a clean slate.