Question Windows BSOD Errors are following system across multiple reinstalls and parts replacements ?

Apr 3, 2024
6
0
10
Hey everybody, hoping to get some assistance with this issue as I'm running out of stuff to try here. I've been getting DPC_Watchdog errors, IRQL_NOT_LESS_OR_EQUAL BSODs, and MEMORY_MANAGEMENT BSODs in my windows installs, and I feel like I've tried everything to get rid of them. They occur kinda randomly, but seem to get more likely the longer the computer runs. Gaming also seems to trigger them, and some of my steam games won't even launch or will crash after running for a bit. Here's how the system started:
5800x3D
Hyper212 cooler
MSI Tomahawk board
64GB DDR4 RAM Gskill
1TB NVME SSD
3070
bequiet 850W PSU

I was upgrading the cooler to a Noctua DH15S, and **** up the pins on 5800x3D. My fault, but I figured might as well go for AM5, so the build became:

7800x3D
Noctua DH-15s
Gigabyte Aorus B650
64GB Corsair Vengeance DDR5 5200mhz
1TB NVME SSD (New hard drive)
3070
bequiet 850W PSU

Well, the blue screen issues follow me over to the new CPU as well. I spent a long time trying to chase down what was happening in event viewer, but it was something new every time. I had read about some stability issues regarding AM5, so I figured I'll be safe and just pivot to Intel. So now the build is currently this:

13700KF
Noctua DH-15S
Asus TUF Gaming Pro Wifi Board
64GB DDR5 Vengeance 5200mhz
1TB NVME SSD (new)
3070
Corsair 750W Power Supply

And I've come to discover the issues have even followed me here. I ran DDU to get rid of any AMD and Nvidia drivers, then did a full reinstall but it didn't seem to make a difference. I can grab any logs anyone needs to look at, just let me know what you need and I'll post. From what I can tell in event viewer, I'm seeing a lot of "Kernel-Power" Errors. Any advice would be very appreciated, thanks everyone!



Here is a cat.moe upload of the device's recent minidump files: https://files.catbox.moe/phz8hm.zip

And here is one for the System logs: https://files.catbox.moe/buxf4l.evtx
 
Can you please upload the full kernel dump, it's the file C:\Windows\Memory.dmp, and it will be large.

Three of the BSODs happened because several interrupt service routines (ISR) collectiovely ran for longer than allowed. The ISR is the front-end of device interrupt handing, the ISR code is in the device driver and it's executed whenever the device presents a hardware interrupt. When it's a collective group of ISRs (or DPCs) than run for too long we need a kernel dump.

One BSOD happened because a single deferred procedure call(DPC) ran for longer than allowed. The DPC is the back-end of device interrupt processing. DPCs are placed on a queue (by the ISR) and run when a processor is otherwise idle (the DPC code is in the device driver also). Because this one dump is for an isolated DPC we can debug it with a minidump. In the dump we see a networking operation in progerss, the Windows networking drivers are called often. We also see the Windows Wdf01000.sys driver called - this is the Widnows Driver Foundation root driver, it manages any third-party driver written using WDF libraries but sadly we don't get to see what those drivers were. It's probably one of these third-party drivers whose DPC ran for too long.

The other thing we do see in the dump are Windows USB3 drivers being called, so this may well be a USB attached network adapter? If so, then it's probably the driver for that USB network adapter that's at fault here.

If the full kernel dump shows that the collectively long running ISRs were mostly caused by Wdf01000.sys, and the USB drivers are referencved then these will be caused by the same network adapter driver.
 
  • Like
Reactions: Rickafer39
Can you please upload the full kernel dump, it's the file C:\Windows\Memory.dmp, and it will be large.

Three of the BSODs happened because several interrupt service routines (ISR) collectiovely ran for longer than allowed. The ISR is the front-end of device interrupt handing, the ISR code is in the device driver and it's executed whenever the device presents a hardware interrupt. When it's a collective group of ISRs (or DPCs) than run for too long we need a kernel dump.

One BSOD happened because a single deferred procedure call(DPC) ran for longer than allowed. The DPC is the back-end of device interrupt processing. DPCs are placed on a queue (by the ISR) and run when a processor is otherwise idle (the DPC code is in the device driver also). Because this one dump is for an isolated DPC we can debug it with a minidump. In the dump we see a networking operation in progerss, the Windows networking drivers are called often. We also see the Windows Wdf01000.sys driver called - this is the Widnows Driver Foundation root driver, it manages any third-party driver written using WDF libraries but sadly we don't get to see what those drivers were. It's probably one of these third-party drivers whose DPC ran for too long.

The other thing we do see in the dump are Windows USB3 drivers being called, so this may well be a USB attached network adapter? If so, then it's probably the driver for that USB network adapter that's at fault here.

If the full kernel dump shows that the collectively long running ISRs were mostly caused by Wdf01000.sys, and the USB drivers are referencved then these will be caused by the same network adapter driver.
Thanks so much for taking the time to check into this with me, I really appreciate it! I went ahead and grabbed a full kernel dump, here is the link: https://www.mediafire.com/file/azboj5ip0hvkobp/MEMORY.DMP/file

If there are any other documents or logs that will be helpful, let me know and I can grab them!

As for the USB network adapter, I don't think I have one? But, if I wanted to be sure those drivers were 100% gone, how would I go about doing that?
 
Hmmm, for some reason that I don't fully understand the ISR/DPC trace entries were not saved in either of your kernel dumps. However, I now think the problem may actually be RAM. I've looked through all 24 of your processors and only 5 were running work, the rest were idle. Of those four, three of them all suffered a page fault due to an invalid memory reference. This could be a rogue driver, but we must suspect RAM first.

Testing 64GB of RAM with Memtest86 is going to take an age, during which you won't be able to use the PC at all. Since you have 2 x 32GB RAM sticks by far the best way to test your RAM is to remove one stick. Ensure that the one remaining stick is in the A2 slot (the correct slot for a one stick RAM according to your motherboard manual).

Leave that one stick out and try installing again. If it BSODs then swap sticks and try installing again. You'll soon discover whether one stick is bad.
 
Hmmm, for some reason that I don't fully understand the ISR/DPC trace entries were not saved in either of your kernel dumps. However, I now think the problem may actually be RAM. I've looked through all 24 of your processors and only 5 were running work, the rest were idle. Of those four, three of them all suffered a page fault due to an invalid memory reference. This could be a rogue driver, but we must suspect RAM first.

Testing 64GB of RAM with Memtest86 is going to take an age, during which you won't be able to use the PC at all. Since you have 2 x 32GB RAM sticks by far the best way to test your RAM is to remove one stick. Ensure that the one remaining stick is in the A2 slot (the correct slot for a one stick RAM according to your motherboard manual).

Leave that one stick out and try installing again. If it BSODs then swap sticks and try installing again. You'll soon discover whether one stick is bad.
Understood, thanks very much I'll give that a try tonight! Are there any other logs you'd like me to try and capture?
 
Given that you have changed all the hardware, I think you might be having an issue with your Windows installation source. Try downloading a new ISO image directly from Microsoft and installing from that to see if the problems go away.
 
Let's see how removing a RAM stick goes first.
Hey folks, apologies for the late response, wanted to do a lot of stress testing to make sure the issue is really resolved, and it looks like it was! After removing one RAM stick the BSODs stopped completely. I have since changed to a new set of DDR5 RAM (2x 32GB Gskill) and again, BSODs have not returned (knock on wood). Looks like I was getting pretty bad tunnel vision on the processor and the SSDs, learned a valuable lesson! Thanks again for the assistance everyone, really appreciate you all taking the time!