Question BSOD and random restarting/freezing after hard reboot ?

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
So I awoke yesterday to my pc having all my rgb fans lit and all fans running hard. I turned on the monitor and it was stuck on the windows screen that says something like waiting for windows don't turn off the computer but the rotating balls were frozen. I figured I would leave it awhile so I went about my morning. When I came back a few hours later it was still stuck so I went ahead and held down the power button and turned it back on.

Everything seemed fine but awhile later while browsing the internet my cursor froze and all the lights and fans came back on again. I had to do the second hard reboot. Then my computer would just randomly start rebooting for no reason. Then at one point I came back up to a BSOD that said DCP watchdog violation.

So I tried updating my graphics driver. Didn't do anything and both the freezing and the rebooting has continued. Ran sfc/scannow and it came back saying it had repaired something. Rebooted was on the internet and about 5 minutes later I get another freeze then another reboot. At this point I'm at a loss so any suggestions would be greatly appreciated. Looking back at windows updates the only one that happened on the 28th was this one and it says successfully installed:

October 26, 2023-KB5031904 Cumulative Update Preview for .NET Framework 3.5, 4.8 and 4.8.1 for Windows 10 Version 22H2

I'm thinking about uninstalling that update but figured I'd ask around for help first.
 
Hey there,

So is the system in your sig, the one yo are talking about?

Are all your system drivers up to date? Did you 'DDU' your GPU driver?

Test your ram outside of Windows with memtest86+. Let it run for 4 passes. You don't want any errors in the result.

Good idea to take out all components and put back in securely.

What bios are you running? An update might help here.
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
I'm also seeing a HUGE number of this error in Event Viewer
Metadata staging failed, result=0x80004005 for container '{5EE0C0F0-1D2A-531C-B790-0BD85F9F617B}'

I didn’t use DDU so I downloaded it and then when I tried to boot in safe mode, that it recommended, I’m unsure what happened but either it didn’t even post or it’s unable to use my tv monitor as a monitor in safe mode.

Since I can’t get in to my system at the moment I’ll have to let you know about the BIOS though I don’t believe I’ve updated it since I made this system in October of 2021. I’ll see about grabbing an actual monitor soon. But the code on the mobo is AA which should mean that it did post. Sorry I always seem to forget half the things I learn when I make a system in a short amount of time!

No it's not the "first build mixed use".
My current system is this:
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
This could be critical. If that long since last one, then you could have instability. Worth doing first to rule it out as the issue.
Ok I have now uninstalled the graphics driver with DDU and reinstalled it and updated the BIOS to the current version. Used minidumps and Blue Screen View and it is saying that the "cause" is ntoskrnl.exe at address ntoskrnl.exe+3fd640. I’m in the middle of using the memtest86+ now. 64gbs of RAM seems to take about an hour a pass. So far one pass zero errors.
What should my next step be if all 4 passes pass?
 
Ok I have now uninstalled the graphics driver with DDU and reinstalled it and updated the BIOS to the current version. Used minidumps and Blue Screen View and it is saying that the "cause" is ntoskrnl.exe at address ntoskrnl.exe+3fd640. I’m in the middle of using the memtest86+ now. 64gbs of RAM seems to take about an hour a pass. So far one pass zero errors.
What should my next step be if all 4 passes pass?
Make sure to clear CMOS after the bios update. This will wipe any remnants of microcode, and we can be sure that it's a problem or not.
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
Make sure to clear CMOS after the bios update. This will wipe any remnants of microcode, and we can be sure that it's a problem or not.
Ah yes I should have done that before redoing my settings oh well. Thankfully I have a clear CMOS button on my mobo. Makes it a lot easier than having to uncover and remove that damn battery! Did that and this time I just left it at the default for now. Removed and reseated my RAM sticks. I think I also exchanged them but I honestly don't know because I got them confused while they were in my hand lol.

I hope I did this file sharing right I don't normally share files. https://www.dropbox.com/scl/fi/eyh6...9-01.dmp?rlkey=iikygx6pzkbnwigyo3gf7fooj&dl=0
 
  • Like
Reactions: Roland Of Gilead

ubuysa

Distinguished
Yes you did upload the dump successfully. Sadly, now that I can see the minidump, we need to see the full kernel dump. This particular bugcheck can only be diagnosed with the kernel dump.

Please upload the file C:\Windows\Memory.dmp. It will be large (around 1.5GB).
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
Yes you did upload the dump successfully. Sadly, now that I can see the minidump, we need to see the full kernel dump. This particular bugcheck can only be diagnosed with the kernel dump.

Please upload the file C:\Windows\Memory.dmp. It will be large (around 1.5GB).
Ok this is that file zipped since it was 2.5GB. I'm really hoping we can fix this. It's getting real old only being able to be on my system for sometimes only 5 minutes at a time. Makes it so hard to troubleshoot when it keeps freezing.
 

ubuysa

Distinguished
Thanks for the kernel dump, I'll explain a little about what's going on here and why I needed the kernel dump. This is useful knowledge because the results here are not clear-cut.

A DPC (Deferred Procedure Call) is the back-end of device interrupt processing. DPCs are scheduled on a queue when the device interrupt occurs and this queue is executed when a processor is otherwise idle. Even so, because DPCs run at a high priority they are not allowed to run for too long, otherwise the processor would be unable to process real-time work. There are two watchdogs that monitor DPCs; one pops if a single DPC runs for too long, the other pops when a group of DPCs collectively run for too long. Your BSOD is one of the latter and we need the kernel dump to be able to access the Windows Management Instrumentation (WMI) trace records, which are only found in a full kernel dump.

We dump the WMI trace records for all running DPCs and then use the Windows Performance Analyzer (WPA) to view these trace records. By sorting the DPCs on their total run time we can see which ones ran for the longest and thus contributed the most to the cumulative watchdog timeout.

The WPA display for your BSOD is below...

432dKgM.jpg


Microsoft recommend that no DPC run for longer than 100 microseconds (0.1ms) yet in the display above you can see that the top 7 DPCs run for longer than that. Note: ntoskrnl.exe is not a DPC it's the Windows kernel and so we can ignore it here.

It's unusual to have so many long running DPCs, but I think we can safely ignore storport.sys, it's the high-level storage driver (for disk accesses). It's also likely that most of the others may be related. We can see the following long running DPCS...
  • ndis.sys - this is the high-level network stack driver
  • ACPI.sys - this is the PnP power management driver
  • tcpip.sys - this is the high-level TCP/IP driver
  • dxgkrnl.sys - this is the DirectX kernel driver
  • Wdf01000.sys - this is the Windows Driver Foundation (WDF) high-level driver (many third-party drivers are written using the WDF librtaries, this driver manages all of those third-party drivers)
Of those, ndis.sys, ACPI.sys, tcpip.sys, and possibly Wdf01000.sys, are related to a networking adapter. The outlier is dxgkrnl.sys which is graphics related, but if you're streaming for example then a problem in the networking area could also cause the dxgkrnl.sys DPC to run for longer than usual as well.

Do please bear in mind that this is just one dump and it's hard making a diagnosis based on only one dump. If you have more DPC_WATCHGOG_VIOLATION BSODS please copy the file C:\Windows\Memory.dmp to a temprary location - it's overwrittenm every time a new BSOD happens. Then you can upload multiple kernel dumps which will give a much more reliable picture of what's going on.

For now however, I think it would be wise to focus on the networking area. If you're using WiFi then (temporarily) connect via a cable, or vice-versa. That will quickly eliminate the network adapter as the cause (or highlight it as the cause).

I would also suggest restting the winsock and TCP/IP stack...
  1. Open an elevated cokmand prompt
  2. Enter the command: netsh winsock reset
  3. Enter the command: netsh int ip reset
  4. Reboot
See whether that helps at all. Do please try and upload more kernel dumps just in case this one is an outlier.
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
Thanks for the kernel dump, I'll explain a little about what's going on here and why I needed the kernel dump. This is useful knowledge because the results here are not clear-cut.

A DPC (Deferred Procedure Call) is the back-end of device interrupt processing. DPCs are scheduled on a queue when the device interrupt occurs and this queue is executed when a processor is otherwise idle. Even so, because DPCs run at a high priority they are not allowed to run for too long, otherwise the processor would be unable to process real-time work. There are two watchdogs that monitor DPCs; one pops if a single DPC runs for too long, the other pops when a group of DPCs collectively run for too long. Your BSOD is one of the latter and we need the kernel dump to be able to access the Windows Management Instrumentation (WMI) trace records, which are only found in a full kernel dump.

We dump the WMI trace records for all running DPCs and then use the Windows Performance Analyzer (WPA) to view these trace records. By sorting the DPCs on their total run time we can see which ones ran for the longest and thus contributed the most to the cumulative watchdog timeout.

The WPA display for your BSOD is below...

432dKgM.jpg


Microsoft recommend that no DPC run for longer than 100 microseconds (0.1ms) yet in the display above you can see that the top 7 DPCs run for longer than that. Note: ntoskrnl.exe is not a DPC it's the Windows kernel and so we can ignore it here.

It's unusual to have so many long running DPCs, but I think we can safely ignore storport.sys, it's the high-level storage driver (for disk accesses). It's also likely that most of the others may be related. We can see the following long running DPCS...
  • ndis.sys - this is the high-level network stack driver
  • ACPI.sys - this is the PnP power management driver
  • tcpip.sys - this is the high-level TCP/IP driver
  • dxgkrnl.sys - this is the DirectX kernel driver
  • Wdf01000.sys - this is the Windows Driver Foundation (WDF) high-level driver (many third-party drivers are written using the WDF librtaries, this driver manages all of those third-party drivers)
Of those, ndis.sys, ACPI.sys, tcpip.sys, and possibly Wdf01000.sys, are related to a networking adapter. The outlier is dxgkrnl.sys which is graphics related, but if you're streaming for example then a problem in the networking area could also cause the dxgkrnl.sys DPC to run for longer than usual as well.

Do please bear in mind that this is just one dump and it's hard making a diagnosis based on only one dump. If you have more DPC_WATCHGOG_VIOLATION BSODS please copy the file C:\Windows\Memory.dmp to a temprary location - it's overwrittenm every time a new BSOD happens. Then you can upload multiple kernel dumps which will give a much more reliable picture of what's going on.

For now however, I think it would be wise to focus on the networking area. If you're using WiFi then (temporarily) connect via a cable, or vice-versa. That will quickly eliminate the network adapter as the cause (or highlight it as the cause).

I would also suggest restting the winsock and TCP/IP stack...
  1. Open an elevated cokmand prompt
  2. Enter the command: netsh winsock reset
  3. Enter the command: netsh int ip reset
  4. Reboot
See whether that helps at all. Do please try and upload more kernel dumps just in case this one is an outlier.
Wow thanks so much for all the help and the detailed discription. Doing that hasn't seemed to do anything. It's still crashing every few minutes. Although when I work on it in SafeMode I don't believe it has crashed once. I normally use a wired connection for internet but even disabling the ethernet adapter and switching to wifi didn't help. The last crash didn't produce a memory.dmp. The second to last crash created a 67gb dump file!! I'm NOT going to attempt uploading that one! But I've included one other one I have and I also went ahead and put the minidump from the crash that had the huge full dump file as well just in case. I'll keep experimenting and upload other dumps as they come.
 

ubuysa

Distinguished
Make sure the dump type is set to 'Automatic memory dump' and ensure that the 'Overwrite existing dump' box IS checked.

That latest mindump is the same bugcheck as before and which requires a kernel dump to analyse. There is one other test you could usefully do however, and that's to run Driver Verifier...

Driver Verifier subjects selected drivers (typically all third-party drivers) to extra tests and checks every time they are called. These extra checks are designed to uncover drivers that are misbehaving. If any selected driver fails any of the Driver Verifier tests/checks then Driver Verifier will BSOD. The resulting minidump should contain enough information for us to identify the flaky driver. It's thus essential to keep all minidumps created whilst Driver Verifier is enabled.

To enable Driver Verifier do the following:

1. Take a System Restore point and/or take a disk image of your system drive (with Acronis, Macrium Reflect, or similar). It is possible that Driver Verifier may BSOD a driver during the boot process (some drivers are loaded during boot). If that happens you'll be stuck in a boot-BSOD loop.

If you should end up in a boot-BSOD loop, boot the Windows installation media and use that to run system restore and restore to the restore point you took to remove Driver Verifier and get you booting again. Alternatively you can use the Acronis, Macrium Reflect, or similar, boot media to restore the disk image you took.

Please don't skip this step. it's the only way out of a Driver Verifier boot-BSOD loop.

2. Start the Driver Verifier setup dialog by entering the command verifier in either the Run command box or in a command prompt.

3. On that initial dialog, click the radio button for 'Create custom settings (for code developers)' - the second option - and click the Next button.

4. On the second dialog check (click) the checkboxes for the following tests...
  • Special Pool
  • Force IRQL checking
  • Pool Tracking
  • Deadlock Detection
  • Security Checks
  • Miscellaneous Checks
  • Power framework delay fuzzing
  • DDI compliance checking
Then click the Next button.

5. On the next dialog click the radio button for 'Select driver names from a list' - the last option - and click the Next button.

6. On the next dialog click on the 'Provider' heading, this will sort the drivers on this column (it makes it easier to isolate Microsoft drivers).

7. Now check (click) ALL drivers that DO NOT have Microsoft as the provider (ie. check all third-party drivers).

8. Then, on the same dialog, check the following Microsoft drivers (and ONLY these Microsoft drivers)...
  • Wdf01000.sys
  • ndis.sys
  • fltMgr.sys
  • Storport.sys
These are high-level Microsoft drivers that manage lower-level third-party drivers that we otherwise wouldn't be able to trap. That's why they're included.

9. Now click Finish and then reboot. Driver Verifiier will be enabled.

Be aware that Driver Verifier will remain enabled across all reboots and shutdowns. It can only be disabled manually.

Also be aware that we expect BSODs. Indeed, we want BSODs, to be able to identify the flaky driver(s). You MUST keep all minidumps created whilst Driver Verifier is running, so disable any disk cleanup tools you may have.

10. Leave Driver Verifier running until you have between 5 and 10 BSODs/dumps, or for 48 hours. Use your PC as normal during this time, but do try and make it BSOD. Use every game or app that you normally use, and especially those where you have seen it BSOD in the past. If Windows doesn't automatically reboot after each BSOD then just reboot as normal and continue testing.

11. To turn Driver Verifier off, enter the command verifier /reset in either Run command box or a command prompt and reboot.

Should you wish to check whether Driver Verfier is enabled or not, open a command prompt and enter the command verifier /query. If drivers are listed then it's enabled, if no drivers are listed then it's not.

12. When Driver Verifier has been disabled, navigate to the folder C:\Windows\Minidump and locate all .dmp files in there that are related to the period when Driver Verifier was running (check the timestamps). Zip these files up if you like, or not as you choose. Upload the file(s) to the cloud with a link to it/them here (be sure to make it public).
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
Ok have now made sure it said automatic memory dump, created a restore point, created a USB Windows installation media, turned on Driver Verifier and am now waiting. Interestingly enough I have not had a BSOD so far but the system does pause ever so often and things are running a bit slower.
I have noticed one thing that seems to happen is if I’ve left the system off for a while and turn it on again it seems to take at least a little while to crash but as soon as it has done it once it then starts doing it every few minutes.
I‘m unsure if I have any disk cleanup tools running is there something specific I should look for?
 

ubuysa

Distinguished
If you don't know of an disk cleanup tools you probably aren't running any. The Windows disk cleanup tool, Ccleaner, and a host of other tools can clean up junk files. All these tools typically view dumps as junk files!

You will see system pauses now and again as Driver Verifier check drivers.

That is takes a time to crash and then crashes often sounds as though it may be a hardware device that's ok whilst cool but fails once it's warmed up?
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
Yeah I had thought of that too but if that was the case I would think that most of my crashes would happen when I'm doing something resource intense like gaming. But I believe I've only had one of two crashes when gaming and all other crashes when I've either been watching a youtube video, surfing the net or troubleshooting these issues.
Here is something SUPER WEIRD though, since having Driver Verifier ON I have had ZERO CRASHES!! I left my system on overnight and it did not crash. I've played games and watched videos and still no crash! WTF?! I have checked and Driver Verifier is on and I am still getting the pauses while it does it's thing.
 

ubuysa

Distinguished
That's not the first time I've come across this curiosity, it tends to suggest a hardware problem because the drivers are effectively running slower due to Driver Verfier testing. Given how your system crashes the graphics card has to be high on the list of suspects.

I would like you to leave Driver Verifier enabled for a couple more days if you can stand it, we need to be 100% certain that every driver is loaded at some point so that Driver Verifier can check it, so be sure to use every app, feature, and hardware device that you have..
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
That's not the first time I've come across this curiosity, it tends to suggest a hardware problem because the drivers are effectively running slower due to Driver Verfier testing. Given how your system crashes the graphics card has to be high on the list of suspects.

I would like you to leave Driver Verifier enabled for a couple more days if you can stand it, we need to be 100% certain that every driver is loaded at some point so that Driver Verifier can check it, so be sure to use every app, feature, and hardware device that you have..
I will leave it running. It has been nice just being able to use my system for more than a few minutes at a time even with the pauses. There are 2 other things that I have remembered noticing. During the boot process at some point the screen will flash purple just once then continue. During the crashes the BSOD will sometimes look weird and have a lot of wavy colored lines on the bottom half of the screen. Those two things make me think that it may indeed be graphics related. I did see that another graphics driver was released just a few days ago. I have yet to attempt to install it. Should I go ahead and do that and should I leave Driver Verifier on while installing it??
 

ubuysa

Distinguished
Once you've established via Driver Verifier that it's not a third-party driver, then pop the graphics card out and back in again - firmly. Also ensure that any additional power connector is fully home - at both ends.

From what you describe this does sound graphics card related.
 

Twistfaria

Distinguished
Feb 3, 2016
180
8
18,715
I have some NEW information. I wanted to be able to check the dump files myself so I downloaded something called WhoCrashed. It read all the dump files it could find and said that it didn't point to anything conclusive but "A full memory dump will likely provide more useful information on the cause of this particular bugcheck." Then I remembered that huge 67GB dump file that thankfully I had moved so it didn't get overwritten. I had it check that dump and it pointed to the amdppm.sys. driver. I looked that up and it is a Microsoft driver for the CPU that manages the power states. I was following this guide of things to try https://windowsreport.com/amdppm-sys/
I did the first regedit step. I'm pretty sure I don't have the CPU overclocked so I skipped that step. I already did the BIOS reset when I updated it several days ago so I skipped that too.
Then when I got to the step where you are supposed to "reregister amdppm.sys" in an elivated comand prompt I get this error:
"The module "AmdPPM.sys" failed to load. Make sure the bianary is stored at the specified path or debug it to check for problems with the bianary or dependant .DLL files. The specified module could not be found."

At this point I left the system on while I went to feed my cats. When I came back it had frozen but it was a very different kind of freeze. The screen was still on the desktop with my CPU temp reading visible at 38C. Holding the power buton down did NOTHING I had to turn off the PSU to get the PC to turn off! No dump file was created either. Upon reboot I still left it as is because I want to check if I get more crashes. So far it hasn't crashed again yet.

This leads me to belive that it is NOT the graphics card causing the issue. What I don't know is how to proceed now! Why wasn't I able to reregister amdppm.sys? Could it be corrupted?

I temporarily turned Driver Verifier off while testing this! I'm assuming that since the driver that is (probably) causing this is a not one of the Microsoft drivers that you specified that that is why the system isn't crashing while DV is running...?

What should my next step be?

Edit: it’s been about 6 hours now and it still has not crashed again. That’s by far the longest it’s stayed on without DV enabled in over a week now!

Edit2: now been another 10+ hours or so and still no crashes.

Edit3: The system has been on now since that first reboot with no crashes over 24 hours. Because WhoCrashed said it "could" be a thermal issue I ran a benchmark in the game Shadow of the Tomb Raider to check temps. The CPU got maybe to 54C and the GPU averaged about 65C. Both temps well within operating range.
 
Last edited: