Question BSOD - DPC Watchdog Violation - Unable to create dump

Apr 9, 2024
10
0
10
I built this system around Jan 2022 and these BSOD's have been happening at random since day 1. BSOD's happen at completely random times. Could be under heavy load or sitting at desktop in more or less Idle. BSOD is always a DPC Watchdog Violation. It always hangs at 0% and waits for me to do the reboot. System creates no dump files no matter what I have the system set to in those regards.

SPEC
- Gigabyte X570 Aorus Elite Wifi
- Ryzen 9 5900x
- Noctua NH-D15 Cooler
- 970 EVO Plus NVMe M.2 SSD, 1TB (all other drives removed during troubleshooting - made no difference)
- 2x Crucial Ballistix BL16G36C16U4R 16GB 3600MHz DDR4 CL16
- Sapphire Nitro 7900 xtx (just installed 2 days ago - Same BSOD since install as on previous GTX 1080)
- Corsair HX 1000 PSU

This machine was a mix of old system and new as it is the only way I can absorb the cost of upgrades.
Motherboard and CPU came as a bundle (purchased new for this build)
memory (purchased new for this build)
cpu cooler (purchased new for this build)
m.2 drive (purchased new for this build)
GTX 1080 (old system)
Many other drives (old system)
Hardware Replaced During troubleshooting:
Old PSU Corsair 650 was recently replaced with Corsaid HX 1000
GTX 1080 was just replaced with 7900 xtx

Steps Taken,

set bios to factory settings with no overclock of any kind
windows fully updated
Performed system file checker scan now command on cmd. Did find corrupted files and corrected but still having BSoD after.
bios check of all hardware and updated where applicable
memory sticks swapped with each other
All drives except boot drive removed
memory tested with free version of memtest86 - ran multiple times to stress it as much as I could, no issues found
tested boot drive, no issues found
tested CPU with Cinebech and several other tools furmark and several others
GPU tested extensively, no issues found
complete re-install of windows
used driver easy to identify out of date drivers system wide and update to current versions
updated system drivers with all the most current stuff Gigabyte had for me to download

I have a gut feeling this is Realtek related, but have obviously not been able to fix this.

For years I have seen countless people come here looking for help and often walking away with problems solved. I figured at least one of those many people with seemingly the same problem, at least one of their solutions MUST be the one to fix me up.

I now come before you all, hat in hand begging for your help. Whatever you need me to do I will try to do; Whatever questions you ask I will try to the best of my ability to answer.

Smart people of Tom's Hardware, you are my only hope.
 

Ralston18

Titan
Moderator
Look for error codes, warnings, and even informational events that may be being captured just before or at the time of the BSODs.

Two tools to start with: Reliability History/Monitor and Event Viewer.

Reliability History/Monitor is end user friendly and provides a time line format that may reveal some pattern.

Event Viewer requires more time and effort to navigate and understand.

To help with Event Viewer:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)

Just take your time to explore and get a sense of both tools. No need to rush or jump to any immediate conclusions.

Focus on the errors being captured and note the error codes.

One immediate question: when the PSU was replaced were only the cables that came with the replacement Corsair HX 1000 used?
 
Apr 9, 2024
10
0
10
One immediate question: when the PSU was replaced were only the cables that came with the replacement Corsair HX 1000 used?

All cables in use with the HX 1000 came with the HX 1000. I don't ever mix my PSU cables.

I have some time off Thursday so I will give Reliability a go and see what language it translates my errors into :) Might get messy as Event viewer is showing that I had over 400 errors in the last 24 hours alone.
 

ubuysa

Distinguished
Firstly, we need memory dumps to be able to pin this down with any certainty. To write dumps all of the following MUST be true...
  • The page file must be on the same drive as your operating system
  • Set page file to "system managed"
  • Set system crash/recovery options to "Automatic memory dump"
  • Windows Error Reporting (WER) system service should be set to MANUAL
  • User account control must be running
In addition, the following can also prevent you writing dumps...
  • SSD drives with older firmware do not create dumps (update the drive firmware)
  • Cleaner applications like Ccleaner delete dump files, so don't run them until you are fixed
  • Bad RAM may prevent the data from being saved and written to a file on reboot, so if all else fails test your RAM

Secondly, the 0x133 bugcheck (DPC_WATCHDOG_VIOLATION) is driver or device related. It happens when either the ISR or DPC code (both of which are in the device driver) runs for longer than allowed. This is either due to a bad driver - so check that all device drivers are up to date - or because the device itself is bad - pop all PCIe cards out and re-seat them properly. Also disconnect all external devices (except mouse, keyboard, and monitor) and see whether it still BSODs.
 
Apr 9, 2024
10
0
10
Firstly, we need memory dumps to be able to pin this down with any certainty. To write dumps all of the following MUST be true...
  • The page file must be on the same drive as your operating system
  • Set page file to "system managed"
  • Set system crash/recovery options to "Automatic memory dump"
  • Windows Error Reporting (WER) system service should be set to MANUAL
  • User account control must be running
In addition, the following can also prevent you writing dumps...
  • SSD drives with older firmware do not create dumps (update the drive firmware)
  • Cleaner applications like Ccleaner delete dump files, so don't run them until you are fixed
  • Bad RAM may prevent the data from being saved and written to a file on reboot, so if all else fails test your RAM

Secondly, the 0x133 bugcheck (DPC_WATCHDOG_VIOLATION) is driver or device related. It happens when either the ISR or DPC code (both of which are in the device driver) runs for longer than allowed. This is either due to a bad driver - so check that all device drivers are up to date - or because the device itself is bad - pop all PCIe cards out and re-seat them properly. Also disconnect all external devices (except mouse, keyboard, and monitor) and see whether it still BSODs.
- Discovered windows error reporting service was set to Manual, but was not running. It now is.
- User account control is set to it's default (1 notch below always notify ( is this set correctly ? )
 
Apr 9, 2024
10
0
10
Seems I have lost the ability to just add this post as an edit to my last.

Waiting for my next BSOD at this time so we can move forward. Can take as little as a few minutes, or up to a month for it to happen, so let the waiting game begin.

Sorry if double posting is a violation of the rules here, I just didn't want anyone to think I had just vanished after getting the help I received thus far.
 
Apr 9, 2024
10
0
10
Good to know.

Update: Just got a BSOD DPC Watchdog...

Sadly, it seems turning the error reporting service on did not let windows create a dump. Just sat there on the blue screen at 0%

So... where do I go from here?
 

ubuysa

Distinguished
Is there a file called C:\Windows\Memory.dmo with a recent timestamp? Is so then upload it to a cloud service with a link to it here.

If that file doesn't exist then you really aren't writing dumps. If you checked all of the items in my post #4 then I would start to suspect RAM and/or your system drive...

Use Samsung Magician to test your system drive

Test your RAM by doing the following....
  1. Download Memtest86 (free), use the imageUSB.exe tool extracted from the download to make a bootable USB drive containing Memtest86 (1GB is plenty big enough). Do this on a different PC if you can, because you can't fully trust yours at the moment.
  2. Then boot that USB drive on your PC, Memtest86 will start running as soon as it boots.
  3. If no errors have been found after the four iterations of the 13 different tests that the free version does, then restart Memtest86 and do another four iterations. Even a single bit error is a failure.
 
Apr 9, 2024
10
0
10
Is there a file called C:\Windows\Memory.dmo with a recent timestamp? Is so then upload it to a cloud service with a link to it here.

If that file doesn't exist then you really aren't writing dumps. If you checked all of the items in my post #4 then I would start to suspect RAM and/or your system drive...

Use Samsung Magician to test your system drive

Test your RAM by doing the following....
  1. Download Memtest86 (free), use the imageUSB.exe tool extracted from the download to make a bootable USB drive containing Memtest86 (1GB is plenty big enough). Do this on a different PC if you can, because you can't fully trust yours at the moment.
  2. Then boot that USB drive on your PC, Memtest86 will start running as soon as it boots.
  3. If no errors have been found after the four iterations of the 13 different tests that the free version does, then restart Memtest86 and do another four iterations. Even a single bit error is a failure.
Windows has not created any Memory.dmp files ever, so sadly there is nothing to upload.
Samsung Magician - Diagnostic Scan - Full Scan = No Errors

Will test the memory when I get home from work tonight. Unfortunately there is no other PC for me to test this memory on.
 
Apr 9, 2024
10
0
10
Could a UPS be causing all this? I am running an APC Back-UPS ES 750.

Just found the datasheet for it

Main Input Voltage 120 V
Main Output Voltage 120 V
Kw Rating 450 W
Rated Power In Va 750 VA
 

ubuysa

Distinguished
Well something is very wrong if you're not writing dumps, but lets look at some other troubleshooting data. Can you please download and run the SysnativeBSODCollectionApp and upload the resulting zip file to a cloud service with a link to it here. The SysnativeBSODCollectionApp collects all the troubleshooting data we're likely to need. It DOES NOT collect any personally identifying data. It's used by several highly respected Windows help forums (including this one). I'm a senior BSOD analyst on the Sysnative forum where this tool came from, so I know it to be safe.

You can of course look at what's in the zip file before you upload it, most of the files are txt files. Please don't change or delete anything though. If you want a description of what each file contains you'll find that here.
 

ubuysa

Distinguished
Sadly the logs in that upload are only for two days. It's never wise to clear logs, they contain a lot of useful history. However, reading between the lines I now think you should look more closely at your system drive; the Samsung 970 Pro NVMe drive. A problem with this drive would aso explain why no dumps are being written.

We can see only one crash in these logs but the 'dump not written' error identifies HarddiskVolume2 as the problem drive. Dumps are written to the paging file initially and so this must be your system drive (I've already checked that your paging file is on your system drive)...
Code:
Log Name:      System
Source:        volmgr
Date:          26/04/2024 08:08:00
Event ID:      161
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      DESKTOP-911USV4
Description:
Dump file creation failed due to error during dump creation.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="volmgr" />
    <EventID Qualifiers="49156">161</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2024-04-26T05:08:00.3311831Z" />
    <EventRecordID>85398</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="240" />
    <Channel>System</Channel>
    <Computer>DESKTOP-911USV4</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\HarddiskVolume2</Data>
    <Binary>000000000100000000000000A10004C042000400010000C000000000000000000000000000000000</Binary>
  </EventData>
</Event>
In addition, just prior to the crash, there is an informational message indicating a potential, though not serious, issue with the system drive...
Code:
Log Name:      System
Source:        secnvme
Date:          26/04/2024 08:07:58
Event ID:      11
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      DESKTOP-911USV4
Description:
The description for Event ID 11 from source secnvme cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\RaidPort2
144d
Samsung SSD 970 EVO Plus 1TB

The message resource is present but the message was not found in the message table

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="secnvme" />
    <EventID Qualifiers="16385">11</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2024-04-26T05:07:58.4341807Z" />
    <EventRecordID>85394</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="312" />
    <Channel>System</Channel>
    <Computer>DESKTOP-911USV4</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\RaidPort2</Data>
    <Data>144d</Data>
    <Data>Samsung SSD 970 EVO Plus 1TB</Data>
    <Binary>0F00200003004800000000000B00014000000000000000000000000000000000000000000000000001000000200000000B0001400000000000000000000000000000000000000000</Binary>
  </EventData>
</Event>
The 0x144d mentioned in there is the VEN identifier for Samsung and the 'provider name' of ssecnvme is the Windows secnvme.sys NVMe driver, so this is definitely a system drive issue because that's your only NVMe drive. I would download Samsung Magician and display the SMART data for that drive. Also run the maximal diagnostic you can, and check for firmware updates for the drive.

In addition to the above, a scan of your drivers shows that you have Avast! installed. Without a dump there's no way to know whether Avast! is involved, but these BSODs could very well be cause by Avast!. All third-party anti-malware products caused BSODs often across a range of PCs. As a test, even temporarily, please uninstall Avast! using the specialised tool at
https://www.avast.com/uninstall-utility#pc.
 
Apr 9, 2024
10
0
10
Steps continue as per your suggestions.

- Avast! - Uninstalled using the uninstall app at the link you provided
- Re-ran all tests including S.M.A.R.T. from within Samsung Magician - all came back clean - S.M.A.R.T. logs provided at the link below

https://www.dropbox.com/scl/fi/fr2h...ey=7gliqhdqsy5ojsfjhbti636el&st=twkuhpwl&dl=0

Had a BSOD last night while I slept and my PC was doing nothing. Still stuck at 0% when I awoke to find it.

- Re-ran SysnativeBSODCollectionApp - it would not complete after a 30 minute run.
- Re-booted system and attempted SysnativeBSODCollectionApp again - completed this time, link below to it's latest results.

https://www.dropbox.com/scl/fi/t14h...ey=xrw9s2f7fiyca6lf46mkdu8or&st=fdwd1a98&dl=0