Question DPC_WATCHDOG_VIOLATION BSOD & Event ID 153, 14, 0 nvlddmkm.sys crashes ?

Jul 31, 2025
12
0
10
Hello! I have pretty much been going through this hell of an issue for almost a month now. I got a new build around a month ago. For the first 2-3 weeks, everything was working flawlessly: gaming at ultra settings, no stutter, no lag, and most importantly, no crashes. Just after that, one day, I got a BSOD about DPC_WATCHDOG_VIOLATION around 5 minutes into a game. Fast forward to this day, I can't run any game stable for less than 10 minutes. The same thing applies when I try to export a video. In Event Viewer I get errors about nvlddmkm, especially Event ID 0, 14 and 153, which I will go in more detail as this thread continues.

The system specs are as listed below:

CPU: Intel Core i9-14900k
GPU: ZOTAC RTX 4090 AMP Extreme Airo
RAM: 32 GB DDR5-4800, Kingston KF552C40-32 + 32 GB DDR5-4800, Kingston KF552C40-32 (64 GB RAM in total)
SSD: CT2000P5PSSD8, 1863.02 GB
All-in-One Cooler for the CPU
PSU: bequiet! Pure Power 12M 1000W
Motherboard: MSI PRO B760M-P
OS: Windows 11, 24H2
Monitor: Dell S2419HGF, 1920x1080, 144Hz

I have noticed the same issue being reported many times, especially for 4090. Here is what I have done so far:

  1. Enabled & Disabled XMP in BIOS (issue persists) ;
  2. Disabled & Enabled Hyperthreading, Turbo Settings regarding CPU (issue persists, no difference) ;
  3. Used DDU to uninstall the driver in safe boot mode and installed an older version [537.58] (This did make games more stable but only a few, tested some other games, they would still crash with Event ID 0) ;
  4. Changed permissions for nvlddmkm.sys file to Full Control for Users (issue remain) ;
  5. Turned on Debug Mode in Nvidia Control Panel
  6. Switched to "Prefer Maximum Performance" in Nvidia Control Panel (no difference) ;
  7. Disabled Hardware Accelerated GPU Scheduling in Windows settings (This caused no BSODs but the crashes remain)
  8. Used MSI Afterburner to undervolt the GPU core & memory clocks for around -52 MHz (no difference) ;
  9. Changed PCIe Gen Mode in BIOS to both 4.0 and 3.0 (no difference) ;
  10. Uninstalled programs like G-Hub and Wallpaper Engine, switched HAGS off for the other programs that supported it (no difference);
  11. Disabled Integrated GPU in Device Manager (issue still persists);
  12. Uninstalled NVIDIA HD Audio in Device Manager (yet again no difference);
  13. Disabled High Precision Event Timer (others said it was the only workaround it, no difference whatsoever);
  14. Ran OCCT tests for every single component, even at extreme. What crashed into the BSOD was VRAM. Could this indicate a hardware issue?
I tried to DDU even the latest drivers, it did not change anything. I've also seen reports of 566.36 being the most stable driver for 4090 but that also did not change anything. As for the errors in Event Viewer, I get these 3-4 specific errors from source nvlddmkm:



The description for Event ID 153 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
Error occurred on GPUID: 100

The message resource is present but the message was not found in the message table.



The description for Event ID 14 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
badfbadf(badfbadf) 00000000 00000000

The message resource is present but the message was not found in the message table



The description for Event ID 0 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
Error occurred on GPUID: 100



As for the BSODS, the bugcheck error in Event Viewer is:

The computer has rebooted from a bugcheck. The bugcheck was: 0x00000133 (0x0000000000000001, 0x0000000000001e00, 0xfffff80784dc43b0, 0x0000000000000000).



Something to notice is that I would get Event ID 153 error on the latest drivers only, but either way, I have been pulling my hair out trying to find any solution available. If anyone has any idea or has been having the same problem, I'd really appreciate any help! Thank you so much for your time!
 
RAM: 32 GB DDR5-4800, Kingston KF552C40-32 + 32 GB DDR5-4800, Kingston KF552C40-32 (64 GB RAM in total)
You mixed ram? Remove the stick that isn't part of the kit.
If you desire 64GBs, buy a kit of 64GBs and consider selling off the old kits to recoup some of the cost.

CPU: Intel Core i9-14900k
Motherboard: MSI PRO B760M-P
That's a pretty bare-looking board to pair with a 14900K, and that issues happen within 10 minutes makes me believe heat is one of the issues.
The i9 should be paired with a motherboard that at least has heatsinks over the VRM area, or else the cpu won't run at its best all the time.

Hello! I have pretty much been going through this hell of an issue for almost a month now. I got a new build around a month ago. For the first 2-3 weeks, everything was working flawlessly...
Ambient temperatures have gone up since then, right?
The AC is running?
 
  • Like
Reactions: MEMOFLEX
You mixed ram? Remove the stick that isn't part of the kit.
If you desire 64GBs, buy a kit of 64GBs and consider selling off the old kits to recoup some of the cost.


That's a pretty bare-looking board to pair with a 14900K, and that issues happen within 10 minutes makes me believe heat is one of the issues.
The i9 should be paired with a motherboard that at least has heatsinks over the VRM area, or else the cpu won't run at its best all the time.


Ambient temperatures have gone up since then, right?
The AC is running?
Both of the RAM sticks are the exact same model, and they both came with a 64 GB kit (2x32 GB). I am gonna try to remove one stick and test the issue, then report back.

As for the CPU and Motherboard, MSI claims that PRO B760M-P is made to support 14th/13th and 12th generation of Intel but I am not sure if it's the perfect match for i9 though. What motherboard would be a balanced match for the i9? I do have a All-in-One Liquid Cooler for the CPU and the temps never usually go extreme (around 55-70 degrees under load).

And as for the ambient temperatures, I keep my room pretty cooled so I think we can keep that ruled out.

I tried to run heavy benchmark tests using OCCT for every single component. CPU was perfectly fine. RAM, perfectly fine. PSU, no issues. GPU had also no issues. What made the system go into a BSOD (DPC_WATCHDOG_VIOLATION specifically) was VRAM. In less than 40 seconds, the system crashed when I ran a benchmark test for VRAM.

I am starting to believe that this is rather a GPU hardware problem, a Motherboard problem or a problem related to the connection between GPU and PSU but I think that is pretty unlikely in this situation.

Either way, thank you for your reply!
 
Both of the RAM sticks are the exact same model, and they both came with a 64 GB kit (2x32 GB). I am gonna try to remove one stick and test the issue, then report back.
AH. With the way the specs were typed out, I thought the sticks were packaged separately. My bad!

As for the CPU and Motherboard, MSI claims that PRO B760M-P is made to support 14th/13th and 12th generation of Intel but I am not sure if it's the perfect match for i9 though. What motherboard would be a balanced match for the i9? I do have a All-in-One Liquid Cooler for the CPU and the temps never usually go extreme (around 55-70 degrees under load).
Empty words from MSI; they, and the other vendors are not going to flat out tell the truth. The associate probably just looked at the cpu support list and said, "Yep, it's there!". You have to find out the hard way when it's in the build and running:
Old article, but it's relative here.

12900-stock-s.png


12900-nolimit-s.png


-Some boards throttle according to manufacturer spec.
-Some boards have their limits preset higher than they can reasonably handle.
-Some boards have their limits preset lower than default, because they know the thing can't handle more than an i3/i5.


You can monitor functions, such as:
-core clocks. By default, the maximum frequency is supposed to scale up/down depending on how many cores are active. Fewer cores, higher maximum. Many/all cores, lower maximum.
-power limits. 125w base, 253w turbo boost for the 14900K.

If the motherboard is thermal/power throttling, it can look like something's wrong with the cpu, even if there isn't.
 
AH. With the way the specs were typed out, I thought the sticks were packaged separately. My bad!


Empty words from MSI; they, and the other vendors are not going to flat out tell the truth. The associate probably just looked at the cpu support list and said, "Yep, it's there!". You have to find out the hard way when it's in the build and running:
Old article, but it's relative here.

12900-stock-s.png


12900-nolimit-s.png


-Some boards throttle according to manufacturer spec.
-Some boards have their limits preset higher than they can reasonably handle.
-Some boards have their limits preset lower than default, because they know the thing can't handle more than an i3/i5.


You can monitor functions, such as:
-core clocks. By default, the maximum frequency is supposed to scale up/down depending on how many cores are active. Fewer cores, higher maximum. Many/all cores, lower maximum.
-power limits. 125w base, 253w turbo boost for the 14900K.

If the motherboard is thermal/power throttling, it can look like something's wrong with the cpu, even if there isn't.

Thank you for the reply!

Yeah I agree with you; MSI and all the other vendors don't ever wanna tell the truth. Honestly, seeing these issues and how much bloatware MSI installs on it's own makes me swap this motherboard (I reinstalled Windows, and MSI "driver utility" which is enabled by default in BIOS, updates the drivers but gives you a bunch of bloatware).

As for the CPU, last time when I tried to use the default Intel settings in BIOS for the cores, I noticed that it was capped at 153W. I tried to set the limit to 253W and that only made Windows crash everytime it would load the desktop in. Forgive me for my incompentence, I am not very good when it comes to tweaking these kind of things. I tried to set a 53x limit for the P-Cores in BIOS (Default was 57x) but there was still crashes. Last time I ran a physics benchmark test that somebody recommended me, I found out that the CPU performance climbed up very fast instead of steady.

These are default BIOS settings: https://drive.google.com/drive/folders/1mGRUuOGx7K3TaQFDNm0ngzGq6VCRAhZf?usp=sharing
Do you perhaps have any sort of guide on how to properly tweak the CPU in BIOS? I'd honestly really appreciate it!
 
Last edited:
How high do gpu core, hot spot, and memory junction temperatures get?
Have you tried completely uninstalling/reinstalling the GPU?

As for the CPU, last time when I tried to use the default Intel settings in BIOS for the cores, I noticed that it was capped at 153W. I tried to set the limit to 253W and that only made Windows crash everytime it would load the desktop in. Forgive me for my incompentence, I am not very good when it comes to tweaking these kind of things. I tried to set a 53x limit for the P-Cores in BIOS (Default was 57x) but there was still crashes. Last time I ran a physics benchmark test that somebody recommended me, I found out that the CPU performance climbed up very fast instead of steady.

These are default BIOS settings: https://drive.google.com/drive/folders/1mGRUuOGx7K3TaQFDNm0ngzGq6VCRAhZf?usp=sharing
Do you perhaps have any sort of guide on how to properly tweak the CPU in BIOS? I'd honestly really appreciate it!
With a turbo power limit that low, it's very likely that the board throttles and forces the cpu to drop in frequency too... shouldn't crash though; might just get a little choppy, or seem sluggish at times.

Unfortunately, I don't have a guide, but instead of changing core ratio, try a negative core voltage offset(you can see it near the bottom of your 3rd snapshot), or do a search on undervolt guides. Core voltage offset mode to (-), and core voltage offset to 0.05v.

Disabled Hardware Accelerated GPU Scheduling in Windows settings (This caused no BSODs but the crashes remain)
What kind of crashes, exactly? Crash to desktop? Screen just goes black, or screen goes black, but can still hear sounds? Screen just freezes?
 
How high do gpu core, hot spot, and memory junction temperatures get?
Have you tried completely uninstalling/reinstalling the GPU?


With a turbo power limit that low, it's very likely that the board throttles and forces the cpu to drop in frequency too... shouldn't crash though; might just get a little choppy, or seem sluggish at times.

Unfortunately, I don't have a guide, but instead of changing core ratio, try a negative core voltage offset(you can see it near the bottom of your 3rd snapshot), or do a search on undervolt guides. Core voltage offset mode to (-), and core voltage offset to 0.05v.


What kind of crashes, exactly? Crash to desktop? Screen just goes black, or screen goes black, but can still hear sounds? Screen just freezes?
Thank you once more for the reply!

I ran an extreme stability test on the GPU to see its temperatures, and I found out that the GPU core temperature sits from 70°C to 75°C, the memory junction is around 69°C to 72°C, and the hotspot varies from 80°C to 87°C.

I can't be certainly sure if this is a whole motherboard and CPU mismatch issue, but I do just hope that it's not an issue related to the GPU (even though the chances of it being a GPU problem are the highest here 😭).

Basically before I re-formatted Windows 11, when I tried to run games (e.g., Cyberpunk 2077, Forza Horizon 5, etc.), they would run for less than 5 minutes at best, and when I tried to have Hardware Accelerated GPU Scheduling enabled, they would give a DPC_WATCHDOG_VIOLATION Blue Screen of Death. Disabling HAGS did not remove the BSODs completely, but it did reduce their number. Without HAGS, again, I would be around for less than 5 minutes in games, and the system would freeze, audio could be heard in the background, the screen would go black after 10 seconds of freezing, and I would get redirected back to my desktop. Checking what caused the issue in Event Viewer, I found out about errors regarding nvlddmkm.sys, specifically Event ID 153 & 14 (with the newer drivers) and Event ID 0 (with the older drivers).

When I ran into BSODs, Event Viewer would give this description:



The computer has rebooted from a bugcheck. The bugcheck was: 0x00000133 (0x0000000000000001, 0x0000000000001e00, 0xfffff804bbfc43b0, 0x0000000000000000).



From the research I've done, the code x133 means DPC_WATCHDOG_VIOLATION, which indicates a problem with the GPU & its drivers (at least in this case).

While the errors about Event ID 153 & 14 regarding nvlddmkm.sys would give this description:



The following information was included with the event:

\Device\Video3
Reset TDR occurred on GPUID:100



The following information was included with the event:

\Device\Video3
0228349c 00000000 00000000 2028f6d8 2028f4e2 2026f9ce 202900cc 2026bafa



I tried to power the PC off completely and reseat the GPU back in its PCIe port. The issues still persist. I tried to reinstall drivers of all versions to see which would be more stable; all would crash, but the most stable one would be 537.58. I am honestly so worn out because it only had been less than a month since I got this PC brand new and seeing it run into such errors in that much time makes you want to question your choices. 🥲

Thank you for the time, though; I really appreciate all your effort!
 
Yeah, sorry I've not been of much help.

I ran an extreme stability test on the GPU to see its temperatures, and I found out that the GPU core temperature sits from 70°C to 75°C, the memory junction is around 69°C to 72°C, and the hotspot varies from 80°C to 87°C.
Those are all very good.

I can't be certainly sure if this is a whole motherboard and CPU mismatch issue, but I do just hope that it's not an issue related to the GPU (even though the chances of it being a GPU problem are the highest here 😭).
Well, there might be multiple things happening, that would be a headache to try and resolve.

When I ran into BSODs, Event Viewer would give this description:
[things]
First one is memory. All the ones after checking other threads - I can't tell what the hell's going on, what with the different individual solutions(or never resolved ones).
 
Yeah, sorry I've not been of much help.


Those are all very good.


Well, there might be multiple things happening, that would be a headache to try and resolve.


First one is memory. All the ones after checking other threads - I can't tell what the hell's going on, what with the different individual solutions(or never resolved ones).
I really appreciate you for taking your time; don't stress it!

I plan to take the PC to the technician team from whom I bought everything and see what's actually going on. As you said, it's a complete headache to try and resolve everything; just when you think you’ve identified the problem, it turns out to be something entirely different. 😭

Before handing it in, I’ll try a few last things. During my previous inspection inside the case, I noticed that the LED controller cables were tightly packed near the PSU, possibly interfering with the 12VHPWR connection. I now suspect there might be a defect or bad fit in the 12-pin connector between the PSU and GPU, but who knows...
 
The issue may just be CPU pins out of alignment with too much pressure on a corner or side of the CPU in the socket. Seems to be a common issue with LGA 1700 systems when RAM issues arise.
Sorry for my incompetence; when you mention too much pressure, do you mean pressure directly from the socket or pressure from the all-in-one liquid cooler block?
 
Sorry for my incompetence; when you mention too much pressure, do you mean pressure directly from the socket or pressure from the all-in-one liquid cooler block?
If the CPU latching system or if a contact frame is used (replaces the LGA latch holding the CPU in the socket), sometimes the pressure applied is not even across the CPU when installed in the socket. This can sometimes cause some CPU pins in the LGA socket to become misaligned and can cause issues with ram that people might diagnose as faulty ram modules, when it could be pins related to a memory channel not making proper contact with the contact pads on the bottom of the CPU. You can also get this issue with CPU coolers applying too much pressure on one side or corner of the CPU.
 
UPDATE: I don't know why but when I tweaked the virtual memory paging file for more space then disabled C-State for CPU and choosing CPU Lite Load Mode 15 in BIOS seems to make the BSODs disappear; however, the nvlddmkm.sys crashes remain but take a bit longer to appear.
 
UPDATE: I don't know why but when I tweaked the virtual memory paging file for more space then disabled C-State for CPU and choosing CPU Lite Load Mode 15 in BIOS seems to make the BSODs disappear; however, the nvlddmkm.sys crashes remain but take a bit longer to appear.
Windows is using part of the your SSD as virtual ram when you use page file. You're not fixing the problem, you're only delaying it. At this point, I would suggest you stop using the PC before you corrupt your Windows installation, because you clearly have a system ram issue.

You could try fixing it your self by removing each ram module one at a time and seeing if you still have the crashing. Which ever module does not cause crashing, swap it to the slot of the one that does and see if it still crashes. That will help diagnose if it's a faulty module issue or possibly a CPU pin issue, but if it is a CPU pin issue, it may require fully removing the CPU cooler to fix it. If you're not familiar with that process, just have the people that built the system do it for you.
 
Last edited:
Hello! I have pretty much been going through this hell of an issue for almost a month now. I got a new build around a month ago. For the first 2-3 weeks, everything was working flawlessly: gaming at ultra settings, no stutter, no lag, and most importantly, no crashes. Just after that, one day, I got a BSOD about DPC_WATCHDOG_VIOLATION around 5 minutes into a game. Fast forward to this day, I can't run any game stable for less than 10 minutes. The same thing applies when I try to export a video. In Event Viewer I get errors about nvlddmkm, especially Event ID 0, 14 and 153, which I will go in more detail as this thread continues.

The system specs are as listed below:

CPU: Intel Core i9-14900k
GPU: ZOTAC RTX 4090 AMP Extreme Airo
RAM: 32 GB DDR5-4800, Kingston KF552C40-32 + 32 GB DDR5-4800, Kingston KF552C40-32 (64 GB RAM in total)
SSD: CT2000P5PSSD8, 1863.02 GB
All-in-One Cooler for the CPU
PSU: bequiet! Pure Power 12M 1000W
Motherboard: MSI PRO B760M-P
OS: Windows 11, 24H2
Monitor: Dell S2419HGF, 1920x1080, 144Hz

I have noticed the same issue being reported many times, especially for 4090. Here is what I have done so far:

  1. Enabled & Disabled XMP in BIOS (issue persists) ;
  2. Disabled & Enabled Hyperthreading, Turbo Settings regarding CPU (issue persists, no difference) ;
  3. Used DDU to uninstall the driver in safe boot mode and installed an older version [537.58] (This did make games more stable but only a few, tested some other games, they would still crash with Event ID 0) ;
  4. Changed permissions for nvlddmkm.sys file to Full Control for Users (issue remain) ;
  5. Turned on Debug Mode in Nvidia Control Panel
  6. Switched to "Prefer Maximum Performance" in Nvidia Control Panel (no difference) ;
  7. Disabled Hardware Accelerated GPU Scheduling in Windows settings (This caused no BSODs but the crashes remain)
  8. Used MSI Afterburner to undervolt the GPU core & memory clocks for around -52 MHz (no difference) ;
  9. Changed PCIe Gen Mode in BIOS to both 4.0 and 3.0 (no difference) ;
  10. Uninstalled programs like G-Hub and Wallpaper Engine, switched HAGS off for the other programs that supported it (no difference);
  11. Disabled Integrated GPU in Device Manager (issue still persists);
  12. Uninstalled NVIDIA HD Audio in Device Manager (yet again no difference);
  13. Disabled High Precision Event Timer (others said it was the only workaround it, no difference whatsoever);
  14. Ran OCCT tests for every single component, even at extreme. What crashed into the BSOD was VRAM. Could this indicate a hardware issue?
I tried to DDU even the latest drivers, it did not change anything. I've also seen reports of 566.36 being the most stable driver for 4090 but that also did not change anything. As for the errors in Event Viewer, I get these 3-4 specific errors from source nvlddmkm:



The description for Event ID 153 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
Error occurred on GPUID: 100

The message resource is present but the message was not found in the message table.



The description for Event ID 14 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
badfbadf(badfbadf) 00000000 00000000

The message resource is present but the message was not found in the message table



The description for Event ID 0 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
Error occurred on GPUID: 100



As for the BSODS, the bugcheck error in Event Viewer is:

The computer has rebooted from a bugcheck. The bugcheck was: 0x00000133 (0x0000000000000001, 0x0000000000001e00, 0xfffff80784dc43b0, 0x0000000000000000).



Something to notice is that I would get Event ID 153 error on the latest drivers only, but either way, I have been pulling my hair out trying to find any solution available. If anyone has any idea or has been having the same problem, I'd really appreciate any help! Thank you so much for your time!
you should provide the mini dump.
you could also try this windbg command
!dpcwatchdog

I do not remember which classes of devices has
1e00 timeout values. (not the cpu)

must be a video timeout.
provide a minidump if you can
 
Windows is using part of the your SSD as virtual ram when you use page file. You're not fixing the problem, you're only delaying it. At this point, I would suggest you stop using the PC before you corrupt your Windows installation, because you clearly have a system ram issue.

You could try fixing it your self by removing each ram module one at a time and seeing if you still have the crashing. Which ever module does not cause crashing, swap it to the slot of the one that does and see if it still crashes. That will help diagnose if it's a faulty module issue or possibly a CPU pin issue, but if it is a CPU pin issue, it may require fully removing the CPU cooler to fix it. If you're not familiar with that process, just have the people that built the system do it for you.
Alright, I am reverting everything I applied to normal settings. I will be handing the PC over to a hardware service shop to see what's actually going on. Thank you for your help; I really appreciate it!
 
Alright, I am reverting everything I applied to normal settings. I will be handing the PC over to a hardware service shop to see what's actually going on. Thank you for your help; I really appreciate it!
I may actually be wrong on it being the ram specifically, and could actually be the SSD, but it's still very possible there is a CPU pin issue because that could still cause kind of similar issues with storage and corrupt the data on the SSD.

Whatever the cause of the issue is, I just know nearly every time I've had to deal with this kind of error, it was because something corrupted the driver files and often the Windows files. I've found it usually came down to being a ram issue for one of various reasons, usually unstable ram settings and less often being a CPU pin issue.
 
over all your build looks pretty good. no really modified windows files. my guess is that gna.sys is messing with your graphics sound driver and crashing your GPU driver.
see link below for update to gna.sys

I have never looked into what the AMD Workload Profiling Scheduling Driver actually does. Just did not expect to see it.
(not sure if it has any effect on your bugcheck)

-----------------
gna.sys Fri Aug 11 03:25:02 2023
(intel gaussian and neural accelerator)
(looks like it can mess with sound processing, i would apply the update to match your bios version)
https://www.msi.com/Motherboard/PRO-B760M-P-DDR4/support#driver

take a look at the other drivers section for updated driver.
i guess it would be called Intel GNA Scoring Accelerator.
you might also install intel management engine so you can get the intel ME cpu patches.

rtcx21x64.sys Sun Oct 3 23:54:20 2021
(realtek network driver for motherboard chip, update)
you should apply the lan drivers update from the msi link listed above.

rtwlanu.sys Wed Oct 28 21:40:51 2015

the old rtwlanu.sys is unexpected, way too old
real tek usb network driver. (if the usb device was removed, you might go to windows control panel, device manager, find the menu option to show hidden devices, then delete the greyed out entries). You might also use microsoft autoruns64.exe to delete the driver if you can not find a update.


KslD.sys Sun Aug 31 21:17:24 1986
WdFilter.sys Wed Nov 18 09:24:15 2020
(both drivers look wrong in the debugger, checksums were ok) (windows anti malware drivers)
you might run cmd.exe as an admin then run
DISM .exe /Online /Cleanup-Image /RestoreHealth
(just did not know if old binaries were on your system for some reason)

amdwps.sys (unexpected to see AMD as part of a driver name on a intel cpu system)
(AMD Workload Profiling Scheduling Driver)

note: you should change the memory dump type to kernel and provide a kernel dump. a minidump only show the info on the active cpu at the time of the bugcheck. your system has 32 CPU cores that but I can only see what was running on the actual core that called the bugcheck.
 
Last edited:
over all your build looks pretty good. no really modified windows files. my guess is that gna.sys is messing with your graphics sound driver and crashing your GPU driver.
see link below for update to gna.sys

I have never looked into what the AMD Workload Profiling Scheduling Driver actually does. Just did not expect to see it.
(not sure if it has any effect on your bugcheck)

-----------------
gna.sys Fri Aug 11 03:25:02 2023
(intel gaussian and neural accelerator)
(looks like it can mess with sound processing, i would apply the update to match your bios version)
https://www.msi.com/Motherboard/PRO-B760M-P-DDR4/support#driver

take a look at the other drivers section for updated driver.
i guess it would be called Intel GNA Scoring Accelerator.
you might also install intel management engine so you can get the intel ME cpu patches.

rtcx21x64.sys Sun Oct 3 23:54:20 2021
(realtek network driver for motherboard chip, update)
you should apply the lan drivers update from the msi link listed above.

rtwlanu.sys Wed Oct 28 21:40:51 2015

the old rtwlanu.sys is unexpected, way too old
real tek usb network driver. (if the usb device was removed, you might go to windows control panel, device manager, find the menu option to show hidden devices, then delete the greyed out entries). You might also use microsoft autoruns64.exe to delete the driver if you can not find a update.


KslD.sys Sun Aug 31 21:17:24 1986
WdFilter.sys Wed Nov 18 09:24:15 2020
(both drivers look wrong in the debugger, checksums were ok) (windows anti malware drivers)
you might run cmd.exe as an admin then run
DISM .exe /Online /Cleanup-Image /RestoreHealth
(just did not know if old binaries were on your system for some reason)

amdwps.sys (unexpected to see AMD as part of a driver name on a intel cpu system)
(AMD Workload Profiling Scheduling Driver)

note: you should change the memory dump type to kernel and provide a kernel dump. a minidump only show the info on the active cpu at the time of the bugcheck. your system has 32 CPU cores that but I can only see what was running on the actual core that called the bugcheck.
Wow, thanks for this very detailed analysis!

I tried to update the Intel GNA Scoring Accelerator driver and the Realtek LAN driver. When I tried to disable the Realtek USB network driver, I couldn't find anything related to it even though I enabled "Show hidden devices". I ran the DISM command with RestoreHealth option, it did not find any issue.

I was surprised when you mentioned that amdwps.sys file. I built this PC completely from scratch and I only used i9-14900k and no other processor. Unfortunately even though I did update the drivers you mentioned, the BSODs still remain. I was able to replicate one just before through a VRAM benchmark test.

I managed to get the kernel type of memory dump. You can find it here: https://drive.google.com/file/d/1uc3suYk7oqc4WkN63CAIe0K5ra5N_iAc/view?usp=sharing

Although, it's 3 GB :/

Thanks for the help man! Much appreciated!
 
Wow, thanks for this very detailed analysis!

I tried to update the Intel GNA Scoring Accelerator driver and the Realtek LAN driver. When I tried to disable the Realtek USB network driver, I couldn't find anything related to it even though I enabled "Show hidden devices". I ran the DISM command with RestoreHealth option, it did not find any issue.

I was surprised when you mentioned that amdwps.sys file. I built this PC completely from scratch and I only used i9-14900k and no other processor. Unfortunately even though I did update the drivers you mentioned, the BSODs still remain. I was able to replicate one just before through a VRAM benchmark test.

I managed to get the kernel type of memory dump. You can find it here: https://drive.google.com/file/d/1uc3suYk7oqc4WkN63CAIe0K5ra5N_iAc/view?usp=sharing

Although, it's 3 GB :/

Thanks for the help man! Much appreciated!
looks like you have some hardware disabled.
re enable this
ServiceName is "nvvad_WaveExtensible"

looks like you have some thread that does not complete.
1.3 million attempts for other threads to access the lock.
(in 12 minutes)

I will poke around in the dump and see if I can find a cause but
renable the "nvvad_WaveExtensible"

looks like a lot of threads waiting on a lock.
most look like they are actually gpu related but this one looks strange:
SearchIndexer.exe
I do not know why it would be using a graphics driver.
I would stop the service and maybe reset its database.
ie something like this:


system32\drivers\fxvad.sys Wed Mar 17 10:48:36 2021
anther 3rd party sound driver
https://forum.fxsound.com/t/bsod-related-to-fxvad-sys/5804

might want to remove it for testing. or update to the 6/2024 version (on same link)

looks like you were also running:
GpuMemtest.exe 20760640 Kb
(using a lot of memory)

you might consider removing
ProcessLasso.exe

looks like you have a windows insider build also. Might just be some bug.

one of the threads that was waiting was trying to change the power state of some gpu device.
You might want to run in high performance mode to avoid this.
 
Last edited:
looks like you have some hardware disabled.
re enable this
ServiceName is "nvvad_WaveExtensible"

looks like you have some thread that does not complete.
1.3 million attempts for other threads to access the lock.
(in 12 minutes)

I will poke around in the dump and see if I can find a cause but
renable the "nvvad_WaveExtensible"

looks like a lot of threads waiting on a lock.
most look like they are actually gpu related but this one looks strange:
SearchIndexer.exe
I do not know why it would be using a graphics driver.
I would stop the service and maybe reset its database.
ie something like this:


system32\drivers\fxvad.sys Wed Mar 17 10:48:36 2021
anther 3rd party sound driver
https://forum.fxsound.com/t/bsod-related-to-fxvad-sys/5804

might want to remove it for testing. or update to the 6/2024 version (on same link)

looks like you were also running:
GpuMemtest.exe 20760640 Kb
(using a lot of memory)
1.3 million attempts????!!! That's insane.

As for nvvad_WaveExtensible, I disabled it because I did see many other reports that when they disabled Nvidia HD Audio in device manager, it fixed their issues but it did not for me.

The GpuMemtest.exe was a benchmark I ran which I used to replicate the crash and BSOD (used 80% of the GPU VRam = 20760640 Kb).

I doubted it was Windows 11 which was faulty so I went and reinstalled Windows 10 (fresh install) but the issues remain. This has to be a hardware related issue rather than software.

But hey, thanks for the help man!
 
1.3 million attempts????!!! That's insane.

As for nvvad_WaveExtensible, I disabled it because I did see many other reports that when they disabled Nvidia HD Audio in device manager, it fixed their issues but it did not for me.

The GpuMemtest.exe was a benchmark I ran which I used to replicate the crash and BSOD (used 80% of the GPU VRam = 20760640 Kb).

I doubted it was Windows 11 which was faulty so I went and reinstalled Windows 10 (fresh install) but the issues remain. This has to be a hardware related issue rather than software.

But hey, thanks for the help man!
there were a bunch of things running that I normally do not see.
new stuff and old drivers are always get the blame.

I would use a clean build +updates, turn on verifier testing and then run my stress tests.
verifier testing will slow down the machine but it will also bugcheck early if there is a problem.
i would only install the basic nvidia drivers and not all of the extras. (nvidia container) just to limit what can go wrong. IE if the system locks up/ bugchecks with the basics then it more likely to be hardware. Even then, I would set the base clocks to standard reference values.
get them to pass stress before increasing any overclocks or adding more software program. kind of tedious.
 
Last edited: