DPC WATCHDOG VIOLATION bsod with win10 - ndis.sys and ntsokrnl.exe apparent causes

Status
Not open for further replies.

tompbsmith

Reputable
Jun 23, 2015
13
0
4,510
Hello all,

I did a clean install of w10 a few days ago. I've had zero errors so far until today when I got a system thread exception, then the computer just went to black screen and restarted and then straight after the reboot a DPC_WATCHDOG_VIOLATION.

On opening the .dmp in bluescreenview it seems as though ndis.sys and ntoskrnl.exe are the problems (ndis.sys is the main one I think - I don't really know how to read bluescreenview).

After some googling, I found that ndis.sys is a network driver and that most people were having this problem with their wireless network adapters. However, my machine is a desktop and only has LAN. I tried updating the drivers for my LAN but windows says they are up to date. So now I have ran out of ideas.

Here is the minidump file: here

And here is a screenshot of the bluescreenview properties window on the .dmp if you can't get to the .dmp file: here

Thank you for your time.

EDIT: Just a quick edit after some more reading/googling. Others seem to get this BSOD fixed by updating their disk drive drivers. I have a Samsung 840 EVO SSD but Samsung Magician says the drivers are up to date. Then I thought it could be my external HDDs (because some other threads said USB drivers cause this problem) but they are up to date too - if I'm going by Device Manager and Seagate's website.

EDIT 2: I just booted the machine, and got a different BSOD this time: PFN_LIST_CORRUPT. After some googling this appears to be caused when a driver messes up so everything is pointing to a bad driver which doesn't work with Windows 10.

EDIT 3: I have been looking at the non-microsoft drivers using DriverView and saw that my network driver is Intel so I thought that could be causing the problem (the ndis.sys problem) and so I tried updating the drivers because there were apparently new ones last month but the Intel utility says I do not have any Intel network adapters in my system. So that's a bit strange.

Also, I used CPUZ to see my system specs to reply to @darkbreeze and then opened "PC Info". Apparently Windows is only seeing 8GB but CPUZ sees the whole 12GB. What is going on here? Isn't CPUZ getting its information from the Windows kernel somewhere along the line?

Here is the .dmp file for the PFN_LIST_CORRUPT bsod: here.
 


Hello, my specs are:

CPU: Intel Core i7 Extreme 980X @ 3.33GHz
Mobo: Asus Rampage III Extreme
RAM: Corsair 6x2GB DDR3 (12GB) - PC3-10700H 677MHz
GPU: EVGA Geforce GTX 970
OS: Windows 10 Pro x64

I've also just added an edit to the original post because after booting I've just had a different BSOD (PFN_LIST_CORRUPT) which points again towards this being a driver error.
 


The PSU is OCZ ZX Series 1000W.

I've looked in C:\ for "windows.old" but there isn't, I've got "show hidden items" turned on.
 
Reviews for that unit are pretty good, considering it's OCZ which doesn't have the greatest track record when it comes to power supplies.

http://www.jonnyguru.com/modules.php?name=NDReviews&op=Story5&reid=238


So while it's still possible that it's the problem, it's highly doubtful. I'd still take a look at the system voltages in the bios and probably run HWinfo (NOT HWmonitor or open hardware monitor) to see what they 3, 5 and 12v readings are doing.

Considering the issue with the reported RAM, I suspect you have some incompatible memory modules installed that are not playing nice with each other, or a bad module. Plus, way too many modules. The stress on the memory controller from six modules is substantial, since it uses far more voltage than two or even four modules would use. I'd suggest taking two or even four of those modules out of the system, leaving the remaining modules in the correct slots as designated by your motherboard manual, and see if the problem continues or is gone.

It could also be a stability issue with the memory, due to voltage. You might bump the memory voltage very slightly in the bios.
 


Thanks for all the info. I don't really know much about PC hardware so I'm not sure about how to go about testing voltages etc. Is there a guide you can link to?

Also, about the RAM, all the sticks are the same so the voltages etc should be the same right? It still seems strange to me that CPUZ can still see all 12GB but win10 can only see 8GB. I ran memtest for a few hours a couple of weeks ago and it found no problems with my RAM.
 
Memtest often doesn't see problems with memory. There could be various reasons for this including intermittent faults or faults that only present themselves under specific loads or on specific regions of the memory. Usually Memtest is pretty thorough but it won't detect windows related issues or problems with the memory controller itself.

It's actually NOT uncommon to see this issue with the full amount of RAM not being recognized by windows. Where are you seeing only 8GB? In the "system" applet main page in control panel or somewhere else? Double check your bios settings to ensure that the integrated graphics are disabled. Perhaps the system is reserving some memory for the on chip GPU operations, even though it's not being used.

As for the voltages, you should have that information listed in the bios somewhere, or you can do the following to see it and post the results here.

Run HWinfo and look at system voltages and other sensor readings.

Monitoring temperatures, core speeds, voltages, clock ratios and other reported sensor data can often help to pick out an issue right off the bat. HWinfo is a good way to get that data and in my experience tends to be more accurate than some of the other utilities available. CPU-Z, GPU-Z and Core Temp all have their uses but HWinfo tends to have it all laid out in a more convenient fashion so you can usually see what one sensor is reporting while looking at another instead of having to flip through various tabs that have specific groupings.

After installation, run the utility and when asked, choose "sensors only". The other window options have some use but in most cases everything you need will be located in the sensors window. If you're taking screenshots to post for troubleshooting, it will most likely require taking three screenshots and scrolling down the sensors window between screenshots in order to capture them all.

*Download HWinfo


And then post the data here using this method:

*How to post images in Tom's hardware forums



You can also test the PSU manually, which is more accurate in any case, like this:

https://www.youtube.com/watch?v=ac7YMUcMjbw
 


Thanks @darkbreeze for taking your time to work me through all this information. I will do what you say and report back soon-ish (probably tomorrow as I am away from the machine right now).
 


Hey, here is an album of the HWInfo sensors screens: here.

I did a memtest last night and it seems there is a bad stick, I took it out and memtest didnt report any errors for a few hours so I think I found the stick but Im still getting bsods. As for where I see it saying 8GB of RAM, it is in "PC Info" and in the Performance tab of Task Manager. What's a bit strange with the Performance tab of task manager is that it shows only 4.7GB of RAM "Available" (here). It seems to change between 4.7 and 4.6. After removing that stick I now have 10GB in the machine, and looking at the "Committed" section on the Task Manager Performance tab, it says 4.5/9.6GB, does that mean it can see all 10GB but some isn't "available" (maybe for the integrated graphics reason you've mentioned)?

Thanks again for the help so far.
 
Voltages look fine. Others sensors look good as well. I'm not sure what you're virtual memory settings are set at, but I'd probably go into control panel, system, advanced system settings and make sure the virtual memory is set to "let windows automatically control virtual memory for all drives" as your virtual memory settings look unusual unless you've set them that way intentionally.

As to the memory, I think this is your core issue. Running Memtest is fine and great, but you can't test the memory accurately with it all installed and I'd be skeptical as to whether the stick you removed is actually the bad stick or not. You need to test each module individually for 5-7 full passes. If there is more than one module installed, memtest will often throw up false errors. Plus, you can't know which module is bad, if any, with more than one module installed.

Check your motherboard manual for the population rules. There are specific slots that need to be used when installing one, two, three or four modules and they need to be followed. Also, just because all the modules are the same part number, does NOT mean they will play nice together. When using dual, triple or quad memory configurations, all modules should be purchased in a matched set, meaning they have been tested by the manufacturer to work together in dual, triple or quad channel operation. Even modules from the same manufacturing batch can be incompatible, which is why they are pretested for compatibility and shipped in matched sets.
 



I haven't made any changes to the virtual memory myself so I'll make sure it is set to automatic. Okay I'll memtest each module alone and I'll also check the manual for configurations. The strange thing is, I bought the mobo/ram/cpu/psu off my friend last year...had no problems with the RAM, and he was running the setup for about 2 years with no problems. Could the a stick have just died now? That's why I was wondering if it was a driver/win10 problem rather than hw.
 


Haha "durr", I guess I'm tired. I think I was just making excuses to try and get out of the laborious task of checking each stick one by one :ange:

Without taking too much more of your time, and thank you very much for all the help so far, could you tell me why the BSOD seems to be DPC_WATCHDOG_VIOLATION most of the time but is sometimes PFN_LIST_CORRUPT (and at least one to do with a 'trace' of something but for some reason that .dmp file wasn't in the minidump folder)?
 
Honestly I can't tell you with any amount of certainty. What I can say is that if two different processes, drivers or applications the same faulty sector/cell/area of a memory module, it can corrupt the data being used or trigger different errors that are based on the application/process/driver rather than an error specific to the memory itself. The system sometimes only knows that something went wrong with what was going on, and triggers an error related to that, rather than triggering an error that says "hey, the damn memory is acting up".

Same thing can happen with a power supply. If a hardware component fails to do it's job because of a lack of power or other PSU related problem, the code that's triggered may tend to make you think that hardware component is bad, rather than the fact that it just can't function properly without clean, stable power.

None of this necessary means that there IS a problem with the memory either. It could be something else entirely, like the motherboard or just a driver issue.

When you did the "clean install", did install to a blank drive, or delete all the existing partitions on the current drive before installing, or did you just install to the partition or drive that previously had another OS version installed?
 
to debug the page file name corrupt error you would also need the kernel memory dump.
you can guess at a fix and update the sata drivers, unless you have a second driver corrupting memory.
or a bad filter driver. Can not tell without looking at a memory dump, even then you might have to set some debug flags.
a mini memory dump can be helpful just by looking at the driver list, lots of drivers that are known to to be bad versions.
----------
I could not access your memory dump file.
Also, watch dog timeout bugchecks require a kernel memory dump to debug them.

change your memory dump type to kernel and post a new memory dump file with public access enabled.

how to change memory dump type:
https://www.sophos.com/en-us/support/knowledgebase/111474.aspx


most often a watchdog violation in NDIS will be caused by problems in USB support for a USB wireless adapter being installed. IE a usb wireless adapter tries to install on one core, while the system attempts to use the device on a second core. It should work if the PLUG and play can install the driver within 300 seconds, if not the CPU reports the core is hung and bugchecks the system on the assumption that there is something wrong with the CPU. CPU is actually ok, but the plug and play is trying to install a broken driver over and over until the timeout limit on the second core expires and the system bugchecks.

This can be verified only with a kernel memory dump. Fix is to update BIOS to get usb fixes, update CPU chipset drivers to get USB 2.x fixes, update USB 3.0 external chipset drivers to get USB 3.0 fixes, then update the driver for the wireless device to get its fixes. (not a fun process)

there are other causes but I have just seen this one over and over.







 
Status
Not open for further replies.