Question Daily BSOD - - - ntoskrnl.exe (+416980) ?

Dec 4, 2023
17
1
15
Hello all,

I have been having, for a few months now, BSOD caused mostly by ntoskrnl.exe (+416980)
Doesn't matter if PC is idle or not. Sometimes is happening during shutdown or a restart. Maaby once per day or more.

minidumps -> https://ufile.io/f/kvoic

The firsts BSOD, were related to MEMORY_MANAGEMENT. Then, variaty of causes.
So, I run some memtest86 on my G.Skill rams, and basically stopped after few minutes due to excess errors. (I tried with and without xmp etc)
I thought the rams had the issue, so RMA'ed them and they send me new ones.
In the mean time, I bought some other rams, so I could run my build. Those were 2x Patriot 8GB DDR4 2666MHz CL19. (I did a memtest86 on those, without issues.)
When the new rams arrived from G.Skill and installed them, I immediately run a memtest again and everything was clear (after few tests).

But didn't fix anything, I was stil getting BSOD every day.
Run the memtest again on my g.skill and again errors.
Re-installed the Patriots. Run memtest for around 12 hours and not even a single error. As the days passed, and kept having issues, I was doing memtest, without any single error.
I started to think that this is a mobo memory chip related issue? and because gksill rams are running higher, maybe that's why I see errors?

The next steps I did was to reinstall windows on a new disk. So, I was running windows 10 when the problem started and installed windows 11 on the new disk. Still kept the previous one as storage.
And the problems continue.

At this point:
I have changed RAMS,
Installed Windows 11 on new disl,
Updated every driver (even used third party apps to achieve this),
ran SFC,
ran driver verifier but didn't see anything more specific in minidumps.

I tried to see if I can fix it but up to this point, doesn't make sense.
My next step was to replace my GPU with an older one (nvidia again).

I can't figure out the minidumps and I would really appriciate any help.

Thank you in advance for any tip and let me know if you need any other information.

CPU: Intel Core i7-10700K
Motherboard: ASUS ROG STRIX Z490-E GAMING
Cooler: ARCTIC Liquid Freezer II 360
Ram: 2x Patriot 8GB DDR4 2666MHz CL19
Previous ram: F4-3600C16D-32GTZNC (G.Skill)
SSD/HDD:
Apacer AS2290Q4U 1TB (nvme) were windows are installed.
Samsung SSD 970 EVO (nvme)
WDC WD10EZEX 1 TB (sata)
Samsung SSD 840 EVO (sata)

GPU: MSI RTX 2070 SUPER Ventus OC
PSU: B-quite
OS: Microsoft Windows 11 Professional (x64) Build 22631.2715

minidumps -> https://ufile.io/f/kvoic
 
Solution
Have you checked that each set of RAM sticks you're using are on the QVL for your motherboard? If they're not they will probably still work, but it's always wise to buy RAM that has been QVL tested to avoid any concerns about whether it's compatible or not.

The five dumps you uploaded could be interpreted as RAM problems; two fail with a 0xC000005 exception code (invalid memory access) and two fail with a MEMORY_MANAGEMENT bugcheck during page table management operations. The fifth is a DPC_WATCHDOG_VIOLATION but this particular bugcheck (with an argument 1 value of 1) can only be debugged with the full kernel dump, sadly that's already been overwritten by later BSODs.

If we put RAM aside for the moment, since it does seem unlikely to...
Have you checked that each set of RAM sticks you're using are on the QVL for your motherboard? If they're not they will probably still work, but it's always wise to buy RAM that has been QVL tested to avoid any concerns about whether it's compatible or not.

The five dumps you uploaded could be interpreted as RAM problems; two fail with a 0xC000005 exception code (invalid memory access) and two fail with a MEMORY_MANAGEMENT bugcheck during page table management operations. The fifth is a DPC_WATCHDOG_VIOLATION but this particular bugcheck (with an argument 1 value of 1) can only be debugged with the full kernel dump, sadly that's already been overwritten by later BSODs.

If we put RAM aside for the moment, since it does seem unlikely to be the problem (but do check the RAM QVL), the next logical place to look is the system drive, which appears from the dumps to be an NVME drive, and since I can see you've been running Samsung Magician I guess it's a Samsung NVMe drive(?).

The hypothesis that this may be a system drive issue is also supported by three of the BSODs having happened during storage drive operations - the three that are not MEMORY_MANAGEMENT bugchecks. All three dumps have nfts.sys on the call stack leading up to the bugcheck - this is the Windows filesystem driver. The other two dumps fail during page table operations, this will also involve updating slots on the pagefile - which is on your system drive.

If we discount RAM then, there is good evidence here to suspect the system drive. You don't mention which drive the active Windows 11 system is on but it's never wise to have two Windows systems installed on active drives, it can cause all manner of problems. I would strongly recommend running the original Windows 10 system whilst you troubleshoot this, Windows 11 is an unknown quantity for you and by running that you are potentially introducing more issues. If you can, reinstall Windows 10 on a drive other than the NVMe drive you're using now. Even though Samsung Magician may have declared it good, it's still suspoect based on what we see in the dumps. Ideally I'd like whatever drive was the system drive in thse five dumps removed from the system whilst we test with Windows 10.

If you get further BSODs with Windows 10 on another drive (and this NVMe drive removed) please download the V2 log collector from here and upload the zip file it produces. That provides us with all the roubleshooting data we are likely to need.
 
Last edited:
Solution
First of all, thank you for your reply.

Regarding the drives, 970 was the old drive with win10 installed. Now it's storage for steam games.
Apacer is the new one, with win11. Never had both OS running at the same time on two different active drives.

Both the sata drivers, were always storage.

RAMs:
Rookie mistake, just checked both vendors and both kits are not on the QVL list of my motherboard.

I will remove the old drive (970 EVO) from the system, but will keep win11 for the time being to see. If continues, I will reinstall win10 again on the new drive (Apacer) and check. I will try to install QVL RAMS at the end to be sure.

Thank you again and will keep updating this thread, if resolves the issue, maybe someone else finds it helpful.
 
So, after I removed Samsung 970 EVO, it kept crashing, but this time only as MEMORY_MANAGEMENT.
Then, started to remove one at a time, the other 2 drives with the sata connection.
First, I removed the WD but crushed once, then removed the other samsung. For the last 18 hours, hasn't crashed. So keep monitoring. Currently I am with only one drive, Apacer.

Basically, the 3 latest dumps, were generated after I start removing drives.

dumps: https://ufile.io/f/fd4nb
 
Yes sorry forgot to mention that:
be quiet! Straight Power 11 Platinum 850W
was bought on 10/10/2020 and it was brand new, original to build.
occasional gaming but not something demanding, no video editing or mining
 
The link appears to require a download....

= = = =

Look in Reliabiity History/Monitor and Event Viewer for any error codes, warnings, or even informational events that occur just before or at the time of the BSODs.

Take some screenshots and post the screenshots herein using imgur (www.imgur.com).

Start with Reliability History/Monitor. Much easier to use and understand. The tool uses a timeline format that may reveal patterns.

Event Viewer requires more time and effort to navigate and understand.

To help:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)

Overall my thought is that that PSU may be nearing its' EOL (End of Life) and starting to falter and fail.

It may not longer be fully able to meet peaks in power demands.

Look for varying errors and increasing numbers of errors.

May have decreased a bit as you removed the drives.

If crashes again increase as drives are added back then the PSU is a likely suspect.
 
Using ufile to share those small files. Yes they need to be downloaded but it's free.

In the reliability/history viewer could not find anything in common.
Choose three random days, and seems that I was using the PC differently.

images: View: https://imgur.com/a/18pNOLN


this is the latest dump https://ufile.io/7fa8eo7i

Let me know if you need any other info.

PS:
Still using only one drive. Today I just turned the PC, opened steam to download a game and checked the Reliability/History. Crashed few minutes later.

Edit1: crashed again when the pc started to restart (I issued the restart) with MEMORY_MANAGEMENT error. new dump https://ufile.io/tve7atqa
 
Last edited:
The five dumps you recently uploaded (dated 5th Dec) are all very strongly pointing at memory (RAM). Four of the dumps are MEMORY_MANAGEMENT bugchecks, but with two different exceptions; one is a page table entry corruption and the other is an I/O memory space error. The fifth dump is a STORE_DATA_STRUCTURE_CORRUPTION bugcheck, with an exception indicating that a heap buffer has been corrupted (a heap is a memory allocation unit).

There are no third-party drivers on the call stack leading up to any of these bugchecks, which makes a third-party driver cause very unlikely. If bad RAM is not the problem then we should next look at the CPU and since you're running an Intel CPU the best tool to test that is the Intel Processor Diagnostic Tool. Download that and see what that has to say.

The other test that would be useful is to run Prime95. This is a stress testing tool, that hammers your CPU and RAM (in some tests). It will highlight any instability in your CPU and RAM. However, it WILL make your CPU run very hot so you need a CPU temperature monitor (like CoreTemP) running to keep an eye on temps.

Run all three Prime95 tests (Small FFTs, Large FFTs, and Blend) one after the other for at least 1 hour each test, longer if you can. If Prime95 generates error messages, if the system crashes or BSODs, or if your CPU gets too hot (Tmax is 100C for your CPU), then stop testing and let us know what happened.
 
  • Like
Reactions: phyrexian_oblitron
The five dumps you recently uploaded (dated 5th Dec) are all very strongly pointing at memory (RAM). Four of the dumps are MEMORY_MANAGEMENT bugchecks, but with two different exceptions; one is a page table entry corruption and the other is an I/O memory space error. The fifth dump is a STORE_DATA_STRUCTURE_CORRUPTION bugcheck, with an exception indicating that a heap buffer has been corrupted (a heap is a memory allocation unit).

There are no third-party drivers on the call stack leading up to any of these bugchecks, which makes a third-party driver cause very unlikely. If bad RAM is not the problem then we should next look at the CPU and since you're running an Intel CPU the best tool to test that is the Intel Processor Diagnostic Tool. Download that and see what that has to say.

The other test that would be useful is to run Prime95. This is a stress testing tool, that hammers your CPU and RAM (in some tests). It will highlight any instability in your CPU and RAM. However, it WILL make your CPU run very hot so you need a CPU temperature monitor (like CoreTemP) running to keep an eye on temps.

Run all three Prime95 tests (Small FFTs, Large FFTs, and Blend) one after the other for at least 1 hour each test, longer if you can. If Prime95 generates error messages, if the system crashes or BSODs, or if your CPU gets too hot (Tmax is 100C for your CPU), then stop testing and let us know what happened.
Hey thank you for the reply. Just came home from vacations, so I will test what you suggested and will let you all know. Thank you again.
 
This might sound silly, but have you updated your bios since this has started? I am asking since you've only mentioned drivers specifically.
The reason why I am saying this is is not because I've had exact similar issues, it is mainly to exclude options and to be sure you're running everything up to date. The main reason why I am mentioning this is because I've had similar bsods (though less frequent) after a massive windows 11 update released last year, memtests were all clean, but windows memory diagnostic came out with errors every time and logs kept on pointing towards the RAM. Surprisingly updating the bios managed to resolve it for me, I am not saying it will resolve this, but I would make sure you're not running some year old bios version that might cause issues with Microsoft's updates.
 
  • Like
Reactions: phyrexian_oblitron
This might sound silly, but have you updated your bios since this has started? I am asking since you've only mentioned drivers specifically.
The reason why I am saying this is is not because I've had exact similar issues, it is mainly to exclude options and to be sure you're running everything up to date. The main reason why I am mentioning this is because I've had similar bsods (though less frequent) after a massive windows 11 update released last year, memtests were all clean, but windows memory diagnostic came out with errors every time and logs kept on pointing towards the RAM. Surprisingly updating the bios managed to resolve it for me, I am not saying it will resolve this, but I would make sure you're not running some year old bios version that might cause issues with Microsoft's updates.
I thought I had the latest bios update which is 2801, but I had the 2701. It was released 10 days ago.. I will try it out just to be sure.

I was the first think I checked when the issue started and indeed I had one bios older, and thought "Aha! This will fix it." 😛
 
The five dumps you recently uploaded (dated 5th Dec) are all very strongly pointing at memory (RAM). Four of the dumps are MEMORY_MANAGEMENT bugchecks, but with two different exceptions; one is a page table entry corruption and the other is an I/O memory space error. The fifth dump is a STORE_DATA_STRUCTURE_CORRUPTION bugcheck, with an exception indicating that a heap buffer has been corrupted (a heap is a memory allocation unit).

There are no third-party drivers on the call stack leading up to any of these bugchecks, which makes a third-party driver cause very unlikely. If bad RAM is not the problem then we should next look at the CPU and since you're running an Intel CPU the best tool to test that is the Intel Processor Diagnostic Tool. Download that and see what that has to say.

The other test that would be useful is to run Prime95. This is a stress testing tool, that hammers your CPU and RAM (in some tests). It will highlight any instability in your CPU and RAM. However, it WILL make your CPU run very hot so you need a CPU temperature monitor (like CoreTemP) running to keep an eye on temps.

Run all three Prime95 tests (Small FFTs, Large FFTs, and Blend) one after the other for at least 1 hour each test, longer if you can. If Prime95 generates error messages, if the system crashes or BSODs, or if your CPU gets too hot (Tmax is 100C for your CPU), then stop testing and let us know what happened.
So,

First I did the Intel Test. First and second try was a pass. On the third try, I opened FF and crash.


After that, I updated bios to the latest version (which was released just 11 days ago..) and run Intel Test 3 more times, all pass.

Then, started with Small for about 1 hour and 30 minutes, then Large for almost 2 hours, and lastly Blend, for almost 2 hours. No problems at all. Below temps and duration of tests.

View: https://imgur.com/a/dwhVoiP
 
Those three dumps still strongly suggest a RAM problem - or of course, a motherboard RAM slot problem.

One dump is a PAGE_FAULT_IN_NONPAGED_AREA bugcheck, which means a page fault (invalid page) was encountered whilst addressing a page in the non-paged pool. In the raw call stack we can see a memory error in private memory (nt!MiResolvePrivateZeroFault+0x1af) and whilst the kernel is recovering from that we get a page fault in the kernel (nt!KiPageFault+0x369) - that's the one that caused the BSOD.

The second dump is a MEMORY_MANAGEMENT bugcheck with an exception code indicating an invalid page table entry (PTE). In the call stack we can see that an address space had ended normally and cleanup was in progress for the memory it had used. The BSOD happened when the nt!MiDeleteVa+0x2316e2 function was called. Notice that very large offset into the function (0x2316e2). That's outside the range of the module, which is only 0x104700 bytes long - that's why we got the BSOD. Quite where that offset came from we can't tell but all the functions called are in Windows modules, so it's not a software error.

The third dump is a KERNEL_LOCK_ENTRY_LEAKED_ON_THREAD_TERMINATION bugcheck indicating that a thread was terminated before all of its locks had been released. In the raw call stack we can see the process ending normally and memory cleanup starting. The BSOD happens on the nt!KeCleanupThreadState+0x246428 function call. Notice again that large offset (0x246428), that's also beyond the end of the nt!KeCleanupThreadState module, which also has a length of only 0x1047000 bytes. This call stack also contains only functions in Windows modules, so it's not a software error.

That Prime95 ran well does suggest that it's not a CPU problem, but that said, Prime95 also stresses RAM on a couple of the tests and I would have expected that to fail.

It is just possible that these BSODs could have a bad third-party driver causing them. It might just be that a third-party driver trashed a critical system structure or screwed-up a queue or list and that problem wasn't encountered until some time later when we got a BSOD in a different thread. We wouldn't see the faulty driver in any of the dumps because it was long gone.

Because of that, and before you spend any money, it's well worth enabling Driver Verifier...

Driver Verifier subjects selected drivers (typically all third-party drivers) to extra tests and checks every time they are called. These extra checks are designed to uncover drivers that are misbehaving. If any selected driver fails any of the Driver Verifier tests/checks then Driver Verifier will BSOD. The resulting minidump should contain enough information for us to identify the flaky driver. It's thus essential to keep all minidumps created whilst Driver Verifier is enabled.

To enable Driver Verifier do the following:

1. Take a System Restore point and/or take a disk image of your system drive (with Acronis, Macrium Reflect, or similar). It is possible that Driver Verifier may BSOD a driver during the boot process (some drivers are loaded during boot). If that happens you'll be stuck in a boot-BSOD loop.

If you should end up in a boot-BSOD loop, boot the Windows installation media and use that to run system restore and restore to the restore point you took, to remove Driver Verifier and get you booting again. Alternatively you can use the Acronis, Macrium Reflect, or similar, boot media to restore the disk image you took.

Please don't skip this step. it's the only way out of a Driver Verifier boot-BSOD loop.

2. Start the Driver Verifier setup dialog by entering the command verifier in either the Run command box or in a command prompt.

3. On that initial dialog, click the radio button for 'Create custom settings (for code developers)' - the second option - and click the Next button.

4. On the second dialog check (click) the checkboxes for the following tests...
  • Special Pool
  • Force IRQL checking
  • Pool Tracking
  • Deadlock Detection
  • Security Checks
  • Miscellaneous Checks
  • Power framework delay fuzzing
  • DDI compliance checking
Then click the Next button.

5. On the next dialog click the radio button for 'Select driver names from a list' - the last option - and click the Next button.

6. On the next dialog click on the 'Provider' heading, this will sort the drivers on this column (it makes it easier to isolate Microsoft drivers).

7. Now check (click) ALL drivers that DO NOT have Microsoft as the provider (ie. check all third-party drivers).

8. Then, on the same dialog, check the following Microsoft drivers (and ONLY these Microsoft drivers)...
  • Wdf01000.sys
  • ndis.sys
  • fltMgr.sys
  • Storport.sys
These are high-level Microsoft drivers that manage lower-level third-party drivers that we otherwise wouldn't be able to trap. That's why they're included.

9. Now click Finish and then reboot. Driver Verifiier will be enabled.

Be aware that Driver Verifier will remain enabled across all reboots and shutdowns. It can only be disabled manually.

Also be aware that we expect BSODs. Indeed, we want BSODs, to be able to identify the flaky driver(s). You MUST keep all minidumps created whilst Driver Verifier is running, so disable any disk cleanup tools you may have.

10. Leave Driver Verifier running for 48 hours, use your PC as normal during this time, but do try and make it BSOD. Use every game or app that you normally use, and especially those where you have seen it BSOD in the past. If Windows doesn't automatically reboot after each BSOD then just reboot as normal and continue testing.

Note: Because Driver Verifier is doing extra work each time a third-party driver is loaded you will notice some performance degradation with Driver Verifier enabled. This is a price you'll have to pay in order to locate any flaky drivers. And remember, Driver Verifier can only test drivers when they are loaded, so you need to ensure that every third-party driver gets loaded by using all apps, features and devices.

11. To turn Driver Verifier off enter the command verifier /reset in either Run command box or a command prompt and reboot.

Should you wish to check whether Driver Verfier is enabled or not, open a command prompt and enter the command verifier /query. If drivers are listed then it's enabled, if no drivers are listed then it's not.

12. When Driver Verifier has been disabled, navigate to the folder C:\Windows\Minidump and locate all .dmp files in there that are related to the period when Driver Verifier was running (check the timestamps). Zip these files up if you like, or not as you choose. Upload the file(s) to the cloud with a link to it/them here (be sure to make it public).
 
  • Like
Reactions: phyrexian_oblitron
Thank you for your reply. I just installed new RAMs,
Patriot Viper Steel 2x8GB, PC4-25600, CL16-18-18-36, voltage: 1.35V Product Number: PVS416G320C6K
Unfortunately, I couldn't find anything compatible with the list where I live. Bought the above as the timings and CL were the same. In memory support page, it has the PV416G320C6K in the list but same CL and timings.

I was thinking that maybe the RAM slots in motherboard, could be the issue. If it comes to motherboard, I will have a hard time (price wise and availability) to find one but it can be done.

I keep a full disk image everyday so it will not be a problem in case of loops. I will try Driver Verifier and will let you know in the next few days.

Thank you again for your insight. Very detailed.
 
  • Like
Reactions: ubuysa
Driver Verifier subjects selected drivers (typically all third-party drivers) to extra tests and checks every time they are called. These extra checks are designed to uncover drivers that are misbehaving. If any selected driver fails any of the Driver Verifier tests/checks then Driver Verifier will BSOD. The resulting minidump should contain enough information for us to identify the flaky driver. It's thus essential to keep all minidumps created whilst Driver Verifier is enabled.

So, a small update regarding the BSOD in the last 3 days.
On 13/12 -> Installed the new RAM sticks around 12.00 o'clock. Did not have issues the whole day, not even a browser crash. A little bit of use.

On 14/12 -> Enabled Driver Verifier around 11.00 o'clock and no issues the whole day. Usual use and some light gaming (game was not too demanding)

On 15/12-> DV still enabled and played BG3, usual use and some gaming (demanding title, BG3).

So, I will start connecting the drives in the next few days (one drive per day, to monitor for any issues.)

Hope that RAM was the issue, but yeah I will provide update once again in a two-three days and hope to wrap up it.
 
That sounds encouraging. If DV has been running for over 24 hours AND you've tried to ensure that every third-party driver gets loaded, then you can deactivate it now. Your problem is unlikely to be a third-party driver.

Fingers crossed that the new RAM is the solution.
 
I just removed the DV, so it was running for a couple of days, no issues still.
Yes, seems very encouraging to be honest. From 1-2 per day to 0 in the last 4 days..

My next update will be in 3 days, so hopefully no issues until then.
 
That sounds encouraging. If DV has been running for over 24 hours AND you've tried to ensure that every third-party driver gets loaded, then you can deactivate it now. Your problem is unlikely to be a third-party driver.

Fingers crossed that the new RAM is the solution.
So, up until now, everything looks good. I have installed all the drives and basically running without issues.
So, I assume, we can close this as solved.

To be honest, I have build a few PC, and never had issues with RAMs, not in vendors QVL list. Lesson learnt, I guess.

If you have time, please share if you have any sources, so I can read a little bit how to read the minidumps etc.

Thank you.