Question First CLOCK_WATCHDOG_TIMEOUT BSOD, unsure how to proceed

Apr 24, 2022
14
0
10
0
Motherboard: ROG STRIX Z690-A GAMING WIFI D4 (latest BIOS revision 1404)
CPU: Intel Core i7-12700K (no OC, stock settings)
GPU: EVGA GeForce RTX 3070 Ti FTW3 ULTRA GAMING, 08G-P5-3797-KL (latest Nvidia drivers 512.15)
PSU: Corsair CMPSU-750TX
RAM: G.SKILL Ripjaws V Series 16GB (2 x 8GB) 288-Pin DDR4 SDRAM DDR4 3200 (PC4 25600) Desktop Memory Model F4-3200C16D-16GVKB (XMP II profile)

Recently built another PC with some spare parts + some new ones and ran into a BSOD today when launching a game. I'm pretty sure it's related to my heavy GPU overclocking, but this BSOD code is related to CPU failure. Ran WhoCrashed and BlueScreenView but they weren't of much help. Hoping someone here can help me troubleshoot.

Here's a link to the dump file: https://www.dropbox.com/s/wk2e9cn31s95ya0/042422-10812-01.dmp?dl=0

WhoCrashed claims the culprit is 0n2.sys, but this driver does not exist. Google search also yields 0 results. BlueScreenView claims the BSOD was caused by ntoskrnl.exe, which from my understanding is a very nonspecific crash address and doesn't really help pinpoint the actual problem.

I'm assuming my GPU overclock caused the CPU to hang on something, which caused a watchdog BSOD, but I haven't been able to find anything about a GPU overclock causing this type of BSOD. I've also run prime95 small ffts on the CPU for well over 3 hours with 0 crashes or errors, and temps averaged 79c, so I don't think heat or instability from the CPU is the problem. So right now I'm just waiting for the BSOD to happen again. I've already removed the GPU overclock and restored everything to stock settings, but I'm posting this in case it happens again.
 
Last edited:
Motherboard: ROG STRIX Z690-A GAMING WIFI D4 (latest BIOS revision 1404)
CPU: Intel Core i7-12700K (no OC, stock settings)
GPU: EVGA GeForce RTX 3070 Ti XC3 ULTRA GAMING (latest Nvidia drivers 512.15)
PSU: Corsair CMPSU-750TX
RAM: G.SKILL Ripjaws V Series 16GB (2 x 8GB) 288-Pin DDR4 SDRAM DDR4 3200 (PC4 25600) Desktop Memory Model F4-3200C16D-16GVKB (XMP II profile)

Recently built another PC with some spare parts + some new ones and ran into a BSOD today when launching a game. I'm pretty sure it's related to my heavy GPU overclocking, but this BSOD code is related to CPU failure. Ran WhoCrashed and BlueScreenView but they weren't of much help. Hoping someone here can help me troubleshoot.

Here's a link to the dump file: https://www.dropbox.com/s/wk2e9cn31s95ya0/042422-10812-01.dmp?dl=0

WhoCrashed claims the culprit is 0n2.sys, but this driver does not exist. Google search also yields 0 results. BlueScreenView claims the BSOD was caused by ntoskrnl.exe, which from my understanding is a very nonspecific crash address and doesn't really help pinpoint the actual problem.

I'm assuming my GPU overclock caused the CPU to hang on something, which caused a watchdog BSOD, but I haven't been able to find anything about a GPU overclock causing this type of BSOD. I've also run prime95 small ffts on the CPU for well over 3 hours with 0 crashes or errors, and temps averaged 79c, so I don't think heat or instability from the CPU is the problem. So right now I'm just waiting for the BSOD to happen again. I've already removed the GPU overclock and restored everything to stock settings, but I'm posting this in case it happens again.
If you run out of things to try test with xmp off.
Got the proper mobo drivers?
 

gardenman

Distinguished
Moderator
Hi, I ran the dump file through the debugger and got the following information: https://jsfiddle.net/19mt57er/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:042422-10812-01.dmp (Apr 24 2022 - 14:47:36)
Bugcheck:CLOCK_WATCHDOG_TIMEOUT (101)
Probably caused by:memory_corruption (Process running at time of crash: MsMpEng.exe)
Uptime:0 Day(s), 2 Hour(s), 28 Min(s), and 10 Sec(s)

Comment: The overclocking driver "RTCore64.sys" was found on your system. (MSI Afterburner)

Possible Motherboard page: https://rog.asus.com/motherboards/rog-strix/rog-strix-z690-a-gaming-wifi-d4-model/
There is a BIOS update available for your system. You are using version 1304 and the latest is version 1404. Wait for additional information before deciding to update or not. Important: Verify that I have linked to the correct motherboard. Updating your BIOS can be risky. Never try it when you might lose power (lightning storms, recent power outages, etc).

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
The process that crashed is part of windows defender

Do you use WIFI or Ethernet?
Jun 15 2021e2f68.sysIntel(R) Ethernet Adapter NDIS driver
Mar 10 2022Netwtw10.sysIntel Wi-Fi driver
expect those are newest WIFI but not so sure about Ethernet
It depends on what you use.

BIOS update could help
update intel management engine interface at same time as bios - https://rog.asus.com/motherboards/rog-strix/rog-strix-z690-a-gaming-wifi-d4-model/helpdesk_download (its under chipset)

I wonder if win 11 ever have the right build number - BUILD_VERSION: 10.0.22000.613
maybe next version will be closer
 
Apr 24, 2022
14
0
10
0
Thanks for the info guys. A bit confused though. Does this mean my RAM is faulty or was Windows Defender just running at the time of the crash? I ran the Windows Memory diagnostic test and it found 0 errors with my RAM. If RTCore64.says was found at the time of crash too, isn’t it more likely to be the real problem? After the BSOD I updated my BIOS from 1304 to 1404 and made sure all drivers were up to date. Also I use an Ethernet connection on my desktop.

If you run out of things to try test with xmp off.
Got the proper mobo drivers?
I’ll add this to the list of things to test. After the BIOS update I noticed that all voltages to the CPU were slightly increased though, so maybe I won’t crash again?
NTOSKRNL = windows kernel. It handles all driver requests, power management, and memory management. It sits between Hardware and Applications. It got blamed but its not the cause

I will get a friend to check dumps

try running this on CPU (I wonder if it knows CPU yet) - https://www.intel.com/content/www/us/en/download/15951/19792/intel-processor-diagnostic-tool.html?
I get a brand string error when running Intel Processor Diagnostic Tool.

Also unsure if this is related or not but I was using a very VERY large pagefile (nearly 40gb) for some unrelated work in 3DS Max. Could that have caused memory corruption?
 
Last edited:

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
Probably caused by:memory_corruption (Process running at time of crash: MsMpEng.exe)
memory corruption is something I see on almost every BSOD. So often I don't mention it. Except if people notice it and ask about it. It can be drivers, it can be page file, it can be ram, there are probably a few other reasons. I don't believe it means anything specific.
I get a brand string error when running Intel Processor Diagnostic Tool.
that is the test not knowing what your CPU model is. I was expecting that - Intel need to update their software to recognise 12000 series CPU.
This does not mean that your processor is defective. In order to complete the test on the IPDT go Tools > Stop Testing On Fail > Off.
https://community.intel.com/t5/Processors/brand-string-fail-test/td-p/560045

so re run test after changing settings and you will likely get another fail, about frequency expected. Since it doesn't know what CPU is, it doesn't really know what frequency to expect

What make/model is your C drive?
page file on C so might want to check its health.
 
Last edited:
Apr 24, 2022
14
0
10
0
memory corruption is something I see on almost every BSOD. So often I don't mention it. Except if people notice it and ask about it. It can be drivers, it can be page file, it can be ram, there are probably a few other reasons. I don't believe it means anything specific.

that is the test not knowing what your CPU model is. I was expecting that - Intel need to update their software to recognise 12000 series CPU.

https://community.intel.com/t5/Processors/brand-string-fail-test/td-p/560045

so re run test after changing settings and you will likely get another fail, about frequency expected. Since it doesn't know what CPU is, it doesn't really know what frequency to expect

What make/model is your C drive?
page file on C so might want to check its health.

My C drive is a Samsung 870 Evo 1TB. Drive health looks okay.




Also I'm still waiting for the BSOD to happen again.
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
Did you try running the diagnostics tests in Magician while you there?
two of them are SMART scans, short and long.
the other two scans write data to cells to confirm their health, the short scan is probably best place to start since it only takes 2 minutes. Long scan can take longer, depending on size of drive. On my 970 Evo plus 1tb it would take 40 minutes.
 
Apr 24, 2022
14
0
10
0
Did you try running the diagnostics tests in Magician while you there?
two of them are SMART scans, short and long.
the other two scans write data to cells to confirm their health, the short scan is probably best place to start since it only takes 2 minutes. Long scan can take longer, depending on size of drive. On my 970 Evo plus 1tb it would take 40 minutes.
I'm able to pass the short test, but the extended test fails after about 70%. I get the error "Defects have been detected from the device. Please check help." however help doesn't provide any useful information and now I see "Failing LBA" in the top right corner of Samsung Magician.

Kind of surprised that it seems to be failing. Haven't had any issues with this drive for over a year. SMART status is still in good condition too...
 
Apr 24, 2022
14
0
10
0
it might be a false positive - link
Although Samsung suggest replacements - https://eu.community.samsung.com/t5/computers-it/bsods-and-failing-lba/td-p/4306004
checking sata cables was suggested in above too.
I see. I went ahead and reseated most things in my system since the last post. Also replaced the CMOS battery because the BIOS time was constantly off and that didn't seem right. Noticed one sata cable was not fully plugged in (not C drive), despite the drive being visible in Windows.

I've been running into a few weird problems since my initial post though, but haven't encountered another BSOD yet. I had an issue where I'd freeze at the POST screen after a reboot. The only way to fix the problem was by shutting my PSU off entirely and then turning it back on. It seems that problem was caused by my USB headset (hyper x cloud revolver s) because removing the device has seemingly stopped the POST freezes. Plugging it into one of the auxiliary USB ports on my keyboard also fixes the issue. There were also times where the headset simply wouldn't be detected by the system at all after a reboot, and would need to be replugged to work. I'd also get a message in windows about a malfunctioning USB device until unplugging it and plugging it back in. Did not have that problem with my old system, so I'm not sure what is really at fault here. Tried all motherboard USB ports, same problem across all of them. My other USB devices do not have this problem. After some googling, it looks like there are some serious USB issues with the Z690 chipset so again, not sure what's really causing that or if it's even related to my original post.

I also ran another prime95 test to check CPU stability. It always completes with 0 warnings or errors, but I noticed that occasionally, the CPU usage on some of the cores would drop to 50 or 60%, and then return to 100% over the next minute. Is that normal? None of the workers are stopped and every single test is passed.

There's also the issue of my idle temps being lower than my ambient temps. The lowest recorded temps on some cores are as low as 13c, but that shouldn't be possible as my ambient temps have not dropped below 22c.
 
Last edited:

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
It seems that problem was caused by my USB headset (hyper x cloud revolver s) because removing the device has seemingly stopped the POST freezes.
i have seen others having problems with hyperX headsets

There's also the issue of my idle temps being lower than my ambient temps. The lowest recorded temps on some cores are as low as 13c, but that shouldn't be possible as my ambient temps have not dropped below 22c.
Thats impressive cooling you have there, but yes... its unlikely its below ambient.
what program do you use to track temps?
perhaps the bios is showing wrong temps or the sensor is broken. Updating BIOS could fix it if its just the BIOS software showing wrong values.

I have my PC set to hibernate and it often shows temps below normal at startup. Lowest it hit today was 18c but it wasn't much warmer than that when I started it today, CPU average temp for day is 38C... thats been its avg temp for several months though
 
Apr 24, 2022
14
0
10
0
i have seen others having problems with hyperX headsets


Thats impressive cooling you have there, but yes... its unlikely its below ambient.
what program do you use to track temps?
perhaps the bios is showing wrong temps or the sensor is broken. Updating BIOS could fix it if its just the BIOS software showing wrong values.

I have my PC set to hibernate and it often shows temps below normal at startup. Lowest it hit today was 18c but it wasn't much warmer than that when I started it today, CPU average temp for day is 38C... thats been its avg temp for several months though
it might be a false positive - link
Although Samsung suggest replacements - https://eu.community.samsung.com/t5/computers-it/bsods-and-failing-lba/td-p/4306004
checking sata cables was suggested in above too.
So I got two more BSODs yesterday. KERNEL_SECURITY_CHECK_FAILURE. Both were either immediately at boot or right after getting to the desktop and they started after I installed Armory Crate so that might've be an ASUS specific problem...or more signs that my hard drive was dying. The POST freezes also came back, so I don't think it was my headset causing it. When I finally got into Windows, the SMART results on my Samsung Evo changed too.

So my drive appears to be dying...but the funny thing is, I installed windows on a different drive and then used the asus secure erase feature on my samsung drive, and now the post freezes are gone. I can also run the extended SMART test and pass it with 0 errors. I haven't gotten a BSOD on my new windows install either.


i have seen others having problems with hyperX headsets


Thats impressive cooling you have there, but yes... its unlikely its below ambient.
what program do you use to track temps?
perhaps the bios is showing wrong temps or the sensor is broken. Updating BIOS could fix it if its just the BIOS software showing wrong values.

I have my PC set to hibernate and it often shows temps below normal at startup. Lowest it hit today was 18c but it wasn't much warmer than that when I started it today, CPU average temp for day is 38C... thats been its avg temp for several months though

I probably should've put this in the OP but I have a Noctua NH-D15 cooler and I've been using Armory Crate, Core Temp, and HWInfo64 to check temperatures. They all report similar values, and in the BIOS the CPU temp is usually between 25-30c. Also the low values aren't from when I boot up Windows, they're from when I leave my PC idle overnight. It might've been around 15c in my house last night, but shouldn't my idle temps still be much higher than ambient temps? I don't think the sensors are broken because they actually work...the numbers just seem like they're 10-15c lower than they should be. Is there a way to recalibrate the sensors or something? I hit 86c on one core during my prime95 testing so now I wonder if that was accurate....might've really been somewhere in the 100s.
 
Last edited:

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
its strange, my SMART report for my 970 Evo doesn't even show the Uncorrectable error count so I can't tell what its default value should be. I wouldn't trust that ssd with anything valuable but might be okay as storage. It could have been cause of all the memory errors. Page file is on C and seen as Memory to windows.

Motherboard: ROG STRIX Z690-A GAMING WIFI D4 (latest BIOS revision 1404)
was this you? link
if not, it happened at night as well. I know cores can sleep but that would just get it to maybe ambient.

its impossible to be below ambient, I am normally about 8c over, and although I haven't tried to run fans on my AIO in extreme mode, lowest I can get my CPU is 32C on really cold nights. I have an AMD though so it will never get to 13c like my last Intel CPU could.

BIOS update might fix values too.
 
Apr 24, 2022
14
0
10
0
its strange, my SMART report for my 970 Evo doesn't even show the Uncorrectable error count so I can't tell what its default value should be. I wouldn't trust that ssd with anything valuable but might be okay as storage. It could have been cause of all the memory errors. Page file is on C and seen as Memory to windows.



was this you? link
if not, it happened at night as well. I know cores can sleep but that would just get it to maybe ambient.

its impossible to be below ambient, I am normally about 8c over, and although I haven't tried to run fans on my AIO in extreme mode, lowest I can get my CPU is 32C on really cold nights. I have an AMD though so it will never get to 13c like my last Intel CPU could.
Yup, that's me. I wasn't really satisfied with the answer I was given there. My temps shouldn't ever drop below ambient with an air cooler...
In my BIOS there's actually an option to change the temperature at which my CPU will thermal throttle. That must be the TJ Max setting, right? It's currently set to Auto. I wonder if changing it to 100 will "fix" the sensors.
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
TJ Max is when it will thermal throttle. I doubt changing its max to 100 would fix sensors as they wouldn't be using its value.

I see if anyone has any smart ideas.

I wonder which of those cores with low temps, are the power efficient ones, I wonder if they are the ones lower than other 8. I wouldn't have expected core 1 to be one, but maybe? 5.6.7 could be.

the P Cores run at 3.6 base, 4.9 turbo
the E cores run at 2.4 base, 3.8 turbo.
https://au.pcmag.com/processors/91666/intel-core-i7-12700k

So its unlikely you get below ambient but if its just running off the 4 E Cores it could get pretty low.
 
Last edited:

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
It could be last 4 temps are E cores and rest get to go lower as they can sleep at night. Might explain why their max temp is lower than other 8 cores.

The easiest way to tell would be look at their max speeds in HWINFO, as it should have one for each thread (all 20 of them)
 
Apr 24, 2022
14
0
10
0
It could be last 4 temps are E cores and rest get to go lower as they can sleep at night. Might explain why their max is lower than other 8 cores.

Yeah the last 4 are the E cores. Also I went into my bios and set the max throttle temp to 100c just to test. Also realized that I forgot to enable resizeable bar so I did that. When I got into windows i got another BSOD. VIDEO_SCHEDULER_INTERNAL_ERROR. Unfortunately there's no dump file because it some how failed during creation. GPU is not overclocked and I was pretty much idle at my desktop when it happened.
 

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
can your PC stop throwing problems at me.

did you install latest Nvidia drivers? cause thats what problem could be. Hope its not the GPU overclocking coming to bite you.

this might help - https://www.nvidia.com/en-us/geforce/forums/geforce-graphics-cards/5/445190/rtx-3060-video-scheduler-internal-error-bsod/?topicPage=52

Recently built another PC with some spare parts + some new ones
what was old?
How recently?

I had thought the ssd would fix error, now you get new ones.
 
Apr 24, 2022
14
0
10
0
can your PC stop throwing problems at me.

did you install latest Nvidia drivers? cause thats what problem could be. Hope its not the GPU overclocking coming to bite you.


what was old?
How recently?

I had thought the ssd would fix error, now you get new ones.
Yeah when I reinstalled windows I installed 512.59. At this point the only old parts are my PSU and case. They are both about 7 years old now. I assumed the PSU is fine because the voltages are fine at idle and load.

Lowest I've ever seen the 12V droop to was 11.904v.

Also I hope it's not my GPU overclocking biting me in the ass. Honestly the max attempted overclock I've tried was +100 on the core and +1000 on the memory. I increased the power limit to 105 (max possible setting in evga precision) and left everything else.
 
Last edited:

Colif

Win 11 Master
Moderator
Jun 12, 2015
55,439
4,333
160,590
10,080
so the 870 was fairly new? Samsung SSD are pretty reliable.
at least they have a 5 year warranty on them.

7 year old PSU would probably need a retirement soon. I know PSU can kill HDD but not sure about ssd.
what make/model is PSU?

I have to go as its late but will look in here later.
the Nvidia link might help as it should mention Risizable Bar in it, it was how I found it.
 
Apr 24, 2022
14
0
10
0
so the 870 was fairly new? Samsung SSD are pretty reliable.
at least they have a 5 year warranty on them.

7 year old PSU would probably need a retirement soon. I know PSU can kill HDD but not sure about ssd.
what make/model is PSU?

I have to go as its late but will look in here later.
the Nvidia link might help as it should mention Risizable Bar in it, it was how I found it.
The samsung drive is only 1 year old. The PSU is is a Corsair TX750. https://www.corsair.com/us/en/Categories/Products/Power-Supply-Units/Enthusiast-Series™-TX750-—-80-PLUS®-Certified-Power-Supply/p/CMPSU-750TX
 
Apr 24, 2022
14
0
10
0
Decided to try memtest HCI. My system slows to a crawl and BSODs less than a minute into the test. Tried to run it three times and got KERNEL_SECURITY_CHECK_FAILURE every single time. Running memtest86 now to see if it gives me any errors.

EDIT: So I ended up canceling memtest86. Turns out I configured memtest HCI wrong. It was constantly writing to my disk which lead to 100% disk usage….which means the crashes were caused by the SSD I installed windows on. I installed windows to a third drive and didn’t crash. But seriously. Two of my drives are suddenly dying? This seems really strange. Is it possible that there’s a short somewhere on my motherboard that’s destroying everything connected to it? Or maybe the SATA controller is buggy? Or I guess it could be a driver issue...but if that was the case SMART wouldn’t show me critical errors. I honestly have no idea what’s going on anymore.
 
Last edited:

ASK THE COMMUNITY