• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

[SOLVED] Windows 10 Clock Watchdog Timeout / WHEA Uncorrectable error

Jan 4, 2019
11
0
10
Hi all,

I've recently been experiencing BSOD errors with my build randomly but seem to be able to force them also when I run 3DMark Benchmark.

My system is made up of:
Gigabyte B250-HD3P MB
Intel i7 7700K CPU
16GB (2x8) Corsair Vengeance 2400 DDR4 RAM
Zotac GTX 1080 Ti Mini GPU
Crucial MX500 1TB SSD
8TB HDD
Blu Ray Drive
Corsair TX750M PSU

I have installed all of the drivers provided by Gigabyte, Nvidia etc. and have also tried a fresh install of Windows 10.

Intel CPU diagnostic tool came u with no issues & prime 95 has been run for a few hours also with no issues.

4 passes of Memtest 86 came up with no Ram issues.

While benchmarking I've been checking Temps on MB, CPU, GPU and they are all ok under load.

Also, my BIOS is the latest build and is at stock settings and I haven't overclocked anything.

As mentioned, occasionally I get clock watchdog BSOD errors and occasionally WHEA Uncorrectable ones.

Here is the output from the dump files:

System Information (local)
--------------------------------------------------------------------------------

Computer name: DESKTOP-CNKFICM
Windows version: Windows 10 , 10.0, build: 17763
Windows dir: C:\Windows
Hardware: B250-HD3P, Gigabyte Technology Co., Ltd., B250-HD3P-CF
CPU: GenuineIntel Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz Intel586, level: 6
8 logical processors, active mask: 255
RAM: 17132072960 bytes (16.0GB)




--------------------------------------------------------------------------------
Crash Dump Analysis
--------------------------------------------------------------------------------

Crash dumps are enabled on your computer.

Crash dump directories:
C:\Windows
C:\Windows\Minidump

On Fri 04/01/2019 00:08:55 your computer crashed or a problem was reported
crash dump file: C:\Windows\Minidump\010419-6203-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x1B1B40)
Bugcheck code: 0x101 (0x18, 0x0, 0xFFFFDD80AB800180, 0x4)
Error: CLOCK_WATCHDOG_TIMEOUT
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval. This can be caused by non-responding hardware or by a overheated CPU (thermal issue).
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.



On Fri 04/01/2019 00:08:55 your computer crashed or a problem was reported
crash dump file: C:\Windows\MEMORY.DMP
This was probably caused by the following module: intelppm.sys (intelppm+0x136f)
Bugcheck code: 0x101 (0x18, 0x0, 0xFFFFDD80AB800180, 0x4)
Error: CLOCK_WATCHDOG_TIMEOUT
file path: C:\Windows\system32\drivers\intelppm.sys
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Processor Device Driver
Bug check description: This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval. This can be caused by non-responding hardware or by a overheated CPU (thermal issue).
The crash took place in a Microsoft module. The description of the module may give a hint about a non responding device in the system.



On Thu 03/01/2019 23:31:33 your computer crashed or a problem was reported
crash dump file: C:\Windows\Minidump\010319-6125-01.dmp
This was probably caused by the following module: hal.dll (hal+0x42667)
Bugcheck code: 0x124 (0x0, 0xFFFFC605A12E8028, 0xBE000000, 0x800400)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.



On Thu 03/01/2019 23:27:53 your computer crashed or a problem was reported
crash dump file: C:\Windows\Minidump\010319-5875-01.dmp
This was probably caused by the following module: hal.dll (hal+0x42667)
Bugcheck code: 0x124 (0x0, 0xFFFF8B894DA0A028, 0xBE000000, 0x800400)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.



On Thu 03/01/2019 23:16:57 your computer crashed or a problem was reported
crash dump file: C:\Windows\Minidump\010319-6453-01.dmp
This was probably caused by the following module: hal.dll (hal+0x42667)
Bugcheck code: 0x124 (0x0, 0xFFFF8F899F8E6028, 0xBE000000, 0x800400)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.



On Thu 03/01/2019 23:02:41 your computer crashed or a problem was reported
crash dump file: C:\Windows\Minidump\010319-5843-01.dmp
This was probably caused by the following module: hal.dll (hal+0x42667)
Bugcheck code: 0x124 (0x0, 0xFFFFD90565B08028, 0xBE000000, 0x800400)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.

Any help in identifying the culprit would be fantastic.
 
Solution
these errors can be caused by graphics cards that pulls too much power from the motherboard pci/e slot.
it can cause power fluctuations to the cpu. you might under clock the GPU to see if it helps.
(also check the power supply and make sure you are getting proper power to the GPU)

if the gpu pulls to much power then generally the motherboard logic resets the cpu and you get a different problem.
there is a power windows where it can pull just enough to cause the power to the cpu to dip but not enough for the motherboard logic to trigger a reset of the cpu. mostly notice this on factory overclocked gpu cards. if you have one, try running reference speeds for the card. if you already have stock speeds then try to underclock...
Can you go to c windows/minidump
copy the minidump files to documents
upload the copy from documents to a file sharing web site, and share the link here and I will get someone to convert file into a format I can read

we can get more out of the dump files than what Whocrashed provides.
 
okay, someone called gardenman will reply to this but it may not be for a few hours. He will have a conversion of dumps I can use to look at PC. there are others who can read them so perhaps they will beat him to answering question for you

Do you have any overclocking software on PC? anything like Intel Extreme Tuning Utility. Even if you not running them they may still be cause.
Both errors called by CPU but not necessarily caused by it. Your tests seem to show its not CPU.
WHEA errors can be caused by any hardware which is what makes them a pain to work out...
Can be caused by Heat so clean PC fans and heatsinks
 


Ok great thank you,

I have MSI afterburner which I was using to keep an eye on temperatures, usage etc. (I've yet to find a simple app that displays temp and usage of everything in one place), I will remove it now and run 3DMark although I started using afterburner after I had issues so I suspect the issue is elsewhere.

After the clean install I have re installed very little to avoid any additional variables.

The Graphics card is new, I was using integrated graphics beforehand although I had some bugcheck issues in the past which disappeared after a clean install so I don't think its definite that the GPU is the culprit.

The Noctua case fans, Noctua NH9 CPU fan and GPU fans are all running well, the rackmount case is actually slightly too small as the new GPU sticks out too much so until my new case arrives the lid is off and the rack itself is in a very open area.

Under 100% load GPU settles at around 72 degrees and CPU at around 60 degrees at 100% load. At rest the GPU is at 23 degrees and CPU at 28 degrees.

From what I've read these temperatures are ok but i'm open to ideas.
 


No problem, I've removed it, restarted etc. and the same things are still happening but one less culprit anyway.

Thanks
 
you have a hung cpu core, you need to change the memory dump type to kernel and provide a kernel dump. c:\windows\memory.dmp

remove overclock driver or it will be blamed for the problem.
C:\Program Files (x86)\MSI Afterburner\RTCore64.sys Fri Sep 30 05:03:17 2016
your cpu came out Q1'17 your overclock driver does not know about your cpu. (remove)

could not read the bios info in the memory dump. check to see if the bios version needs to be updated.
 


Thanks for the heads up.

I changed it to Kernel, crashed and have included the new dump file below:

https://drive.google.com/open?id=1AXYErScyRCWBr3W61k2l6g6AUI6uQG-S

MSI Afterburner is completely removed and the BIOS if vF10d which is the latest one.

I have also now juggled the RAM modules & slots and no combination stopped the crashing so i'm fairly confident they are not at fault.

Also, removed the Graphics card and ran the same test a few times using integrated graphics and it didn't crash once, I then thought to go some way to eliminate the motherboard I would try the same graphics card in a different PCI slot which also caused crashing.

Unfortunately I don't have a spare graphics card to test in the same MB but i'm thinking that there is a pretty high chance that the graphics card is the culprit.

let me know what you think.

Thanks
 
The plot thickens ...

I picked up a new replacement GTX 1080 Ti, did a clean install etc. on Nvidia drivers and the crashing still continues.

Also thought it would be worth overwriting the BIOS firmware incase it was corrupted but again same thing.

The fact it doesn't happen when I remove the card and just use inbuilt graphics does that rule out CPU and RAM?

I'm tempted to try a different motherboard.

Thanks
 
dump not shared for public access. asking for google password



 


Apologies,

This one should work:

https://drive.google.com/file/d/1AXYErScyRCWBr3W61k2l6g6AUI6uQG-S/view?usp=sharing

Thanks
 
I don't see any overclock drivers that would cause this.
my best guess is a overheated cpu but i can not tell. (guessing since the error type was Micro-Architectural Error and the system uptime was 11 minutes)
debugger commands are not workiing correctly
1: kd> !tzinfo fffff80538221180
Could not read THERMAL_INFO at fffff80538221180


-------------
cpu called the error after 11 minutes

Error Type : Micro-Architectural Error
Severity : Fatal

Error : Internal timer (Proc 1 Bank 4)
Status : 0xbe00000000800400
Address : 0x0000000140055c5d
Misc. : 0x0000000140055c5d

machine info:
BIOS Version F10d
BIOS Release Date 03/09/2018
Manufacturer Gigabyte Technology Co., Ltd.
Product B250-HD3P-CF
Processor Version Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Processor Voltage 8ch - 1.2V
External Clock 100MHz
Max Speed 8300MHz
Current Speed 4400MHz

1: kd> !sysinfo cpumicrocode
Initial Microcode Version: 00000000:00000000
Cached Microcode Version: 00000084:00000000
 
Thanks for the feedback.

Strange thing is Prime 95 is running for hours without crashing and cpu temperature is settling at around 65 deg at 100% load.

It only seems to be graphical stuff that’s triggering it (3D Mark, VR Mark etc.)

It’s fine when I remove the graphics card and use internal graphics but I know the card is ok.

I’m thinking about trying a better Z270 motherboard as I was planning on upgrading that anyway.

If it’s still happening after that I guess it has to be CPU.
 
these errors can be caused by graphics cards that pulls too much power from the motherboard pci/e slot.
it can cause power fluctuations to the cpu. you might under clock the GPU to see if it helps.
(also check the power supply and make sure you are getting proper power to the GPU)

if the gpu pulls to much power then generally the motherboard logic resets the cpu and you get a different problem.
there is a power windows where it can pull just enough to cause the power to the cpu to dip but not enough for the motherboard logic to trigger a reset of the cpu. mostly notice this on factory overclocked gpu cards. if you have one, try running reference speeds for the card. if you already have stock speeds then try to underclock.



 
Solution


Thanks for the feeback, in the end I sussed it late last night and it is along the same lines as that.

Although the Bios was set to 42 x 100 Mhz (stock clock speed for 7700K CPU) I noticed that in Windows it was running at 4500 with no overclocking software installed. Even turned off Intel Turbo Boost but was still running at 4.5.

Lowered the Bios to 40 x 100 which made it run at 4200 in Windows and that did the trick, I guess it was running too quick and didn't have enough voltage.

Not 100% sure on the cause, maybe a small fault in the MB or a dodgy chipset driver but I'm replacing the MB now anyway for something more fitting, I'm just glad my CPU seems to be ok.

Thanks to everyone for all the help it's much appreciated.
 
I thought I would post an update to help anyone with a similar issue in the future.

I rebuilt the machine with a new (Asus Z270-WS) Motherboard and interestingly the same thing was happening as the CPU was running at 4500MHz at idle not 4200 (with no overclocking software), also voltage was at 1.088 and I read standard should be 1.2 - 1.25.

I was surprised this was happening as the MB was at stock settings but after manually overriding the clock speed to 4200 and voltage to 1.25 it is working faultlessly.