Question Thousands of "Event ID 17, WHEA-Logger" warnings in Event Viewer ?

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
Hello,

I've been having huge issues with Service Host: Windows Event Log taking up all my resources (CPU and Disk usage) and paralyzing my PC every half a minute or so. After checking the Event Viewer, there was thousands of warnings (yellow exclamation marks) of "Event ID 17, WHEA-Logger". They seem to appear every second and the logger logs them all the time.

I have an Asrock X58 Extreme motherboard, an Intel i7-920, 6GB of Corsair RAM and only recently I swapped my faulty old Ati Radeon 5870 for a GTX 960 and this started happening. Ran all the GPU tests and tested it in multiple games and everything seems normal, but PC in Windows is suffocated by the event logger that cranks up my CPU and disk usage and outright paralyzes the system all the time.

Any ideas?

A corrected hardware error has occurred.

Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)

Primary Bus: Device:Function: 0x0:0x0:0x0
Secondary Bus: Device:Function: 0x0:0x0:0x0
Primary Device Name: PCI\VEN_8086&DEV_3405&SUBSYS_34051849&REV_13
Secondary Device Name:


EDIT:
After acquiring the new GPU, I ran OCCT standard 3D test for more than 30 minutes and no errors were detected. Now when I run it starts returning WHEA errors, millions of them.

NOTE: Windows ran multiple updates yesterday, maybe they have something to do with it?
 
Last edited:
  • Like
Reactions: madaraosenpai
You can try these troubleshooting steps. First check your Windows update to make sure all the updates were installed. Secondly :
1. Press Ctrl+Alt+Del, select Task Manager.

2. From the File menu, select Run new task.

3. Put a check mark next to Create this task with administrative privileges and tap or click OK.

4. Type cmd and hit Enter.

5. Type the following commands in same order in the Command prompt window:

a. Type sfc /scannow and press Enter.

(This command scans and verifies the integrity of all protected system files and replaces incorrect versions with correct versions).

b. Type Dism /Online /Cleanup-Image /ScanHealth and press Enter.

c. Type Dism /Online /Cleanup-Image /CheckHealth and press Enter.

d. Type Dism /Online /Cleanup-Image /RestoreHealth and press Enter.

(These commands fix any files that System File Checker can't repair and Windows corruption errors).

e. Close the command window.

f. Check to see if the issue is resolved. If not, restart the system.
 
  • Like
Reactions: ashala2202

ubuysa

Distinguished
In addition to the above, can you please also download and run the SysnativeBSODCollectionApp and upload the resulting zip file to a cloud service with a link to it here. The SysnativeBSODCollectionApp collects all the troubleshooting data we're likely to need. It DOES NOT collect any personally identifying data. It's used by several highly respected Windows help forums (including this one). I'm a senior BSOD analyst on the Sysnative forum where this tool came from, so I know it to be safe.

You can of course look at what's in the zip file before you upload it, most of the files are txt files. Please don't change or delete anything though. If you want a description of what each file contains you'll find that here.
 

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
I ran all the commands and it did find some corrupted files and replaced them. However, the issue remained and the warnings spammed just the same until I rebooted the system. For now, it seems to have worked! I'm not getting any Event ID 17, WHEA-Loggers in the Event Viewer and ran OCCT again which returned no errors whatsoever.

Thank you for your help!

As for SysnativeBSODCollectionApp, I ran it but it took ages, so I decided to abort for now since I don't have enough time to wait for it. Maybe I run it again later.

I'll keep an eye on the system and get back to you if the problem reemerges.

 

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
Update
I've just found some WHEA-Loggers in the Event Viewer again, but not too many. In fact, only 2 warnings in a span of 2 hours. Have no idea what it could mean now. Before, there were millions of them. The system is working fine. I'll rerun the Sysnative Collection App when I get the chance these days.
 

ubuysa

Distinguished
Thanks for the upload. There are two dumps in there and they are identical. Both happened because your graphics driver is spinning in a continuous loop.

In your case this appears to be because you have completely the wrong graphics driver installed! Your system is Intel based with an Nvidia GTX 960 graphics card, yet the graphics driver that is failing is atikmpag.sys which is the driver for an AMD Radeon graphics card. You can see this clearly in the function calls leading up to the bugcheck (read this from the bottom up)...
Code:
2: kd> k
 # Child-SP          RetAddr               Call Site
00 ffffaa85`c1d47f58 fffff800`2883540d     nt!KeBugCheckEx
01 ffffaa85`c1d47f60 fffff800`288354ee     dxgkrnl!TdrTimedOperationBugcheckOnTimeout+0x45
02 ffffaa85`c1d47fd0 fffff800`2efe2043     dxgkrnl!TdrTimedOperationDelay+0xce
03 ffffaa85`c1d48010 ffff9a0f`a4a74000     atikmdag+0x42043
04 ffffaa85`c1d48018 ffffaa85`c1d48131     0xffff9a0f`a4a74000
05 ffffaa85`c1d48020 ffffaa85`c1d48100     0xffffaa85`c1d48131
06 ffffaa85`c1d48028 fffff800`2f030514     0xffffaa85`c1d48100
07 ffffaa85`c1d48030 00000000`013105f0     atikmdag+0x90514
08 ffffaa85`c1d48038 00000000`00000028     0x13105f0
09 ffffaa85`c1d48040 fffff800`2efe2017     0x28
0a ffffaa85`c1d48048 00000000`00000101     atikmdag+0x42017
0b ffffaa85`c1d48050 00000000`0000007f     0x101
0c ffffaa85`c1d48058 00000000`00fd5e87     0x7f
0d ffffaa85`c1d48060 ffffb18f`82b5a9fb     0xfd5e87
0e ffffaa85`c1d48068 00000000`00000000     0xffffb18f`82b5a9fb
You can clearly see atikmdag.sys called three times before the Windows DirectX kernel (dxgkrnl.sys) realises that the graphics operation has stalled (in the dxgkrnl!TdrTimedOperationDelay+0xce function call) and eventually it times out and we get the BSOD.

It's clear that your old graphics card must have been an AMD Radeon and you've swapped it for an Nvidia GTX 960 - without removing the AMD graphics driver first. In this situation here's what I would suggest...
  1. Download DDU.
  2. Run DDU and remove the NVidia driver for the GTX 960 card that is installed (this is a precautionary measure).
  3. Remove the GTX 960 card and replace the old AMD Radeon card.
  4. Run DDU again and remove the AMD driver for the Radeon card.
  5. Remove the AMD Radeon card and install the Nvidia GTX 960.
  6. Install the latest Nvidia driver for the GTX 960.
That's what you should have done when changing graphics cards in this way.
 
  • Like
Reactions: ashala2202

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
I did that. Before removing the old GPU, I used DDU to completely remove all the utilities and the driver and then swapped the hardware, following it with installing the latest Nvidia GPU. All in that order.

Anyway, I don't have the old GPU at hand anymore. Is it really necessary to plug it back in or?

I've never got the BSOD with the new GPU. My old GPU was faulty and it did cause BSODs from time to time. Maybe these are old dumps?
 
Last edited:

ubuysa

Distinguished
Your right, they are from mid-June, that's my mistake. For the future it would help when you make hardware changes that cause problems to state the date the hardware change was made in your post.

The WHEA-Logger events that you complain of are all for the same device, PCI\VEN_8086&DEV_3405; the Intel X58 chipset I/O Hub to ESI Port. Accoring to this, the I/O Hub device interfaces the CPU to the PCIe lanes. Any driver for that device will be part of your chipset package. When I look at the downloads for your motherboard it seems that Asrock only have drivers available for Windows 7, which always raises the question whether there are Windows 10 drivers available and even whether this board is truly Windows 10 compatible?

It's possible of course that the GTX 960 requires services from the PCIe lanes that the AMD card did not and that's why you're seeing these errors now. The only ay to check/confirm that would be to borrow a known good Radeon 5870 and see wether running that card elimminates these errors.

I would suggest that you first download and run the Intel Driver & Support Assistant and see whether that fins any new drivers - install any it finds.

If that doesn't help then download and run the Intel Chipset Installation Utility (now called the Chipset INF Utility) and see whether that can help. It may not be valid for you chipset however, if it is it will tell you.

The only other suggestion I can make, since you've been inside the case swapping graphics cards, is that you may have disturbed something else. Check all PCIe cards, remove and reseat them. Check all cables and connectors, at the PSU too, to be sure they are fully home.
 
  • Like
Reactions: ashala2202

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
Thanks for that post! I'll remember to state the dates more accurately next time.

I noticed the error was connected to the chipset and yes, since it's the old Mobo, there are no new drivers whatsoever. However, I've been running Win10 since it was launched with no real issues. It is an old PC, so it may get difficult to discern between the problems cropping up, but generally, I'd say it ran fine throughout the years.

Borrowing the same card is really not a possibility for me at this moment.

Repairing system files seems to have eliminated the WHEA-Logger which hasn't appeared for two days now. There was just that one reappearing of it, but it never came back. I hope it stays that way, but I'm not really convinced.

As far as cable connections go, yes, I'm aware of that so I always double check all the cables when making some changes. I've reinserted the RAM and the CPU lately, so there's no trouble there.

The Intel Driver & Support Assistant hasn't found any drivers.

The Intel Chipset Installation Utility said that the platform was not supported.
 
  • Like
Reactions: ubuysa

ubuysa

Distinguished
Whea logging in small amounts is a normal function and needs no action taken.
Can you provide some evidence for that statement? The importance or otherwise of WHEA logging depends on what is being logged, the severity of the logs, and how often the error is repeated. A WHEA error for a fatal failure really shouldn't be ignored at all. Even repeated corrected failures are indicative that something isn't behaving as expected.
 

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
Just to note: they did reappear multiple times during a day and always in pairs (two warnings with the same timestamp). Something I do must trigger them, but I can't see what. System is working fine as far as I can see.


EDIT: Millions of WHEA - Loggers after XCOM 2 froze on me. That's the first time something like that happened.
 
Last edited:

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
Couldn't boot Win10 this morning, it got stuck on the loading screen, so I ran the startup recovery and it booted after having attempted the fix. The system is working very slow, have no idea what's slowing it down: I've checked for viruses, ran optimization tools, disabled startup items, defragmented the drive and so on. It's as slow as a snail and recently games started freezing followed by the WHEA logs. Now this has propped up:

Distributed COM errors:
DCOM got error "1053" attempting to start the service WaaSMedicSvc with arguments "Unavailable" in order to run the server:
{9EA82395-E31B-41CA-8DF7-EC1CEE7194DF}

and

Service Control Manager errors:
The Windows Update Medic Service service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
 

ubuysa

Distinguished
Hmmmm, that's sounding more like a real hardware issue now. The Windows Update Medic Service is a service designed to keep Windows Update running well, that's going to be a symptom not a cause.

I note that your C: drive has limited free space (25GB) and that's approaching the limit of what Windows needs to function. Your single drive is a 1TB SATA HDD too, so file fragmentation will be having an impact on the apparent speed of your system.

I would move as much non-system data off the C partition on to the D partition as you can. The only things you want on the C partition is Windows and apps. It's also worth running the Windows defragger (if you can in its current state). Buying a SATA SSD will make a huge difference to system performance.

It might be worth trying to clean boot Windows with as many services and startup apps disabled as you can live with. If you can start by disabling all third-party drivers and apps (be warned that a lot of things won't work properly if you do this) and then gradually enable services in groups, and then finally startup apps, and see whether you can find services or apps that trigger the problems.
 
  • Like
Reactions: ashala2202

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
So, this has become a true nightmare. The Windows just wouldn't boot after the PC had been shut down. They do after a restart, but the boot time takes way longer than normal. When it decides to boot, the logo just disappears at one point and looks kinda like the resolution has changed, the screen darkens and the little circle continues to move around until it eventually starts slowly.

At first it was possible to boot up after recovery disk fix (about an hour every time) and in the end even that wasn't possible.

When failing to boot, I loaded the safe mode and checked the Event Viewer and it had multiple DistributedCOM and Service Control Errors, not all the same.

The NcbService service terminated with the following error:
A device attached to the system is not functioning.

This is just one of those I found.

I ran DDU from safe mode then and uninstalled the driver, booted into Windows very slowly and installed the latest driver again. Almost immediately after it had been installed, the PC froze on desktop (idle). Could it be a driver problem? Read in other places that people have detected the recent Nvidia drivers to have the same issues. Restarted and now it's working like it has been before: very slowly and unpredictably. I guess I won't be able to boot again if I shut down.

Loads of WHEAs cropped up again.

This is getting really frustrating and it looks as if it's worsening. Don't know what the culprit is.

I ran all the DISMs, defragged, freed up space on C. There's 31GB now.

My GPU runs all the tests fine and usually I'm able to play games and all, until the WHEA decides to interrupt and freezes the game.


EDIT: Now this appeared:

The Peer Name Resolution Protocol service terminated with the following error:
Unable to access a key.

and this:

The Peer Name Resolution Protocol cloud did not start because the creation of the default identity failed with error code: 0x80630203.
 
Last edited:

ubuysa

Distinguished
I don't think this is Windows, it looks much more like a hardware issue now. Something I've just realised is that you seem to be using mismatched RAM...

fSts0ft.jpg


Also 6GB is just about at the limit in what Windows 10 will run so there is no possibility of removing sticks to test. You should instead run Memtest86 on your RAM...
  1. Download Memtest86 (free), use the imageUSB.exe tool extracted from the download to make a bootable USB drive containing Memtest86 (1GB is plenty big enough). Do this on a different PC if you can, because you can't fully trust yours at the moment.
  2. Then boot that USB drive on your PC, Memtest86 will start running as soon as it boots.
  3. If no errors have been found after the four iterations of the 13 different tests that the free version does, then restart Memtest86 and do another four iterations. Even a single bit error is a failure.
If this were mine I wouldn't muck about with 6GB in three 2GB sticks, if money is tight then buy a single 8GB stick at least - but ensure it's fully compatible with your motherboard (check the QVL). Two matched 8GB sticks in a kit would be better.

You might also try downloading a Linux distro on a different PC (Linux Mint is a good choice). These can run directly off the USB drive without needing any installation. Plug that Linux USB drive into your PC and boot it.

Mess around with that Linux system. It won't be blistering fast running off the USB drive but you should be able to see whether Linux runs near normally or not. If Linux struggles then you definitely have some sort of hardware problem, but if Linux runs smoothly it's more likely to be Windows. The only exception there is if your storage drive is the problem, you won't touch that running Linux off a USB drive and that HDD could easily be flaky. I would replace that with an SSD as I've already mentioned.

If you can, run a chkdsk /r /f command on that HDD and post the last 20 or so lines of output.
 
  • Like
Reactions: ashala2202

ashala2202

Distinguished
Dec 20, 2016
27
2
18,535
Thanks for all the tips!

Those RAM sticks have been with me for 15 years now. Sometime during the PCs first year one stick was faulty, the warranty covered that and they gave me a new one. That may be that information.

Anyway, I've ran Memtest many times with them and they passed it every time, as they did again today. I did 2x4 iterations and it returned no errors.

then I did the chkdsk f and r commands, but on the boot and since they took ages, I missed any information it gave prior the reboot. However, after that I ran the regular chkdsk scan from Windows. It gave the following results:
https://ibb.co/SwHKY33

After that scan and after the /f and /r commands before, the Event Viewer showed 189 of the following errors:
The device, \Device\Harddisk0\DR0, has a bad block.

The wmic disk drive get status command said it was OK.

I was aware that my HDD was failing, it's been like that for a long time now, but haven't connected the WHEAs and the sudden weird behavior with that. Still don't find the actual connection.
 
Last edited: