Question Chasing the cause of graphics drivers failing

May 20, 2023
6
0
10
I have been having issues with the display drivers crashing and not recovering. The screen will freeze but the system continues to operate in the background. Windows plays the chime you hear when unplugging a USB device when the driver fails. I was able to save a screenshot and capture a crash message after the screens had frozen with WIN+Print Screen. this is what I saw in the screenshot from AMD Bug Report Tool:
"AMD crash Defender has detected an issue with your display driver. To prevent a system crash or hang, the display driver is now operating in safe mode with reduced functionality. It is recommended that you save all work and close any open applications. A system restart is required to restore your graphics hardware acceleration."
In another instance, I got this message:
"AMD software detected that a driver timeout has occurred on your system. An issue report has been created that can be sent to AMD to help us improve our software. Would you like to report this issue"
Often when this occurs, after rebooting the drivers need to be reinstalled as they did not reload and the system falls back to the basic driver. I have used DDU every time I did anything to the drivers as a precaution.

it has recently started beeping through the system speakers while in the OS. usually if I have the ram at full utilization in game and then try to switch windows.

I have tried many different things from upgrading the PSU, running Memtest86 for 24 hours (0 errors), disabling MPO, ran sfc /scannow with no errors, I've tried different versions of the graphics drivers (AMD) ranging from 22.11.2 to the latest driver, updating the chipset drivers, reseating the GPU, reseating the ram, trying running with just one stick of ram, reseating the power connections, reseating the CPU, updating the BIOS, changing the CMOS battery.

I even tried swapping in an old RX580 and it booted up and ran for 3 minutes the system turned off, I rebooted it hit Windows and shut back off immediately, and then would no longer power on at all with the RX580 installed. I put the 6700xt back in and it turns on now at least.

It isn't a temperature issue as temps settle at about 80c when running furmark with cpu burner.

I have been fighting this for a few weeks and cannot figure this one out. The system was brand new in December.

I was thinking the graphics card was faulty but now I'm suspicious of the motherboard or CPU. Not sure what else to do at this point. I don't have another AM4 board or CPU to swap in.

**EDIT** < Playing around with the system some more and only used one monitor rather than the 4 I would normally use. Things seemed to go smoothly but only had it in that configuration for about 2 hours while screwing around in Hitman as it seems to crash fairly frequently when playing that game. I then plugged the other monitors back in and there was some artifacting as Windows was loading that seems to be related to the crashes and got a double beep as Windows loaded in. >

System Specs:

CPU: Ryzen 9 5900x

GPU: ASRock Radeon RX 6700 XT Challenger D

RAM: 4x 8gb Corsair Vengence 3000 MHz

Motherboard: Gigabyte B550M AORUS PRO-P (BIOS: F16c)

PSU: 650W EVGA 650 GQ 80+ Gold
 
Last edited:
Windows plays the chime you hear when unplugging a USB device when the driver fails
have you tried checking your USB devices? Any of them old and always worked before?

I had a problem that I too thought was caused by my GPU as I would lose picture - I didn't have an AMD card so crash defender wasn't there to help. It took use lots of troubleshooting to point I gave PC to a store to test hardware. Problem didn't happen for them but second I plugged in my mouse at home, it started...

So that USB noise could be a clue to actual cause.

GPU & USB devices all controlled by same chip on motherboard so if one is playing up, it can cause other to play up.
 
Check both Reliability Monitor/History and Event Viewer.

Either or both tools may be capturing some relevant error codes, warnings, or even informational events related to the problems. Likely just before or at the time the problem occurs.

Reliabilty Monitor is much more end user friendly and the time line format may reveal some pattern.

Event Viewer takes more time and effort to navigate and understand.

FYI:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)

Also: How old is the PSU? History of heavy gaming use, video editing, or even bit-mining?

PSU may be nearing its' built in EOL (End of Life) and starting to falter and fail in some manner.
 
Check both Reliability Monitor/History and Event Viewer.
I can already guess what it looks like
nYgKZ0m.jpg


dH3JIjH.jpg


fun times.

Event viewer might help if the usb sound happened during the error you looking at.

@Ralston18 i don't like reliablity history as it can show you problems you were happily ignoring and didn't even notice. I refuse to look at it... right now its just parts of windows that fail to install that show on mine.
 
Last edited:
  • Like
Reactions: Ralston18
@Colif & @Ralston18

I'll look into the USB devices. I have noticed while getting the event viewer and reliability report data that Logitech G Hub is throwing a tantrum, and quite often. I will uninstall that and see what happens.

The reliability report, as @Colif guessed, lights up like a Christmas tree. On the days with no errors or few errors, I didn't use the PC.

2W5gsf7.png


The hardware errors are all LiveKernelEvent with codes 1b0, 117, 193, 144, and 1a8.

Here are the Last 24 hours of Errors in Event Viewer.

wtIn4zN.png


Most, if not all, of the application failures are for this (Logitech again)
Faulting application name: lghub_system_tray.exe, version: 2023.3.6302.0, time stamp: 0x6439ade4
Faulting module name: KERNELBASE.dll, version: 10.0.19041.2913, time stamp: 0xa1c3e870
Exception code: 0xc000027b

The Service Control Manager errors say one of the following
The AMDRyzenMasterDriverV20 service failed to start due to the following error:
The system cannot find the file specified.
The AMD User Experience Program Data Uploader service terminated unexpectedly. It has done this 1 time(s).
The GameSDK Service service failed to start due to the following error:
The system cannot find the file specified.
usually occurring in that order from top to bottom.

As for the PSU, upgrade probably isn't the best word for it. I had previously used a 600w 80+bronze PSU which was under the minimum suggested spec by the graphics card manufacturer of 650w. So I swapped the PSU I had in another machine, an EVGA 650w 80+ Gold that is currently in the machine.

My UPS reports, at full utilization, a total draw of less than 500w. The UPS powers my network, NAS, 4 monitors, and the PC. The 650w PSU was purchased new in 2019 and saw light gaming use during its lifetime. I also would either put the PC into sleep mode or power it off each night.

I'm not exactly sure where the 600w PSU came from. I don't have a box for it and I usually keep all of my computer component boxes. I probably was from a Cyberpower Prebuild but isn't a grey mystery box. I don't recall what brand it was and it is currently working fine in another machine.

Let me know if you need any more information. I'll report back about the G Hub uninstall and if I can get it to crash after.
 
It took less than 10 minutes after uninstalling G Hub and rebooting to get another crash. I'm not reinstalling G Hub for now and will wait to see what you have to say before messing with the system more.

I went back to event viewer to see what went on during that time and it is all Service Control Manager and a Certificate Services Client -CertEnroll.

The Certificate Error shows this
SCEP Certificate enrollment initialization for WORKGROUP\DESKTOP-O6AM8OG$ via https://AMD-KeyId-907d65e9b562315997dd5ad086b2b7598957b92c.microsoftaik.azure.net/templates/Aik/scep failed:

GetCACaps
GetCACaps: Not Found
{"Message":"The authority \"amd-keyid-907d65e9b562315997dd5ad086b2b7598957b92c.microsoftaik.azure.net\" does not exist."}
HTTP/1.1 404 Not Found
Date: Sat, 20 May 2023 15:25:01 GMT
Content-Length: 121
Content-Type: application/json; charset=utf-8
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000;includeSubDomains
x-ms-request-id: bda06145-a173-45ab-93c4-438e11c56c49

Method: GET(203ms)
Stage: GetCACaps
Not found (404). 0x80190194 (-2145844844 HTTP_E_STATUS_NOT_FOUND)

Also got a new Service Control Manager Message
The AMD User Experience Program Data Uploader service is marked as an interactive service. However, the system is configured to not allow interactive services. This service may not function properly.

Also, the Reliability Report had this to say about the most recent crash:
Description
A problem with your hardware caused Windows to stop working correctly.

Problem signature
Problem Event Name: LiveKernelEvent
Code: 117
Parameter 1: ffffa88ce81af050
Parameter 2: fffff80165db6760
Parameter 3: 0
Parameter 4: 600
OS version: 10_0_19045
Service Pack: 0_0
Product: 256_1
OS Version: 10.0.19045.2.0.0.256.48
Locale ID: 1033
 
Try to narrow things down a bit.....

Use "sfc /scannow" and "dism" to look for and fix problem files.

Objective being twofold: 1 regain some stability and 2) reduce the reported problems list.

I have run both of those. I haven't heard of "dism" before so I hope I ran the commands you were referring to.

Here is the CMD as I ran through the commands.
Microsoft Windows [Version 10.0.19045.2965]
(c) Microsoft Corporation. All rights reserved.

C:\WINDOWS\system32>sfc /scannow

Beginning system scan. This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection did not find any integrity violations.

C:\WINDOWS\system32>DISM /Online /Cleanup-Image /CheckHealth

Deployment Image Servicing and Management tool
Version: 10.0.19041.844

Image Version: 10.0.19045.2965

No component store corruption detected.
The operation completed successfully.

C:\WINDOWS\system32>DISM /Online /Cleanup-Image /ScanHealth

Deployment Image Servicing and Management tool
Version: 10.0.19041.844

Image Version: 10.0.19045.2965

[==========================100.0%==========================] No component store corruption detected.
The operation completed successfully.

C:\WINDOWS\system32>sfc /scannow

Beginning system scan. This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection did not find any integrity violations.

What would you suggest I do next?
 
i don't think its software. You would be getting BSOD.

How old are your USB devices? It can happen to anything, I didn't suspect my mouse as it had always worked before, but it was old. I guess age isn't really a sign, it could be a faulty device.

The worst part about mine was it didn't always happen. It would do it once and then a week or so later, it would happen again. We tried all the things you have, I even reinstalled windows once.

I hate how I search for faulty USB device in google and it still shows ports. I was specific. the only advice I found was for obviously not working USB devices that don't work when plugged in.

there has to be a way to tell what device caused the problem.

To use Device Manager to display USB info:

  1. Select Windows logo key+R, enter devmgmt.msc into the pop-up box, and then select Enter.
  2. In Device Manager, select your computer so that it's highlighted.
  3. Select Action, and then select Scan for hardware changes.
  4. Select View, and then select Show hidden devices to display more devices (for example, devices that aren't currently active).
  5. Expand the Universal Serial Bus controllers node in Device Manager and select the device in question.
  6. Right-click and select Properties to view summary device status info.
  7. Select the Details tab to view more info.
  8. Select Property to view details such as Status or Problem code.

I might be wrong but it just looks like what I had. Its not a common problem.
 
Last edited:
Still way too early to definitively call it solved but, I've tried some things that seem to have fixed the problem. I'd like to see it work without issue for at least a week but I haven't had an issue for the last 6 hours. Before I could maybe get 30 minutes in a game without a crash.

Here's what I did.

I came across a Reddit post that seemed to have similar issues and a comment said to do this:
I have been getting this error along with another for AMDRyzenMasterDriverV19 since November 2021 on boot. Seemingly this has not caused any issues as I haven't noticed until I checked event viewer today for an unrelated issue. To get rid of this error, you need to remove the registry entries for any of the old Ryzen Master Drivers. Do the following:

  1. Open regedit
  2. Navigate to: Computer\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\
  3. Find any entries that begin with AMDRyzenMasterDriver and DELETE ALL BUT THE MOST CURRENT DRIVER VERSION. For example: I had folders titled AMDRyzenMasterDriver, AMDRyzenMasterDriverV17, and AMDRyzenMasterDriverV19. I deleted the first and second folders, but kept V19 as that is the latest (at this time) Driver for the Ryzen Master software.
  4. Restart and you should no longer see this error in event viewer

Looks like this has solved the driver failing to load.

I also think the certificate issue was resolved by disabling the TPM. (Not sure if this is an issue or not)

I'll try to post an update a few days from now to see if this has resolved the issue entirely.