Question My PC reboots all the time unless I uninstall all GPU drivers (AMD)

Jan 30, 2021
7
0
10
Hello, everybody. I've been struggling recently with a problem that is making me absolutely mad and any help is highly appreciated. Almost 1 month ago I built my very first computer and outside of a fan which came DOA everything went nice and smoothly.
First of all, my specs:

Ryzen 3600x, Asus TUF Gaming x570 plus (WI-FI), G. Skill Trident Z 16gb 3600mhz, RTX 5500 XT (8GB), WD NVMe m2 1tb, thermaltake 650W Gold, Hyper 212 EVO.

I was able to use it without any troubles at all until last Sunday when I had a sudden crash while playing Albion Online. Then it happened again playing Lol. Both very low-demanding games. The pc would just freeze or become supper sloppy and then it would reboot. Then it started rebooting doing simple tasks like browsing the internet or even logging in. There was no GPU drivers update, no windows update, or any device installation prior to these crashes. Completely out of nowhere.

I've been getting a wide variety of errors in event viewer from Kernel-Power 41 to WHEA-logger id 18. The only consistent thing is that the PC would run just fine in safe mode 100% of the time, not a single crash. During this past whole week, I've been trying to do a lot of troubleshooting within my reach and this is what I've discovered so far and what I've done:

-Fresh Windows Install (Didn't Work)

-P95 test (in safe mode) RAM scan, diskcheck. (Every test was successful)

-Tried the /scannow and RestoreHealth lines in cmd and no corruption found.

-Disabled DOCP profile and disable CPU boost (No difference)

-GPU and CPU temperatures are perfectly normal.

-Disabled auto restart after crashes (I just get a couple of blue screens saying something about the AMD drivers but other than that the screens just goes black)

Now, I've narrowed it down to the GPU. I uninstalled the drivers completely using DDU and now the pc is able to work in normal mode WITHOUT any GPU drivers. I've done automatic drivers install as well as manual drivers installation (using the exact same driver version that I've been using flawlessly for a whole month) and the reboots start immediately after installing the Adrenalyn drivers.

Unfortunately, I don't have a spare PSU or GPU to test if it is just a hardware problem and I need to process some RMA or if there is something I'm missing.

What is your opinion about this? The fact that the PC runs just fine w/o GPU drivers suggests that there is something going on with Windows 10 or that mean that the GPU itself is faulty? I'm just very confused about the fact that the system was perfectly stable during a whole month and started having these reboots out of the blue. What else should I try?

Sorry for the long post and every bit of help is immensely appreciated!
 
You have troubleshoot this very well.

It's not software related since you did a fresh windows install. It might be a faulty GPU vBIOS though, although not very possible.

The two things that are more probable, are the PSU and the GPU. It does sound like power issues more than GPU issue. You said that you don't have another PSU or GPU to try but is it possible to try from a friend's or your GPU in a friend's system?

Check the temps of the GPU idle and while gaming.
 
Jan 30, 2021
7
0
10
You have troubleshoot this very well.

It's not software related since you did a fresh windows install. It might be a faulty GPU vBIOS though, although not very possible.

The two things that are more probable, are the PSU and the GPU. It does sound like power issues more than GPU issue. You said that you don't have another PSU or GPU to try but is it possible to try from a friend's or your GPU in a friend's system?

Check the temps of the GPU idle and while gaming.
So either of those two components can be culprit even if the system is able to work just fine without the GPU drivers?
 
The drivers make the GPU run at advertised clocks drawing the power it needs to have that performance, while the basic windows drivers have the GPU run at lowest clocks possible with minimal power draw just to display to a monitor. So, basic drivers = low power, low clocks and proper drivers = high clocks, high power draw. As you can see, it does not make it clear if the problem is the GPU or the power.
 
  • Like
Reactions: adjudika
Jan 30, 2021
7
0
10
If possible do check your temps as requested. Use HWInfo64 sensors mode.
Temps are pretty normal. And since the reboot happens within the first 15 mins or so regardless of activity I have no chance to monitor Temps at intense gaming. When the problem first started and reboots didn't happen as often as they are now I could actually perform several furry ark tests without any issues.
 
You should still check them. You might have a faulty fan, mining virus, dried up thermal paste, etc. 15 mins is a lot time for modern hardware to overheat even without "you" doing anything.

Isn't it better to cover all bases before a drastic action?
 
Jan 30, 2021
7
0
10
You should still check them. You might have a faulty fan, mining virus, dried up thermal paste, etc. 15 mins is a lot time for modern hardware to overheat even without "you" doing anything.

Isn't it better to cover all bases before a drastic action?
OK. I'll give it a try as eell
 
Jan 30, 2021
7
0
10
You should still check them. You might have a faulty fan, mining virus, dried up thermal paste, etc. 15 mins is a lot time for modern hardware to overheat even without "you" doing anything.

Isn't it better to cover all bases before a drastic action?
Ok so I took out the GPU cleaned all the dust that it collected during the month it has been working, reinstalled it again (Same PCIe port though) Tried the other PCIe wire from PSU and I was trying to do the temp tests with that software but I simply can't because the system will freeze and give me bsod alongside amdkdmag code. But I was taking a look at the temps with radeon software just before the crash and it was on the 30° so nothing out of the ordinary :(

Here are the results of the minidump just in case they're useful

0: kd> !analyze -v
***
  • *
  • Bugcheck Analysis *
  • *
***

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: ffffdc0fffbe23f0, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff80625811710, The pointer into responsible device driver module (e.g. owner tag).
Arg3: ffffffffc0000001, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000003, Optional internal context dependent data.

Debugging Details:
------------------

Unable to load image amdkmdag.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for amdkmdag.sys

KEY_VALUES_STRING: 1

Key : Analysis.CPU.mSec
Value: 1686

Key : Analysis.DebugAnalysisProvider.CPP
Value: Create: 8007007e on DESKTOP-BLU2SE3

Key : Analysis.DebugData
Value: CreateObject

Key : Analysis.DebugModel
Value: CreateObject

Key : Analysis.Elapsed.mSec
Value: 2895

Key : Analysis.Memory.CommitPeak.Mb
Value: 82

Key : Analysis.System
Value: CreateObject


ADDITIONAL_XML: 1

OS_BUILD_LAYERS: 1

DUMP_FILE_ATTRIBUTES: 0x8
Kernel Generated Triage Dump

BUGCHECK_CODE: 116

BUGCHECK_P1: ffffdc0fffbe23f0

BUGCHECK_P2: fffff80625811710

BUGCHECK_P3: ffffffffc0000001

BUGCHECK_P4: 3

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT: 1

PROCESS_NAME: System

STACK_TEXT:
ffffd58cfcf549f8 fffff8061d6613ee : 0000000000000116 ffffdc0fffbe23f0 fffff80625811710 ffffffffc0000001 : nt!KeBugCheckEx
ffffd58cfcf54a00 fffff8061d6097ab : fffff80625811710 ffffdc0ff9a4a000 ffffdc0ff9a4a0f0 0000000000000000 : dxgkrnl!TdrBugcheckOnTimeout+0xfe
ffffd58cfcf54a40 fffff8061d60a57e : ffffdc0f00000101 0000000000002000 ffffdc0ff9a4a000 0000000001000000 : dxgkrnl!DXGADAPTER::prepareToReset+0x1a3
ffffd58cfcf54a90 fffff8061d660b15 : 0000000000000100 ffffdc0ff9a4aa58 0000000000000000 ffff9346afdcccf8 : dxgkrnl!DXGADAPTER::Reset+0x28e
ffffd58cfcf54b10 fffff8061d660c87 : fffff8060df24440 ffffdc0ffcd6a040 0000000000000000 0000000000000100 : dxgkrnl!TdrResetFromTimeout+0x15
ffffd58cfcf54b40 fffff8060d425975 : ffffdc0ff7a1a040 fffff8061d660c60 ffffdc0ff129ca20 ffffdc0f00000000 : dxgkrnl!TdrResetFromTimeoutWorkItem+0x27
ffffd58cfcf54b70 fffff8060d517e25 : ffffdc0ff7a1a040 0000000000000080 ffffdc0ff12a8040 000f8067b4bbbdff : nt!ExpWorkerThread+0x105
ffffd58cfcf54c10 fffff8060d5fcdd8 : fffff8060b07f180 ffffdc0ff7a1a040 fffff8060d517dd0 84282237a4202b5b : nt!PspSystemThreadStartup+0x55
ffffd58cfcf54c60 0000000000000000 : ffffd58cfcf55000 ffffd58cfcf4f000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x28


SYMBOL_NAME: amdkmdag+b1710

MODULE_NAME: amdkmdag

IMAGE_NAME: amdkmdag.sys

STACK_COMMAND: .thread ; .cxr ; kb

FAILURE_BUCKET_ID: 0x116_IMAGE_amdkmdag.sys

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {1c94a7e8-453f-c7b9-1484-0a0454a0ee36}

Followup: MachineOwner
---------