Question VIDEO_TDR_ERROR (I swear I have tried everything)

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Feb 23, 2022
14
0
10
I've looked everywhere I can and I just can't seem to solve this issue.

PC Specs:
  • i5-11600kf (NO OC)
  • MSI GeForce RTX 3060 VENTUS 3X 12G OC
  • MSI MPG Z590 Gaming Plus (BIOS is updated)
  • Corsair RM750x PSU
  • 16GB ADATA XPG Z1 (3200 // In slots A2 and B2)
  • Kingston SA400M8240G (Boot Drive)
  • 1 Seagate Barracuda 2TB HDD
  • Corsair 4000D Airflow
  • CoolerMaster Hyper 212 EVO Black CPU Air Cooler
Things I've already tried:
  • Clean install of windows
  • DDU to reinstall GPU Drivers
  • New mobo
  • New GPU
  • RAM tested with memtest86
  • Stress test for temps (CPU hit a max of 73C and GPU 61C)
  • Fast startup is off/power plans changed
  • SFC Scan
This started last year towards the end of the year. I originally had a EVGA RTX 2060 KO and an ASROCK Z590 c/ac motherboard when this issue started. I changed out the GPU and the issue persisted. Tried the original GPU in a friends PC and he has had no issues. I then changed out my mobo to the MSI one. Issue still persists. I changed out the PSU from an EVGA 600 to Corsair RM750x and the issue still persists. I have clean installed windows twice. I have ran DDU multiple times. I have health checked the drives and they are all good. RAM passed memtest86 and windows memory diagnostic. I also ran the intel diagnostic on my CPU to confirm it was not the CPU. Today I recieved 3 of the VIDEO_TDR_ERROR while on desktop. I have logged 38hrs on a game on this new set up without an issues until today. Temps in game were fine.





This is the latest dump:

***
  • *
  • Bugcheck Analysis *
  • *
***

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: ffffd28e08e38010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8073810cfe8, The pointer into responsible device driver module (e.g. owner tag).
Arg3: ffffffffc000009a, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000004, Optional internal context dependent data.

Debugging Details:
------------------

Unable to load image \SystemRoot\System32\DriverStore\FileRepository\nvmdig.inf_amd64_48a94de4b861e2fb\nvlddmkm.sys, Win32 error 0n2


KEY_VALUES_STRING: 1

Key : Analysis.CPU.mSec
Value: 2015

Key : Analysis.DebugAnalysisManager
Value: Create

Key : Analysis.Elapsed.mSec
Value: 3116

Key : Analysis.Init.CPU.mSec
Value: 265

Key : Analysis.Init.Elapsed.mSec
Value: 2901

Key : Analysis.Memory.CommitPeak.Mb
Value: 97

Key : WER.OS.Branch
Value: vb_release

Key : WER.OS.Timestamp
Value: 2019-12-06T14:06:00Z

Key : WER.OS.Version
Value: 10.0.19041.1


FILE_IN_CAB: MEMORY.DMP

BUGCHECK_CODE: 116

BUGCHECK_P1: ffffd28e08e38010

BUGCHECK_P2: fffff8073810cfe8

BUGCHECK_P3: ffffffffc000009a

BUGCHECK_P4: 4

VIDEO_TDR_CONTEXT: dt dxgkrnl!_TDR_RECOVERY_CONTEXT ffffd28e08e38010
Symbol dxgkrnl!_TDR_RECOVERY_CONTEXT not found.

PROCESS_OBJECT: 0000000000000004

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

PROCESS_NAME: System

STACK_TEXT:
ffffbe8cbb1e72d8 fffff80733ef663e : 0000000000000116 ffffd28e08e38010 fffff8073810cfe8 ffffffffc000009a : nt!KeBugCheckEx
ffffbe8cbb1e72e0 fffff80733ea6c34 : fffff8073810cfe8 ffffd28e04d448e0 0000000000002000 ffffd28e04d449a0 : dxgkrnl!TdrBugcheckOnTimeout+0xfe
ffffbe8cbb1e7320 fffff80733e9f76c : ffffd28e04fcc000 0000000001000000 0000000000000004 0000000000000004 : dxgkrnl!ADAPTER_RENDER::Reset+0x174
ffffbe8cbb1e7350 fffff80733ef5d65 : 0000000000000100 ffffd28e04fcca70 0000000000000002 ffffd28dfe9dd620 : dxgkrnl!DXGADAPTER::Reset+0x4dc
ffffbe8cbb1e73d0 fffff80733ef5ed7 : fffff8071cb25440 0000000000000000 0000000000000000 0000000000000300 : dxgkrnl!TdrResetFromTimeout+0x15
ffffbe8cbb1e7400 fffff8071c0b86c5 : ffffd28e06c51040 fffff80733ef5eb0 ffffd28df70ce2a0 ffffd28d00000000 : dxgkrnl!TdrResetFromTimeoutWorkItem+0x27
ffffbe8cbb1e7430 fffff8071c155a05 : ffffd28e06c51040 0000000000000080 ffffd28df7096040 0000000000000000 : nt!ExpWorkerThread+0x105
ffffbe8cbb1e74d0 fffff8071c1fea08 : ffffab0078f40180 ffffd28e06c51040 fffff8071c1559b0 0000000000000246 : nt!PspSystemThreadStartup+0x55
ffffbe8cbb1e7520 0000000000000000 : ffffbe8cbb1e8000 ffffbe8cbb1e1000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x28


SYMBOL_NAME: nvlddmkm+e2cfe8

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

STACK_COMMAND: .cxr; .ecxr ; kb

FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

OS_VERSION: 10.0.19041.1

BUILDLAB_STR: vb_release

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {c89bfe8c-ed39-f658-ef27-f2898997fdbd}

Followup: MachineOwner
---------


Also from Whocrashed:

On Wed 2/23/2022 6:23:57 PM your computer crashed or a problem was reported
crash dump file: C:\Windows\Minidump\022322-5671-01.dmp
This was probably caused by the following module: nvlddmkm.sys (0xFFFFF8073810CFE8)
Bugcheck code: 0x116 (0xFFFFD28E08E38010, 0xFFFFF8073810CFE8, 0xFFFFFFFFC000009A, 0x4)
Error: VIDEO_TDR_ERROR
file path: C:\Windows\System32\DriverStore\FileRepository\nvmdig.inf_amd64_48a94de4b861e2fb\nvlddmkm.sys
product: NVIDIA Windows Kernel Mode Driver, Version 511.79
company: NVIDIA Corporation
description: NVIDIA Windows Kernel Mode Driver, Version 511.79
Bug check description: This indicates that an attempt to reset the display driver and recover from a timeout failed.
A third party driver was identified as the probable root cause of this system error. It is suggested you look for an update for the following driver: nvlddmkm.sys (NVIDIA Windows Kernel Mode Driver, Version 511.79 , NVIDIA Corporation).
Google query: nvlddmkm.sys NVIDIA Corporation VIDEO_TDR_ERROR



At this point I am at a loss as to where to go from here. Would appreciate any help.
 

Colif

Win 11 Master
Moderator

gardenman

Splendid
Moderator
Hi, I ran the dump file through the debugger and got the following information: https://jsfiddle.net/fqgdbe51/show This link is for anyone wanting to help. You do not have to view it. It is safe to "run the fiddle" as the page asks.

File information:040622-7281-01.dmp (Apr 6 2022 - 06:34:11)
Bugcheck:VIDEO_TDR_ERROR (116)
Driver warnings:*** WARNING: Unable to verify timestamp for nvlddmkm.sys
Probably caused by:memory_corruption (Process running at time of crash: System)
Uptime:1 Day(s), 23 Hour(s), 02 Min(s), and 10 Sec(s)

Comments:
  • The overclocking driver "NTIOLib_X64.sys" was found on your system. (MSI Afterburner or other MSI software)
  • The overclocking driver "RTCore64.sys" was found on your system. (MSI Afterburner)
  • The overclocking driver "IOCBios2.sys" was found on your system. (Intel Extreme Tuning Utility)
  • BIOS info was not included. This can sometimes mean an outdated BIOS is being used.

The nvlddmkm.sys file is a NVIDIA graphics card driver. There are a few things you can do to fix this problem. First off, try a full uninstall using DDU in Safe Mode then re-install the driver (more information). Or try getting the latest version of the driver. Or try one of the 3 most recent drivers released by NVIDIA. Drivers can be found here: http://www.nvidia.com/ or you can allow Windows Update to download the driver for you, which might be a older/better version.

This information can be used by others to help you. Someone else will post with more information. Please wait for additional answers. Good luck.
 

jclaumann

Reputable
Dec 22, 2018
12
0
4,510
So, I've been doing some reading on what could be causing the VIDEO_TDR_ERROR and I found a post on reddit explaining a certain "bug" with the memory power states of a GPU (source: here).
TL;DR
The GPUs have 3 states, P0, P1 and P2 going from "Maximum energy consumption" to "Minimum energy consumption". The problem in question MAY happen when the GPU goes from P0 state to P2.
The solution he proposed is to turn off the property that forces the GPU to enter P2 state, this basically makes the GPU always work on the highest clock and, assuming your problem is this state change, will serve as a workaround for your problem.

For this workaround you will want to download NVidia Profile Inspector (Basically this is the NVidia Control Panel but with EVERY option at your disposal, an "advanced mode"), you can get it here.
After downloading and unpacking you will run it, look for the classification "5 - Common" and turn OFF the option CUDA - Force P2 State.

I did it to my rig just to see if there would be any side effects and so far I got none (2 days with it turned off). Like I said before here I am no expert so there could be some negative side effect that I don't know of, hopefully some other user will comment on this and clarify if there is side effects.
 
Feb 23, 2022
14
0
10
I've pretty much tried everything on here, I went a month without any issues and then when I watching a video it decides to black screen again citing the same issue as before showing the same error as before. At a loss at this point. It doesn't seem to be something I can recreate...it just happens.
 
May 23, 2022
1
0
10
Long-time reader, first time poster. I registered just to offer what appears to be a solution for me.

Like @Scallywops the symptom was a black screen and GPU fans screaming, but the soft-power-down button on the case would always lead to a gentle shutdown. Often, this was in Microsoft Flight Simulator 2020, which is very hard on both GPU and CPU, but rarely it would happen outside a game.

Scallywops' research on behalf of us all in swapping out parts and trying other things was very helpful, so at a similar loss (but without the swapping out of parts - no part really seemed to be failing) I looked for configuration issues, but didn't find anything promising (XMP off, no over-clocking of GPU or CPU, etc.). Further, while I had this problem last year, it eventually stopped happening, but resurfaced in the past few weeks. GPU and CPU are NOT overheating; GPU temps top at about 62 degrees C.

So...what could have changed in the past few weeks, except...the weather?

Finally, I think I have found the problem and, for me, the solution:
THE POWER SUPPLY WAS OVERHEATING

My Seasonic Prime Gold 80+ 750w seemed to be working perfectly fine, and while I considered swapping it out (it's 3 years old), other people with this problem replaced theirs with no change in behavior.

Finally I paid attention Hybrid Mode on/off switch on the PSU, next to the power switch. Mine was currently set to Hybrid Mode ON, which I don't think was intentional - I probably accidentally pressed it while looking for the power switch. Getting out the Seasonic documentation, I found that the way that I had mounted it called for only the Normal mode to be used (in Hybrid mode the PSU fan should face the motherboard; my case has a bottom grill/outlet).

Changing the switch to Normal and...3 days later I get -0- crashes, despite numerous hours running MSFS, including some of the new resource-intensive jetliners.

I hope this helps someone.
 

Karadjgne

Titan
Ambassador
Hybrid is an economy setting. Normally it's set to default by most companies who use such on their psus. What it does is turn off the fan below a certain temperature threshold, and thats all. Gpus tend to have the same thing. With a gpu, temp is usually set at 60-65°C. Below that the fans don't spin, not until the gpu reaches the target temp, and then they kick in.

That has its good points and bad. It's good because the psu is passively cooled until target temp, so no fan noise if not doing anything power hungry. It's bad because the psu will turn on the fans at a high rate to cool itself below target, then shut the fan off. Ramps galore, which can get annoying.

Fan up simply means natural thermal convection will pull air in the back of the psu to replace the heated air that's now going up into the case. Not necessarily a good thing overall as the gpu sits right above the psu.

Personally I think Hybrid mode is more of a gimmick than serves any useful purpose in most cases, better to have a fan down normal psu with normal curves and normal sealed environment away from adding to case/gpu temps. Plus normal mode will also generally run a fan at a much lower rate consistently than Hybrids ramp ups off/ons.
 

se7enX89X

Reputable
Feb 24, 2019
17
0
4,510
Been going through the same thing only it never happens when playing games. Only happens when the gpu is under light load like browsing the internet or watching videos on Youtube. I have also tried many potential fixes with no luck.
 
Feb 23, 2022
14
0
10
Been going through the same thing only it never happens when playing games. Only happens when the gpu is under light load like browsing the internet or watching videos on Youtube. I have also tried many potential fixes with no luck.
At this point I am having the same issue. No problems playing games for several hours, but I am on my desktop and it just restarts with the same error as before. Not been able to find a culprit at this point.