**SOLVED** - VIDEO_TDR_FAILURE error 116 - Please help I'm at wits end

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
D

Deleted member 2673316

Guest
Hopefully this is somewhat the correct place for this but this is my first ever post on a forum so please forgive me if I have made a mistake. I'm at wits end with an issue that's been giving me grief for months now.

I have been plagued with BSOD (doesn't actually bluescreen but I can see in the dump files that there's a 116 error VIDEO_TDR_FAILURE) and display driver issues (sometimes windows is able to recover and gives a message "display drivers stopped working and has recovered").

The dump files and my system info can be downloaded here -> dump files and system info

Unfortunately, I'm unable to find a way to trigger this and it makes troubleshooting very difficult. It seems to just happen at random and it doesn't matter if the computer is under load playing games or editing video etc., or if it's just sitting idle or browsing the internet. Due to it's random nature and not being able to replicate the scenario where the drivers crash or the computer blue screens, it has been dragging on for months now as I can essentially only try to modify one thing at a time and then I have to wait for the next crash/bluescreen to see if it's fixed or not. This can take over a week sometimes (normally happens at least once a day though) and obviously I haven't been successful in fixing the issue as of yet, hence this post.

Over the last few months I have done an absolute crap tonne of reading on forums and have tried everything I have read and thought of myself to fix the issue to no avail. I'm starting to think it's hardware related but the hardware I have been able to swap out and test has all been fine so I have no idea. Below is a list of things I have tried to fix the issue;

* Fresh version of windows (have tried windows 10 64bit and windows 7 64bit (current))
* Reinstalled several different driver versions (using DDU for clean install)
* Freeing up as many resources as possible by only running essential programs
* Torn down the build and cleaned dust, replaced thermal paste etc. (quick visual check on cables and components also)
* Removed extra hardware not needed to run windows
* Re-seated PCI card and tried in different slots
* Re-seated RAM modules and tried with only one DIMM installed (tried all 4 DIMMs solo)
* Swapped out the GPU with another proven GPU (got the same issue)
* Bios is already at current version but I reset all the defaults (never overclocked intentionally but did notice at some point I had inadvertantly selected "air cooling OC profile (it was a default Asus OC profile in bios I must have clicked by mistake at some point)"
* Checked out the crash dumps in Bluescreenview.exe but it's over my head (offending files/drivers are; dxgkrnl.sys, dxgmms1.sys and nvlddmkm.sys across 11 dumps I still have record of) I have obviously googled with this information but haven't been able to fix with the information gathered from searching
* Ran as many diagnostics as I could think of including;
a) Checked temperature and voltage with several software's (all in normal range)
b) Furmark (ran for over an hour with no crash/overheating)
c) Memory check with windows inbuilt check and another software I can't remember (Passed with no errors)
d) Checked CPU with Intel Processor Diagnostic Tool (Passed)
* Probably a bunch of other stuff that I'm not remembering off the top of my head

If anyone has taken the time to read through this I really appreciate it and I apologise that the post is so long but I wanted to give a comprehensive overview of what I have already done to save time.

Please, if anyone has any feedback, knows exactly what the issue is or could give me some more ideas to try, please let me know. I'm happy to provide any information requested as well. Thanks again for your time.
 
D

Deleted member 2673316

Guest


Yeah mate, i'm planning on doing it for sure. Thanks.

Thoughts on this?

I have rolled back the driver to several different versions and did a clean install each time using DDU. It has crashed on every one of them. Just noticed above though, in the dump text, every dump reports "NVIDIA Windows Kernel Mode Driver, Version 417.01 ". Version 417.01 on every crash even though those crashes happened with different driver versions? Does that seem right to you?
 
1: Did you use DDU insafe mode when uninstalling the drivers?
2: It's also safe to manually delete the Nvidia folder located at C:/Nvidia

Here's what Microsoft has to say regarding hte video_tdr_failure:
(power supply and memory are listed)
https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x116---video-tdr-error

I don't want to confuse you, I would follow what everyone else has been saying here first (too many cooks). Keep this link after you've tried the other solutions
 
It's recommended by DDU to properly uninstall the drivers in safe mode. If you've tried several different versions of NVIDIA drivers and yet you're still getting the same driver version error you may not be getting a proper clean uninstall. There may be some remnants left of the old driver
 
D

Deleted member 2673316

Guest


yes sir, every time i changed the driver version i did it with DDU and did it in safe mode (could you please take a look at everything in bold in the op, i have tried a great many things). I also made sure that windows updates was turned off so that it didn't automatically install anything.

Also, I just fully stripped the computer again (every single part) and reseated everything. I'm going to do a fresh install of windows in the next few hours (hopefully) and see if that fixes it. If not I think I'm just going to chance it and buy a new power supply. Unless anyone knows how I can test the one I have with software? The only other option is to test it with another physical power supply but the only other one i have isn't capable of running a GTX TITAN.
 
D

Deleted member 2673316

Guest


Just found this on Reddit.

Read this is you have iCUE corsair software installed

I think removing iCUE is going to fix my issue. I will report back in a couple of days. Thanks for the help.
 
D

Deleted member 2673316

Guest
It's been a couple of days now. Looks like iCUE software was the culprit. It apparently only effects an obscure amount of people with specific hardware. The biggest consistency I found was that most people in the links below had a mobo with a x79 chipset.

Helpful links:

Reddit post
Corsair forum

Thanks everyone for your help.
 
Status
Not open for further replies.