[SOLVED] Computer hard-crashes randomly during games

Status
Not open for further replies.

Dambroda

Prominent
Aug 8, 2021
5
0
510
BUILD

OS: Windows 10

CPU: i9-11900K

MOBO: GIGABYTE Z490 VISION G LGA 1200 Intel Z490 ATX

Memory: G.Skill Ripjaws V 32GB (16x2), DDR4-3600

Storage: Samsung 970 EVO Plus 2TB M.2-2280 NVME

Video Card: Gigabyte GeForce RTX 3080 TI

PSU: SeaSonic FOCUS Plus Gold 850W

-----


I built this computer a couple of months ago and have been plagued by this issue from the beginning. I've put a lot of effort into troubleshooting, but I'm starting to run out of ideas!

General description of the problem: Seemingly randomly while I'm playing games, my whole computer will shut down and then automatically restart. Sometimes its five minutes after launching a game, sometimes I can play games for 4 days and it's fine. Sometimes its during resource intensive games, sometimes its during less resource intensive games. I have not experienced any crashes while not playing a game.

Long story short, I'm pretty confident it's related to the graphics card, but I'm not sure if there are other steps I should take to try and resolve the issue or if I should just look into returning it.

Datapoints:

  • Crashes occurred even after disabling XMP for RAM
  • Crashes occurred even after swapping out PSU
  • Crashes were reported as LiveKernelEvent 117, 141, and 144; after restart windows would report BlueScreen 116 in the reliability monitor, but I never saw any blue screens.
  • Crashes occurred even after uninstalling graphics card drivers and installing older versions with DDU
  • I couldn't "force" a crash with benchmarks or simulators (I didnt try for too too long, though)
  • After one crash, the system would not turn on at all until I took out the graphics card. This led me to believe there was something in it that might have shorted, and I sent in the card for repair finally "confident" that I had found the root of the problem.
  • Crashes did NOT occur when swapping out to my GTX 1080 while the 3080 was getting repaired. I had it in for a month.
  • I got the card back a few days ago (RMA didn't say what was "repaired", of course), installed it, and crashed the same day. The crash was slightly different (rather than instantly shutting down, the PC froze for 5 seconds first and I could hear sound still), so I updated graphics drivers again hoping that maybe it was a different issue. Unfortunately, yesterday it crashed during a game in a way identical to the original issues.
I installed various logging tools during my troubleshooting, here's an example of a crash as logged by GPU-Z (logged every second):

Date​
GPU Clock [MHz]​
Memory Clock [MHz]​
GPU Temperature [°C]​
Hot Spot [°C]​
Memory Temperature [°C]​
Fan 1 Speed (%) [%]​
Fan 1 Speed (RPM) [RPM]​
Fan 2 Speed (%) [%]​
Fan 2 Speed (RPM) [RPM]​
Memory Used [MB]​
GPU Load [%]​
Memory Controller Load [%]​
Board Power Draw [W]​
GPU Chip Power Draw [W]​
PWR_SRC Power Draw [W]​
PWR_SRC Voltage [V]​
8-Pin #1 Power [W]​
8-Pin #2 Power [W]​
Power Consumption (%) [% TDP]​
PerfCap Reason []​
CPU Temperature [°C]​
8/7/2021 19:34​
2010​
1187.7​
59​
68.1​
66​
69​
1607​
68​
1676​
3964​
47​
16​
233.6​
109​
82.2​
12.2​
108.7​
87​
63.1​
4​
63​
8/7/2021 19:34​
2010​
1187.7​
58.7​
68​
64​
69​
1604​
68​
1676​
3965​
44​
13​
228.3​
107.1​
80.9​
12.2​
106.5​
84.9​
61.7​
4​
63​
8/7/2021 19:34​
2010​
1187.7​
58.6​
68.1​
64​
69​
1602​
68​
1677​
3959​
41​
12​
216.5​
99.3​
79.3​
12.2​
101.7​
80.3​
58.5​
4​
62​
8/7/2021 19:34​
2010​
1187.7​
58.7​
68.2​
64​
69​
1604​
68​
1676​
3966​
42​
13​
224.9​
103.6​
81.1​
12.2​
105.2​
83.5​
60.8​
4​
64​
8/7/2021 19:34​
2010​
1187.7​
58.5​
67.8​
64​
69​
1602​
68​
1676​
3963​
43​
13​
224.7​
103.9​
80.5​
12.2​
105​
83.4​
60.7​
4​
62​
8/7/2021 19:34​
2010​
1187.7​
58.4​
67.8​
64​
69​
1602​
68​
1679​
3972​
42​
12​
218.7​
99.4​
80.3​
12.2​
102.5​
80.9​
59.1​
4​
62​
8/7/2021 19:34​
2010​
1187.7​
58.6​
68.1​
64​
69​
1602​
68​
1674​
3966​
42​
13​
219.9​
100.4​
80.6​
12.2​
102.9​
81.7​
59.4​
4​
62​
8/7/2021 19:34​
2010​
1187.7​
59​
68.1​
66​
69​
1602​
68​
1674​
3975​
46​
14​
230.7​
107.6​
81.9​
12.2​
107.3​
86.1​
62.4​
4​
63​
8/7/2021 19:34​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
-​
0​
0​
63​


Another datapoint that the graphics card fails before the system goes down.... and also eliminates high usage / temps as the culprit, as far as I can interpret it.


Here's an !analyze on one of the minidumps (if you feel like you could glean more from further investigation of the minidump, I can upload it for you)

Code:
VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: ffff8a0f8a4bf010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff80368cf6bd8, The pointer into responsible device driver module (e.g. owner tag).
Arg3: ffffffffc000009a, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000004, Optional internal context dependent data.

Debugging Details:
------------------

Unable to load image \SystemRoot\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_5d5c294bb8d17217\nvlddmkm.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for nvlddmkm.sys
*** WARNING: Unable to verify timestamp for win32k.sys

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.Sec
    Value: 2

    Key  : Analysis.DebugAnalysisProvider.CPP
    Value: Create: 8007007e on DESKTOP

    Key  : Analysis.DebugData
    Value: CreateObject

    Key  : Analysis.DebugModel
    Value: CreateObject

    Key  : Analysis.Elapsed.Sec
    Value: 9

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 86

    Key  : Analysis.System
    Value: CreateObject


BUGCHECK_CODE:  116

BUGCHECK_P1: ffff8a0f8a4bf010

BUGCHECK_P2: fffff80368cf6bd8

BUGCHECK_P3: ffffffffc000009a

BUGCHECK_P4: 4

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)

BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT:  1

PROCESS_NAME:  System

SYMBOL_NAME:  nvlddmkm+dc6bd8

MODULE_NAME: nvlddmkm

IMAGE_NAME:  nvlddmkm.sys

STACK_COMMAND:  .thread ; .cxr ; kb

FAILURE_BUCKET_ID:  0x116_IMAGE_nvlddmkm.sys

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10




Are there ANY variables / compatibility issues that I could be missing besides the card being faulty? The fact that my 1080 works fine could be the nail in the coffin, but I am obviously very reluctant to return the card in this market if there are any other steps I could be taking...
 
Last edited:
Solution
Welcome to the forums, newcomer!

The culprit might be your GPU driver. Also, you should check and see if you can update your OS to version 21H1. You should also check and see what your BIOS version is at the time of writing. As for your GPU drivers, use DDU to uninstall your GPU drivers, reboot and with the latest version sourced from Nvidia, manually reinstall the driver for your GPU in an elevated command, i.e, Right click installer>Run as Administrator. I would advise against older revision drivers since the GPU had issue with launch day drivers.

On another note, do you know of any friend or neighbor who owns a system with a reliably build PSU that has 850W of power for the entire system? Yes Seasonic are in the top tier of units...

Lutfij

Titan
Moderator
Welcome to the forums, newcomer!

The culprit might be your GPU driver. Also, you should check and see if you can update your OS to version 21H1. You should also check and see what your BIOS version is at the time of writing. As for your GPU drivers, use DDU to uninstall your GPU drivers, reboot and with the latest version sourced from Nvidia, manually reinstall the driver for your GPU in an elevated command, i.e, Right click installer>Run as Administrator. I would advise against older revision drivers since the GPU had issue with launch day drivers.

On another note, do you know of any friend or neighbor who owns a system with a reliably build PSU that has 850W of power for the entire system? Yes Seasonic are in the top tier of units but it's a good idea to rule out perhaps a faulty PSU or one that can't deliver 750W to the entire system when your system/GPU is taxed. To an extent it might rule out a faulty GPU and a faulty PSU on your end.
 
Solution

Dambroda

Prominent
Aug 8, 2021
5
0
510
Welcome to the forums, newcomer!
Thank you!

Also, you should check and see if you can update your OS to version 21H1.
I will perform that update now!

You should also check and see what your BIOS version is at the time of writing.
I had to update the bios for it to support my CPU, so I can say for certain it is on F20. That appears to still be the latest.

As for your GPU drivers, use DDU to uninstall your GPU drivers, reboot and with the latest version sourced from Nvidia, manually reinstall the driver for your GPU in an elevated command, i.e, Right click installer>Run as Administrator. I would advise against older revision drivers since the GPU had issue with launch day drivers.
I went through this process once before when I first was experiencing the crash, but there has been a new driver version since then so I will do the full clean reinstall again.

On another note, do you know of any friend or neighbor who owns a system with a reliably build PSU that has 850W of power for the entire system? Yes Seasonic are in the top tier of units but it's a good idea to rule out perhaps a faulty PSU or one that can't deliver 750W to the entire system when your system/GPU is taxed. To an extent it might rule out a faulty GPU and a faulty PSU on your end.

Unfortunately no, but my original thoughts were that it was the PSU, and I actually returned it for a replacement. So I have used two different 850W PSUs and encountered the issue. They are both the same model though, to be fair.


Thank you for the advice, I will perform those two updates (21H1, nvidia clean re-install) and followup if I experience another crash (or if I think it has resolved the issue). I still welcome any more steps people might have or requests for more info!
 

Dambroda

Prominent
Aug 8, 2021
5
0
510
Apologies for late update; unfortunately the device experienced a crash again within a couple days of those suggested steps. The old graphics card has been in for a while now again without any issue. I'm at a loss, so I guess I'll be returning the card... but I was lucky enough to grab another one so I'll be able to truly confirm if the card itself was the issue or some other random compatibility with both of the PSUs I've tried or maybe even my mobo.
 

Dambroda

Prominent
Aug 8, 2021
5
0
510
New card arrives in a few days, and then I'll have a few days of testing before I can confidently say if it's something else about my build or my original card was defective. I'll update when I know for sure
I crashed with the new card today -- there is something else about the system causing the problem. My next best ideas are to try a different mobo and/or a third PSU.
 
Status
Not open for further replies.