Question GPU upgrade causes reboots on my Dell Precision T3610 ?

Cyber_Akuma

Distinguished
Oct 5, 2002
451
12
18,785
I have a Dell Precision T3610 Desktop Workstation. I would like to clarify that this PC uses a proprietary PSU/Motherboard/Case so I can't really replace any of those with standard parts.

My PC came with a 685 Watt PSU, and it originally came with an Intel Xeon 1260v2 and Nvidia Quadro K4000, as well as a 1500RPM SAS drive. I replaced the CPU with a Xeon 2667v2, upgraded the RAM to 8x16GB and replaced the SAS HDD with two SSDs and a 7200RPM HDD.

So far the system was working perfectly fine. But then I tried to upgrade the GPU. I got an HP OEM style RTX 2060 Super, hoping that the lower-profile and lower-power GPU would both be easier to fit in my case and not demand too much power. This kind of card:

View: https://i.imgur.com/WrCo4JJ.jpg


I thoroughly tested the card in another machine for about two weeks and had no issues.

However, one problem is that the PSU only has a single proprietary 8-pin connector for GPU power, which by default gets split into two 6-pins, but that RTX 2060 Super I got has a single 8-pin. I wanted to just get an 8-to-8 pin cable, but I could not find one, all of them were 8-pin to 2 x 8-pin which I do not trust. So I just got a cable to converts my two 6-pins back to a single 8-pin, this one:

https://www.amazon.com/dp/B07V4GGS43

Here it is installed:
View: https://i.imgur.com/mWEvDnX.jpg


One oddity that I noticed was that from the stock OEM cable that connects to my PSU, only 6 of the 8 pins appear to be populated from the part that connects to the PSU: View: https://i.imgur.com/IRwaHXR.jpg


The whole setup looks like this:
View: https://i.imgur.com/n64XbRu.jpg


So after I installed the card I ran more tests to make sure my system can handle it. I tried Furmark's stress test, and it ran fine for about 5 minutes. According to my UPS my system was pulling around 300-360 watts.

Then I closed that and tried Prime95, again my system was pulling in the mid-300s according to my UPS.

Then I ran both... to my surprise it seemed to be pulling the same amount of power, a few spikes to 400 watts but that's it.

I walked away for a minute, and when I came back the system had rebooted.

No doubt some kind of current protection had kicked in while I was gone, and I was looking for advice on what to do. I would have assumed that 675 watts would be enough for all this, and I have no idea if it's the PSU at fault. If so there are 800 and 1300 watt PSUs, but I am not sure if they would work for my system. I see a few listing them for the Precision T3600 series and up... but most sites list them for the T5000 series and up. I have no idea if the 800+ watt PSUs are compatible with my system, and I don't want to risk putting in an incompatible PSU... not like I can use a standard PSU.

Or I wondered if it could maybe be the 2x6 to 1x8 pin adapter. Like I mentioned I found it weird that only 6 pins on the OEM cable of all things are populated, and that my system was not pulling more than 400 watts with the CPU and GPU both being stressed while it was pulling in the mid-high 300s with each separate part stressed, so I don't know if it might be either the OEM cable or the 6 to 8 pin adapter I got that is not supplying enough power.

And if that is the case, if that 1x8 to a 2x8 adapter might be a better idea, this one here:
https://www.amazon.com/dp/B07P82ZH22

I do not trust that kind of cable because it is splitting a single 8 pin to two, something that seems pretty risky and dangerous, but I also would only be using it to power a single 8-pin and it APPEARS to populate all the pins on the PSU side (though it seems like it just re-routes the ground pins on the GPU-end to other pins?)

Or if this could be something else entirely or I am just pushing my system too hard with all the modifications I made: upgrading the CPU, maxing out the RAM, putting in three drives in place of the one which required a SATA splitter (though that is replacing one 15000 SAS drive with a 7200 and two SSDs on a system designed to handle up to two of those SAS drives so I would assume the SATA power is not being overloaded) and now upgrading the GPU.

Any advice on what could be the issue and how to try to solve it?
 

Cyber_Akuma

Distinguished
Oct 5, 2002
451
12
18,785
I posted about this before but wasn't able to get help, I have been trying everything I can think of for the last week and I am still having the random crashes.

So the story is that I currently have two PCs I am using, a 11700K based system with a 850 watt PSU I built myself and a Dell Precision T3610 with a 675Watt PSU. The Dell was supposed to be a backup system, but the 11700K system while functional is only half-built and it will be a while until I can finish it, so I am mostly using the "backup" Dell system as my main for now and remoting into the 11700K when I need it.

The Dell originally had a Xeon E5-1620v2 and 16GB of RAM, but has been upgraded to a Xeon E5-2667v2 and 8x16GB RAM configuration... when I upgraded the RAM I was very rarely getting an error only in OCCT, but that seemed to go away and only happened when I tested all 128GB, didn't happen within what I generally use and no other RAM test ever showed an error.

Since I will be stuck on this Dell for longer than I expected, I decided to also get a GPU upgrade so I can at least game on it a little as the Quadro K4000 it came with was rather useless for that. I got an HP OEM 2060 Super, these are the specs:

View: https://i.imgur.com/WrCo4JJ.jpg


View: https://i.imgur.com/K8xN64m.gif



Since I wanted to make sure this card worked on it's own, I installed it in my 11700K system first and ran every benchmark, stress test, and any other test I could think of on it. Ran a neural network for a short time too and played several games over the course of a week or two. No issues whatsoever, it even operated exactly with no deviation of the expected performance of this OEM model card according to the benchmarks.

So I installed it in the Dell then (Yes I used DDU in safe mode first), I had to use a 2x6 pin to 1x8 pin cable to power it but many others have used this exact same model desktop to do something similar, it looked like this when installed:

View: https://i.imgur.com/IRwaHXR.jpg


View: https://i.imgur.com/mWEvDnX.jpg


View: https://i.imgur.com/n64XbRu.jpg


The side cover was left off so I could monitor the wiring to make sure nothing is getting too warm or showing obvious faults, the cover is still off right now, not sure if this makes a difference.

I wanted to make sure the PSU could handle it so I ran Furmark's stress test and saw no issues. Then I ran Prime95 and no issues. Then I tried both at the same time and walked away for a minute, when I came back the system had rebooted.

Event log showed nothing other than an unexpected reboot, and Prime95+Furmark ran again together for 20 minutes with no issues, so I figured it was a one-off and proceeded to do the same tests on the Dell. All of them again passed (Although OCCT at first was claiming the CPU benchmark was crashing, but nothing in the event logs) and some things that I thought were issues such as mentions of nvlddmkm crashing in the event log turned out to just be red herrings from a demo I was using to test that is a known issue for many. One issue that did crop up however is that the GPU ran much hotter, not a surprise as the Dell's case is much smaller and has much lower airflow. Card was around 70C-80C in the 11700K, but it was constantly at 80C and many times hit 85C in the Dell.

I thought I was done, but then two days later as I was using the PC normally and not even doing anything demanding on it my monitor went blank claiming there was no signal. It looked like it was constantly about to get the signal back then then losing it again. I tried soft-rebooting/shutting down by pressing the power button, Win+X then U U, and mashing Ctrl+Alt+Del but nothing happened, so I forced a hard power off.

Event Viewer said nvlddmkm had crashed three times, and then it was flooded with warnings about the display driver having "successfully" recovered.

I tried OCCTs benchmark again as that was the only thing that really claimed to have crashed more than once in the same test, ran every CPU benchmark separately several times, and then the white suite several times... no problems.

I then tried the CPU stress test.... never heard my fans go that hard before, but it seemed to be working... from when I could focus on my screen throughout the fan noise at least. After about 20 minutes however the same no signal issue happened.

At this point I was just confused what the issue could possibly be. First thing that instantly comes to mind is the GPU, and I am still barely within the time window to claim it's defective on eBay if it is the GPU, but it never gave me any problems no matter how hard I pushed it in the 11700K system. I also considered maybe it's the power, but 675 watts should be enough unless this cabling is wrong (I was a little confused why the 8-pin power that connects to the PSU side only had 6 wires populated) or the PSU is just too old. Also wondered if it's heat (and if having the side panel off contributed to this) since the GPU is running much hotter and when I stressed the CPU I never heard my fans run that hard before. Or it could be a faulty CPU or even RAM after all, but if it's the CPU or RAM that's faulty and crashing the system why am I getting an error that it's the GPU driver that crashed?

After several days of trying to get help and no luck someone suggested maybe it's my drivers, so I updated to the latest version (531.18, released 2/28/2023 ) and can spent an entire day running every OCCT stress test for an hour. Every CPU test, the RAM test, every GPU test, the GPU VRAM test, even the Power test that rocketed my power usage to 450W+. I was also using my system for about 20-60 minutes between every test.

No problems whatsoever.

I started to assume those must have been a random occurrence and that my system must finally be working fine now and started playing a game.... 10 minutes into the game the system crashed in the exact same way again. Image freezes for about 2 seconds, then I get the "no signal" message on my monitor. I could still hear the game's music in the background but it didn't appear to actually still be running since I could not hear anything else actually happening in the game.

I have no idea where to even begin considering what could be the fault, and how to try to fix it. Especially in a way that won't leave my system down for days or even weeks while I try to hunt down replacement parts. At the same time since I use this system 24/7 I don't want it just randomly crashing on me and forcing a shutdown, especially since that can cause data corruption.

Starting to lose my sanity here. Every stress test I throw at it shows no issues, but then when I just normally use my system I get that crash.
 
The problem with Dell PSUs is that they are multi rail. When using 2x6 pin to 8 pin adapter to power GPU you have no way of forcing equal power draw from each line, so worst case scenario the GPU will try to draw (nearly) all power from single 6 pin possibly causing voltage drop and in consequence driver failure.
And the other problem is running one fan GPU in hot case. This could as well create those crashes if temp even temporarily and locally exceeds safe limits.
Unfortunately you are trying to run gaming GPU in a system that is ill suited for it.
 

Cyber_Akuma

Distinguished
Oct 5, 2002
451
12
18,785
The problem with Dell PSUs is that they are multi rail. When using 2x6 pin to 8 pin adapter to power GPU you have no way of forcing equal power draw from each line, so worst case scenario the GPU will try to draw (nearly) all power from single 6 pin possibly causing voltage drop and in consequence driver failure.

What about a cable like this?

https://www.amazon.com/COMeap-Power-Adapter-Cable-13-inch/dp/B07HCYDK5K

I didn't get one of these because I don't trust them, since they split a single 8-pin into two, which sounds like a dangerous thing to do, but I will only be using one of the plugs anyway.

The cable that plugs into my PSU only has 6 pins populated on the 8 pin plug anyway.
 

DSzymborski

Titan
Moderator
Yeah, I think the problem is PSU-related.

If you're determined to modernize this PC, I'd honestly try to find an inexpensive Sandy Bridge/Ivy Bridge motherboard used and an inexpensive airflow-oriented case. Prebuilts simply are poor upgrade candidates becasue you run into myriad issues like this, and it's better to just cut out the cancer.
 

Cyber_Akuma

Distinguished
Oct 5, 2002
451
12
18,785
Well great, it JUST happened again, and this time the RTX2060 isn't even in my system! I swapped it out temporarily for a GT720 until I can figure this out about a week ago. I thought for sure it was the RTX2060 since the day I installed it my system would crash within roughly 24 hours, at most I got it running for 48, many times it wasn't even up for 24. But ever since I swapped to the GT720 it was running fine so I was trying to look into if it was the card, cooling, or power. But just now the same thing happened with the GT720 after being fine for a week. I was going between Chrome windows when suddenly the Window I swapped to was completely black, then my entire display become garbled and the video signal was lost. Trying to reset the video driver with Win+Ctrl+Shift+B did nothing, mashing Ctrl+Alt+Del did nothing, the only difference was that the second I pressed the power button it hard shutdown instead of making me hold it down for 10 seconds.

I heard someone mention that I might have to disable SERR messages or VT-x in my BIOS? Anyone ever heard of having to do that to fix something like this? I use Virtualbox on my system as well, would disabling VT-x impact that?

The event log didn't show much, just made a mention of "The computer has rebooted from a bugcheck" and generated a minidump.

The minidump seems to imply that it's somehow STILL the Nvidia driver.... despite having completely wiped my drivers and re-installed a much older driver that was the last one that supported the GT720:

Microsoft (R) Windows Debugger Version 10.0.22621.755 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\Minidump\031023-11156-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: srv*
Executable search path is:
Windows 10 Kernel Version 19041 MP (16 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Machine Name:
Kernel base = 0xfffff80550200000 PsLoadedModuleList = 0xfffff80550e2a210
Debug session time: Fri Mar 10 13:40:58.599 2023 (UTC - 6:00)
System Uptime: 4 days 20:46:45.393
Loading Kernel Symbols
...............................................................
................................................................
................................................................
....
Loading User Symbols
Loading unloaded module list
.................................
For analysis of this file, run !analyze -v
15: kd> !analyze -v
***
  • *
  • Bugcheck Analysis *
  • *
***

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: ffffc206961ea010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff80571da5838, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 000000000000000d, Optional internal context dependent data.

Debugging Details:
------------------

Unable to load image nvlddmkm.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for nvlddmkm.sys

KEY_VALUES_STRING: 1

Key : Analysis.CPU.mSec
Value: 2890

Key : Analysis.DebugAnalysisManager
Value: Create

Key : Analysis.Elapsed.mSec
Value: 5844

Key : Analysis.Init.CPU.mSec
Value: 3858

Key : Analysis.Init.Elapsed.mSec
Value: 75133

Key : Analysis.Memory.CommitPeak.Mb
Value: 103


FILE_IN_CAB: 031023-11156-01.dmp

DUMP_FILE_ATTRIBUTES: 0x8
Kernel Generated Triage Dump

BUGCHECK_CODE: 116

BUGCHECK_P1: ffffc206961ea010

BUGCHECK_P2: fffff80571da5838

BUGCHECK_P3: 0

BUGCHECK_P4: d

VIDEO_TDR_CONTEXT: dt dxgkrnl!_TDR_RECOVERY_CONTEXT ffffc206961ea010
Symbol dxgkrnl!_TDR_RECOVERY_CONTEXT not found.

PROCESS_OBJECT: 000000000000000d

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT: 1

PROCESS_NAME: System

STACK_TEXT:
ffffae89095aa808 fffff8056769555e : 0000000000000116 ffffc206961ea010 fffff80571da5838 0000000000000000 : nt!KeBugCheckEx
ffffae89095aa810 fffff80567694bc1 : fffff80571da5838 ffffc206961ea010 ffffae89095aa919 0000000000000000 : dxgkrnl!TdrBugcheckOnTimeout+0xfe
ffffae89095aa850 fffff80567abd483 : ffffc206961ea010 0000000000989680 ffffae89095aaa30 00000000019a8d58 : dxgkrnl!TdrIsRecoveryRequired+0x1b1
ffffae89095aa880 fffff80567b1b2fb : ffffc2068b491000 0000000000000001 ffffc2068b491000 0000000000000000 : dxgmms2!VidSchiReportHwHang+0x62f
ffffae89095aa980 fffff80567ae8142 : ffffae89095aaa01 00000000019a8cd7 0000000000989680 0000000000000040 : dxgmms2!VidSchiCheckHwProgress+0x3318b
ffffae89095aa9f0 fffff80567a8a11a : 0000000000000000 ffffc2068b491000 ffffae89095aab19 ffffc2068b491000 : dxgmms2!VidSchiWaitForSchedulerEvents+0x372
ffffae89095aaac0 fffff80567b0d405 : ffffc2068f3f8000 ffffc2068b491000 ffffc2068f3f8010 ffffc2068b541620 : dxgmms2!VidSchiScheduleCommandToRun+0x2ca
ffffae89095aab80 fffff80567b0d3ba : ffffc2068b491400 fffff80567b0d2f0 ffffc2068b491000 fffff8054d64b100 : dxgmms2!VidSchiRun_PriorityTable+0x35
ffffae89095aabd0 fffff80550455485 : ffffc20689929080 fffff80500000001 ffffc2068b491000 00078425bd9bbfff : dxgmms2!VidSchiWorkerThread+0xca
ffffae89095aac10 fffff80550602cc8 : fffff8054d64b180 ffffc20689929080 fffff80550455430 0000000000000000 : nt!PspSystemThreadStartup+0x55
ffffae89095aac60 0000000000000000 : ffffae89095ab000 ffffae89095a5000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x28


SYMBOL_NAME: nvlddmkm+dd5838

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

STACK_COMMAND: .cxr; .ecxr ; kb

FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {c89bfe8c-ed39-f658-ef27-f2898997fdbd}

Followup: MachineOwner
---------
 

DSzymborski

Titan
Moderator
Well great, it JUST happened again, and this time the RTX2060 isn't even in my system! I swapped it out temporarily for a GT720 until I can figure this out about a week ago. I thought for sure it was the RTX2060 since the day I installed it my system would crash within roughly 24 hours, at most I got it running for 48, many times it wasn't even up for 24. But ever since I swapped to the GT720 it was running fine so I was trying to look into if it was the card, cooling, or power. But just now the same thing happened with the GT720 after being fine for a week. I was going between Chrome windows when suddenly the Window I swapped to was completely black, then my entire display become garbled and the video signal was lost. Trying to reset the video driver with Win+Ctrl+Shift+B did nothing, mashing Ctrl+Alt+Del did nothing, the only difference was that the second I pressed the power button it hard shutdown instead of making me hold it down for 10 seconds.

I heard someone mention that I might have to disable SERR messages or VT-x in my BIOS? Anyone ever heard of having to do that to fix something like this? I use Virtualbox on my system as well, would disabling VT-x impact that?

The event log didn't show much, just made a mention of "The computer has rebooted from a bugcheck" and generated a minidump.

The minidump seems to imply that it's somehow STILL the Nvidia driver.... despite having completely wiped my drivers and re-installed a much older driver that was the last one that supported the GT720:

I would like more details like what's goign on with your PSU. Did you ever upgrade your old one as advised numerous times or did you decide to just wing it and stick with the old one?
 

Cyber_Akuma

Distinguished
Oct 5, 2002
451
12
18,785
I would like more details like what's goign on with your PSU. Did you ever upgrade your old one as advised numerous times or did you decide to just wing it and stick with the old one?

I have since returned the GPU because I noticed that it actually had some components broken off the back (actually about to drop it off at the post office tomorrow):


And rather than play whack-a-mole with replacing the psu, cpu, ram, etc one by one until the random crashing that can take up to a week to manifest stops, I am just replacing the entire system with a slightly newer Dell T5810. And yes, a 825 watt PSU for said system is on it's way right now.
 

DSzymborski

Titan
Moderator
I have since returned the GPU because I noticed that it actually had some components broken off the back (actually about to drop it off at the post office tomorrow):


And rather than play whack-a-mole with replacing the psu, cpu, ram, etc one by one until the random crashing that can take up to a week to manifest stops, I am just replacing the entire system with a slightly newer Dell T5810. And yes, a 825 watt PSU for said system is on it's way right now.

Good deal, hopefully this will resolve your issues. I'm sorry I missed your post for this long, apparently past when discussion about it would be useful.
 
  • Like
Reactions: Cyber_Akuma

Cyber_Akuma

Distinguished
Oct 5, 2002
451
12
18,785
Yeah, I hope so. If somehow an entirely new PC with a different socket, CPU, entirely different type of RAM and even a new GPU still somehow has these same issues my scream will likely manage to get heard by extraterrestrial life.

I AM planning to clone my OS drive AFTER I setup a new Windows install and test the new system for a few weeks or so on that clean install, so if it has no issues for weeks and then crashes again when I clone my OS over I will also know it was a software issue. I can't imagine how it can be though since the system worked fine being on 24/7 for two years and as soon as I replaced the GPU it started going nuts, constantly giving me errors about the GPU driver crashing even though I used DDU in safemode to do a complete clean reinstall multiple times, a clean re-install of the drivers, and even did it offline to make sure Windows Update didn't muck with it.... even cleared the shader cache just to be safe.