Question Galax RTX 3080 SG crashed to black screen ?.

ksh.sharma

Honorable
Jul 30, 2019
38
0
10,540
Hi All,
I recently installed a used Galax RTX 3080 SG LHR 10GB in my system, but while stress testing using 3D Mark, and even once while playing No Man's Sky, the system crashed to a black screen, and the fans ramped up to full speed.
Restarting immediately, the screen remained blank and the VGA post light was 'ON'. Starting up after 5 mins the system started, but immediately crashed again while starting 3D Mark Stress Test. Later looking at GPUz logs, temperature and power draw appeared to be nominal. I was able to start it up again, but I have since refrained from any gaming.
I have heard that 3080's are notorious for transient spikes, and was wondering if that was an issue here, or is the card broken ? Keeping that in mind I have an 850W PSU (Corsair RM850e) and have connected different PCIe Power rails for the 2 8-pin power connections.

Here's my configuration:
CPU: Ryzen 5 5600x
Ram: 32 GB DDR4
Motherboard: MSI B450M Pro-Vdh Max
GPU: Galax RTX 3080 SG LHR 10GB
PSU: Corsair RM850e 850W
Any help will be much appreciated.
 
Solution
Thanks. Just for the heck of it I'm returning the Corsair RMe 850W, and getting a Corsair RMe 1000W. I'll see if it works, otherwise will try refunding/exchanging the GPU.
crashed to a black screen, and the fans ramped up to full speed.
I have heard that 3080's are notorious for transient spikes, and was wondering if that was an issue here...
The entire 30 series is like that, but it gets worse as you go up in the stack. Culprits could be:
-Psu, but this one is quite new, ~so... 'press X to doubt'.
-Motherboard. Try updating the bios if you haven't already done so.
-Gpu, or more specifically, its voltage regulator, is getting hammered. Workaround is to set an fps limit and lower the gpu power limit(90% or so).


Would also like to add temperature during gaming and 3d mark was hitting 80 to 82 C
That's fine. What's the gpu hot spot and memory junction at that time?
 
Memory Junction is generally around 98 to 100C, I had repasted the TIM, but didn't change the pads on VRMs and VRAMs.
Motherboard and GPU Bios have also been updated to the latest ones available.
 
Memory Junction is generally around 98 to 100C,
Hmm, that's up there, but OK if spikes and not sitting there... and the hot spot? What's that one at? ~20C or lower is the typical gap between it and the gpu core. FYI, hot spot can go up to 110C+, but gpu fans will be at full blast before then.

I had repasted the TIM, but didn't change the pads on VRMs and VRAMs.
Good. Pads should be a last resort.
That said... you wouldn't happen to recall, or kept a record of the 3 operating temperatures BEFORE the paste job?
 
Unfortunately I did not. One thing I'd like to add is that I noticed the thermal pads had darkened in appearance and felt smooth to the touch, didn't have the usual stickiness that pads generally have. I'm thinking they might be worn out, causing these issues, but then again the Memory junction temps were around 98C only.
 
Unfortunately I did not. One thing I'd like to add is that I noticed the thermal pads had darkened in appearance and felt smooth to the touch, didn't have the usual stickiness that pads generally have. I'm thinking they might be worn out, causing these issues, but then again the Memory junction temps were around 98C only.
Unless they're torn to pieces, or disintegrated into flakes over the PCB, leave pads alone - for now. Don't want to go down that rabbit hole until the very end.

Have you tried fps cap and lower gpu power limit?
 
I have yet to try the fps cap and gpu power limit, which I will check and get back. I was however wondering if 850W PSU is enough for a RTX 3080, given that transient power spikes might be triggering OCP etc. on the PSU. Will switching to a good 1000W PSU make it more stable ? Does it matter if the PSU has a single 12V rail or dual 12V rails ?
 
I was however wondering if 850W PSU is enough for a RTX 3080, given that transient power spikes might be triggering OCP etc. on the PSU.
Yes, but wattage rating doesn't tell a thing about quality.
There's a member here that runs/ran a 3080 with a RM650X/750X, and didn't have any issues. @sizzling

Will switching to a good 1000W PSU make it more stable ? Does it matter if the PSU has a single 12V rail or dual 12V rails ?
1)No.
2)No.
 
Hi,
I lowered the GPU power limit to 90% and tried running the 3D Mark Time Spy Stress Test, however the system still crashed around the 12th iteration. I logged the temperature and other values through GPUz. Here is a snippet of the values right before the crash (And the overall highest) :
1. GPU Temperature : 79.8 C ( 79.8 C)
2. Hot Spot: 90.5 C (91 C)
3. Memory Temperature: 90 C (90 C)
4. Board Power Draw: 286 W (294.8 W)


After that this crash however, they system kept crashing 3 - 4 mins after startup with just chrome open. I am beginning to think that this card is a complete loss. Any ideas here ?
 
Hi,
I lowered the GPU power limit to 90% and tried running the 3D Mark Time Spy Stress Test, however the system still crashed around the 12th iteration. I logged the temperature and other values through GPUz. Here is a snippet of the values right before the crash (And the overall highest) :
1. GPU Temperature : 79.8 C ( 79.8 C)
2. Hot Spot: 90.5 C (91 C)
3. Memory Temperature: 90 C (90 C)
4. Board Power Draw: 286 W (294.8 W)
Those are all good.


After that this crash however, they system kept crashing 3 - 4 mins after startup with just chrome open. I am beginning to think that this card is a complete loss. Any ideas here ?
No, but I'm guessing software/driver in that scenario.
Open command prompt as admin, and enter the following: sfc /scannow (space included)
If it finds any violations, then enter: DISM.exe /Online /Cleanup-image /Restorehealth (spaces included)
After that, scannow again, to be sure it's fixed up, then test the gpu again.

If the first scannow didn't find anything, try disabling Gpu Hardware Acceleration in Chrome.

That's about all I got.
 
Alright, good news, sfc /scannow resulted in some corrupt files which I fixed with the DISM command, and reverified with scannow. Also I uninstalled Xtreme Tuner utility by Galax and disabled 'Enabled Hardware Control and Monitoring' and 'Enable low-level IO driver' options in MSI Afterburner.

Then I ran 3D mark stress test with 90% power limit, which didn't crash although the percentage was 95.2% stability, and finally I ran it with 100% power limit which also completed with 96.2%.
The max temperatures in the 100% power limit was :
82.4 C for Core, 93.4 C Hot Spot, and 92 C for the memory as logged using GPUz.

So are these results enough to say that the GPU is functioning well. Do you suggest any further tests to finally put this issue to rest ?
 
Alright, good news, sfc /scannow resulted in some corrupt files which I fixed with the DISM command, and reverified with scannow. Also I uninstalled Xtreme Tuner utility by Galax and disabled 'Enabled Hardware Control and Monitoring' and 'Enable low-level IO driver' options in MSI Afterburner.

Then I ran 3D mark stress test with 90% power limit, which didn't crash although the percentage was 95.2% stability, and finally I ran it with 100% power limit which also completed with 96.2%.
The max temperatures in the 100% power limit was :
82.4 C for Core, 93.4 C Hot Spot, and 92 C for the memory as logged using GPUz.
That's great! 👍

So are these results enough to say that the GPU is functioning well. Do you suggest any further tests to finally put this issue to rest ?
3D Mark is not a one all be all stability test. Trying the card out across multiple titles is the superior method.
 
Thanks. Just for the heck of it I'm returning the Corsair RMe 850W, and getting a Corsair RMe 1000W. I'll see if it works, otherwise will try refunding/exchanging the GPU.
 
Solution
Did getting a new PSU work?
Nope, I tested same GPU with the Corsair RMe 1000W, same failures. So I sent it back. They tested at their end and reported the same issue, so I got a full refund.
I have since upgraded to a GALAX GeForce RTX 3080 Ti SG, and it's worked without any issues whatsoever with the Corsair RMe 1000W.
 
Nope, I tested same GPU with the Corsair RMe 1000W, same failures. So I sent it back. They tested at their end and reported the same issue, so I got a full refund.
I have since upgraded to a GALAX GeForce RTX 3080 Ti SG, and it's worked without any issues whatsoever with the Corsair RMe 1000W.
I see. By any chance, do you remember whether you had 2 PCIe cables from the PSU connected to your GPU? or was it just 1 PCIe cable?