Question GPU Sometimes not detected during boot, random crash (display turns off and fan speeds up), only run more stable on PCIE Gen 1 BIOS setting

Nov 30, 2022
3
0
10
Hi,

I started to have some weird problem with my GPU recently. It sometimes not detected during boot (VGA part on EZ Debug LED lights up) and require a few restarts/boot attempts to make it work and get into Windows. However even after it's booting successfully, it's still showing VGA LED lights in a few seconds during the successful booting phase, is it normal?

That's one problem, another issue is when the time it's working, it sometimes crashes even though I'm not doing anything heavy. I even did some stress test in a few minutes to see if it crashes, but everything's seems fine. Until a few minutes later or a few hours later it crashes with display turned off, keyboard lights not responding, but I can still hear the music playing normally in the background. And this happened with PCIE Link speed set to GEN 1 (Via BIOS on motherboard).
Well I've said this setting is more stable in the title, because.....

When the PCIE link speed set to default (AUTO), or any other (GEN2 and GEN3), the problems is much worse. It requires much more boot attempts/ restarts to get into Windows login screen successfully, (VGA LED EZ debug is still lights up too during this boot btw). And after I get into windows login screen...just a few second later the display crashes, with the fans speeds up. And even if it's running in a few minutes, the crashes is more usual in this setting than when it's running on GEN1 PCIE Speed.

I've tried different methods:
  • Uninstalling the driver via DDU and reinstalling the drivers again (Old and recent version)
  • Replaced the thermal paste and thermal pads
  • Replaced the power supply to the current one that I used)
  • Cleaned the PCB using 99% IPA with brush, and also sprayed some contact cleaner
  • Checked if there's any fried parts/bulged capacitor, nothing (idk if there is tiny parts that I miss)

I suspect this is likely caused by power controlling issue within the GPU hardware? I also noticed that the 8-Pin #1 voltage is running on average of 10V, is this normal? I'll show you GPU-Z in the image attached.





Is this caused by a bad GPU fan replacement? because in fact a few months prior to this problem started, I've replaced both fans because one of them is dead and running slower than usual, it runs perfectly fine until today. The new fan is 12V 0.42A, while the old ones is 12V 0.35A. Do you think this slight difference is causing the problem?
Another things that I want to tell before this problem started, I sometimes hears some weird high pitched noise in a few seconds from the GPU, coil whine? is this likely related to this problem? I also tried the RT feature sometimes, and when it's turned on, I got some weird grinding noise from inside the GPU.

I haven't tried this GPU in another system yet, but I'm curious of what might causing this problem before deciding to replace my GPU if it's indeed dying.
This is a 3,5 years old GPU, main usages are for gaming, animation rendering and 3D editing with Unreal Engine 5. I've never overclocked the graphic card, but it's been used for tens of hours at one time with 100% usage while rendering.
I will appreciate if there are any answers, and sorry if my english is a little bit off...I'll try to explain if there is any part that's hard to understand.

Thank you!

My System:

GPU: Zotac RTX 2060 AMP 6GB
Proc: Intel i7 8700
Mobo: MSI H310 HDV
RAM: Corsair Vengeance LPX 2x8GB
PSU: NZXT C550 550W Semi Modular
OS: Windows 10 Pro, NVIDIA Driver Version 527.56
 
Nov 30, 2022
3
0
10
I noticed your pcie slot voltage is at 14v it should be at 12v . The sensor readout shows a staticky readout. It should be a nice steady and solid red line.

Now that you've mentioned it, I think my system is quite unstable? compared to the other sensor reading that I've found on google search.

I'm running my PC with battery-backup device (UPS) with built in voltage regulator, and a grounding cable attached from PSU screw to the wall...because my main power source doesn't have any proper grounding. Is it possible that this might be the cause of the instability?

Edit:
  • Using multimeter, I've tested my power outlet voltage, UPS output-Power supply input voltage, steady at 230 VAC
  • Tested PCIe 6+2 pin connector got 12V steady voltage from all 3 yellow pins.
  • Voltage from HWMonitor (image below)


Everything's looking fine, or is it? I'll try it in another system for conclusion if my GPU is indeed malfunctioning.
 
Last edited:
Nov 30, 2022
3
0
10
Try taking the gpu out and run on the cpu graphics, motherboard graphics port. See if the motherboard boots up.

Yes, everything's working fine when GPU unplugged from the motherboard.
In fact that's the first thing I've tried to see if it's maybe RAM or CPU problem. And before I've plugged the GPU back into the motherboard, I've set the primary display to Integrated Graphic and enabling iGPU in the BIOS.
I'm using dual monitor btw, one monitor plugged to onboard HDMI port and the other to the GPU, so if it's fail to detect the GPU during boot, I still managed to take control the BIOS or the OS.
 
Now that you've mentioned it, I think my system is quite unstable? compared to the other sensor reading that I've found on google search.

I'm running my PC with battery-backup device (UPS) with built in voltage regulator, and a grounding cable attached from PSU screw to the wall...because my main power source doesn't have any proper grounding. Is it possible that this might be the cause of the instability?

Edit:
  • Using multimeter, I've tested my power outlet voltage, UPS output-Power supply input voltage, steady at 230 VAC
  • Tested PCIe 6+2 pin connector got 12V steady voltage from all 3 yellow pins.
  • Voltage from HWMonitor (image below)


Everything's looking fine, or is it? I'll try it in another system for conclusion if my GPU is indeed malfunctioning.
Everything looks fine on the other voltage readouts. So I don't think you have a power problem. What does it show on HWMonitor when the gpu is under load for voltages? Do you have access to another gpu to test the pcie voltage draw on that? Also check your csm in bios, is it on or off?