Question 3 GPUs in 2 years, something in my PC is killing them?

Status
Not open for further replies.
Apr 13, 2024
5
0
10
I want begin by stating I'm not particularly computer savvy, but ill try my best to list everything as clearly as I can. I guess I need to start from the beginning, roughly around 2022, I got a prebuilt PC from NZXT. I've since changed the ram and memory, but all the other components listed below (besides the GPU for obvious reasons) are what it originally came with and still has currently installed:

AMD Ryzen 9 5900X 12-Core
NVIDIA GeForce RTX 3080 - GIGABYTE
Gigabyte X570S Aorus Master Wi-Fi
Team T-FORCE XTREEM ARGB 4000MHz DDR4 - 16 GB x 4
EVGA SuperNOVA 1000W G5 Gold
boot drive 1TB Samsung SSD
and the cooler is a NZXT Kraken X73

Things were perfectly fine for around a year, no issues that I could recall. I was 3 hours into play War Thunder at max settings, when my PC seemingly did what I will refer to as a "partial shutdown". Why partial? Because the PSU and Motherboard still seemed to be on. GPU and all the case fans would be off. To fix this, all I had to do was turn off the PSU, and turn it back on.

This happened again 3 days later, same game. Then started happening every 2 days, same game. Then it happened 2 days later on a different game. This is when it raised a alarm to me, it became clear the issue wasn't being caused by a game, but instead of something with the PC. The issue began happening without even having games running, once every single day. Having YouTube open, or not even any applications. At this point, sometimes it would do a full shown as well, with everything being completely off. IT seemed roughly 50/50 whether it would go into a full or partial shutdown.

When you google random shutdowns, the main issue is overheating. I never had my CPU go above 65, and GPU past 70. I have done a max 100% stress test on the computer for 8 hours twice, and never hit a issue. I have also conducted individual stress tests of the GPU, CPU, PSU, UPS, and Ram. To me this proved it was not a temperature issue, or a component failing from being at max use

The 2nd biggest cause of shutdowns I heard of was regarding power. My system normally draws around 450 watts, and never really gets past 650. The PSU is 1000W, and is plugged into a 1200 UPS. I have ran the entire build with just the battery of the UPS alone. This showed to be that power was not the concern.

The shutdowns started to occur every few hours, then every hour, then every half hour. You can see where this is going. It got to the point where it was every few minutes. Then, it shutdown before I could even login. This almost seemed to of "reset" the issue. With shutdowns occurring every 3 days or so, and progressively decreasing in time. The pattern became incredible apparent.


What I did to attempt to resolve the issue

Regarding Software:

-Updated everything to the latest drivers
-Downgraded to older drivers
-Reset BIOS
-Updated BIOS
-Clean Reinstall of Windows

Regarding Physical Stuff:
-Unplugged each peripherals one at a time (unplugged a different one every shutdown)
-Unplugged all peripherals (including mouse and keyboard)
-Disconnected all peripherals AND monitor after logging in
-Reset CMOS on motherboard
-Plugged PSU into different outlets, as well as outlets in different buildings

Nothing worked. Due to the Motherboard and PSU staying on during the "partial shutdowns", I concluded the GPU was at fault. So, I used it as a excuse to upgrade


2nd GPU

I got a brand new MSI 4080 from amazon, as well as the appropriate cables for it to function with my now previous gen PSU. Performance was far better, but more importantly the issue seemed to of been completely resolved

Fast forward 4 months...

Random shutdown, oh no. the issue was back, and it was following the same exact pattern. As far as I could tell, nothing related to performance seemed to trigger it. By now, I knew how this would go down. This relatively new GPU was still in warranty, so I would send it to RNA. I could not afford to have my computer for the weeks, or even month it would take for that to be handled. So I got another GPU


3rd GPU

I got another XLR8 4080 from amazon. 400$ cheaper then the MSI one, and the performance was notably worse, but who cares, as the PC was back running.

This lasted a month? maybe 2 at most.

My PC randomly shutdown on YouTube two days ago. The problem is back, it also reminded me that I actually forgot to send the MSI GPU to RMA, so I'm getting ready to do that now. Now there was I have noticed before these shutdowns actually appeared. Things got slower. But not in terms of performance, or at least not ones I was able to detect. I know that does not make a ton of sense, but I'll try to explain it as best I can. Loading into a game went from taking 2 seconds, to 4 seconds, to 10 seconds, then shutdown issue popped up. Thumbnails on YouTube videos would take longer to load as a I scrolled down the page. Neither of these were tied to internet.

But, my performance in the actual games was not changed, still the same loads, temperatures, and FPS. Login on the computer also got slower. When powering on, the screen would first show my login with like a 30% dark filter, before going to the normal view 1-2 seconds later, showing that it was struggling to load in the login screen? Im not sure. But across the board things seemed to have gotten slower, despite performance in applications seemingly not taking a hit whatsoever.

Event History shows these shutdowns as "Unexpected Shutdown Occurred" and never points towards any direction. Even History also shows multiple critical errors, but again, they are all from random unexpected shutdown, being classified as "Stopped working" or "Stopped responding and was closed". Systems never stating more then "a problem stopped this program from interacting with Windows". As far as I can tell, all of these errors are simply triggered by the random shutdown, with nothing being able to determine the actual trigger of the shutdown itself.

I have been broken attempting to trouble shoot this issue, and its become evident that buying new GPUs no longer even works as a band air solution. My new theory is it is something related to the motherboard. Why? because that's what the GPU is directly connected to, and my build has otherwise slowly deteriorated a 3080, and 2 perfectly good 4080s. I do not have another system or friend with a build that can test the GPUs, or really of my components in either. Something is happening that no built in or third party software is picking up. Which to me means that something is faulty on a hardware component that cant be detected by software. I would be very gracious for any type of support that could be provided. I have truly hit the bottom of the barrel in terms of what I'm capable of to resolve this issue,

Here's two old images of the shutdowns being recorded:
 
Increasing numbers of errors and/or varying errors make the PSU a prime suspect.

You mentioned testing the PSU: how was the PSU tested?

Do you have a multimeter and know how to use it. Or know someone who does?

FYI:

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

Not a full test because the PSU is not under load. However, any voltages out of tolerance indicate a faltering/failing PSU.

Any way to borrow another known working PSU to install in your build? (Remember to use only the cables that come with any given PSU.)

Also (late edit) look at those informational errors as well - could be some action on your part or due to some app.

https://learn.microsoft.com/en-us/t...erformance/incorrect-shutdown-reason-code-sel

Google other error codes and look for some common factor(s).
 
Ill look into buying a new PSU then. I don't own a multimeter, nor know how to use one. I cant borrow a PSU from anyone either.

PSUs are relatively cheap from my understanding, So ill purchase it. Worse case scenario I just end up with a brand new PSU, best case is it fixed the issue.

I'll also look into the error codes a bit more. Thank you
 
Increasing numbers of errors and/or varying errors make the PSU a prime suspect.
I still don't understand how replacing the GPU temporarily fixes this problem though, assuming it is the PSU at fault? Especially how I first switched out the 3080 to a 4080, which has a higher power draw, yet instantly alleviated the shutdown issue, only coming back months later.
 
Quick update. Purchased and installed a new PSU, exact same model as before. So far no issues. Assuming nothing happens in the next week or so, I believe its safe to assume the PSU was the issue
 
Found a review on that power supply.

https://www.tomshardware.com/reviews/evga-supernova-1000-g5-power-supply,6337.html

Seems like they felt like the g5 was a little less than the g3.

Good luck with it though. May just purchase extra warranty on any cards you install and keep an eye on how the new power supply acts.
Do you really think video card makers do not know a power supply problem when they see it? Buying an extended warranty is just a waste of money, when it is voided due to a bad power supply.
 
I’d like to think EVGA makes decent units but there are so many power supplies that are in the wild and of varying quality I think if I were manufacturing gpus or any other equipment it might be hard to test all of them. You have so many overseas and various brands that may be virtually unknown. So I imagine hardware manufacturers are having to trust that units are living up to the standards. Have a look here. This may be out of date but should give an idea of the quality of various units. I think in general you don’t want to be less than tier C but of course higher is better. I can say I’ve tried components with a cheap power supply in my time so it’s definitely important to get a good one. Hopefully this one you just got is a good unit I think it was tier b maybe. But I’m on my phone at the moment as well.


But card makers may know, depending on the evidence. Not sure where you sourced your parts but if you are in the USA and shop at a microcenter, their store warranties used to be pretty good. I have rebuilt entire systems with parts from a warranty I bought through them for almost no money out of pocket. I had to buy the warranties again I think but definitely worth it.
 
There has been 0 shutdowns since the PSU replacement. I think this concludes that the original PSU was likely at fault. Just wanted to give a (hopefully) final update, to bring this post to a conclusion
 
Status
Not open for further replies.