Question My PC freezes - GPU or PSU problem?

Seanchaoz

Reputable
Aug 1, 2016
9
0
4,510
GPU: Asus nVidia 1080 TI (factory OC)
PSU: Corsair TX650 (it's around 6-8 years old and has seen extensive use; 12+ hours daily)

Basically when playing games (and only when playing games) my PC will randomly freeze up for a few seconds and then usually recover with no error messages. Very rarely it will crash and reboot (maybe 1 in 50 times if not less).

The only solution has been to reduce the GPU Power Target% using afterburner software. This fix only lasts for a few months and then I have to further reduce the Power Target to keep things stable.

Initially the Power Target was 100% and my rig ran just fine for around a year. Then the problem started and I reduced the PT% down to around 95%, which fixed the issue for half a year or so. It happened again, and once more I had to reduce the PT% to get things stabilized.

I've been doing this constant Power Target% reduction incrementally for about 3 years now, roughly once every 3-6 months or so, and I am now currently sitting at 80%. I have no doubt that in a few months I will need to lower it even further.

There are no other known issues with my system, and reducing the PT% always fixes all freezes/crashes and instabilities

I have not tested my GPU on another machine because I don't have the option to. I am on the verge of buying a new PSU, but I'm not sure if that's the problem - and installing a new PSU is a big undertaking since I'll have to split the entire PC apart, not something I'm particularly hyped about, especially if that's not even the issue.

I've run some Wattage Calculation tests using online calculators and they all come up with my system requires roughly around 550-570 watt, based on my hardware. Seeing as how my PSU is getting old and has seen a lot of use, is it not a fairly reasonable assumption that the problem is a result of the GPU not getting enough power during load because the PSU just can't keep up anymore?
 
The only solution has been to reduce the GPU Power Target% using afterburner software.
Maybe the PSU or the GPU has problem, if you can test the GPU in other PC, then you will know, but like you said you can't. But it sounds like the PSU has problem, because you used it for 6-8 yrs.

Then you can monitor the PSU +3.3V, +5V, and +12V with HWinfo or HWMonitor
hwinfo: https://www.hwinfo.com/download/
HWMonitor: https://www.cpuid.com/softwares/hwmonitor.html

You can open the software during the gaming in other window. You want to see these voltages are within +/- 5%, like +3.3V, it should be in +3.13V to +3.46V range. If one of them ( +3.3V, +5V, and +12V) is not within the +/- 5% range, that means the PSU is not good anymore.

One more, you can use the same software to monitor the temp too, also open the case to check the GPU fan(s) work or not, then you can make sure the GPU is not overheat to cause the problem.
 

Seanchaoz

Reputable
Aug 1, 2016
9
0
4,510
Thanks for the suggestion, I've tried what you said and the voltages look fine - even when it freezes/crashes.
Temperatures are also fine, hovering around 70-71 celsius on the GPU at the time of crashing, with the CPU around the mid 70'ies

The +5V has been consistently within 4.880 V and 4.920 V
The +3.3V sits within 3.312 V and 3.328 V
The +12 V within 12.000 and 12.096 V

I'm currently throttling the GPU Power Target% at 70 in order to keep things stable. Just for experimentation's sake I ran a game, kept all monitoring tools open, and increased the Power Target to 90%. The game froze and hitched and the graphics drivers reset and recovered after only about 30 seconds.

I immediately lowered Power Target back down to 70% and the game ran for 3 hours with no issues.
So I don't know. I mean if it's the PSU not delivering enough power then it doesn't show up in HWMonitor at least, yet the problem is clearly tied to the amount (or lack) of power going into the GPU.
 

Seanchaoz

Reputable
Aug 1, 2016
9
0
4,510
Alright just in case anyone cares or experiences similar issues here's an update and another clue to the puzzle.

Things were getting worse and worse and I had to reduce Power Target% on the GPU all the way down to 60%. Yet games still kept crashing, but in a different manner now. They were either crashing straight to desktop or my PC would blue screen with a Kernal error. No freezes, no screen flickering, no stuttering as had been the case previously - just straight up crashes or BSOD.

Today it just so happened I was playing WoW which, during repeated crashes, kept popping up an error message related to Memory unreadable or something similar. This prompted me to run a MemTestx86 and it came up with a whopping 552 erorrs. I ran it a few more times and the amount of errors were random, ranging from as low as 12 to several hundreds.

Eventually I tinkered around with BIOS and remembered that the built in Turbo feature (an auto-overclock setting) was something I had modified many years ago to get some extra juice out of the system. I reset the feature back to default and ran another Memtest. 0 errors!

Back in windows my games are not only running perfectly fine now, I am even able to set my GPU Power Target back to 90% with no problems. It still randomly stutters and hangs if I go to 100% or higher though.

So, once again, evidence points to my system suffering from a lack of power, seeing as how reducing the CPU and RAM overclock feature (which also reduces power drain) has restored stability. I have a new 850 watt Titanium PSU coming tomorrow and will be installing that over the weekend. If that doesn't help then I will swap out the power cable and even the electric socket extension the PC is plugged into - just to be absolutely sure. If all of that still doesn't solve it, then I think it's probably the motherboard itself that is slowly dying.

Either way, the list of possible culprits grow smaller and smaller. My bet is still on the PSU.
 

Seanchaoz

Reputable
Aug 1, 2016
9
0
4,510
The new one I have installed now is a Seasonic Prime PX 850 Platinum. Seeing as how everything still crashes exactly the same I guess the old 650 watt PSU isn't faulty at all, so yeah I might build a second PC and try the GPU out in that one. But I'm fairly sure it's the GPU being faulty AND my CPU/motherboard/RAM also starting to break from wear and tear.
 
Seanchaoz What are your full specs, including CPU / motherboard / storage / RAM mfg, model, configuration? Are you running any automatic or manual overclock on either your CPU or RAM? You mentioned a overclocking feature that you had previously set. What speed is the RAM set to now, and what was it set to previously? Speaking of your RAM, did you purchase it all at the same time as a single kit, or did combine two kits purchased at separate times? If you have an AMD system, did you reinstall Windows when you built it, or simply move existing storage over from a previous Intel build? Every one of my questions are important to diagnosing your issue.

Another suggestion: Download fresh copy of drivers, but do not install them yet. Next download and run DDU (2nd link below). After your system reboots to remove drivers then run the downloaded fresh copy of drivers that you just downloaded. DDU temporarily sets the display resolution to a low setting, so it's easier to locate drivers on your system if you've previously downloaded them.
https://www.nvidia.com/en-us/geforce/drivers/

In addition to the above questions, I recommend that you download and run Display Drivers Uninstaller (DDU)
https://www.wagnardsoft.com/forums/viewtopic.php?f=5&t=3216
<click blue font text> "Official Download Here"

Also, please take two photos of the inside of your system; one image of the side of your power supply, showing the mfg. and model; yes I'm aware that you've already told us this information. And the second image should show the entirety of your system's innards. Despite your normal temps, I'd like to get an idea of air flow and anything else that my stand out. Upload those two images to imgur, and share the link here.
 

Seanchaoz

Reputable
Aug 1, 2016
9
0
4,510
Update: GPU crashes have been fixed - I am now running at 100% power target with no issues. The solution was (relatively) simple; I took the GPU apart, cleaned off the old thermal paste and applied a fresh new layer. Even though it was showing acceptable temperatures during the crashes it was overheating anyway and once I saw the factory thermal mess it was obvious why.

What prompted me to do so was a particular crash a few days ago where, during WoW gameplay, my monitors went black, PC was unresponsive and the GPU fans went to 100% and stayed that way until I physically shut it down by holding the power button. I recognized this as a very typical symptom of overheating and decided to take my chances at doing manual repairs. Just sharing in order to add to the solution base for others who might be dealing with similar symptoms.

Basically you might not be able to trust the reported temperatures on your GPU - even though they look acceptable it can still be a cause for crashes. And honestly, if you're worried about taking your GPU apart and changing thermal paste, don't be - this was my first time ever trying and it was really quite easy. Check YouTube you disassembly videos, there's one for nearly every model/brand of GPU out there.

GPU_thermalRep.jpg