Question Recurring Crash - is my GPU failing?

Jan 10, 2024
4
0
10
Hi everyone,

My set up currently is
GPU: Zotac Gaming GeForce RTX 4070ti
CPU: AMD Ryzen 9 5900x
PSU: Corsair rm750x
Motherboard: ASUS Tuf Gaming x570-plus WiFi
Monitor: gigabyte 32” 144 hz 4K curved monitor

I’ll try to be as thorough as possible while maintaining some brevity here, but this has been an issue I’ve been working on for nearly 10 months at this point and I’m completely lost as to what things actually mean something and which don’t, so please bear with me.

At the beginning of the year, I was experiencing crashes that were caused by an old and incompatible motherboard. Upon upgrading, things seemed to be working well for some time until I began to experience various display issues, namely:

-my display would go to sleep and not be able to be woken up without resetting my pc
-Monitor would lose display and it would not come back until the PC was reset
-same as the above but GPU fans would begin running at maximum speed

As time went by, these issues became more frequent, and so I began troubleshooting as best I could. The biggest difficult however was that these issues were completely irreplicable. Sometimes I would be gaming or doing something similarly graphically intensive, other times I would just be idling on my desktop; sometimes one situation would cause a crash repeatedly, other times it would run perfectly well; crash logs never said anything beyond that an unexpected shutdown had occurred from my own hard shutdown. I would run furmark and other GPU stress tests, OCCT on any components I could, and try to recreate circumstances where I experienced these issues over and over again without any consistency. I will note, however, that no stress test ever caused a crash at any point. As well, there was never any stuttering, slow downs, or other performance issues - these crashes would happen without any forewarning and were the only issues I encountered.

Regardless, I began troubleshooting as best I could. I
-completely reseated all of my components, blowing compressed air in all connections
-ran SFC scans and DISM tools
-started running my bios in compatibility support mode
-replaced the HDMI cable I was using to connect to my monitor with a new one
-updated my bios
-updated my drivers
-changed settings from the NVIDIA management tool, ranging from preferring performance under 3D settings to changing how much graphical data it would cache
-ran memtest
-replaced my cpu (this was unrelated but is worth mentioning)

At various points, these appeared to produce “false positive” fixes, as the issue would subside and then reappear after several weeks. Eventually, spurred on by some corruption in my boot drive, I did a completely clean windows install on a new SSD in the past ~6 weeks. After doing this, the issue evolved somewhat.

In order to install windows, I had to begin using a display port cable (I know this makes absolutely no sense in isolation, but apparently there is some issues involving windows 11 installation and CSM and without CSM I was getting no signal until I used a display port cable and am apparently not the only one to have experienced this.) The display port cable showed very promising improvements as I went several weeks without the crash, but then bizarre issues started emerging. Namely, on start up, when my pc should display the windows login screen, it instead just was grey (it did not lose signal - it was a “lit up” shade of grey) and once either
a.) the monitor went to sleep and was woken up or
b.) the DP cable removed and reinserted
it would display the windows login screen and would proceed without issue. Since it no longer would fail to wake up from sleep or crash, I took this as a win overall.

Unfortunately, the latter two crashes returned in the exact same way. They were much more consistent in that they only ever happened under some type of graphical strain, but still with absolutely no replicability. I essentially repeated every troubleshooting step on this new windows install and got my hands on a different monitor (I will note this is a smaller monitor, with 1440p and 120hz refresh rate) to see if that was the issue this past weekend.

Again, the new monitor seemed promising as I no longer had this strange delay to get to windows login and was not experiencing crashes for a few days. Eventually,I had one crash occur and had tested my own monitor on other electronics to see if it caused any problems to no avail, so today I switched back to my own monitor.

Instantly, the crashes became more frequent. I crashed once I switched monitors, again in the middle of playing a game, and then once this morning while my PC was just idling. After that last crash, I no longer can get a signal on either monitor regardless of what type of connection I use. I have reseated my GPU and my CMOS battery with no progress made. My GPU lights up and its fans spin as normal. The VGA light on my motherboard is on (though when I was experiencing issues installing windows 11, this light would come on when there was no display connected at all so I’m not sure how much it says about the GPU)

Has this just been the slow death of my GPU? Is there anything else worth considering or do I have to replace my 4070ti? Could this reasonably be another hardware component or a software issue still?

Also, I will note here since I did not mention it elsewhere - no, my GPU is not connected to my PSU with a single daisy-chained connection.

Thank you for any help you can offer!
 
PSU: Corsair rm750x
How old is the PSU?

What BIOS version are you on for your motherboard? Speaking of motherboard, what board were you on prior to the swap?

As for the Windows install, did you recreate the installer to rule out any inherent corruption? To later install the OS in offline mode(without www)? If you're met with the account creation screen, follow through this guide and then proceed to install all relevant drivers for your platform in an elevated command, i.e, Right click installer>Run as Administrator. You will need to download the drivers with the latest versions, beforehand.
 
PSU: Corsair rm750x
How old is the PSU?

What BIOS version are you on for your motherboard? Speaking of motherboard, what board were you on prior to the swap?

As for the Windows install, did you recreate the installer to rule out any inherent corruption? To later install the OS in offline mode(without www)? If you're met with the account creation screen, follow through this guide and then proceed to install all relevant drivers for your platform in an elevated command, i.e, Right click installer>Run as Administrator. You will need to download the drivers with the latest versions, beforehand.
The PSU is about 10 months old.

I am currently on BIOS version 5021, the most recent one available. Prior to the swap, I was using an ASRock B450M/ac. The types of crash I was experiencing with this old motherboard were completely different; the PC would shut itself down and they were replicable; I posted about it here previously which is how I identified my motherboard was the issue.

I did recreate the installer and do a second install when I first experienced the delay to log-in screen. I have not done a completely offline install, however; if I can get a signal to the monitor again I will try.