[SOLVED] Mysterious GPU Spikes Leading to Soft and Hard Crashes

Mar 16, 2020
4
0
10
Hi,
I'm new to this forum, despite having lurked for some number of years. And I'm new to owning a PC!--sort of. I've never built a custom PC before, nor have I had a machine in my home (desktop or otherwise) that was anything more than meagerly capable beyond business/browser capacity.

Before I go into it, I'll describe my build:

AMD Ryzen 5 3600X 3.8GHz 6-core processor
MSI B450 Tomahawk Max motherboard
Corsair Vengeance LPX 16gb (two sticks) DDR4-3000
Seagate Barracuda 2TB 3.5" 7200RPM internal hard drive
Western Digital Blue SN550 500 GB M2 Drive
Nvidia GeForce RTX 2070 Super 8GB
Corsair iCUE 465x ATX mid-tower case
and a Corsair RMx 650 W 80+ Gold Certified Fully Modular ATX Power Supply

I've done quite a few things, from doing clean installs of new drivers to re-installing Windows to checking the power connections on my GPU and into the PSU. I've re-installed Steam and re-installed games, and I've been monitoring temps--all healthy. It actually seems to crash playing older games (TF2, Max Payne 3, XCOM Enemy Unknown), I can't say I've witnessed a game from this year or last crash but that could be a coincidence of my playing habits--could be a red herring.

A lot of what I tend to get are soft crashes to desktop, but those are very common--sometimes a game crashes at boot, sometimes a game can run for hours without issue--and lots inbetween.

BSODs were more common early on but I did get my first one in a week or two last night. I've gotten a System Service Exception, IRQL_NOT_LESS_OR_EQUAL, and last night I got a PFN List Corrupt. It's been all over the board. At this point I'm starting to suspect that my GPU is faulty, but if it is, I'd like to return and exchange sooner rather than later.

I do have a few of the blue screen dumps and have also gotten some info from LatencyMon, which tends to cite issues with dxgkrnl.sys and storport.sys.

Honestly at this point it's too much information for me to parse as a novice, and it isn't like I once had a perfectly fine PC and then something changed--this PC was only built a month ago so it's hard for me to pinpoint. If someone would like the reports from BlueScreenView or LatencyMon, feel free to ask and I can post---but I know this forum is full of one-time users with problems and I want to respect you people as a new user-didn't want to make my first post here 3000 words with a giant LatencyMon report. Thanks so much, in any case.
 
Solution
Extra two-slot cable; not weird.

The "extra" two pins are to allow installation in either 6 pin or 8 pin power connections.,

General reference:

https://graphicscardhub.com/graphics-card-pcie-power-connectors/

Refer to your GPU and PSU documentation to confirm.

And you mentioned DDU.

Do the driver installations yourself - no third party utilities/tools.

Go to the manufacturer's website to directly download the applicable drivers. Install and configure accordingly per the GPU's User Guide/Manual.
Look in Reliability History and Event Viewer for error codes, warnings, and even informational events that correlate with the crashes.

My focus would start with the PSU. How old and in what condition? Lots of gaming or even mining?
The PSU is brand new. Every aspect of my PC build (that I built last month) is brand new. Which makes it a little tougher for me to test components out one-by-one since I can't swap out stuff I have lying around, because I don't have anything else lying around. No mining, some gaming--what I can pull off before a crash anyway. Shorter sessions, maybe two or three times a week. Typically older games, or less needy games anyway.

I can see some stuff in Event Viewer and my Reliability History. How can I make better sense of it? Here's a link to imgur with snippets of the windows in either program. Does this seem like an abnormal amount of stuff going wrong? Thank you again for your help.
 
A variety in error codes is generally a clue that the PSU is at fault. Especially with events such as "Windows was not properly shut down" etc..

[Note: you can right click on error codes to get more information/technical details. Not always meaningful or useful but the details are available.]

Even new PSU's can be faulty.

However, being a new build there are some things to check:

Power down, unplug, and open the case.

Doublecheck that all cards, cables, jumpers, RAM, are fully and firmly in place. Everything on straight and square.

Use a bright flashlight and even a magnifying glass to check the connections and seatings.

Making connections in a new build can be difficult. Fits can be stiff/tight. No one wants to force anything - especially with a new build.

Plus using the computer causes everything to heat up, expand and contract. That process leads to components and connections creeping loose.
 
A variety in error codes is generally a clue that the PSU is at fault. Especially with events such as "Windows was not properly shut down" etc..

[Note: you can right click on error codes to get more information/technical details. Not always meaningful or useful but the details are available.]

Even new PSU's can be faulty.

However, being a new build there are some things to check:

Power down, unplug, and open the case.

Doublecheck that all cards, cables, jumpers, RAM, are fully and firmly in place. Everything on straight and square.

Use a bright flashlight and even a magnifying glass to check the connections and seatings.

Making connections in a new build can be difficult. Fits can be stiff/tight. No one wants to force anything - especially with a new build.

Plus using the computer causes everything to heat up, expand and contract. That process leads to components and connections creeping loose.
No problem, I'll do this later in the afternoon when I've wrapped up work. I was curious, though--if I do have a faulty PSU, would it be weird that I only have problems while gaming? This is my work computer that I use all day, and it's never so much as hiccuped while working.
 
Gaming demands more power.

If the PSU is continually running at high end power levels and/or nearing designed EOL (End of Life) then problems will occur.

Bit-mining and high end graphics work will do much the same.

You can observe what your system is doing via Task Manager or Resource Monitor.

Use one or the other (not both together) to watch what is happening when the computer is idling, doing light work, online browsing, office work and gaming.
 
Gaming demands more power.

If the PSU is continually running at high end power levels and/or nearing designed EOL (End of Life) then problems will occur.

Bit-mining and high end graphics work will do much the same.

You can observe what your system is doing via Task Manager or Resource Monitor.

Use one or the other (not both together) to watch what is happening when the computer is idling, doing light work, online browsing, office work and gaming.
I'm having a hard time pin pointing it, but, today I re-checked all the connections and seatings on my motherboard right down to the CPU. I ran DDU and cleaned out my old GPU drivers, ran Windows in safe mode to make sure the drivers were right and nothing weird was happening, and still--intermittent soft crashes.

I'm planning on exchanging my PSU, I'll be making that call in the morning.

One thing that I thought I hadn't be concerned about but figured I might as well ask--is it weird that I have this little extra two-slot cable hanging off the cables that go into my GPU? Could this be an incorrect cable, or causing a power issue? Here's a link to a close up on the orphan cable, and some other photos.
 
Extra two-slot cable; not weird.

The "extra" two pins are to allow installation in either 6 pin or 8 pin power connections.,

General reference:

https://graphicscardhub.com/graphics-card-pcie-power-connectors/

Refer to your GPU and PSU documentation to confirm.

And you mentioned DDU.

Do the driver installations yourself - no third party utilities/tools.

Go to the manufacturer's website to directly download the applicable drivers. Install and configure accordingly per the GPU's User Guide/Manual.
 
Solution