Question Is it my GPU? Running out of ideas to troubleshoot games crashing

Status
Not open for further replies.

Sinistra03

Distinguished
Jun 7, 2013
4
0
18,510
Can't figure out why I'm repeatedly crashing in video games. I'm looking for advice after my troubleshooting thus far.

CUSTOM PC SPECS
  • MSI Gaming GeForce RTX 4090 24GB GDRR6X 384-Bit HDMI/DP Nvlink Tri-Frozr 3 Ada Lovelace Architecture Graphics Card (RTX 4090 Gaming Trio 24G)
  • Intel Core i9-13900K Desktop Processor 24 (8 P-cores + 16 E-cores) with Integrated Graphics - Unlocked
  • GIGABYTE Z790 AORUS Elite AX (LGA 1700/ Intel Z790/ ATX/ DDR5/ Quad M.2/ PCIe 5.0/ USB 3.2 Gen2X2 Type-C/Intel WiFi 6E/ 2.5GbE LAN/Q-Flash Plus/PCIe EZ-Latch/Gaming Motherboard)
  • GIGABYTE M32U 32" 144Hz 4K FreeSync Compatible Gaming Monitor, SS IPS, 3840x2160 Display, 1ms Response Time (MPRT), 1x Display Port 1.4, 2x HDMI 2.1, 3x USB 3.0, 1x USB Type C,BLACK
  • G.SKILL Ripjaws S5 Series (Intel XMP 3.0) DDR5 RAM 32GB (2x16GB) 6000MT/s CL30-40-40-96 1.35V Desktop Computer Memory UDIMM - Matte Black (F5-6000J3040F16GA2-RS5K)
  • Cable Matters 3-Pack 8K HDMI Cable, 48Gbps, 6.6 ft, Supports 8K@60Hz, 4K@240Hz, HDR - For PS5, Xbox Series X/S, RTX3080/3090, RX 6800/6900, Apple TV
  • PSU EVGA Supernova 1600 P+, 80+ Platinum 1600W, Fully Modular, 10 Year Warranty, Includes Free Power On Self Tester, Power Supply 220-PP-1600-X1
  • SAMSUNG 980 PRO SSD 2TB PCIe NVMe Gen 4 Gaming M.2 Internal Solid State Drive Memory Card + 2mo Adobe CC Photography, Maximum Speed, Thermal Control MZ-V8P2T0B/AM
  • Noctua NH-D15 chromax.Black, Dual-Tower CPU Cooler (140mm, Black)
  • Windows 11
  • Back up monitor ASUS VG248QE 24" Full HD 1920x1080 144Hz 1ms HDMI Gaming Monitor.
  • Back up Graphics Card Nvidia GTX 1080 I've been troubleshooting with
  • Built PC in March 2023, all components new. Amateur PC builder, 2nd build ever.
  • Didn’t overclock GPU back then but I remember somewhere disabling FPS caps for GPU and tweaking some memory settings to optimize after reading articles for several hours. System stable, no cores overheating. Been working excellently running anything max settings 4k 144hz for 10 months, single monitor.
  • About 20-30 days ago I started noticing performance issues playing games Heroes of the Storm. Games randomly crashed. Fatal error exceptions. Usually in between games. Over the next few weeks of troubleshooting and continuing to play as it was quite intermittent I had first ever a blue screen. Also played Stardew Valley (low graphics requirements) a fair amount - never had crashes... Mostly just the HOTS or other game crashes to desktop, and happens more often in between or during launch of Heroes of the Storm game, when the match just ended. Odd. Eventually I got a BSOD and system crashed. Started to seriously troubleshoot at that point, but couldn't figure it out. Dealt with intermittent crashes while trying to play, almost brought PC in last week to local repair but held off because it wasn't crashing that often.
  • I realized the GPU actually wasn’t even recognized by the PC in device manager, it didn’t show up. Unsure how long I had been playing without my GPU recognized but I didn't notice the game graphics appearing any differently.
  • I suspect something amiss with GPU drivers 12/12/2023 NVIDIA 546.33. When I went to update drivers, the update kept failing. Couldn’t get gpu recognized until completely uninstalled old drivers and then reinstalled new drivers with DDU program. GPU now recognized. But still same crashing problems.
  • Last 7 days, crashing became more and more frequent until now I can't even launch Heroes of the Storm anymore, it usually just crashes right away.
  • I spend about 8 hours troubleshooting on Sunday 1.14.2024.
  • I use DDU to uninstall/reinstall Game Ready Drivers. Later I will clean install of GPU 546.33 game ready drivers a second time, no improvement.
  • Swapped graphics card - 4090 out, in goes my old GTX 1080 which I know to be functioning normally, (and while I had a new problem - for some reason my M32U monitor wasn't displaying with the old GTX 1080 GPU in, so I used a different monitor) - the system appeared stable on my old GTX 1080 GPU + old monitor ASUS VG248QE display, and I didn't have any crashes for about an hour, tried a few different games. Now I'm ruling out GPU vs monitor problem? Sidequest troubleshooting M32U not appearing when plugged in - swapped the M32U to a different PC I know to be functioning fine, and monitor screen turns on fine. So I put my 4090 GPU back in my case, and plugged M32U monitor back into it, monitor boots up again like it used to - I'm perplexed as to why M32U wasn't turning on briefly, but I'm focused on resolving the repeated game crashes, regardless of whether it's using new or old monitor. I still get game crashes now that the 4090 and M32U are back together, back to square one.
  • Tried swapping to a displayport cable instead of HDMI (so, different ports as well on both the GPU and monitor) - still same crashing problem. Didn't try a new HDMI cable but they are 2.1 cables.
  • Tried reducing the framerate within display settings to 60 Hz instead of 144. Still crashes.
  • Doesn’t crash outside of video game use nearly as often but it has.
  • Reset BIOS to defaults.
  • Ran Windows System Memory Diagnostic tool with no errors found.
  • Tried disabling Discord Steam / other overlays.
  • Tried limiting other background processes.
  • Reseated the GPU, twice, and the RAM. Checked GPU cabling.
  • Read about 4090 power adapter problems.
  • Read about M32U monitor to 4090 GPU potential compatibility problems.
  • Fans in the unit seem to be functioning normally on heatsink, GPU, PSU, and case fans.
  • Screen sometimes goes black for a moment, then back to normal.
  • Also have seen multiple FULL WHITE screen flickering. 3-4 BSOD's, some with atypical screen coloring.
  • Played an entire game of heroes of the storm trying to crash system, right after I quit the program, blue screen of death.
  • STATUS_ACCESS_VIOLATION occurs twice Chrome browser errors start appearing when drafting this email - never seen that before ever
  • CoreTemp 1.18 program doesn't seem to show overheating cores and physically seems unlikely but I don't know at this point
  • Use DDU to uninstall NVIDIA Game ready drivers. Attempt to install studio drivers latest version instead. NVIDIA INSTALLER FAILS to install studio drivers after Advanced>Custom>Clean install selected. Great. Tried to launch driver.exe file a second time, fails again. Use DDU to uninstall cruft/remnants. Let's try again with maybe an older driver version?
  • Trying GeForce Game Ready Driver Version: 546.17 WHQL
    Release Date: 2023.11.14 Operating System: Windows 10 64-bit, Windows 11. Express install works, 4090 appears in device manager display adapters. Time to stress test again. Opening HOTS... still using old monitor VG248QE. Odd, Battle.net launcher says "reclaiming space-updating HOTS" when I go to launch the application, had to launch application from app search instead of via Battle.net.
  • Close HOTS battle.net doesn't load properly, reload, relaunch HOTS, unexpected fatal error more crashes. OKAY so maybe drivers are not the issue, they don't seem to help. GPU problem or PSU problem?
IDEAS I AM CONSIDERING NEXT
  • Studio driver instead of Game Ready Driver? Unable to test so far, install failed. Could try again.
  • Likely GPU problem? How to do warranty return WARRANTY
PROBLEM LIST
  • GPU hardware problem vs GPU drivers problem?
  • PSU problems? Don't have the tools to diagnose
  • Bad memory - not sure this is on my differential diagnosis list but I have least understanding of memory and troubleshooting problems.

APPS
  • CoreTemp
  • HWInfo
  • DDU
I’ve exhausted my troubleshooting ability if you're reading this! Thank you for any help!
 
One immediate thought: discontinue any "automated" installers.

You do the driver reinstalls manually by going to the applicable manufacturer's websites to download, install, and configure the drivers. No third party installers and tools. You have no control over what and how those 3rd party applications may be doing things.

Also: Look in Reliabilty History/Monitor and Event Viewer for error codes, warnings, and even informational events that are captured just before or at the time of the crashes.

Reliability History is much more end user friendly and the timeline format can reveal patterns.

Event Viewer is more complex and thus requires more time and effort. To help:

How To - How to use Windows 10 Event Viewer | Tom's Hardware Forum (tomshardware.com)

Hopefully there will indeed be some error codes and details that will help troubleshoot the crashing problems.
 
@Ralston18 Thank you for weighing in.

Do you mean don't use DDU? Or don't use auto installed for GPU drivers? I have always gone directly to NVIDIA website to download the 4090 drivers from there.
Hm, Reliability History/Monitor / Event Viewer are new to me, I will read up.

Reviewed Reliability Monitor for errors - uninstalled Gigabyte Control Center app - investigating Antimalware Service Executable Windows Defender - reset Windows Defender following the guide here for Windows 11:
View: https://www.youtube.com/watch?v=tynCmVNKUOg&ab_channel=GeekerMag
.
Malwarebytes scan does not reveal any malware.
Reinstall GCC and reset to defaults then uninstall again? Installed GCC and installed the updates GCC asked me to - I did this when I first set up this computer I remember.
Uinstalled GCC - everything feels smoother. Switching back to M32U monitor. Still 60hz.

Summary: found 2 not fully seated pins, in one of the three PCI express power supply cables, where it connected to my PSU. Fully seated those in a new slot entirely.
Also removed Gigabte Control Center completely from the computer.
It's like night and day so far, no crashes. Will update if more problems!!!
 
I am not one for using DDU, auto-installers, and other similar tools for the most part.

You have little or no control over what they are doing or may be trying to do. Some include add-ins that are not actually necessary. Probably unwanted as well....

Plus the tool could be corrupted, buggy, or out-of-date. And one must be very careful about the source website(s). Just because a manufacturer's name appears in the website pathname (URL) does not mean that the website is the real manufacturer.

Simply take a few extra minutes to do some additional research to validate any given tool or utility.
 
First thought is the evil that is Win10/11 being bloated and having fits.
Second obvious thought is something is dying.


Seeing as you have fixed the problem for now (btw mark as solved :] ), the above posts do set you up for a bit of preventative maintenance but I'd go further.

Bracing for the tirade of people ranting about security issues, I disable all Windows AutoUpdates immediately. Only ever installed manually KB updates if required for specific functions or critical issues. As suggested above, auto updates for all other things are generally evil too. Manually update your AV/AS software once a week, and only do Driver Updates cautiously. If it aint broke...

Generally, any potential performance updates from drivers are very game specific, the recent Cyberpunk for example was a stand out example. Otherwise it is marginal to the point of being unnoticeable. Sometimes, feature sets are introduced such as RTX support for Game X Y or Z, so for such things you just need to monitor yourself.

You also seem to have cleared most of the clobber already, but you can include things like the GeForce Experience (unless you specifically use functions of it) on that list, and carefully go through every single application and feature set installed on your PC currently and rip out ruthlessly that which you haven't used recently, have no idea what it is, or ever will use. It is not a big deal to install an app for a single use once every couple of months such as a specific editing tool or whatever.

Google is your friend, but Toms has many articles on it as does Majorgeeks (along with piles of tools) for optimising Win10/11 and stripping out evil things like Indexing and replacing with Everywhere (for example) that just eat up resources and over time bog down the PC with collywobble and excess Windows tomfoolery (system restore point is another). This includes disabling and limiting certain Windows Services, things like (i mean really) Windows Phone, Tablet, Biometric (if not a laptop with fingerprint reader you actually use...), Shadow Copy and possibly even Print Spooler.

It really is a fine art, and it is of course YOUR PC, so do as you wish after reading as much as is available.

To close, if your issue was indeed the power connectors not being seated correctly, you may potentially have a damaged unit on hand. This will be very apparent after a burn test. Download a benchmark/test tool and leave it running for an afternoon, if nothing evil happens...woohoo!
 
its not a gpu issue. its a 13th gen intel cpu issue. you have to underclock it by a few hundred mhz. Test it. just set it to 5.2 ghz max and games will run again with no issues

Btw the issue is not with the intel cpus... my system was running fine until some update
 
Status
Not open for further replies.

TRENDING THREADS