Question Random crashes, no display after reboots, potential VRAM/driver instability ?

CiaranM578

Prominent
Feb 6, 2023
5
0
510
I bought an MSI RTX 3080 Ti Trio used off eBay from a reputable seller some months ago, card came in excellent condition and working order giving 0 issues up until a couple days ago. Playing when I get my first crash, screen flickered crashed, black screen, reboot to windows, this happened a couple times after retrying a game, until eventually after crashes it just stays off and doesn’t give a display. Put different gpu in system to boot in safe mode and uninstall drivers using ddu before putting original card back in and installing new drivers.


The GPU initially worked after reinstalling drivers, with several hours of gaming and idle stability. After leaving the PC off for ~20 hours, I started a game and returned to find the system had crashed. Upon reboot, there was no display output, and testing confirmed the issue is isolated to the GPU (motherboard, PSU, and other components were ruled out using another GPU).

2nd attempt to reinstall drivers after performing a clean removal with DDU (in Safe Mode) resulted in a blue screen during installation, with the error: "Attempted to write to a read-only segment." Post-crash, the system rebooted without a display.3rd attempt at cleaning and reinstalling drivers worked with no issues (note the first time I reinstalled the most up to date driver so this time I tried with slightly older driver)but of course when put under stress tests in furmark the system crashe.Additionally, after installing MemTestG80, artifacts such as flickering and black squares briefly appeared while the system was idle(with memtestg80 not even opened for the first time), even without drivers installed.

* Tested the GPU in another monitor and cable setup: still no display output.
* Another GPU works flawlessly in the same system, ruling out issues with the motherboard, PSU, or display.
Driver Diagnostics:
* NVIDIA drivers installed successfully after a clean removal via DDU, but crashes persisted during stress or testing.
VRAM Diagnostics:
* Ran MemTestG80 to test VRAM but encountered crashes and app closure during testing.
* Artifacts (flickering, black squares) suggest potential VRAM instability or hardware-level faults.
System/Power Monitoring:
* Power delivery to the GPU monitored via MSI Afterburner during the stable period; no anomalies detected.
* Temperatures were within safe operating ranges under load (no signs of overheating or throttling).
RAM and CPU Stability:
* System RAM passed MemTest86 with no issues.
* No signs of CPU-related instability or thermal issues.
Any ideas what’s going on here? Pretty expensive card to be failing on me…


Ps. Some extra info when analysing dumps in who crashed it’s showing kernel nvidia driver faults.

Thanks!
 
You can try underclocking the memory to see if it improves, but if one of the chips is failing not much to do.

Just because some of the GPU sensors are returning good temperatures doesn't mean everything is. May be worth taking the card apart and making sure all the memory has good contact with thermal pads, re-doing the paste.

Visual inspection for PCB cracks and the like.

You can get it repaired potentially, but it will be a few hundred if that is worth it to you.
 
  • Like
Reactions: CiaranM578
When posting a thread of troubleshooting nature, it's customary to include your full system's specs. Please list the specs to your build like so:
CPU:
CPU cooler:
Motherboard:
Ram:
SSD/HDD:
GPU:
PSU:
Chassis:
OS:
Monitor:
include the age of the PSU apart from it's make and model. BIOS version for your motherboard at this moment of time.

Upon reboot, there was no display output, and testing confirmed the issue is isolated to the GPU (motherboard, PSU, and other components were ruled out using another GPU).
Might want to also mention the GPU used to troubleshoot your system. Please be wary that an RTX3080Ti equipped system would need a 850W+ PSU due to transient load spikes.

Any ideas what’s going on here? Pretty expensive card to be failing on me…
Possible that the card's conked out after being in someone's else's hands prior to your purchase
 
You can try underclocking the memory to see if it improves, but if one of the chips is failing not much to do.

Just because some of the GPU sensors are returning good temperatures doesn't mean everything is. May be worth taking the card apart and making sure all the memory has good contact with thermal pads, re-doing the paste.

Visual inspection for PCB cracks and the like.

You can get it repaired potentially, but it will be a few hundred if that is worth it to you.
I haven’t tried underclocking memory yet , will try that today, thanks. If all else fails I probably will just pay for a repair as a couple hundred is better than paying 600+ for a new one…
 
When posting a thread of troubleshooting nature, it's customary to include your full system's specs. Please list the specs to your build like so:
CPU:
CPU cooler:
Motherboard:
Ram:
SSD/HDD:
GPU:
PSU:
Chassis:
OS:
Monitor:
include the age of the PSU apart from it's make and model. BIOS version for your motherboard at this moment of time.

Upon reboot, there was no display output, and testing confirmed the issue is isolated to the GPU (motherboard, PSU, and other components were ruled out using another GPU).
Might want to also mention the GPU used to troubleshoot your system. Please be wary that an RTX3080Ti equipped system would need a 850W+ PSU due to transient load spikes.

Any ideas what’s going on here? Pretty expensive card to be failing on me…
Possible that the card's conked out after being in someone's else's hands prior to your purchase

PC Specs
Mobo: Gigabyte B450M DS3H
CPU: Ryzen 5 3600
GPU: MSI RTX 3080 Ti Trio
RAM: 16GB
PSU: Seasonic g80 750w (brand new)

Spare GPU is a GTX 1050 Ti
 
Upon reboot, there was no display output, and testing confirmed the issue is isolated to the GPU (motherboard, PSU, and other components were ruled out using another GPU).
Might want to also mention the GPU used to troubleshoot your system. Please be wary that an RTX3080Ti equipped system would need a 850W+ PSU due to transient load spikes.

Any ideas what’s going on here? Pretty expensive card to be failing on me…
Possible that the card's conked out after being in someone's else's hands prior to your purchase
I was going to say, a RTX 3080Ti would need a 850W ATX 3.0 or a 1000w ATX 2.0 psu for transient load spikes.
 
  • Like
Reactions: CiaranM578
I was going to say, a RTX 3080Ti would need a 850W ATX 3.0 or a 1000w ATX 2.0 psu for transient load spikes.
Power could definitely be an issue… although the fact that it worked for so long without any issues seems strange in that case.. the ryzen 5 shouldn’t require too much power either so you’d think it would leave enough for the gpu but still is possible, online says 750 should be good for this combo but like you said spikes could require more.
 
You are trying to run a custom 3080Ti on a 750w PSU, from my personal experience I can tell you - this won't fly.

When I bought 3080Ti Aorus Master, I also ran it on 750w Sensonic Prime PSU, which was as premium as it gets at the time. 99% of the time it ran okay, but there was a specific game in a specific scene where I had crashes in about 90% of the cases.

After upgrading PSU to 1000w, this never happened again. I am certain you experience this exact issue, where you finally found a game where for some reason power spike happens and it results in a hard crash that requires a reboot.
 
Just an update, I managed to keep the card stable enough for long to under volt under clock and try reduce power usage , but when trying to run stress test (furmark) it crashes after a couple seconds unfortunately…
 
Furmark is quite punishing, I would try a game or a simpler benchmark to see if it is stable under more normal conditions. If power is the problem then that means buying or borrowing a PSU, or having the GPU tested somewhere else.

To the above the PRIME PSUs available before the launch of the 30 series were well known to have issues with power spikes, basically why ATX 3.0/3.1 was created since even many older high end PSUs would still trip overcurrent protections with the 30 series cards.