Question GPU Crash Black Screen Fans Max Audio Plays - Vega 64

Dec 4, 2019
5
0
10
I built a PC over the summer using all new parts and have been having the issue described above since pretty much the start, I have tried a few things/ just put up with it but it hasn't helped. There isn't a regular time this happens and isn't limited to a single use, happens in CPU intensive games, GPU intensive and even whilst running crypto mining software. Another potential useful detail is the power led indicators on the GPU all turn off and the VGA EZ DEBUG LED on the motherboard lights up when this crash occurs. Very rarely I may get a blue screen with the thread stuck in device driver error message if that gives any indication of the issue. The fan does not go to max during BSODs but during the typical crash, it goes to 100% most of the time.

My specs are:
Ryzen 5 3600 - Stock cooler
MSI Vega 64 Air Boost
Corsair Vengeance LPX 2x8GB 3000MHz C15 RAM
Intel 660p 1TB NVMe SSD - Boot Drive
MSI B450 Gaming Plus
Corsair CX750M 750W Bronze PSU - Grey label version
Windows 10

Some of the things I have tried to try fix/ find the cause are (in no particular order, just trying to remember them all):
Update/roll back graphics drivers including using DDU and safe mode for this
Increase/decrease power limit of the GPU using wattman and afterburner
Update windows
Update/roll back BIOS (limited to the first BIOS that supports Ryzen 3000)
Increase/decrease RAM frequency/timings
Use XMP and non-XMP
Ran memtest86 multiple times with no errors
Ran prime95 for multiple hours with no issues
Moved GPU to different PCIe slot
Checked temperatures during use and nothing abnormal
Took side panel off and turned a fan on to cool parts more just in case
Used sfc and dism commands to check windows files (at first there were corruptions but now it comes up clean and issue persists)
Used windows install media to repair instillation
Checked PSU voltages using BIOS/HWiNFO64 and all seem within reason
Checked event viewer for any details with the only issue being the one related to windows not shutting down properly due to having to hold the power button to turn it off
Tried daisy chain and separate power cables to GPU
Inspected parts for physical damage
Cleaned dust on parts
Ensured parts seated correctly
Fresh Windows 10 install

The performance is as expected outside of this issue, there is no physical warning of this crash and due to some of the things listed above, I don't think it is a hardware issue but can't rule it out. I don't have any spare hardware to try out and I need the GPU to get a display output so it has to stay in the system. I would prefer to avoid getting replacement parts due to being at university and needing my computer but if push comes to shove it is possible as a last resort.

I hope that gives a good overview of my issue and any and all help would be appreciated. If any additional tests can be done or more information needed just let me know and I will see what I can do. Any help or advice would would be much appreciated as I have no idea what else can be done.
 
Dec 4, 2019
5
0
10
I would start with using a hw monitoring app to check your voltages while you game. My second troubleshooting move would be to uninstall the gpu drivers and reinstall them. https://www.amd.com/en/support/grap...eries/radeon-rx-vega-series/radeon-rx-vega-64 If you are using the December 2nd drivers then I would try the September 23RD drivers and vice versa.
Thanks for your reply.

I have tried using various versions of the drivers (including trying DDU to remove other versions completely) and the issue still persists.

In game, the GPU is usually in power state 6 so the voltage should be 1.150V at stock setting according to Wattman and Afterburner but the GPU core voltage I saw using RTSS/HWiNFO64 was about 1.050V. The GPU memory voltage appears to be stuck at 1.356V which seems very high, even when manually changing the voltage in Wattman it does not change. I'm not sure if this some sort of bug with HWiNFO64 or it is actually that high. There wasn't anything drastically different on the on screen display just before the crash.

Another thing I tried is installed windows onto an external hard drive and boot using that with the same result so the SSD seems fine.

Would it be some software issue or hardware issue assuming the readings are correct?
 
Dec 4, 2019
5
0
10
When you built the pc did you download the ide/sata, chipset, audio and lan drivers from the motherboard web site?
Yes I installed all the drivers from the motherboard vendor's website and also the NVMe drivers for the SSD from Intel. I done a clean install of the chipset drivers as there has been an update since then but the crashes still happen.
The only thing that seems to delay the crashes at the moment is turning quality down and capping the FPS so the GPU is not being used fully if that helps.
 
Dec 4, 2019
5
0
10
Try uninstalling all overclocking apps for the gpu. Then uninstall the driver for the gpu once again and install them. See if it goes away with the gpu running at it's normal settings.
I uninstalled Afterburner and GPU drivers (using DDU), reinstalled GPU drivers and the crash happened yet again. Thank you for the effort by the way, I wish I could try more but I'm completely out of ideas :LOL: