[SOLVED] Constant BSOD While Gaming for Past 7 Months, Tried Hardware Swap, Troubleshooting, no Minidump, Custom Watercooling

Byte11

Distinguished
May 24, 2015
23
1
18,515
Basically, after a few minutes of playing pretty much any game, my PC blue screens with WHEA_UNCORRECTABLE_ERROR. I haven't been able to detect any patterns between CPU, GPU, RAM, Disk, or Network usage and my crashes, but I haven't really studied those stats. It always happens within 25 minutes. Also, my temps seem fine. I'm not overclocking either.

Here are my PC Specs:
GTX 1070
Ryzen 5600x
EVGA 860 Watt PSU
G skill Triden Z 3200
WD Black SN750 NVME
Crucial MX500
6TB Seagate 7200 RPM drive
Gigabyte B550 Auros Pro

Thermaltake Core P3 Open air case
EK P360 Watercooling Kit
Custom Micro Center Red and Black sleeved cables

Watercooling issue:
I've only had one leak with my watercooling when I was transporting the PC. The PC was off, it barely spilled anything, and I cleaned everything and let it sit for more than a day after the leak and I had no problems. The BSOD problems started maybe a month or two after the leak occurred. The leak was from my CPU block, a small amount got on my drive, the top PCI-E slot on the mobo, and the ribbon cable going to the GPU. The ribbon cable prevented any water from getting on the PSU, GPU, or on any of the lower PCI-E slots.

Troubleshooting steps:
  • I got my drive RMA'd by Western Digital. I'm pretty sure I had real drive issues combined with whatever this issue is, and WD Rma'd my drive, so now I've got a brand new drive
  • Swapped my GPU, I've got a spare GTX 970
  • Swapped my PSU, I'm gearing up for a GPU upgrade and I needed the extra PSU for a server I've got at home
  • Wiped Windows
  • Installed all drivers
  • Updated my mobo BIOS
  • Ran memtest 86, 4 passes no errors
  • Tried plugging my GPU directly into the mobo, without the riser cable,
  • Tried using the other PCI port
  • Tried Windows 11
  • I've been using Linux for the past few months (my linux install is on a seperate MX500), and I was able to game perfectly fine. I was using Proton, which presents its own challenges, so I wasn't able to game very often, but it did work.
  • There are no Minidump files
Trying without the riser cable and trying the other PCI port rules those out, so the only common denominator is my motherboard, but I want some more conclusive tests before I deal with replacing it, I'm pretty sure it's out of warranty.

I'm hoping debugging this is an interesting challenge for some of the veterans here (or maybe it's super easy and I totally missed something) because I really don't know what else to do at this point. One thing, I pretty much only push my PC when I'm gaming, so maybe some other targeted stress tests could reveal the problem, or at least eliminate some components from the equation. Maybe I should write the Minidump files to another drive?
 
Last edited:
Solution
Ok, I ran driver verifier without those two drivers, and it did boot to Windows, only difference was that everything was laggy. I started playing Witcher 3 and after about 10 minutes, another BSOD happened. This was was the same as the earlier ones, WHEA_UNCORRECTABLE_ERROR and no Minidump generated. I've also tried setting the minidump file folder to another drive, but it doesn't help.
bugcheck 0x124 is a panic bugcheck the last attempt to save info before the system dies. Sometimes you can get info sometimes not.
if you can see the value of the first parameter you can get an idea of the cause.

things that will cause this, overheating, power problems caused by GPU pulling too much power thru the pci bus. Is very common...

Byte11

Distinguished
May 24, 2015
23
1
18,515
I updated my BIOS a couple months ago, I guess a new update came out, I'll update but the issue has been persisting for a very long time

UPDATE: Updated my bios
 
Last edited:

Byte11

Distinguished
May 24, 2015
23
1
18,515
Also, I ran the CPU stress test and there was no blue screen. I also decided to run the GPU stress test and there was no blue screen there either, so it's definitely not a heat issue, the GPU was at 68 degrees and the CPU was at 62.
 

Byte11

Distinguished
May 24, 2015
23
1
18,515
I just realized that's only the memory dump. Here's a Google drive folder with the memory dump and the Minidumps: https://drive.google.com/drive/folders/1T86pylVIrRYGDMunLdJWIHWJ9_nCYWQP?usp=sharing. One thing I realized is that the issue was supposedly on the valorant drivers, but I uninstalled valorant and the issue persisted, so I ran driver verifier again and here are the results. Also, I had a BSOD while not in a game and this time it was able to write a Minidump, so that's in there too.
 
I just realized that's only the memory dump. Here's a Google drive folder with the memory dump and the Minidumps: https://drive.google.com/drive/folders/1T86pylVIrRYGDMunLdJWIHWJ9_nCYWQP?usp=sharing. One thing I realized is that the issue was supposedly on the valorant drivers, but I uninstalled valorant and the issue persisted, so I ran driver verifier again and here are the results. Also, I had a BSOD while not in a game and this time it was able to write a Minidump, so that's in there too.
first memory dump had verifier turned on. it found a bug in a network driver:
netr28ux.sys Thu May 28 07:28:52 2015
the bug was it was allocating regular pool memory rather than non execute pool memory. system was up for only 10 seconds.
netr28ux.sys | Sysnative Forums
info on driver and update. most likely just a change in recommendations on building a driver to prevent malware infections. non execute memory keeps the code separate from the data.
[CPU Information]
~MHz = REG_DWORD 3693
Component Information = REG_BINARY 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Configuration Data = REG_FULL_RESOURCE_DESCRIPTOR ff,ff,ff,ff,ff,ff,ff,ff,0,0,0,0,0,0,0,0
Identifier = REG_SZ AMD64 Family 25 Model 33 Stepping 0
ProcessorNameString = REG_SZ AMD Ryzen 5 5600X 6-Core Processor
Update Status = REG_DWORD 1
VendorIdentifier = REG_SZ AuthenticAMD
10: kd> !sysinfo machineid
Machine ID Information [From Smbios 3.3, DMIVersion 0, Size=2611]
BiosMajorRelease = 5
BiosMinorRelease = 17
BiosVendor = American Megatrends International, LLC.
BiosVersion = F15c
BiosReleaseDate = 05/11/2022
SystemManufacturer = Gigabyte Technology Co., Ltd.
SystemProductName = B550 AORUS PRO
SystemFamily = B550 MB
SystemVersion = Default string
SystemSKU = Default string
BaseBoardManufacturer = Gigabyte Technology Co., Ltd.
BaseBoardProduct = B550 AORUS PRO
BaseBoardVersion = Default string


---------------------------------

second bugcheck was also due to verifier testing
system was up for 4 seconds
driver Riot Vanguard\vgk.sys
allocated pool that should have been no execute
same type of issue as the first bugcheck.
Code Integrity Issue: The caller specified an executable pool type. (Expected: NonPagedPoolNx)
vgk.sys Fri Apr 8 12:12:35 2022

this rule is to make it harder for malware to infect data buffers and attempt to run data as code.

minor bug, the driver should be fixed but you can run verifier.exe and use the option /driver.exclude to skip drivers that you already know have some minor problem.
example
verifier.exe /driver.exclude vgk.sys netr28ux.sys

would exclude these two driver from further testing and let you look for more problems
 
Last edited:

Byte11

Distinguished
May 24, 2015
23
1
18,515
Ok, I ran driver verifier without those two drivers, and it did boot to Windows, only difference was that everything was laggy. I started playing Witcher 3 and after about 10 minutes, another BSOD happened. This was was the same as the earlier ones, WHEA_UNCORRECTABLE_ERROR and no Minidump generated. I've also tried setting the minidump file folder to another drive, but it doesn't help.
 
Ok, I ran driver verifier without those two drivers, and it did boot to Windows, only difference was that everything was laggy. I started playing Witcher 3 and after about 10 minutes, another BSOD happened. This was was the same as the earlier ones, WHEA_UNCORRECTABLE_ERROR and no Minidump generated. I've also tried setting the minidump file folder to another drive, but it doesn't help.
bugcheck 0x124 is a panic bugcheck the last attempt to save info before the system dies. Sometimes you can get info sometimes not.
if you can see the value of the first parameter you can get an idea of the cause.

things that will cause this, overheating, power problems caused by GPU pulling too much power thru the pci bus. Is very common. in this case the motherboard detects the problem and resets the cpu.
if you can get a any minidump with this error, always look at the system up time as a hint at the cause. if the motherboard reset the CPU then the system up time will be very short (under 15 seconds) and you should ignore this error and focus on why the motherboard reset the cpu. IE motherboard detects over current on the PCI/e bus
but note: a power supply can also detect too much current going to a graphics card via the graphic card supplemental power connection. (generally happens if you overclock or use a splitter for the psu to gpu power connection) if the psu detect a issue then it can send a signal to the motherboard and the motherboard will reset the cpu to stop the high power condition. This happens with overheated power supplies, or if a splitter is used, and I have seen it happen with bad connections where the metal connector got pushed out of the plastic holder when the power supply lead was connected to the motherboard main socket, the gpu power connector or even the CPU supplemental power connector.

you can also sometimes not get a memory dump when the error is in the drive controller driver. (generally you would get a different bugcheck code)

running verifier will be laggy for a game since it is going to do a lot of overhead checking every memory allocation from the network to the gpu driver allocations.

for the most part, you would not run verifier for a bugcheck 0x124 error. if you wanted to debug it you would just set the memory dump to kernel or full and compress it if you wanted to send it out to be looked at. full memory dump is useful if you are running a game otherwise you can not trace into the 32bit subsystem. but for bugcheck 0x124 a kernel dump will show the bus and what was running and give a good idea as to the problem.

--------
note: make sure you do not have any overclock in bios
make sure you do not install any overclock tools like easy tune or anything to speed up your GPU. These will get marked as suspect for this kind of problem.

you might even set the system to run in high performance mode since so many things are getting messed up due to sleep functions.
 
Last edited:
  • Like
Reactions: Byte11
Solution

Byte11

Distinguished
May 24, 2015
23
1
18,515
I've replaced the PSU and all the cables it came with, but I didn't remove the sleeved extension cables attached to them, and removing them worked!!! Been testing it for a bit and no blue screens. I've had these cables for 4 years and there was no isssue. I've unplugged them multiple times before too, so it's not like they were improperly attached. I'm super pissed off, I should've spent the extra $20 on cable mod cables instead of Inland, but normally Micro Center is really good and these cables seemed so solid.

That solved the WHEA_UNCORRECTABLE_ERROR bsod, but I'm still getting another bsod with nvddlk.sys. Once it happened just on my desktop, but every other time it happened in a game. I was messing around with minecraft shaders in FTB, and I would teleport (meaning that everything around me had to be loaded at once) and sometimes it would BSOD. I tried swapping back from the 970 to the 1070, and performance was greatly improved, GPU usage fell to 60%, and the BSOD rate went down, but it was still there. My temps were good on both cards too. I'm also getting these slight freezes every few seconds, but I think that's more a Minecraft issue; I'm more concerned about the blue screens.

Here's my minidumps: https://drive.google.com/drive/folders/1T86pylVIrRYGDMunLdJWIHWJ9_nCYWQP?usp=sharing

The last one was with a 1070 and the rest were with a 970. Thanks!
 
I've replaced the PSU and all the cables it came with, but I didn't remove the sleeved extension cables attached to them, and removing them worked!!! Been testing it for a bit and no blue screens. I've had these cables for 4 years and there was no isssue. I've unplugged them multiple times before too, so it's not like they were improperly attached. I'm super pissed off, I should've spent the extra $20 on cable mod cables instead of Inland, but normally Micro Center is really good and these cables seemed so solid.

That solved the WHEA_UNCORRECTABLE_ERROR bsod, but I'm still getting another bsod with nvddlk.sys. Once it happened just on my desktop, but every other time it happened in a game. I was messing around with minecraft shaders in FTB, and I would teleport (meaning that everything around me had to be loaded at once) and sometimes it would BSOD. I tried swapping back from the 970 to the 1070, and performance was greatly improved, GPU usage fell to 60%, and the BSOD rate went down, but it was still there. My temps were good on both cards too. I'm also getting these slight freezes every few seconds, but I think that's more a Minecraft issue; I'm more concerned about the blue screens.

Here's my minidumps: https://drive.google.com/drive/folders/1T86pylVIrRYGDMunLdJWIHWJ9_nCYWQP?usp=sharing

The last one was with a 1070 and the rest were with a 970. Thanks!
looked like a bug where something wanted to write text to the graphic screen, took ownership of a lock and never released it. 16 threads were waiting, the system figured the graphic driver stopped working and attempted to reset it. After some timeout the system called a bugcheck.
I might be able to look at the waiting 16 threads to determine what program is NOT causing the problem.
(have to run in a few minutes, though) (guess it will turn out to be a issue with overwolf)

I would remove gdrv3.sys using microsoft autoruns.
-------
wm.exe
overwolfhelper64.exe
desktop window manager
and 16 threads were trying to get access to the gpu
5.9 million times and dwm.exe attempted to reset the gpu
and failed and called a bugcheck.

I would remove the gpu overclock driver from gigabyte.

any idea what gv.exe is for
it was terminated but still listed as running

looks like gigabyte interface to overclock gpu.
(you should disable)
 
Last edited:

Byte11

Distinguished
May 24, 2015
23
1
18,515
I don'thave a gigabyte gpu, but I do have their software for controlling my mobo. Honestly, if it's just some freak error with overwolf then I should be good, also if it's a gpu issue, I'm buying a new one anyways. As long as the rest of the system hardware is good, then I'm good. Thanks for your help!
 
Manufacturer Gigabyte Technology Co., Ltd.
Product Name B550 AORUS PRO

running the gigabyte tools including graphics overclocking driver and its interface. download and run autoruns from here.
Autoruns for Windows - Windows Sysinternals | Microsoft Docs
find the meu item to hide microsoft entries.
look for
C:\Windows\System32\drivers\gdrv3.sys Fri Nov 5 01:35:11 2021
disable it and reboot.

the interface tool was terminated while holding a lock . Other processes can not get the lock until the app releases it and it will not since its code is no longer running. result is a dead locked system which leads to a graphic timeout and a bugcheck to stop the system.
 

Byte11

Distinguished
May 24, 2015
23
1
18,515
Just did it and all my LEDs are still working with no bsod on the desktop so far. If one comes up, I'll make a new post since it's a pretty different issue. I played Rocket League last night for a few hours and no issues! I've really missed gaming, thanks for all your help!