[SOLVED] STOP 0xD1 when playing full-screen games

NotWD · Feb 6, 2020

Okay, so, recently I've started experiencing NVIDIA kernel-mode display driver crashes and blue screens (STOP code in thread title, of course) when placing significant load upon my six-year-old GTX 760. There is a small wrinkle here, though:

Gaming in full screen will absolutely lock up the whole rig and produce a D1 STOP error if I don't hold the power button down. These STOPs all blame the NVIDIA kernel mode graphics driver.
When NOT in full screen and applying significant stress to the GPU (for example, when using the Unreal Editor), the GPU driver seems to just crash silently and recover. UEd is good at handling this situation so I don't get a screen flicker, it just locks up, crashes and leaves the OS intact.

In the split second before a full-screen game would crash Windows and generate a D1, though, I get artifacting. Usually snowflake artifacts, though some games like FFXIV will let me play for a moment with alpha-to-coverage (black square) artifacts before BugChecking into a D1 STOP. Since I'm running in maximum power mode in an oven of an apartment, GPU temperature usually climbs to around 72 C before crashing while CPU temp usually hovers around 64 C.

Things I have tested to eliminate components and drivers:

Loading the RAM with contiguous data to eliminate memory errors. I have run a particularly heavy FL Studio project (full of instances of EastWest PLAY 6 with Hollywood Orchestra parts loaded, for an idea of just how much data is being loaded into RAM). No crash.
Running a heavy DSP chain with Ozone 8 on the master in that same project, which has FL's CPU meter hovering around 70 in the same project. No crash.
PLAY 6 will stream its audio data from disk when RAM is unavailable. Since I have 16 GB, it has to do this regularly. This does not cause any issues.
Checked the motherboard for busted caps. There are none.
Reinstalled the NVIDIA drivers using the Display Driver Uninstaller in safe mode. They still crash when the GPU is under stress.

This leaves two components: the old, generic PSU (I primarily bought this machine as an upgrade platform) or the 760. Included in the ZIP are the two most recent minidumps, each a week old: Minidumps.zip.

I've looked at these in windbg myself a couple of times, and one thing in particular sticks out to me, though I'm unsure of its significance: It seems like the driver is trying to write to null at an IRQL of DISPATCH_LEVEL. I'm not sure if this means null in VRAM or system memory, or even how significant it is in relation to the STOP errors. Even though I have my suspicions about the cause, a second opinion here would help out a lot.

SPECS:

ASRock X99 Extreme3
Intel Core i7-5820K [stock clocks]
Corsair H75 dual-fan liquid cooler
16 GB DDR4-2133 [Micron]
1 TB WD Blue main drive
6 TB WD MyBook external drive
Gigabyte WindForce GTX760OC [2 fans]
Audient iD4 USB audio interface
Generic 550W PSU [replacement planned during tax season]
Windows 10 build 1909
NVIDIA GeForce Drivers, version 441.66

Thanks for any help,
NotWD

Colif · Feb 7, 2020

Bug Check 0xD1: DRIVER_IRQL_NOT_LESS_OR_EQUAL

Nvidia drivers rarely cause IRQ errors. I mean, most of the time they throw other errors and you have to fox them out. Being obvious is pretty rare.

Have you tried running DDU and using either older drivers from Nvidia or run windows update and let it find Nvidia drivers that are more stable than the current ones - https://forums.tomshardware.com/faq...n-install-of-your-video-card-drivers.2402269/

having newest drivers from Nvidia these days is hit and miss and I see lots of misses.

I will ask a friend to read the dumps

define generic PSU, do you know its brand? How old is it?

NotWD · Feb 7, 2020

Colif said:
define generic PSU, do you know its brand? How old is it?

As with most (if not all) iBUYPOWER rigs, this is an Apex 80+ Bronze unit, which I've heard are VERY hit or miss. I'd consider this particular unit a home run, since it's lasted six years pretty much to the day.

The plan is to swap it out for a Seasonic Focus GX650 650W 80+ Gold unit, which at the time of writing, has over 450 reviews on the Egg and is holding steady at a 5 Egg rating.

As for letting Windows Update install older drivers, I haven't tried that yet, no. I'll get on that later tonight when things settle down a bit here.

NotWD · Feb 7, 2020

Okay, used DDU to roll back to the last stable driver according to Windows Update (432.00) and opened WoW for about 5 minutes.

On the default "optimal power" setting, three NVIDIA driver crashes, followed by strange display anomalies on exit (parts of taskbar turning black, WoW character select screen having model corruption). No BSoD.

On the maximum power setting, one (1) driver crash, no artifacting on exit, no BSOD.

Hm... still not quite conclusive, but given the problems worsen in Optimal Power mode, might suggest a PSU problem? I don't know. These drivers certainly seem more stable, though.

E: Hmm... UE4 crash log might provide a bit of insight here.

Fatal error: [File: D:/Build/++UE4/Sync/Engine/Source/Runtime/Windows/D3D11RHI/Private/D3D11Util.cpp] [Line: 198] Unreal Engine is exiting due to D3D device being lost. (Error: 0x887A0006 - 'HUNG')

UE4Editor_D3D11RHI
UE4Editor_D3D11RHI
UE4Editor_D3D11RHI
UE4Editor_D3D11RHI
UE4Editor_RHI
UE4Editor_Renderer
UE4Editor_Renderer
UE4Editor_Renderer
UE4Editor_Core
UE4Editor_Core
UE4Editor_RenderCore
UE4Editor_RenderCore
UE4Editor_Core
UE4Editor_Core
kernel32
ntdll

Looks like the GPU is hanging when put under stress.

Colif · Feb 8, 2020

NotWD said:
The plan is to swap it out for a Seasonic Focus GX650 650W 80+ Gold unit, which at the time of writing, has over 450 reviews on the Egg and is holding steady at a 5 Egg rating.

I can't argue with seasonic. I hope you don't have to wait too long, the problem could be gpu or PSU, both are about same age and PSU at 6 is in the time zone of replacements. Even good brand PSU get a little suspect around 6 years.

I sent dumps to a friend but he hasn't replied yet. Hopefully tonight he will. Otherwise I can ask others to look for you.

NotWD · Feb 8, 2020

Colif said:
I can't argue with seasonic. I hope you don't have to wait too long, the problem could be gpu or PSU, both are about same age and PSU at 6 is in the time zone of replacements. Even good brand PSU get a little suspect around 6 years.

Yeah, sadly it's against the laws of physics for electronics to last forever. I wish these limitations didn't exist at times, even if some classes like CRT screens can last decades. Thankfully though, tax refunds come in around early-mid April if done online in March in Canada, so I shouldn't be waiting too long. It just sucks how much rate of exchange adds to the price, though. When all is said and done, the new PSU and 2060 will have run me about $800. 🤢

Colif said:
I sent dumps to a friend but he hasn't replied yet. Hopefully tonight he will. Otherwise I can ask others to look for you.

Given the state of this machine at the moment, I'm not about to try stressing the GPU again until I can get the parts swapped out, so I can wait.

gardenman · Feb 8, 2020

Hi, I ran the dump files through the debugger and got the following information: https://unprofessedcase.htmlpasta.com/

File information:	013120-38875-01.dmp (Jan 31 2020 - 15:04:08)
Bugcheck:	DRIVER_IRQL_NOT_LESS_OR_EQUAL (D1)
Driver warnings:	*** WARNING: Unable to verify timestamp for nvlddmkm.sys
Probably caused by:	memory_corruption (Process: ShellExperienceHost.exe)
Uptime:	0 Day(s), 0 Hour(s), 23 Min(s), and 49 Sec(s)

File information:	013020-37828-01.dmp (Jan 30 2020 - 06:40:21)
Bugcheck:	DRIVER_IRQL_NOT_LESS_OR_EQUAL (D1)
Driver warnings:	*** WARNING: Unable to verify timestamp for nvlddmkm.sys
Probably caused by:	memory_corruption (Process: System)
Uptime:	4 Day(s), 5 Hour(s), 00 Min(s), and 30 Sec(s)

(This may have already been mentioned above, if so, ignore it).
The nvlddmkm.sys file is a NVIDIA graphics card driver. There are a few things you can do to fix this problem. First off, try a full uninstall using DDU in Safe Mode then re-install the driver (more information). Or try getting the latest version of the driver. Or try one of the 3 most recent drivers released by NVIDIA. Drivers can be found here: http://www.nvidia.com/ or you can allow Windows Update to download the driver for you, which might be a older/better version.

Possible Motherboard page: https://www.asrock.com/mb/Intel/X99 Extreme3/
I can't tell if there are BIOS updates or not due to the confusing mess on their website.

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.

NotWD · Feb 8, 2020

Memory corruption? Normally that would worry me but in this case it may provide a clue.

UE4 crashes "gracefully" (i.e.: without taking Windows with it) and its crash reporter points toward a hung (and momentarily missing) GPU. This confuses the less-than-stable 441.xx drivers into crashing the whole system in any other scenario by trying to write VRAM data to null in system memory since VRAM is inaccessible. The 432.00 drivers are more stable, so they recover fast enough that Windows doesn't panic.

All that's left is figuring out what's causing the old 760 to hang when under pressure.

(Note: this isn't actually solved yet. Selecting something as Best Answer automatically marks as solved. I marked gardenman's post as it may have cleared the problem up quite a bit.)

Colif · Feb 8, 2020

well, those 2 sure do mention Nvidia but if you unsure, the mouse drivers are from 2010 so maybe update those too
you seem to have 2 different mice. Elecom & iBall?

Date	Driver	Description
Oct 04 2010	ElcMouLFlt.sys	ELECOM Mouse Device driver
Nov 30 2010	ElcMouUFlt.sys	ELECOM USB Driver for the Mouse Device

Dec 03 2012

t_mouse.sys

Advanced Mouse driver from iBall

any driver older than 2015 is possible cause of problems, you have a lot of them.

Feb 12 2014	vbaudio_hfvaio64_win7.sys	VB Virtual Audio Device driver
Aug 14 2014	vbaudio_cable64_win7.sys	VB Virtual Audio Device driver https://www.vb-audio.com/

Nov 25 2014

BazisVirtualCDBus.sys

WinCDEmu Virtual CD-ROM driver (Bazis Inc) http://wincdemu.sysprogs.org/

your realtek sound drivers are fairly old, can get newer from here - grab the drivers in post under the information one - https://www.tenforums.com/sound-audio/135259-latest-realtek-hd-audio-driver-version-2-a.html

Not sure what Nvidia have done but almost every new driver since October has caused BSOD for people with older cards. Better to just run windows update and use the older Nvidia drivers it has.

NotWD · Feb 8, 2020

I should probably remove the Realtek drivers since I don't even use the motherboard OR monitor outputs, I use the iD4 interface! VB virtual cable hasn't caused any problems wrt Windows or application crashes, it's there so I can apply a software processing chain to my microphone input and it does its job without issue.

The double mouse drivers is certainly weird though, wonder where those came from?

E: Actually, may have an idea, and it's something I forgot to list! I have a Logitech G910 RGB keyboard, so it's very possible (and quite likely) that G Hub installed a second driver.

As for WinCDEmu, I think I can place the blame squarely on Native Instruments for installing that particular out-of-date driver, as Kontakt Player (a sampler VST plugin) has support for direct ISO mounting I believe.

As for the NVIDIA drivers, even using the older 432.00 set, I still get GPU hangs. No STOP errors, though.

Colif · Feb 8, 2020

I just letting you know. If they work fine, cool.

When was last time you clean installed? The mice drivers could just be old, the Elecom ones are for win 7. As are the iball (likely).

If you aren't using either, you could use autoruns to stop them loading at startup - if any program needs the drivers it can still run them - https://docs.microsoft.com/en-us/sysinternals/downloads/autoruns

but they may not be problems. See if it was just nvidia.

NotWD · Feb 8, 2020

This machine was pretty much a direct Windows 8.1 upgrade, which probably explains some of the egregiously old drivers to be honest. There's been no problems with them, though.

I've reinstalled the GPU drivers twice now, so I have my doubts it's the drivers so much as the aged hardware finally, well, showing its age. I cleaned it out when I replaced the cooler a few months back, but there's been tons of construction activity here over the past couple months so there's likely more dust in there now.

Colif · Feb 8, 2020

Its possible, as I said before, its really odd for Nvidia drivers to actually cause IRQ errors. I see them causing all sorts of other ones but not the obvious driver ones. Those 2 mention Nvidia drivers in the stack text, so its pretty clear.

As i said, it could be PSU or GPU. They both about same age. Power can make good parts look bad, or go bad. I killed a stack of hdd in the early 00's by using bad PSU. I didn't know any better. I have killed a GPU by trying to make it do things it couldn't.

GPU that are going bad often won't let you install drivers on them, yours seems to accept them, its just the Nvidia drivers themselves that are to blame here,

Obvious answer - give it another clean.

NotWD · Feb 8, 2020

Colif said:
Obvious answer - give it another clean.

I'll try that out once I have access to some more canned air (hopefully this week coming; Costco was packed to the gills so I wasn't even able to get in, let alone pick up a six pack of Falcon air dusters haha).

The main pointer towards it possibly being a power issue is that it only happens when the GPU tries to draw more power (past base clock, in the case of maximum perf mode), and the issue worsens when the card is in optimal power mode. This could also suggest a heat spiking problem, given the dust collection that happens in this complex.

Curious quirk: The card will often render some frames completely fine before it hangs.

Colif · Feb 8, 2020

you can always monitor the sensors in PC during one of these periods to see if the logs show any obvious clues like the PSU rails dipping way below what they should be - this shows the variances allowed - https://www.lifewire.com/power-supply-voltage-tolerances-2624583

Download HWINFO - https://www.guru3d.com/files-details/hwinfo64-download.html
when you run it, tick Sensors only and click run
in next view, along bottom there are a number of buttons. You want the one next to the clock that says "Logging start". It opens file explorer so you can create a file.

run it in background, the output is a CSV file that excel and (probably) Google docs can access. You can upload it to a file sharing site and show here if you like...

NotWD · Feb 8, 2020

Great suggestion, as it provided more information (albeit indirectly)!

I opened Unreal Editor to test this as it is guaranteed to trigger the issue without crashing the machine, thereby creating a clean log file, but curiously, this time the crash report's D3D device lost "reason" changed:

Fatal error: [File: D:/Build/++UE4/Sync/Engine/Source/Runtime/Windows/D3D11RHI/Private/D3D11Util.cpp] [Line: 198] Unreal Engine is exiting due to D3D device being lost. (Error: 0x887A0005 - 'REMOVED')

In the moments leading up to this driver crash, Event Log became absolutely flooded with nvlddmkm errors (Event ID 13), here's a sampling of them:

The description for Event ID 13 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
Graphics Exception: Shader Program Header 18 Error

- other header numbers also showed up

The description for Event ID 13 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
Graphics Exception: ESR 0x405840=0xa2040a0e

- one other, nearby hexadecimal value showed up for the "right operand": 0xa2040a06

Here is the log leading up to and shortly after the driver crash and loss of GPU device: Logs-NvidiaCrash.CSV

Temperatures are elevated due to higher-power modes active on CPU and GPU. The CPU idles at about 46 due to this but never breaches 70 thanks to the H75.

NotWD · Feb 9, 2020

Well, another day, another blasted DRIVER_IRQL_NOT_LESS_OR_EQUAL with the same characteristics (driver attempting a null write at DISPATCH_LEVEL) blaming the same NVIDIA kernel mode display driver. This time's a little odd, though, since I have a full 2GB memory dump, but the minidump is empty so it's of no use to anyone!

This one, however, came as I started up a meme video on YouTube, and on the supposedly more stable 432.00 drivers, no less. I'm not going to upload a 2 GB full memory dump that'll take gardenman possibly days to download, so we're possibly a bit stuck.

Stranger still, though, is that there were none of the usual precursor--driver crashes--Windows just threw a STOP in my face.

E: Never mind about the minidump! It just took a while to write from the main dump I guess. Here it is: 020920-32828-01.dmp

gardenman · Feb 9, 2020

I ran the dump file through the debugger and got the following information: https://slouchiestechidna.htmlpasta.com/

File information:	020920-32828-01.dmp (Feb 9 2020 - 16:09:12)
Bugcheck:	DRIVER_IRQL_NOT_LESS_OR_EQUAL (D1)
Driver warnings:	*** WARNING: Unable to verify timestamp for nvlddmkm.sys
Probably caused by:	memory_corruption (Process: System)
Uptime:	1 Day(s), 22 Hour(s), 25 Min(s), and 34 Sec(s)

There are others here who can get more info from the full dumps, but they are not around that often. For me, I pull the same info from a full dump that I do from a minidump, except a full dump takes hours to download (1gb per hour) and are often corrupted either from the crash, or a bad upload.

This information can be used by others to help you. I can't help you with this. Someone else will post with more information. Please wait for additional answers. Good luck.

NotWD · Feb 9, 2020

The common thread with these dumps is an attempted null write by the kernel-mode GPU driver. Obviously, figuring out the cause of these null writes would put us on the right track, as Windows rightfully doesn't like anything writing there. The kernel considers that to be trashing system-reserved memory.

The GPU itself is still functional enough to accept drivers, as Colif pointed out earlier in the thread. While it could be linked to this in some way, it's not likely to be the cause on its own.

The PSU is a question mark. It's an old, dated Apex unit so it's probably slowly dying anyway.

The RAM is fine, most likely. The analyzer gardenman used was able to properly suss out each DIMM's full details, and I can load it with contiguous data fine.

The motherboard would probably throw other stops or even make the machine a brick if it were at fault. These are all D1s that blame the same driver for the same illegal memory write.

I have no cause to think it's the 5820K either.

Colif · Feb 10, 2020

could run DDU and install the newest drivers from here - https://www.gigabyte.com/au/Graphics-Card/GV-N760OC-2GD-rev-20/support#support-dl-driver - they are old but they should still work.

If those throw the error, um... the driver thing about installing isn't the only way a card shows its going bad, so just cause this isn't refusing drivers, I wouldn't use that as evidence its okay. Running benchmarks is best way.

NotWD · Feb 10, 2020

I'll try those drivers on Wednesday. I have D&D every Tuesday so I don't want to try any major changes the day before.

NotWD · Feb 12, 2020

Alright so the GPU drivers hit the TDR retry limit (crashed 15+ times so I was left with a black screen, but the system didn't shut off) after recovering from yet another of the same blue screen. I'm not entirely convinced it's the drivers themselves at this point.

Colif · Feb 13, 2020

i thought maybe its ram but then ram errors are usually more random and don't keep blaming same driver.

Can you put GPU in another PC and see if you get same errors? That would imply its the card then.

If it is indeed the card, I would contemplate replacing it and PSU at same time as no point having a better card if the PSU is still generic.

NotWD · Feb 13, 2020

Colif said:
i thought maybe its ram but then ram errors are usually more random and don't keep blaming same driver.

Can you put GPU in another PC and see if you get same errors? That would imply its the card then.

If it is indeed the card, I would contemplate replacing it and PSU at same time as no point having a better card if the PSU is still generic.

This... is the hard part. I don't have a spare PC just lying around and my friends are all hours away so using one of theirs to test isn't practical unless I can fit another reason in to go out that far (gas is expensive).

Given that UE4's crashes have all pointed towards it being missing in some way, and it usually happens when the card sees significant use (I.e.: when it's drawing power) it's possible it's the PSU but I can't test that either.

For the record, I do have it on a 1300VA APC UPS, but I intentionally didn't use the USB shutdown feature so it's basically a dumb unit.

The sequence so far is Open 3D app -> a few frames get rendered -> GPU vanishes -> driver tries to write incomplete data and shader code to a missing GPU, can't, and crashes -> either program crash or BSoD.

E: I got curious so I decided to read over the more in-depth dump data that GM's been posting, and noticed something interesting. The drivers are crashing during the same page fault operation at the same stack offset every time. Unsure of how meaningful this is, but it's certainly interesting.

Colif · Feb 14, 2020

all errors mention nvidia. It seems a really big clue.
I don't want to ignore clues.
IRQ errors are rare for nvidia driver based errors. I have said that. It could well be the ram on the GPU all we know.

I can ask @axe0axe0 to have a look as he is better at BSOD than I am, I don't know if there is any significance to it always showing same memory location. I am not sure if he is around at moment. So he may not reply..

NotWD said:
This... is the hard part. I don't have a spare PC just lying around and my friends are all hours away so using one of theirs to test isn't practical unless I can fit another reason in to go out that far (gas is expensive).

Is there a repair shop nearby which would have a GPU they could throw in to see if it has same errors? or another PC they can put your GPU in to test same in another PC.

Without throwing money at new parts randomly, this is the best idea I can think of.

[SOLVED] STOP 0xD1 when playing full-screen games

Win 11 Master

Win 11 Master

Splendid

Win 11 Master

Win 11 Master

Win 11 Master

Win 11 Master

Splendid

Win 11 Master

Win 11 Master

Win 11 Master

Share this page