[SOLVED] STOP 0xD1 when playing full-screen games

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Feb 6, 2020
99
10
45
Okay, so, recently I've started experiencing NVIDIA kernel-mode display driver crashes and blue screens (STOP code in thread title, of course) when placing significant load upon my six-year-old GTX 760. There is a small wrinkle here, though:

  • Gaming in full screen will absolutely lock up the whole rig and produce a D1 STOP error if I don't hold the power button down. These STOPs all blame the NVIDIA kernel mode graphics driver.
  • When NOT in full screen and applying significant stress to the GPU (for example, when using the Unreal Editor), the GPU driver seems to just crash silently and recover. UEd is good at handling this situation so I don't get a screen flicker, it just locks up, crashes and leaves the OS intact.
In the split second before a full-screen game would crash Windows and generate a D1, though, I get artifacting. Usually snowflake artifacts, though some games like FFXIV will let me play for a moment with alpha-to-coverage (black square) artifacts before BugChecking into a D1 STOP. Since I'm running in maximum power mode in an oven of an apartment, GPU temperature usually climbs to around 72 C before crashing while CPU temp usually hovers around 64 C.

Things I have tested to eliminate components and drivers:

  • Loading the RAM with contiguous data to eliminate memory errors. I have run a particularly heavy FL Studio project (full of instances of EastWest PLAY 6 with Hollywood Orchestra parts loaded, for an idea of just how much data is being loaded into RAM). No crash.
  • Running a heavy DSP chain with Ozone 8 on the master in that same project, which has FL's CPU meter hovering around 70 in the same project. No crash.
  • PLAY 6 will stream its audio data from disk when RAM is unavailable. Since I have 16 GB, it has to do this regularly. This does not cause any issues.
  • Checked the motherboard for busted caps. There are none.
  • Reinstalled the NVIDIA drivers using the Display Driver Uninstaller in safe mode. They still crash when the GPU is under stress.
This leaves two components: the old, generic PSU (I primarily bought this machine as an upgrade platform) or the 760. Included in the ZIP are the two most recent minidumps, each a week old: Minidumps.zip.

I've looked at these in windbg myself a couple of times, and one thing in particular sticks out to me, though I'm unsure of its significance: It seems like the driver is trying to write to null at an IRQL of DISPATCH_LEVEL. I'm not sure if this means null in VRAM or system memory, or even how significant it is in relation to the STOP errors. Even though I have my suspicions about the cause, a second opinion here would help out a lot.

SPECS:

  • ASRock X99 Extreme3
  • Intel Core i7-5820K [stock clocks]
  • Corsair H75 dual-fan liquid cooler
  • 16 GB DDR4-2133 [Micron]
  • 1 TB WD Blue main drive
  • 6 TB WD MyBook external drive
  • Gigabyte WindForce GTX760OC [2 fans]
  • Audient iD4 USB audio interface
  • Generic 550W PSU [replacement planned during tax season]
  • Windows 10 build 1909
  • NVIDIA GeForce Drivers, version 441.66
Thanks for any help,
NotWD
 
Last edited:
Solution
Hi, I ran the dump files through the debugger and got the following information: https://unprofessedcase.htmlpasta.com/

File information:013120-38875-01.dmp (Jan 31 2020 - 15:04:08)
Bugcheck:DRIVER_IRQL_NOT_LESS_OR_EQUAL (D1)
Driver warnings:*** WARNING: Unable to verify timestamp for nvlddmkm.sys
Probably caused by:memory_corruption (Process: ShellExperienceHost.exe)
Uptime:0 Day(s), 0 Hour(s), 23 Min(s), and 49 Sec(s)

File information:013020-37828-01.dmp (Jan 30 2020 - 06:40:21)
Bugcheck:DRIVER_IRQL_NOT_LESS_OR_EQUAL (D1)
...
Feb 6, 2020
99
10
45
Is there a repair shop nearby which would have a GPU they could throw in to see if it has same errors? or another PC they can put your GPU in to test same in another PC.

Without throwing money at new parts randomly, this is the best idea I can think of.

I'm in Toronto so probably. I'll get the inside dusted out and check nearby.

IRQ errors are rare for nvidia driver based errors. I have said that. It could well be the ram on the GPU all we know.

This... actually wouldn't shock me for a couple of reasons:

  1. Geometry corruption and artifacting in WoW when in "optimal power" mode. Consistent corruption of data resident in memory would suggest problems with that specific part of physical memory, no?
  2. The "Shader Header ##" errors. I actually Googled this specific error and a good number of results pointed towards faulty VRAM. Naturally, since VRAM operations are far more commonplace when gaming, this produces a problem, which leads into...
  3. ... When the driver tries to map this memory as a page, it runs into the corruption, which makes it dump the data it was going to write in a mapped page of system memory. Windows no likey, since it's trashing memory it shouldn't be. Crash.
Now, this could naturally be completely wrong. Computers are a bunch of disparate, heavily specialised parts held together by operating system duct tape. But I'm not willing to discount the possibility.
 
Last edited:
The sequence so far is Open 3D app -> a few frames get rendered -> GPU vanishes -> driver tries to write incomplete data and shader code to a missing GPU, can't, and crashes -> either program crash or BSoD.

E: I got curious so I decided to read over the more in-depth dump data that GM's been posting, and noticed something interesting. The drivers are crashing during the same page fault operation at the same stack offset every time. Unsure of how meaningful this is, but it's certainly interesting.

I'm not sure if it's incomplete data, minidumps tend to lack data.

The display driver is crashing with the same offset over and over simply means every time it crashes it's doing the same thing.

Why don't you try a driver from this year.
 
Feb 6, 2020
99
10
45
Windows tried to trap me in update hell when I went to use DDU to install 442.19, so I'll wait until later tonight for that. I do have a full memory dump that I saved from one of the BSoDs that should have more information. I've taken the liberty of running 7zip on it with Ultra compression, reducing the download size from 1.95 GB to 261.6 MB, making it far more gardenman-friendly if he wants to take a look at it.

FullMemoryDump.7z

List of driver versions I've tried:

  • 384.76 (Gigabyte drivers)
  • 432.00
  • 441.66
All have produced the same result so far.
 

gardenman

Splendid
Moderator
Thanks for compressing it. Axe is the one you want looking at the memory.dmp. However I downloaded it and here are the basic results: https://overdresseddutch.htmlpasta.com/

A while back I was having game crashes (still do). I tried the following drivers: 388.13, 432.00, 390.65, 397.93, 398.18 and 436.02. I finally ended up on 391.05. Newer drivers screw up AVI files in Media Player (for me). Games still crash, but work 1 out of every 4 tries. It's probably my hardware, an older Alienware and GTX 770, but it's weird how some drivers work better than others. If seems if it was hardware, then all drivers would be bad and games would crash every time (same game). I've given up but my PC is 10 years old. I don't game that much and I want a new PC. I'm not saying to give up, I'm just telling what's happened to me with multiple driver versions. After I find one that works, it will work OK a month or two, then things get worse once again.

Wait for more replies.
 

Colif

Win 11 Master
Moderator
I have found Nvidia drivers to be less reliable on older cards in the last few months. I am not exactly sure why, but if I see a PC with an Nvidia card (which is most) and its not an RTX card, and they have latest drivers, I have to stop myself from suggesting ddu.

I would love to know why.

that last BSOD couldn't blame Nvidia anymore if it tried

Axe may see more than I can... he knows what to look for and where to find it, I just learned this all as I go.
 
Last edited:
Feb 6, 2020
99
10
45
If seems if it was hardware, then all drivers would be bad and games would crash every time (same game).
This has largely been my issue, which, together with the age of the card and observed geometry corruption in any (though I personally saw it in optimal mode) power state, is what pointed me towards the card possibly being on its last legs.

Initially I was getting checkerboard artifacts on the desktop back in December, but a combination of different drivers and Maximum Performance mode seems to have stopped that. Now all I get is ingame artifacting on optimal mode, hangs in gaming following less pronounced artifacting and BSoDs.

that last BSOD couldn't blame Nvidia anymore if it tried

This full dump was one I saved from a STOP on the 12th, but yes, they have all blamed the same driver for the same thing.

Interestingly, when I hIt the TDR retry limit last week, I hadn't thought to check C:\Windows\LiveKernelDumps. I did so today and it turned up a "STOP" 0x141 (VIDEO_ENGINE_TIMEOUT_DETECTED). Quotes because in Microsoft's own words, 141 isn't an actual STOP error.
 
Last edited:

Colif

Win 11 Master
Moderator
This full dump was one I saved from a STOP on the 12th, but yes, they have all blamed the same driver for the same thing.

what I meant was this.. most stack texts will mention the driver 1 time if you lucky, this one is screaming nvidia
ffff930774e287e8 fffff800393d32e9 : 000000000000000a 0000000000000000 0000000000000002 0000000000000001 : nt!KeBugCheckEx
ffff930774e287f0 fffff800393cf62b : ffff930700000020 ffff930774e28950 0000000000000000 fffff80052267b80 : nt!KiBugCheckDispatch+0x69
ffff930774e28930 fffff80052263f62 : ffffe30dd74e9000 ffffe30dd74102f0 ffffe30dd74e9000 ffffe30dd7e0d010 : nt!KiPageFault+0x46b
ffff930774e28ac0 fffff8005275eb9f : 000000000000000c ffffe30dd75541c0 00000000ffffffff 0000000000000000 : nvlddmkm+0x1e3f62
ffff930774e28af0 fffff8005275ed82 : ffffe30dd7e0d010 ffff930774e28c10 ffffe30dd74e9000 ffffe30dd75541c0 : nvlddmkm+0x6deb9f
ffff930774e28b20 fffff800526ca272 : 0000000000000000 ffff930774e28c10 ffffe30dd74e9000 0100000000100000 : nvlddmkm+0x6ded82
ffff930774e28b90 fffff800526c9f42 : ffffe30dd74ac400 000000000000000c ffffe30dd74e9000 0000000000000000 : nvlddmkm+0x64a272
ffff930774e28bd0 fffff80052251ca1 : ffffe30dd28158a0 fffff8005226a73f ffffe30dd74e9000 0000000000000000 : nvlddmkm+0x649f42
ffff930774e28db0 fffff8005275f288 : ffffe30d00000000 ffffe30dd7e0d010 ffffe30dd74e9000 ffffe30dd74e9001 : nvlddmkm+0x1d1ca1
ffff930774e28e00 fffff80052408294 : ffffe30dd7fe7000 ffffe30dd7e0d010 0000000000000020 ffff930774e28e30 : nvlddmkm+0x6df288
ffff930774e28e70 fffff80052408443 : ffffe30dd74e9000 0000000000000010 0000000000000000 fffff8005267ead8 : nvlddmkm+0x388294
ffff930774e28eb0 fffff8005247a4cf : ffff930700000070 0000000000000000 ffffe30dd7e6f010 ffff930774e28fb0 : nvlddmkm+0x388443
ffff930774e28f10 fffff8005247ab29 : ffffe30dd74e9000 ffffe30dd74e9000 ffffe30dd25bcf00 ffffe30dd7e6f010 : nvlddmkm+0x3fa4cf
ffff930774e28f80 fffff8005263cd07 : ffffe30dd74e9000 ffffe30dd74e9000 ffffe30dd25bcf40 ffffe30dd74e9000 : nvlddmkm+0x3fab29
ffff930774e29050 fffff8005276007f : ffffe30dd25bcd30 ffffe30dd25bcd30 ffffe30dd7e6c0c0 ffffe30dd25bcf40 : nvlddmkm+0x5bcd07
ffff930774e290b0 fffff80052761153 : ffffe30dd74e9000 ffff930774e29720 ffffe30dd25bcd30 fffff800522674f0 : nvlddmkm+0x6e007f
ffff930774e29100 fffff80052423e92 : ffffe30dd74e9000 ffffe30dd7e6c0c0 ffffe30dd74ea298 fffff800522507a0 : nvlddmkm+0x6e1153
ffff930774e29130 fffff80052268628 : ffffe30dd74ea298 0000000000400100 ffff930774e29220 0000000000100000 : nvlddmkm+0x3a3e92
ffff930774e29180 fffff8005226a6ff : ffffe30dd74e9000 ffff930774e29230 0000000000000000 0000000000000000 : nvlddmkm+0x1e8628
ffff930774e291e0 fffff8005226a609 : 0000000000000000 0000000000400100 0000000000000000 ffffe30d00400100 : nvlddmkm+0x1ea6ff
ffff930774e29250 fffff8005226a840 : 0000000000000051 ffff930774e293d0 ffffe30dd7f08000 0000000000000000 : nvlddmkm+0x1ea609
ffff930774e29290 fffff800523e63d8 : 0000000000000051 000000000000000c 0000000000000000 0000000000000000 : nvlddmkm+0x1ea840
ffff930774e292d0 fffff8005247a898 : 0000000000000000 ffffe30dd7e6f010 0000000000000000 ffffe30dd74e9000 : nvlddmkm+0x3663d8
ffff930774e29500 fffff80052479f12 : ffffe30dffffffff ffffe30dd74e9000 ffffe30d00000000 fffff8005227f14f : nvlddmkm+0x3fa898
ffff930774e29570 fffff8005247a11f : ffffe30de9e9a370 fffff80000000017 ffffe30dd74e9000 ffffe30dd7df7950 : nvlddmkm+0x3f9f12
ffff930774e295c0 fffff8005247adc6 : 0000000000000040 ffff930774e297d0 0000000000000040 ffffe30dd74e9000 : nvlddmkm+0x3fa11f
ffff930774e29640 fffff8005261bacc : 000098020e1713f0 000098020e1713f0 ffffe30dd74e9000 ffffe30dd7e6f010 : nvlddmkm+0x3fadc6
ffff930774e296a0 fffff80052241fca : ffffe30dd74e9000 ffffe30dd7e6f010 ffffe30dd7559030 0000000000000000 : nvlddmkm+0x59bacc
ffff930774e296d0 fffff8003926ae95 : ffffd10020b82f80 0000000000000001 ffffe30dd7ddf820 ffffd10020b80180 : nvlddmkm+0x1c1fca
ffff930774e298f0 fffff8003926a4ef : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiExecuteAllDpcs+0x305
ffff930774e29a30 fffff800393c5024 : ffffffff00000000 ffffd10020b80180 ffffd10020b91340 ffffe30de2dd10c0 : nt!KiRetireDpcList+0x1ef
ffff930774e29c60 0000000000000000 : ffff930774e2a000 ffff930774e24000 0000000000000000 0000000000000000 : nt!KiIdleLoop+0x84

I can't tell who is a pro and who just picked up as they went along, like me... I am still learning too :)
 
Feb 6, 2020
99
10
45
Jesus, yeah, Windows is pretty clear here about where the memory corruption is coming from. With the number of errors found in the dump's memory snapshot being over ten thousand it also lends credence to my theory that the driver is panicking and dumping unuploaded GPU data and shader code somewhere it's not supposed to be (read: most anywhere in main memory).

For the record, LZMA2 Ultra compression is hard on the CPU and always uses 6.6GB of main memory during the compression operation, so if those weren't ruled out before, they likely are now.
 

Colif

Win 11 Master
Moderator
Interestingly, when I hIt the TDR retry limit last week, I hadn't thought to check C:\Windows\LiveKernelDumps. I did so today and it turned up a "STOP" 0x141 (VIDEO_ENGINE_TIMEOUT_DETECTED). Quotes because in Microsoft's own words, 141 isn't an actual STOP error.


and yet it is - https://docs.microsoft.com/en-us/wi...g-check-0x141---video-engine-timeout-detected

If it was me, I would get a new GPU. I don't like suggesting it without solid proof and I could be wrong. So I apologise in advance, but you yourself are leaning towards it being hardware and a 770 isn't the newest card these days. It could be age.

Not accepting drivers is just 1 sign a card is dying, others are the images you were getting before you messed with settings. It might not be dead now but there are signs. I have killed cards before, trying to push them too hard. Very few of my cards died from old age. My GTX 980 may as I hardly use it.
 
Feb 6, 2020
99
10
45

This is a case where MSDN disagrees with WinDbg, at least for the livedumps.

If it was me, I would get a new GPU. I don't like suggesting it without solid proof and I could be wrong. So I apologise in advance, but you yourself are leaning towards it being hardware and a 770 isn't the newest card these days. It could be age.

Not accepting drivers is just 1 sign a card is dying, others are the images you were getting before you messed with settings. It might not be dead now but there are signs. I have killed cards before, trying to push them too hard. Very few of my cards died from old age. My GTX 980 may as I hardly use it.

The fact that I ran extreme-quality ENB presets as well as 2K+ texture mods on it (Skyrim is ass-ugly without visual mods lol) probably contributed, haha. At this point the machine is years out of date so I'll probably opt for a Ryzen build before I move out to Nova Scotia by year's end. The board, RAM and CPU are all fine according to my admittedly unscientific testing, and those 1TB WD Blues are essentially immortal, so I may salvage the drive from it.

If not, I'll buy a 2060 Super and new PSU and see if it works.
 
Feb 6, 2020
99
10
45
Alright, so, a bit of a related update.

First, something I forgot to mention that may confirm dying VRAM / GPU core: I was unable to get a picture at the time but during one of my test runs where I didn't immediately get an IRQ error, I had experienced full-screen image corruption. First with snowflakes on the game's current rendered frame, then it very quickly devolved into an absolute mess of pixels all squashed into the middle rows of my monitor with all other pixels being black. I use a USB interface / DAC, so this didn't affect my audio while on Discord. Then a moment later, the driver crashed and recovered. I think this was not long before YouTube triggered the IRQ error.

Second, I will be picking up some new hardware in a couple weeks (tax forms came pretty early!) including a 2060 Super so if a new GPU doesn't fix it then the problem lies either with RAM or Windows itself.