Question BSOD when enabling Nvidia GPU, SDXGKRNL and Video Memory Errors

Jan 3, 2025
2
0
10
Alienware m17 R4, Laptop
MB: Custom Alienware design for laptops, Intel chipset.
CPU: Intel Core i7-10870H 8-core.
GPU: Nvidia Geforce rtc 3070 laptop
RAM: 32GB DDR4-2933MHz
PSU: 240W power adapter
Battery: 68Whr lithium-ion battery
Storage: 1TB PCIe NVMe SSD (dual drives in RAID)
OS: Windows 11 24H2 (I had the previous version at the initial round of BSOD loops)
Bios: 1.26.0 (latest version given my dell’s website)

I started getting BSOD after exiting a game (Dave the diver) on December 31st. I had automatic driver updates on but I can’t see if anything installed in that time range. I entered a BSOD restart loop, it would happen three times and I would get this error:
VIDEO_MEMORY_MANAGEMENT_INTERNAL
And I’m not sure if I got this too but I would get it in subsequent BSOD loops:
VIDEO_DXGKRNL_FATAL_ERROR.

Usually after enough loops, the nvidia GPU stops being detected, and for some reason the charging icon on my laptop would disappear (despite the battery going up in charge anyway). Restarting does not help redetect the gpu, but startup from shutdown especially after some short time passes does. No specific warning or conflict symbols appear in the device manager for any drivers.

Things I tried:
  • System file scan and DISM (everything fine).
  • Updated windows and turned off automatic updates for drivers. I also started disabling a bunch of background services incase they were in the way (e.g. Gaming input services, random dell support assist stuff, etc).
  • Updating various Intel drivers to latest versions (e.g. chipset, GPU, management engine, etc). Did this automatically using device manager and manually from drivers listed in dell’s website.
  • Reinstalled BIOS and set default settings.
  • Full scan for Malware using Windows Defender and Kaspersky (nothing detected) - long shot though, haven't been downloading anything fishy.
  • DDU to uninstall nvidia graphics drivers in safe mode, and installed the latest, second latest, and october versions of game ready drivers (and I also tried the latest studio version) from nvidia website, and I also tried those from dell’s website (the latest and second latest). For the nvidia drivers I also usually installed them in safe mode but tried after normal startup. - idk if this matters but note that the pc operates fine with the gpu detected in safe mode.
  • I switched my power settings to balanced from performance (this seemed to make it worse and I would get a different BSOD error: DRIVER_POWER_STATE_FAILURE and so I switched back to high performance). Also I had virtual memory disabled, so I enabled it (incase my Alienware command center was overclocking anything despite default BIOS settings).
  • Disconnected and reconnected the battery with holding power button for 20-30 seconds (which would usually help the PC to detect the nvidia gpu again on startup when simple shutdown and startup wouldn’t, so I could install the different drivers).
I’m at a point where I managed to get the charging symbol back, with new drivers (for nvidia and intel) installed, but I managed to take pics of some various stats from GPU-Z on both GPU’s before the BSOD would come. Links to the images can be found here:

View: https://imgur.com/a/gpu-z-stats-on-intel-nvidia-gpu-s-AI6nmu1


I can use my PC with the new nvidia gpu driver but only when it is disabled in device manager now, enabling it will cause a BSOD not too long after and start the loop again until it isn’t detected anymore.

When I had balanced power settings, the bug code would be 9f caused by driver ntoskrnl.exe. When I would have my battery on performance (which it is now and was originally) the bug code would be 10e and/or 113.

Here is a link with a zip of the minidump files from the original errors (not including the power-related ones, I also included some events from event viewer and a file from saved reliability history if that's of any use):

https://www.mediafire.com/folder/zcb8n6996g2al/PC_Error_Stuff

(I suggest only looking at 010225-9687-01.dmp and 010225-14796-01.dmp since those two are the different errors)

I don’t know what there is to do at this point, I’m considering rolling back to windows 11 23H2 and trying to DDU and install the nvidia drivers again there but I don’t know if that will help. I was also considering a cloud reset in case of any other corrupted files being the cause. I’m worried my GPU is fried but I don't think that’s the case given the above, and I am not savvy with hardware so I’m too scared to open it up and check myself. So I might be forced to give it in to a shop to have them see if I can’t figure it out myself. I heard there were some driver updates coming in around a week and a half, but I don't know if I'm willing to wait that long since I need the nvidia GPU for ML stuff. Is resetting the PC a worthwhile option? Can I do a cloud reset or is a local one preferable?

What do you guys think is going on? Corrupted system/driver files or is my hardware screwed? What do you think my next steps should be?
 
strange cpu speed: Current Speed 2079MHz

start by downloading Microsoft autoruns64.exe from here:
https://learn.microsoft.com/en-us/sysinternals/downloads/autoruns
run it as an admin then find this driver and disable it:
iocbios2 Fri Jun 16 15:05:53 2023
then reboot and see if you have the same problem without the overclock driver.

note: valid kernel address outside of a file made call to gpu related to gpu power functions. guess it is a service talking to the bios via iocbios2.sys. best to disable the strange clocking and see if you have the same problem. your cpu
Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz
was being underclocked, this might indicate overheating or the driver does not match the bios version.

could also be something like updating the nvida gpu drive without updating the nvida gpu sound driver. Happens when people are not using the gpu sound and forget to update the gpu sound driver with a gpu graphics driver update.
 
strange cpu speed: Current Speed 2079MHz

start by downloading Microsoft autoruns64.exe from here:
https://learn.microsoft.com/en-us/sysinternals/downloads/autoruns
run it as an admin then find this driver and disable it:
iocbios2 Fri Jun 16 15:05:53 2023
then reboot and see if you have the same problem without the overclock driver.

note: valid kernel address outside of a file made call to gpu related to gpu power functions. guess it is a service talking to the bios via iocbios2.sys. best to disable the strange clocking and see if you have the same problem. your cpu
Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz
was being underclocked, this might indicate overheating or the driver does not match the bios version.

could also be something like updating the nvida gpu drive without updating the nvida gpu sound driver. Happens when people are not using the gpu sound and forget to update the gpu sound driver with a gpu graphics driver update.

Sorry it took me a while, tough times unfortunately. Regarding your reply, I downloaded autoruns and ran it, I noticed there were 4 drivers highlighted in red (which I guess means are unverified correct?):
1) DDDriver DDDriver: (Not Verified) C:\WINDOWS\System32\drivers\dddriver64Dcsa.sys Mon Oct 26 09:26:22 2020
2) NvModuleTracker NvModuleTracker: Process and module monitoring driver (Not Verified) C:\WINDOWS\System32\drivers\NvModuleTracker.sys Thu Mar 5 06:54:38 2020
3) nvvad_WaveExtensible NVIDIA Virtual Audio Device (Wave Extensible) (WDM): (Not Verified) C:\WINDOWS\system32\drivers\nvvad64v.sys Sat Mar 7 04:03:32 2020
4) nvvhci NVVHCI Enumerator Service: (Not Verified) C:\WINDOWS\System32\drivers\nvvhci.sys Thu Mar 12 14:26:38 2020

However also, whenever I did DDU, I made sure that I did a clean install with all the drivers checked, I checked the device manager in case and I do see the driver is there: NVIDIA High Definition Audio, 1.4.0.1, 25/6/24 (though the nvidia GPU driver was dated 30/8/24, I've been using the latest nvidia gpu driver given by Dell for now).

Following your comments regarding the nvidia drivers and the unverified ones above, I tried disabling driver signature enforcement: I used this command in cmd: bcdedit.exe /set nointegritychecks on

Despite the crash not happening quite as sudden, I still got the same BSOD. I did manage to get more readings on the gpu by GPU-Z though, images here: View: https://imgur.com/a/ISuMvz4


When I went into autoruns, I couldn't find iocbios2.sys. Ended up using file explorer and found it here: "C:\Windows\System32\drivers\iocbios2.sys". Instead of just deleting or something, I just renamed it to iocbios22.sys (so it couldn't be found by whatever service) and rebooted. I still got the same crash.

A little while later, I tried again, but this time I named the file 'iocbios243ffd.sys' and I took new readings of everything:
View: https://imgur.com/a/RTrIocO


Pics 1-6 are just the overall readings from CPU-Z and GPU-Z (in pics 4-5 the bottom tab is for the nvidia gpu, the top one is for the intel one). Since everything was weirdly stable, I decided to open control panel and switch the auto-selected between the two GPU's to only select the nvidia gpu, and in the one where I specify which programs, I selected cmd, and the readings can be seen in pics 7-10 (and I also did nvidia-smi in the cmd for more info, weirdly there were some NA's). I waited for quite a while, then I closed all the applications, and when I closed GPU-Z, my whole screen went blank and then I got a BSOD with error: DRIVER_POWER_STATE_FAILURE.

The three minidumps can be seen here: https://www.mediafire.com/file/1djl5um3u27ay27/new_minidumps.zip/file

little side update: I tried resetting my PC so I am stuck on 24H2 now (unfortunately?). Since I've done this, I noticed there are now 3 drivers that are registered as unknown in device manager (idk if this is relevant, these might be the red drivers in autoruns?).
1) Unknown Driver 1
Device settings for ROOT\UNNAMED_DEVICE\0000 were migrated from previous OS installation.
Last Device Instance ID: ROOT\UNNAMED_DEVICE\0000
Class GUID: {4d36e96c-e325-11ce-bfc1-08002be10318}

2) Unknown Driver 2
Device settings for ROOT\UNNAMED_DEVICE\0001 were migrated from previous OS installation.
Last Device Instance ID: ROOT\UNNAMED_DEVICE\0001
Class GUID: {aa018edf-4915-415e-9c17-d7ebec8917d2}

3) Unknown Driver 3
Device settings for ROOT\UNNAMED_DEVICE\0002 were migrated from previous OS installation.
Last Device Instance ID: ROOT\UNNAMED_DEVICE\0002
Class GUID: {4d36e97d-e325-11ce-bfc1-08002be10318}


Given the above, does it seem like it is likely a hardware issue or do I still have hope?
 
looks like you updated the bio but not all of the drivers. you should go here: https://www.dell.com/support/home/en-ie/product-support/product/alienware-m17-r4-laptop/drivers
then select find drivers and do the updates.
many of the special intel drivers are from a old build and were not updated. for example:
iaLPSS2_GPIO2_CNL.sys Tue May 12 00:32:17 2020
iaLPSS2_I2C_CNL.sys Tue May 12 00:31:46 2020
iaStorAC.sys Fri Nov 25 02:32:36 2022
dptf_acpi.sys Fri Mar 12 15:40:08 2021
dptf_cpu.sys Fri Mar 12 15:40:10 2021

HID_PCI.sys Sun Jul 25 22:45:34 2021
not sure what this driver is for:
ISH_BusDriver.sys Sun Jul 25 22:45:31 2021

this sound driver is old:
RTKVHD64.sys Tue Aug 9 03:15:32 2022
SteamStreamingMicrophone.sys Fri Jul 28 08:33:15 2017
017 (597B593B)
SteamStreamingSpeakers.sys Thu Jul 20 17:56:15 2017

TbtBusDrv.sys Mon Jun 27 03:36:05 2022

the bugcheck I looked at was in the nvidia driver it took too long to respond to a power request. I would just update the other intel chipset drivers and retest.
note: when you look at the dell driver and you have to select expand all drivers and on the second page you see the extra intel drivers for intel hid, intel rapid storage driver and intel sensor drivers