Question PC Randomly Shuts Off when using MS Edge Browser (Critical Error: Kernel-Power)

Grovest

Distinguished
Jun 22, 2009
43
0
18,540
About once an hour (random has been only a few minutes sometimes and other times it has been half a day) system reboots for no reason. The only other program this has happened with has been Photoshop. No BSOD, system just reboots then seems to be working fine after it restarts. The sequence of errors in the system log are:


Log Name: System
Source: Microsoft-Windows-Kernel-PnP
Date: 2023-11-29 12:55:08 AM
Event ID: 219
Task Category: (212)
Level: Warning
Keywords:
User: SYSTEM
Computer: Thinkpad_P16G2
Description:
The driver \Driver\WUDFRd failed to load for the device ROOT\WINDOWSHELLOFACESOFTWAREDRIVER\0000.


Log Name: System
Source: Microsoft-Windows-Kernel-Power
Date: 2023-11-29 12:55:08 AM
Event ID: 41
Task Category: (63)
Level: Critical
Keywords: (70368744177664),(2)
User: SYSTEM
Computer: Thinkpad_P16G2
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

Log Name: System
Source: volmgr
Date: 2023-11-29 12:55:07 AM
Event ID: 162
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: Thinkpad_P16G2
Description:
Dump file generation succeded.


I am at a complete loss as to the cause. Run diagnostics and nothing shows up as wrong. Any help or tips as to where to look would be greatly appreciated. I have attached the event log for the time around the error in case it has any clues.

SystemLog

Thanks.
 
Turned off hardware acceleration in Microsoft Edge and switched power profile to performance. Should know in a few hours if it solved the problem. As Edge was open in the background when Photoshop crashed, it may have been Edge which triggered the crash. Other activities like gaming have never triggered a crash.

Are there known problems with Edge hardware acceleration?

Machine Type: Lenovo Thinkpad P16 Gen 2 Laptop
CPU: Intel Core i7-13850HX
Video Card: NVIDIA RTX 4000 Ada
 
Before turning hardware acceleration off, edge://gpu was reporting:

Graphics Feature Status
=======================
* Canvas: Hardware accelerated
* Canvas out-of-process rasterization: Enabled
* Direct Rendering Display Compositor: Disabled
* Compositing: Hardware accelerated
* Multiple Raster Threads: Enabled
* OpenGL: Enabled
* Rasterization: Hardware accelerated
* Raw Draw: Disabled
* Skia Graphite: Disabled
* Video Decode: Hardware accelerated
* Video Encode: Hardware accelerated
* Vulkan: Disabled
* WebGL: Hardware accelerated
* WebGL2: Hardware accelerated
* WebGPU: Hardware accelerated
 
You appear to have a hardware issue related to a PCIe device. The log contains these errors (and many instances of them too)...
Code:
Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Date:          29/11/2023 07:55:20
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      Thinkpad_P16G2
Description:
The description for Event ID 1001 from source Microsoft-Windows-WER-SystemErrorReporting cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

0x00000124 (0x0000000000000004, 0xffffc90f7dbed028, 0x0000000000000000, 0x0000000000000000)
C:\Windows\Minidump\112923-20000-01.dmp
f206a13d-7863-4d6a-b233-0bf958ddc349

The message resource is present but the message was not found in the message table


Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          29/11/2023 07:53:10
Event ID:      17
Task Category: None
Level:         Warning
Keywords:      
User:          LOCAL SERVICE
Computer:      Thinkpad_P16G2
Description:
A corrected hardware error has occurred.

Component: PCI Express Endpoint
Error Source: Advanced Error Reporting (PCI Express)

Primary Bus:Device:Function: 0x1:0x0:0x1
Secondary Bus:Device:Function: 0x0:0x0:0x0
Primary Device Name:PCI\VEN_10DE&DEV_22BC&SUBSYS_230617AA&REV_A1
Secondary Device Name:


Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          29/11/2023 07:55:21
Event ID:      1
Task Category: None
Level:         Error
Keywords:      WHEA Error Event Logs
User:          LOCAL SERVICE
Computer:      Thinkpad_P16G2
Description:
A fatal hardware error has occurred. A record describing the condition is contained in the data section of this event.
Note that the VEN_10DE&DEV_22BC in the middle message above relates to an Nvidia device, but I can establish to which device the DEV _22BC refers.

The one dump that was uploaded is a PAGE_FAULT_IN_NONPAGED_AREA, which means that a page that should have been fixed in RAM was not found. The trap frame in the dump shows the failing instruction...
Code:
TRAP_FRAME:  ffffb588851a71e0 -- (.trap 0xffffb588851a71e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000a80
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8024806187a rsp=ffffb588851a7370 rbp=ffffba09302b6960
 r8=0000000000000a80  r9=ffffb588851a7420 r10=fffff802483cc720
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
nvlddmkm+0x9187a:
fffff802`4806187a f3aa            rep stos byte ptr [rdi]
Resetting default scope
At the bottom there you can see that the instruction that failed was part of the nvlddmkm.sys driver, this is the Nvidia graphics driver.

We thus have errors for a PCIe device and a dump for the Nvidia graphics card - a PCIe device. The problem area is thus most likely to be the Nvidia graphics card. The version of nvlddmkm.sys that you have installed is not current...
Code:
0: kd> lmDvm nvlddmkm
Browse full module list
start             end                 module name
fffff802`47fd0000 fffff802`4b8e8000   nvlddmkm T (no symbols)           
    Loaded symbol image file: nvlddmkm.sys
    Image path: nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Fri Oct 27 00:24:56 2023 (653AD928)
    CheckSum:         03804D3A
    ImageSize:        03918000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:
The Nvidia driver download site has an RTX 4070 driver dated 14th November 2023 (546.17). I suggest you install this driver and see whether the issues remain - be sure to do a Custom (Advanced) install and check the 'Perform a clean install' box.
 
You appear to have a hardware issue related to a PCIe device. The log contains these errors (and many instances of them too)...
Code:
Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Date:          29/11/2023 07:55:20
Event ID:      1001
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      Thinkpad_P16G2
Description:
The description for Event ID 1001 from source Microsoft-Windows-WER-SystemErrorReporting cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

0x00000124 (0x0000000000000004, 0xffffc90f7dbed028, 0x0000000000000000, 0x0000000000000000)
C:\Windows\Minidump\112923-20000-01.dmp
f206a13d-7863-4d6a-b233-0bf958ddc349

The message resource is present but the message was not found in the message table


Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          29/11/2023 07:53:10
Event ID:      17
Task Category: None
Level:         Warning
Keywords:     
User:          LOCAL SERVICE
Computer:      Thinkpad_P16G2
Description:
A corrected hardware error has occurred.

Component: PCI Express Endpoint
Error Source: Advanced Error Reporting (PCI Express)

Primary Bus:Device:Function: 0x1:0x0:0x1
Secondary Bus:Device:Function: 0x0:0x0:0x0
Primary Device Name:PCI\VEN_10DE&DEV_22BC&SUBSYS_230617AA&REV_A1
Secondary Device Name:


Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          29/11/2023 07:55:21
Event ID:      1
Task Category: None
Level:         Error
Keywords:      WHEA Error Event Logs
User:          LOCAL SERVICE
Computer:      Thinkpad_P16G2
Description:
A fatal hardware error has occurred. A record describing the condition is contained in the data section of this event.
Note that the VEN_10DE&DEV_22BC in the middle message above relates to an Nvidia device, but I can establish to which device the DEV _22BC refers.

The one dump that was uploaded is a PAGE_FAULT_IN_NONPAGED_AREA, which means that a page that should have been fixed in RAM was not found. The trap frame in the dump shows the failing instruction...
Code:
TRAP_FRAME:  ffffb588851a71e0 -- (.trap 0xffffb588851a71e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000a80
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8024806187a rsp=ffffb588851a7370 rbp=ffffba09302b6960
 r8=0000000000000a80  r9=ffffb588851a7420 r10=fffff802483cc720
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
nvlddmkm+0x9187a:
fffff802`4806187a f3aa            rep stos byte ptr [rdi]
Resetting default scope
At the bottom there you can see that the instruction that failed was part of the nvlddmkm.sys driver, this is the Nvidia graphics driver.

We thus have errors for a PCIe device and a dump for the Nvidia graphics card - a PCIe device. The problem area is thus most likely to be the Nvidia graphics card. The version of nvlddmkm.sys that you have installed is not current...
Code:
0: kd> lmDvm nvlddmkm
Browse full module list
start             end                 module name
fffff802`47fd0000 fffff802`4b8e8000   nvlddmkm T (no symbols)          
    Loaded symbol image file: nvlddmkm.sys
    Image path: nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Fri Oct 27 00:24:56 2023 (653AD928)
    CheckSum:         03804D3A
    ImageSize:        03918000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:
The Nvidia driver download site has an RTX 4070 driver dated 14th November 2023 (546.17). I suggest you install this driver and see whether the issues remain - be sure to do a Custom (Advanced) install and check the 'Perform a clean install' box.
Thanks...

Although those drivers didn't help, switching to the Studio Driver rather than Game Ready Driver did resolve the problem. The Studio Driver unfortunately behaves very strangly when playing games, the operating temperature drops down to 60C with power draw dropping down to 65W with frame rates in games dropping by over 60%.

This got me thinking about and looking at Card Settings and differences between the 4000 ADA and the 4080.

The ADA supports ECC memory, while the 4080 doesn't. As ECC was turned on, I tried turning that off in the NVIDIA Control Panel and then tried returning to the RTX 4070 driver (546.17). The problems with Windows Reboots (and the corresponding errors in the system log) stopped and the card performance jumped, running at 80C drawing 130W. However, now games are crashing every couple of hours with the following error.

Any tips on what the new problem may be?

++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Log Name: System
Source: nvlddmkm
Date: 2023-12-03 10:32:07 PM
Event ID: 0
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: Thinkpad_P16G2
Description:

The description for Event ID 0 from source nvlddmkm cannot be found.

Either the component that raises this event is not installed on your local computer or the installation is corrupted.

You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video8

Error occurred on GPUID: 100

The message resource is present but the message was not found in the message table

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="nvlddmkm" />
<EventID Qualifiers="0">0</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2023-12-04T03:32:07.8271945Z" />
<EventRecordID>25448</EventRecordID>
<Correlation />
<Execution ProcessID="4" ThreadID="17348" />
<Channel>System</Channel>
<Computer>Thinkpad_P16G2</Computer>
<Security />
</System>
<EventData>
<Data>\Device\Video8</Data>
<Data>Error occurred on GPUID: 100</Data>
<Binary>00000000020030000000000000000000000000000000000000000000000000000000000000000000</Binary>
</EventData>
</Event>
 
If you've been swapping Nvidia drivers and cards it might be wise to use DDU to remove all traces of earlier drivers (so you start with a clean slate) and then install the latest game ready driver for whichever card you plan to use. If you swap cards again for testing, be sure to use DDU to remove the driver for the other card.
 
If you've been swapping Nvidia drivers and cards it might be wise to use DDU to remove all traces of earlier drivers (so you start with a clean slate) and then install the latest game ready driver for whichever card you plan to use. If you swap cards again for testing, be sure to use DDU to remove the driver for the other card.

Good tip.

I hadn't thought of possible reminants of the Studio Driver being left behind when switching back to the Game Ready driver. I will give DDU a try. At least the new problem is a minor one - a game crashing to windows is more of annoyance as even when everything is working properly it happens at times.