You appear to have a hardware issue related to a PCIe device. The log contains these errors (and many instances of them too)...
Code:
Log Name: System
Source: Microsoft-Windows-WER-SystemErrorReporting
Date: 29/11/2023 07:55:20
Event ID: 1001
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: Thinkpad_P16G2
Description:
The description for Event ID 1001 from source Microsoft-Windows-WER-SystemErrorReporting cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
If the event originated on another computer, the display information had to be saved with the event.
The following information was included with the event:
0x00000124 (0x0000000000000004, 0xffffc90f7dbed028, 0x0000000000000000, 0x0000000000000000)
C:\Windows\Minidump\112923-20000-01.dmp
f206a13d-7863-4d6a-b233-0bf958ddc349
The message resource is present but the message was not found in the message table
Log Name: System
Source: Microsoft-Windows-WHEA-Logger
Date: 29/11/2023 07:53:10
Event ID: 17
Task Category: None
Level: Warning
Keywords:
User: LOCAL SERVICE
Computer: Thinkpad_P16G2
Description:
A corrected hardware error has occurred.
Component: PCI Express Endpoint
Error Source: Advanced Error Reporting (PCI Express)
Primary Bus:Device:Function: 0x1:0x0:0x1
Secondary Bus:Device:Function: 0x0:0x0:0x0
Primary Device Name:PCI\VEN_10DE&DEV_22BC&SUBSYS_230617AA&REV_A1
Secondary Device Name:
Log Name: System
Source: Microsoft-Windows-WHEA-Logger
Date: 29/11/2023 07:55:21
Event ID: 1
Task Category: None
Level: Error
Keywords: WHEA Error Event Logs
User: LOCAL SERVICE
Computer: Thinkpad_P16G2
Description:
A fatal hardware error has occurred. A record describing the condition is contained in the data section of this event.
Note that the VEN_10DE&DEV_22BC in the middle message above relates to an Nvidia device, but I can establish to which device the DEV _22BC refers.
The one dump that was uploaded is a PAGE_FAULT_IN_NONPAGED_AREA, which means that a page that should have been fixed in RAM was not found. The trap frame in the dump shows the failing instruction...
Code:
TRAP_FRAME: ffffb588851a71e0 -- (.trap 0xffffb588851a71e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000a80
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8024806187a rsp=ffffb588851a7370 rbp=ffffba09302b6960
r8=0000000000000a80 r9=ffffb588851a7420 r10=fffff802483cc720
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
nvlddmkm+0x9187a:
fffff802`4806187a f3aa rep stos byte ptr [rdi]
Resetting default scope
At the bottom there you can see that the instruction that failed was part of the nvlddmkm.sys driver, this is the Nvidia graphics driver.
We thus have errors for a PCIe device and a dump for the Nvidia graphics card - a PCIe device. The problem area is thus most likely to be the Nvidia graphics card. The version of nvlddmkm.sys that you have installed is not current...
Code:
0: kd> lmDvm nvlddmkm
Browse full module list
start end module name
fffff802`47fd0000 fffff802`4b8e8000 nvlddmkm T (no symbols)
Loaded symbol image file: nvlddmkm.sys
Image path: nvlddmkm.sys
Image name: nvlddmkm.sys
Browse all global symbols functions data
Timestamp: Fri Oct 27 00:24:56 2023 (653AD928)
CheckSum: 03804D3A
ImageSize: 03918000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Information from resource tables:
The
Nvidia driver download site has an RTX 4070 driver dated 14th November 2023 (546.17). I suggest you install this driver and see whether the issues remain - be sure to do a Custom (Advanced) install and check the 'Perform a clean install' box.
Thanks...
Although those drivers didn't help, switching to the Studio Driver rather than Game Ready Driver did resolve the problem. The Studio Driver unfortunately behaves very strangly when playing games, the operating temperature drops down to 60C with power draw dropping down to 65W with frame rates in games dropping by over 60%.
This got me thinking about and looking at Card Settings and differences between the 4000 ADA and the 4080.
The ADA supports ECC memory, while the 4080 doesn't. As ECC was turned on, I tried turning that off in the NVIDIA Control Panel and then tried returning to the RTX 4070 driver (546.17). The problems with Windows Reboots (and the corresponding errors in the system log) stopped and the card performance jumped, running at 80C drawing 130W. However, now games are crashing every couple of hours with the following error.
Any tips on what the new problem may be?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Log Name: System
Source: nvlddmkm
Date: 2023-12-03 10:32:07 PM
Event ID: 0
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: Thinkpad_P16G2
Description:
The description for Event ID 0 from source nvlddmkm cannot be found.
Either the component that raises this event is not installed on your local computer or the installation is corrupted.
You can install or repair the component on the local computer.
If the event originated on another computer, the display information had to be saved with the event.
The following information was included with the event:
\Device\Video8
Error occurred on GPUID: 100
The message resource is present but the message was not found in the message table
Event Xml:
<Event xmlns="
http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="nvlddmkm" />
<EventID Qualifiers="0">0</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2023-12-04T03:32:07.8271945Z" />
<EventRecordID>25448</EventRecordID>
<Correlation />
<Execution ProcessID="4" ThreadID="17348" />
<Channel>System</Channel>
<Computer>Thinkpad_P16G2</Computer>
<Security />
</System>
<EventData>
<Data>\Device\Video8</Data>
<Data>Error occurred on GPUID: 100</Data>
<Binary>00000000020030000000000000000000000000000000000000000000000000000000000000000000</Binary>
</EventData>
</Event>