Question PCIe Bus errors - how to check and correct?

S13ontap

Reputable
Jul 3, 2020
25
0
4,530
Hey guys, Hoping someone can help give me direction here. I have a new build with some stuff moved over from my old build. I've been having random crashes (freezes with only power button held to reboot. no ctrl+alt+del, no reset button. fans running locked up.)

I've been running some testing, and thought it was the GPU causing errors. but I dunno. I'm NOT the guy. I'm kinda the guy, in that ive built and managed my own systems over the years, but I always seem to have random issues I cant resolve. So here's hoping you folk can help out.

hwinfo is reading WHEA from the pci/pcie bus errors. I checked the event log and these are an event id 17 as described below:

A corrected hardware error has occurred.

Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)

Primary Bus: Device:Function: 0x0:0x1:0x0
Secondary Bus: Device:Function: 0x0:0x0:0x0
Primary Device Name: PCI\VEN_8086&DEV_A70D&SUBSYS_88821043&REV_01
Secondary Device Name:

---
Could this have something to do with my GPU being a 4th gen(i believe) and connected to the 5th gen socket? its jumpered with a vertical GPU stand that I'm not sure if it is 3rd gen or 4th... is this an obvious red flag here? Errors seem to rack up when running a GPU benchmark and not as much as when running CPU benches
--

Microsoft Windows 11 Pro Version 10.0.22631 Build 22631
Processor - Intel Core i9-14900KF
BIOS Version/Date American Megatrends Inc. 1501, 2023-10-05 - Memory express was supposed to update the BIOS - and they did, just not to the most recent. It appears that 18xx is available now.
MB - ROG STRIX Z790-E GAMING WIFI
Gskill Trident Z5 RGB 7200 2x24gb
GPU Strix 4090
ROG Strix 1000w modular with 16pin for GPU
Fractal prisma s+ or something 360 AIO
samsung 990
adata swordfish m.2
3 older ssds

Now the 4090 was in my old system, and ran fine. no hangups, probably just under utilized - 10700k build on a MSI MPG z490 board. ddr4 etc.
So, the PSU is new, downgraded from a 1200 to a 1000 for the 16 pin. cooling and case fans are the same. Cooler is LGA1700 compatible and deemed to mount fine. Most drives are the same except the 990, thats new to replace an old XPG spectrix so i could use the board heatsink.

Any thoughts here guys? I'm going to try and remount my GPU to the 4.0 port below the 5.0 that im jumped into. skip the vertical mount and see if that does anything.

Thoughts? Help me obi wan kenobis.... youre my only hope without dropping this off somewhere it continuing to poke at it with a stick....
 
Anyone? this got buried quick... guess everyone is stuck with their new Christmas presents. IF anyone can help shed some light on this... My days are so depressing right now not being able to work the kinks out of this system. This was supposed to be GOD MODE. But story of my life. It's Icarus

I'm seeing board codes that say cpu mismatch and possible cache error. Is that chipset drivers? everytginh is up to date except that BIOS. I tried to reslot the GPU and read the manual a bit.

PCIE5 slot downgrades from 16 to 8 with a SSD there. I moved the GPU to slot 2 so its on a PCIE4 anyway.

With the code i see when I try gaming on the board, and referancing the hwinfo log to the device, it says its PCIe root port #5. But no issues with drivers. I feel like i should move the SSD's around, reflash the BIOS and reinstall windows with fresh drivers. What do you gurus think?
 
Last edited:
I have the exact same problem. I am pretty sure it is related to the psu, I already reset my pc, reinstalled drivers multiple times, returned my windows to before I had the problem. My pc was working fine for the first 2 months and then a couple of weeks ago it crashes every time I try to play a game or benchmark, a couple minutes after I start it. I checked on event tracker and I see this error and kernel 41 Tell me if you got it fixed.
 
Last edited:
For the whea errors i sourced it to a GPU riser. removed the riser and the hardware errors went away.
I had some other errors that were related to the i9 that I found a workaround for with an undervolting as well.

I did troubleshoot some stuff with microsoft and -big surprise - they told me to reinstall windows. didn't help.

What was your issue? were you getting loads of whea errors in hwinfo? or just seeing errors in event viewer?
I'm not really one of these guys insomuch as being a computer tech, but If I can help with some of the stuff I got sorted, yeah, I'm stable and running fine now. the off lockup related to sleep and standby issues, but otherwise good.

It ended up being a combination of things.... Whats your system specs?