Question HWMon What Are PCIe Errors Recovery Counter?

Heat_Fan89

Reputable
Jul 13, 2020
791
398
5,290
I'm seeing 11 at startup. I played MSFS 2024 and noticed around 25ish more after 15 minutes. I played MSFS 2020 and after 15 minutes I went from 11 to 130. It doesn't seem to increase if i'm idle. Nothing is OC'd, just stock settings.

System is extremely stable. No black screens, CTD, reboots, nada.


Gigabyte B650 Eagle AX Mobo
Ryzen 7 9800X3D
Teamgroup 6000Mhz CL 30-36-36-76 (4x16) 64GB
ASUS TUF OC RTX 5080 Nvidia 576.02 Driver
Corsair RMe 1000W PSU
 
It is what it sounds like. Are you running the most recent chipset drivers?

https://www.amd.com/en/support/downloads/drivers.html/chipsets/am5/b650.html

Is your motherboard BIOS current?

https://www.gigabyte.com/Motherboard/B650-EAGLE-AX/support#support-dl-bios
Yes to both.
BIOS F33
Chipset 7.01.08.129

I'm wondering if the 12pin to 12 pin cable supplied by Corsair PSU might be a factor? The ASUS TUF came with a 12pin to three 8 pin pigtail cable. I'm using the cable that came with the PSU which makes it convenient and is not as stiff as the cable supplied with the GPU.

Also, no overheating issues with the ASUS TUF. Idle temps around 29c. Max under load 49c. Memory idles at 40c and max is 56c under load.
 
The one that came with the PSU should be sufficient.

What other (than the GPU) PCIe devices do you have installed?
That's pretty much it.

I have a Thermalright Prism 360 AIO. I have a rear case fan, three front fans that came with the NZXT H7 Flow 2024 case. I just ordered three additional Thermalright fans I plan on installing on the bottom of the case to blow cool air up towards the GPU.

Oh, I forgot to mention since it wasn't brought up. The 9800X3D has an AGPU and I plugged the HDMI cable into its port. So my video HDMI output is not from the RTX 5080 itself but from the AGPU. That said the RTX 5080 is doing the work.
 
AER is "advanced error reporting". It is a feature of some PCI cards (PCIe is just the PHY, it is still PCI, just serialized). There are a number of errors a PCI card can see and either fail or correct for. I would not say this is something to ignore, it depends on the error. The AER itself is a pointer to a chain of errors; if NULL, then there is no error. There should normally not be any error. The causes which are possible are quite numerous. If you want an overly detailed description (but under Linux), see:
AER in the Kernel

If there is a signal quality issue, then normally the card would fall back to a slower standard. If something else is wrong, e.g., a firmware mismatch to a driver version, then you might see this. If there is some driver conflict, then this too could result in an AER. Maybe there is a bit flip. The list of possible causes is extreme and you really won't know if you don't look at the error list. A Google search on the topic to finding out what the errors are:
how to examine pcie advanced error in windows

It is also possible that unstable power is the cause. Or heat. Or a poorly seated card. If it were me I'd really want to find out which error is reported and on which slot. Maybe it isn't even the card you think it is since there are often embedded PCIe devices.
 
I do want to add, that you can largely ignore this count as long as the system is performing properly otherwise.
That is probably going to be the outcome. I still got those errors with the 5080 out of the motherboard. I even went into the BIOS and changed the IO port setting from iGPU "Auto" to "Forced" to "Disabled". Nothing seemed to make a difference.

It will be interesting to see if these errors go away with an all AMD setup once I get my mitts on the 9070XT Red Devil. 🤔
 
AER is "advanced error reporting". It is a feature of some PCI cards (PCIe is just the PHY, it is still PCI, just serialized). There are a number of errors a PCI card can see and either fail or correct for. I would not say this is something to ignore, it depends on the error. The AER itself is a pointer to a chain of errors; if NULL, then there is no error. There should normally not be any error. The causes which are possible are quite numerous. If you want an overly detailed description (but under Linux), see:
AER in the Kernel

If there is a signal quality issue, then normally the card would fall back to a slower standard. If something else is wrong, e.g., a firmware mismatch to a driver version, then you might see this. If there is some driver conflict, then this too could result in an AER. Maybe there is a bit flip. The list of possible causes is extreme and you really won't know if you don't look at the error list. A Google search on the topic to finding out what the errors are:
how to examine pcie advanced error in windows

It is also possible that unstable power is the cause. Or heat. Or a poorly seated card. If it were me I'd really want to find out which error is reported and on which slot. Maybe it isn't even the card you think it is since there are often embedded PCIe devices.
Only one PCIe slot is being used. The Corsair RMe 1000W is a Tier A PSU and the recommended PSU rating is 850W per ASUS. Heat also isn't a factor as the card idles at around 30c and GPU memory is also low at 40c, with Florida ambient temps of 78F. The errors appear from a cold boot with the PC turned off overnight.

That said, the system is extremely stable.
 
That is probably going to be the outcome. I still got those errors with the 5080 out of the motherboard. I even went into the BIOS and changed the IO port setting from iGPU "Auto" to "Forced" to "Disabled". Nothing seemed to make a difference.

It will be interesting to see if these errors go away with an all AMD setup once I get my mitts on the 9070XT Red Devil. 🤔
Seems an odd "side-grade" to change from the 5080 to the 9070 XT, especially given the costs of GPUs these days.

I have a Rzyen 9950X and 7900 XTX and HWMONITOR doesn't indicate any similar errors. This may have nothing to do with your situation. Just a point of comparison.
 
Seems an odd "side-grade" to change from the 5080 to the 9070 XT, especially given the costs of GPUs these days.

I have a Rzyen 9950X and 7900 XTX and HWMONITOR doesn't indicate any similar errors. This may have nothing to do with your situation. Just a point of comparison.
I want to see how the Red Devil 9070XT compares to the 5080, especially in Microsoft Flight Sim 2024. I also got a nice deal for the Red Devil @ $849 from Amazon, whereas I paid MSRP $1484.99 and ASUS just upped the price by a little more than a $100, to $1599.99 so now Amazon is charging that price . If anything I can also return it either card.
 
Another experiment you could try (if not already done so), is to disable any XMP/EXPO profiles. Just run memory at default.
That, I had not consider as I set all 4 sticks to Expo 1 profile. All four are running @ 6000Mhz CL30-36-36-76 timings. I might give that a try but the alternative is running all four sticks at default 4800Mhz and the timing is nowhere as quick as CL30
 
If the errors persist without the dGPU installed then they are likely caused by another PCIe device. Can be anything from WiFi card to NVMe SSD. I had a similar issue on my Intel based G15 laptop and it was the SSD. A BIOS update (there have been several) seemed to fix it. If you peruse the HWiNFO forum, there's some good information there.

This thread in particular, it's the one that pointed me in a direction:

 
If the errors persist without the dGPU installed then they are likely caused by another PCIe device. Can be anything from WiFi card to NVMe SSD. I had a similar issue on my Intel based G15 laptop and it was the SSD. A BIOS update (there have been several) seemed to fix it. If you peruse the HWiNFO forum, there's some good information there.

This thread in particular, it's the one that pointed me in a direction:

I just realized that the Gigabyte Eagle AX is an AM5 PCIe 5.0 motherboard. The RTX 5080 is PCIe 5.0 and the NVME is a Fanxiang S770 PCIe Gen 4.0

I bet that's what is generating those errors and I put it on the same Bus line as the RTX 5080. Both are sharing the path to the CPU "resizeable BAR, which is enabled. I bet another Gen 4 SSD is not going to solve the problem. It looks like I may have to get a Gen 5 SSD.