Question BSOD 124 - Only 1st Boot After Power Down (or Sleep)

Ange1ofD4rkness

Distinguished
Apr 25, 2013
7
0
18,520
Okay this BSOD error is driving me up the wall.

It started about a week or two ago, I booted my PC to get a BSOD, error code 124. Didn't think much of it. Then it happened again, but it was spread out. Was once again, really unphased.

However, it has now become consistent where anytime I turn on my PC after it's been turned off (or the one time after I slept it), the BSOD happens. It can happen during the booting (I think around the splash screen for windows), or, after I have logged in (takes about 30-60 secs).

No, I am have not updated the BIOS, and no I don't think it's the issue, because the problem started out of no where, and nothing had been updated. I have since tried to update my GPU drivers to see if that was the cause, to no luck.

Yesterday I tore the case down, gave it a good cleaning (it needed it), changing out the liquid cooling water, reseated my RAM and GPU as well. After the cleaning it booted fine, which I thought it was just dirty and getting warm. But then today, it did it again.

The only thing I can think of is the PSU is the cause. Because, I have been playing with this idea before (sadly the last time I did the machine refused to leave the window's splash screen so I had to restart ... compromising the test).

I can't figure it out, and I am stumped (evne worst I wasn't able to do a start up repair, saying the boot master couldn't be repaired ... despite the OS booting fine).

Specs
  • Intel i7-5930
  • MSI X99A
  • Corsair Vengeance LPX
  • Samsung 950 PRO (my boot drive ... which has been the nightmare to repair because of drivers and such)
  • Couple Toshiba HDD drives
  • EVGA Geforce GTX 980 Ti Classified
  • EVGA SuperNOVA P2 1200 W
  • LG WH14NS40 Blu-Ray drive
  • Win 7 Professional

I have minidumps for people who care to see them (just let me know how to send them). But the call stack is always the same
  • hal.dll
  • ntoskrnl.exe
  • pci.sys
 

Ange1ofD4rkness

Distinguished
Apr 25, 2013
7
0
18,520
Update on it.

I completely power down the machine. Turned off the PSU and the power strips it's connected to, even unplugged it from the wall (shoot even the COAX for the modem). The thing couldn't move any power.

It probably sat like that for over 12 hrs.

However, still blue screen, same error, same pattern. Just once. What's interesting is now it's mostly happening after I log in, about the time Steam starts (I think that's a coincidence partially, because, a few times my config setting for Steam got corrupted due to the BSOD, but not always). Although this could all be coincidence, but it's not happening at all at the splash screen (it rarely did in the first place to be honest).

Also, during this process, I was trying my best to watch the temps of my system using an infrared thermometer. During the boot sequence and at the time of the BSOD I have (as best I can remember)
  • M.2 Drive: 40-45C
  • MOBO and RAM: ~ 37C
  • Area around the CPU: ~35-40C
  • GPU Heat Sinks: ~37C
Nothing really spiked temperature wise (which to be honest, I didn't think was the issue, because, I had it running for almost 3 days straight playing games on and off, never once an issue).

At this point I am looking at either the GPU or the MOBO, because, out of all the items that were disconnected when I was cleaning, the GPU is really the only logical one (also it says "pci.sys" in the stack trace). Which was reseated and power cables reconnected (technically the RAM was reseated as well, but I have seem what bad RAM does, it doesn't usually act like this, in fact, sometimes I've seen bad RAM just shut off a system ... however, I am thinking of running a RAM diagnosis as well if time allows).

Next step is to move the GPU to a new PCI slot on the MOBO ... if it can fit, since it may collide with my liquid cooling reservoir. If this is unsuccessful, or, I can't get it to fit in another slot, I'll be planning to dig up an old GPU of mine sitting in an old case, an EVGA 560 Ti, and if needed, I can even grab my old BFG 285 GTX.

(This is also partially because the one time I booted the computer, I noticed the lighting on my GPU kind of dimmed or flickered for a sec ... I haven't payed attention to see if that happens each boot or only the first).
 

Ange1ofD4rkness

Distinguished
Apr 25, 2013
7
0
18,520
Okay another update.

Was poking around some more, and in the Event Viewer the following message is spammed in the "Systems" section, and is the last one before the system crashes (I am basing this on the gap in timestamps and how close it sits to a Crucial type about not successfully shutting down)



Event[16797]:
Log Name: System
Source: Microsoft-Windows-WHEA-Logger
Date: 2020-08-11T15:32:22.773
Event ID: 17
Task: N/A
Level: Warning
Opcode: Info
Keyword: N/A
User: S-1-5-19
User Name: NT AUTHORITY\LOCAL SERVICE
Computer: Ma3a
Description:
A corrected hardware error has occurred.

Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)

Bus:Device:Function: 0x0:0x1:0x0
Vendor ID:Device ID: 0x8086:0x2f02
Class Code: 0x30400

The details view of this entry contains further information.
 

Ange1ofD4rkness

Distinguished
Apr 25, 2013
7
0
18,520
So another follow up.

I had my PC running for about 5-6 days straight, no problem. Anyway, I ended up shutting it down Saturday and reseated my GPU and turned it on. No BSOD. I thought I had succeeded.

Well today I went and rebooted again, and waited for awhile on the login screen, no BSOD. I thought "okay we may be good". Go into Event Viewer, and see quickly a ton of "Microsoft-Windows-WHEA-Logger " Error ID 17. So I quickly shut it down, then turned it on, logged in. Saw them again, so I this time I Restarted, and went into the BIOS, and then exited (which reboots). This time went to Event Viewer, to see the warning only occur maybe 5 times then stop

It appears to be an issue with my PC "warming up" so to speak. I am going to try a few other scenarios such as using msconfig to disabled a lot of start up services to see if it contributes (the more I look at it, maybe it never BSOD on the splash screen). With focus on Steam (also going to try going into the BIOS right away then restarting off that)
 

Ange1ofD4rkness

Distinguished
Apr 25, 2013
7
0
18,520
Okay so I have been doing a lot of work on this. I started investigating the Warning reported earlier, and started to see MSI software showing up (I totally forgot I even installed). Well I uninstalled each hoping to fix the problem, but sadly, it doesn't.

Now I should point out that after uninstalling all, I let the computer run for about 5 mins, longer then usual before a BSOD, and nothing. I went ahead and reset anyway, because the warnings just didn't stop in the Event Viewer. The warnings show up about 3 times, then stop after a reboot (note, this reboot once trick now has a worked like 5 days in a row, and usually now I see the warning about 3 times before it stops).

I started digging more and more into this Warning (this one here)
A corrected hardware error has occurred.

Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)

Bus:Device:Function: 0x0:0x1:0x0
Vendor ID:Device ID: 0x8086:0x2f02
Class Code: 0x30400

The details view of this entry contains further information.

Finally I was able to figure out what the Vendor/Device are, it's the following "Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 "

Now if I read that right, that's the PCI slot my GPU is plugged into. The only thing is, I don't know what to do next. I mean, sure, the reboot trick works, but it's kind of annoying, and I also would rather not risk the work around stop working.