Sep 27, 2020
4
0
20
Hello everybody,

So I’ve got a problem that persists for two years now, but as it didn’t get worse I could live with it. Now I’ve got time so I’m trying to figure out what the hell is going on with my PC.

The symptoms:
Rare BSOD errors, very unpredictable without any consistency. What I mean by that is it doesn’t matter if I play some game on ultra graphics (Project Cars, GTA 5, Watch Dogs) sometimes it crashes in the first 5 seconds or after 10 minutes or I play 2 hours without an issue. Sometimes I play 4 hours and when I’m back at the desktop it crashes immediately. But it is not connected to only gaming (although it is more frequent then), I might just browse in Chrome, or I’m working in the house and only Spotify is playing… There are times when it happens every other day, a couple of times a day and then nothing for a month.

Logs, memory dump, other debugging results:
Just to be clear before I did a windows clean install I’ve got all sorts of error messages when BSOD occurred, after that I get a lot fewer kinds. Running WhoCrashed on the dumps:

This was probably caused by the following module: ntoskrnl.exe (nt+0x3F3EA0)
Bugcheck code: 0xD1 (0xFFFFF8027AD07CE7, 0xFF, 0xC4, 0xFFFFF8027AD07CE7)
Error: DRIVER_IRQL_NOT_LESS_OR_EQUAL

This was probably caused by the following module: ntkrnlmp.exe (nt!setjmpex+0x8279)
Bugcheck code: 0xD1 (0xFFFFF8027AD07CE7, 0xFF, 0xC4, 0xFFFFF8027AD07CE7)
Error: DRIVER_IRQL_NOT_LESS_OR_EQUAL

This was probably caused by the following module: ntoskrnl.exe (nt+0x3F3EA0)
Bugcheck code: 0xD1 (0xFFFFF80280078DD7, 0xFF, 0xE4, 0xFFFFF80280078DD7)
Error: DRIVER_IRQL_NOT_LESS_OR_EQUAL

This was probably caused by the following module: ntoskrnl.exe (nt+0x3F3EA0)
Bugcheck code: 0x7F (0x8, 0xFFFF8600321DCE50, 0x321C7D1F, 0xFFFFF8021151BF4C)
Error: UNEXPECTED_KERNEL_MODE_TRAP

This was probably caused by the following module: ntoskrnl.exe (nt+0x3F3EA0)
Bugcheck code: 0x9C (0x80000001, 0xFFFFBF8058FF8B10, 0x0, 0x0)
Error: MACHINE_CHECK_EXCEPTION

There were a couple of WHEA_UNCORRECTABLE_ERROR as well, but haven’t saved the dumps from them.

The whole dumps uploaded to onedrive:
https://1drv.ms/u/s!Am86NGE33yonbkDJn43uB5d3x4Q?e=1FscG0

PC specs:
Motherboard: ASRock Z170 Extreme4
CPU: Intel Core i7 6700K
RAM: Corsair Vengeance 16GB DDR4 (2x8GB)
Graphics card: Gigabyte GTX 970 4GB
Power: FSP 650W Raider 80
SSD: Kingston 120GB SATA3
HDD: 1TB WD 3.5" Caviar Blue SATA3 (WD10EZEX)

More details in CPU-Z report: https://1drv.ms/u/s!Am86NGE33yoncyXFJGLOYYM1MMQ

Temperature and voltage data:
I’ve logged a couple of values with SpeedFan to see if there is anything suspicious before a crash, you can find them here:
https://docs.google.com/spreadsheet...YkEXh4mdbyN2_nn-H4XLQR9wQ/edit#gid=1426000315

I’ve started the logs right before I started playing F1 2018, and some 20 minutes later I’ve got a BSOD, I don’t see any anomalies.

Attempts to fix it:
  • First off, I’ve updated every driver, nothing changed
  • Moved my graphics card to different PCI slot and also moved around with the RAM sticks (tried every permutation, with one RAM, two RAMs in every slot)
  • Ran chkdisk, sfc /scannow all good
  • Windows was up-to-date
  • Updated the BIOS
After that I’ve realized that I need a to go deeper, so I’ve continued with these steps:
  • Made a completely clean install of Windows
  • Ran MemTest86, 4 passes, all tests (free version), each stick separately, no errors found
  • Removed the CMOS battery for 3 hours to reset BIOS
  • Checked both SSD and HDD with Crystal Disk Mark and both are in good condition
  • Ran Mark3D benchmarking, with no issues, no BSOD
  • Ran Prime95 torture test for 4 hours, with no problems
Other things might worth mentioning:
  • I’ve never overclocked anything
  • The PC is protected by Nod32 Internet Security from day 0
  • Switching parts wasn’t an option as I didn’t want to mess around with that during COVID

Soo… yes, this is it in a nutshell, I’m pretty much out of ideas so if anyone has any idea I would be grateful for some help here. Thank you.
 
Solution
Do you have a multimeter and know how to use it?

Or have a family member or friend who does?

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

Not a full test because the PSU is not underload. However any out of spec voltages would be a fair confirmation of PSU problems.

Thanks, I'm no stranger to a multimeter but I don't own one, I could get one from my old uni, but not during COVID. I'll keep it in mind.

In the meantime, I've turned off every CPU C state support (C1E, C3, C6, C7, Package C state support, CFG Lock) in the BIOS, so far so good, but can't be sure yet because the errors were random, so I'll keep trying to provoke it and we'll see.
Is that 120 SSD the C:\ (boot drive)? 120 GB is about 1/2 of what it should be.

How full is the 120 GB? You mentioned reinstalling Windows but again, 120 GB is not enough.

In any case the SSD may not be the direct culprit.

Numerous and varying errors are, to me, a sign of a faltering PSU.

Two years may be okay but if the PSU has been used for heavy gaming, video editing, or even bit-mining then the PSU may be nearing its designed EOL (End of Life).

Look in Reliability History and Event Viewer for any other error codes, warnings, and even informational events that are being captured. Especially any that precede or correspond to the BSODs and crashes.

Overall, my first suspect would be the PSU.
 
Is that 120 SSD the C:\ (boot drive)? 120 GB is about 1/2 of what it should be.

How full is the 120 GB? You mentioned reinstalling Windows but again, 120 GB is not enough.

In any case the SSD may not be the direct culprit.

Numerous and varying errors are, to me, a sign of a faltering PSU.

Two years may be okay but if the PSU has been used for heavy gaming, video editing, or even bit-mining then the PSU may be nearing its designed EOL (End of Life).

Look in Reliability History and Event Viewer for any other error codes, warnings, and even informational events that are being captured. Especially any that precede or correspond to the BSODs and crashes.

Overall, my first suspect would be the PSU.

Thanks for the ideas.
The SSD part: I've 62GB free space on the SSD, it is solely for windows, every game, every SW goes to the HDD. (sure, if I would build now I'd go for at least 256 if not 512). Is seems odd that the space would not be sufficient for Windows, or should I upgrade?

The PSU is a guess of mine as well, the thing that is bugging me the most is that it seems to be in a constant state for two years now so it is not getting worse, I would think that it will get more frequent or more obvious over time. Do you have any suggestions on how to test the PSU without ordering a replacement?
(note: the build itself is 4 years old, but the issues started happening after 2 years)

I've checked Reliability History and Event Viewer the errors point to the same direction as WhoCrased, didn't find any new information.
 
Do you have a multimeter and know how to use it?

Or have a family member or friend who does?

https://www.lifewire.com/how-to-manually-test-a-power-supply-with-a-multimeter-2626158

Not a full test because the PSU is not underload. However any out of spec voltages would be a fair confirmation of PSU problems.

Thanks, I'm no stranger to a multimeter but I don't own one, I could get one from my old uni, but not during COVID. I'll keep it in mind.

In the meantime, I've turned off every CPU C state support (C1E, C3, C6, C7, Package C state support, CFG Lock) in the BIOS, so far so good, but can't be sure yet because the errors were random, so I'll keep trying to provoke it and we'll see.
 
Solution
Thanks, I'm no stranger to a multimeter but I don't own one, I could get one from my old uni, but not during COVID. I'll keep it in mind.

In the meantime, I've turned off every CPU C state support (C1E, C3, C6, C7, Package C state support, CFG Lock) in the BIOS, so far so good, but can't be sure yet because the errors were random, so I'll keep trying to provoke it and we'll see.

It has been almost 3 weeks since I turned off the CPU C state support and so far so good, no BSOD errors since then. I'll mark this as solved for the time being and we'll see the rest.

Cheers.