May 18, 2024
2
0
10
Greetings! Firstly, as this is my first forum question, I apologize should something may be missing or out of the preferred formatting.

Since around the start of this month (May/2024) my system started to present itself with random hangs and BSOD and sometimes even fail to POST. Unfortunately I couldn't find a pattern or an apparent cause for the problems.
During the month I've been running an assortment of tests and benchmarks, different OSes and different components in an attempt to narrow down the culprit(s). Below are my specs and some tests done:

Tests​

Firstly I've thought it was a problem caused by Windows or a malfunctioning driver, thus I had done some system file checks with SFC and DSIM, checked driver updates to no avail.
While the system presented itself stable, PCMark 10 benchmarks ran successfully without major stress to the system, temperatures were fine. Random gaming with a diversity of games also did not show anything conclusively (GPU and CPU temperature did not raise abnormally nor showed abnormal usage).
To completely rule out software influence I've tested the system with a variety of OSes: A fresh Windows 10 installation media resulted in a BSOD (didn't even knew it was possible), an spare stick with HBCD also resulted in a BSOD. Some Linux live USB hanged during startup and infrequently the system would hang in the UEFI settings screen.
Ruling out software issues I proceeded to remove components and checking if the problem persists, however during these tests the system sometimes would refuse to POST. These tests would go until the minimal possible configuration: CPU+GPU+Single RAM Stick+Keyboard with a single monitor via HDMI, the system would still hang even then.
After this finding I started to attempt some easier tests to further narrow the cause: Up to 7 passes of memtest86 resulted in successful passes, testing with a spare known functioning stick also resulted in hangs;
Later, I've checked the PSU status however it seemed fine: 12v and 5v were stable at their respective voltages, while 3.3v was just below at 3.296v. Testing with a spare PSU also did not fix the problem.
As the system does not have integrated graphics I could not test it without the GPU. Unfortunately I could not get in hand some spare motherboards or CPUs to test them.
As an last attempt before finding any sort of external help, I had flashed the BIOS with the latest version available on the manufacturer's page, only to later BSOD on the subsequent boot.

Current hypothesis​

After all tests done I would wager the possible culprits are either the motherboard or CPU, though the GPU may also be causing some instability. Attempting to analyze information on the dump files that were successfully generated by Windows were unfortunately out of my expertise. Thanks to the random nature of the problem (either frequent hangs or straight up two weeks of stability), some of the BSOD causes being related to some subsystems like networking and understanding that the electric grid in my city is not the most stable, the motherboard might be the most likely culprit. However as the system shows some signs of eventual stability and the board does not display visible (to my knowledge) signs of damage, I'm still afraid of ordering a new one for tests only to find another component was responsible.

Digression​

Latest upgrades on the system where: the addition of the 1tb nvme SSD and 2tb HDD, coupled with a case change. As the system still hangs without these components, and the problems started after a long period of time (changes done in January) they were omitted from the general question. This digression was added as one of my previous suspicions were related to the case layout (Aerocool SI-5100), where the two HDDs might have had their SATA connectors broken as result of being squished by the case's back panel. Tests were done with additional cables and outside of the case, however still resulted in hangs, ruling out this possibility.

Extra information​

Attached to this section are drive folders with two zip files containing minidumps from two Windows installations contained in different drives (denoted in the specifications below as 'main drive'): https://drive.google.com/drive/folders/1d1WLdilHzdqnry_NENm1pAKEp4o2vyZ9
The drive folder will also contain captures of some of the BSODs, as unfortunately I couldn't capture them all.
Some of the hang codes were:
SYSTEM_SERVICE_EXCEPTION
DRIVER_IRQL_NOT_LESS_OR_EQUAL
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED
KMODE_EXCEPTION_NOT_HANDLED
PAGE_FAULT_IN_NON_PAGED_AREA
IRQL_NOT_LESS_OR_EQUAL

Most BSODs generated where related to the nt kernel driver, two I could find where related to an windows networking driver (related to TCP, but couldn't remember it's name correctly) and one seemed to indicate a video driver (also could not remember it's name correctly).

Specifications​

  • MOBO: MSI A320M-A PRO MAX
  • CPU: Ryzen 5 4500
  • GPU: Powercolor RedDragon RX-580 8GB
  • RAM: XPG Gammix D20 16GBx2 3200 MHz
  • PSU: Corsair CX550
  • Storage:
    • Kingston SFYRS1000G 1TB NVME SSD (Current main drive)
    • WD WDCWDS100T2B0A 1TB SATA SSD (Previous main drive)
    • WD WD20EZAZ 2TB HDD
    • Seagate ST1000DM003 1TB HDD
  • Optiarc AD-72 DVD-RW Drive
  • Peripherals
    • USB Keyboard + Wireless USB Mouse (Logo bolt)
    • Dual monitor via HDMI and DVI-D Single link
    • Headphone plugged on Audio Out (back panel) + USB microphone
    • Ethernet connection

Thanks to some unforseen circumstances (and a natural disaster not seen in the region for almost a century!) I was unable to send the machine to a technician for a professional analysis, and as the system was heavily changed and had its warranty expired for some time, that was also not an option.

Any help to test the remaining components would be appreciated! I apologize should this post be missining anything from the guidelines and extend my thanks to the entire forum and it's members as many of the procedures used were obtained from questions and tutorials from Tom's hardware!

My greetings to this community, and many thanks to any that might read this thread.
 
May 18, 2024
2
0
10
Thanks for the answer, appreciate the compliments!
What are provenance of listed equipment?
Parts where purchased through various online IT related retail stores available in my country. While possible not the best costly wise, their products are usually fine, most defects from my purchases from them would be caused by factory faults anyways.

The amazon return policy for the tests seems like a nice suggestion, though I'm not overly familiar with Amazon for PC related purchases, I will take a look through some offers and check if they are attractive enough.

Thanks to an acquaintance I could get hold of a functioning (albeit old) GPU: an GeForce 210. System booted fine with the 'new' graphics card, but unfortunately encountered the same issues while attempting to download its drivers, so for now I would pin the current GPU as functional.

I will keep looking with some friends and see if I could get some compatible components to keep testing while updating the thread.

Appreciate the helpful suggestions!