Question "Ghost in the machine" random crashes that I can't figure out ?

Apr 17, 2024
3
0
10
Motherboard: MSI B450 A-Pro
CPU: Ryzen 7 5700x
GPU: Radeon RX 6750 XT
PSU: brand new EVGA 80+ white 700w, was formerly an Apevia PSU when this started.
RAM: G-skill 2666mhz

I've been having a smattering of "ghost in the machine" type random crashes and all attempts to troubleshoot have turned up with nothing.

Couple weeks ago, computer out of the blue started acting up. There wasn't some big cumulative update or something, it just started misbehaving one day. The system will go into an old school hang, with looping audio and all, or will rarely blue screen. System struggles to boot, with peripherals occasionally failing to light up.

Blue screen error codes indicate memory issues - page fault in unpaged area, kernel mode trap, etc. bluescreen viewer on minidumps points to ntoskernel.exe failures, meaning that if it's a specific device driver, it's not reporting as such. Reseated my RAM modules, fully cleaned out the inside of the computer of dust and debris and ran Memtest86+ as well as windows memory diagnostic, all clear on both. Reinstall Windows with fresh drivers - no change in behavior. This suggests the PSU to be an issue. I knew I'd cheaped out on the PSU back in the day so I swap it out for a brand new one - no change.

I'd heard the new AMD drivers were a bit unstable, and though I didn't do a driver update immediately prior to the problems starting, I rolled them back to 23.11.1, regarded as the most stable recent version. No change. I chkdsk all my drives, they're all clean. I disconnect most nonessential peripheral devices including PCIe devices, set the CPU to eco mode and underclock my RAM. Still no change in behavior. Subject the CPU to a stress test, which it runs effortlessly for several hours, core temps never rising above 80c before crashing when I go back to using the thing normally a few hours later. Infuriating.

There's no common pattern to what leads up to these crashes. I could be playing a game, or the computer could be sitting completely idle. I could be in safe mode, I could be running with nonessential services and startup apps disabled and no change in behavior. It's not my outlet or surge protector I don't think since other plugged in devices experience no such issues and the PC acts the same plugged into a different outlet or the wall directly.

I'm ripping my hair out at this point. Complete ghost in the machine, I feel like I've exhausted every avenue and. No. Change. Nothing. Not a single thing I've done has worked to put this thing back on track or even showed signs of increasing system stability. There's gotta be something I'm missing?
 
Apr 17, 2024
3
0
10
I need a full spec sheet of your system including how much RAM and storage as I MAY have a hypothesis of my own.....................
my PC was down when i first typed the OP, but right now it's chugging along, so here's a more "complete" system spec:

Code:
OS: Windows 11 Pro V 10.0.22621 build 22621
Motherboard: MSI B-450 A-Pro with SMBIOS version 2.8 (dated 8/8/2022)
CPU: Ryzen 7 5700x, 8 core 16t. PBO and other OC disabled, eco mode on.
GPU: Radeon RX 6750 XT
RAM: G.Skill 2666 MhZ, 16 gigs DDR4 (F4-2666C19-8GNT x2 in slots 3 and 4)
Storage:
    OS Drive (C: SPCC M.2 PCIE SSD (2TB size, 482GB in use)
Secondary drive (G: PNY CS900 240G SSD (30.8GB in use; this mostly stores backups and some games that are very disk intensive)
    Bulk Storage (E: WD Black 4tb drive, currently disconnected. Mostly stores games and media files.
sorry for putting it in a code block, emotes were popping up

i also have some external PCIe devices, an elgato HD60 Pro and a USB 3.0 hub, but both of those have been for the most part removed from the system since troubleshooting began a couple weeks ago

A little confused about the PSU situation. Why did you replace one low-quality PSU with another one?
hard up for cash r/n and actually between jobs. i figured EVGA's been in the game for a long time though, a bit more of a trusted name than Apevia etc and the likelihood of having 2 PSUs broken in the same exact way sounds pretty low, though apparently it's happened to a few people here over the years? I don't have access to a multimeter to test the thing properly. should i get in touch with a shop, see if they can test the new PSU and possibly RMA mine?
 

DSzymborski

Curmudgeon Pursuivant
Moderator
EVGA sells some good quality PSUs. This isn't one of them. The W1 series would be the worst product ever released by EVGA if the N1 series of PSUs didn't exist. It's not broken; it's simply dirt-cheap, group-regulated junk that should never be turned on with the hardware of a modern gaming PC, even a budget one. There's nothing to test and most local PC shops aren't going to have a load tester and oscilloscope. A multimeter does nothing here as it doesn't measure voltage regulation and ripple on load. An RMA, if accepted, would just result in you having another identical inappropriate PSU that needs to be replaced. This is a problem if you *weren't* having these random crashes.

I understand that budgets are an issue, but it's way more economical to buy a quality PSU once rather than a junk PSU two or three or four times (and possibly new components). You have crashes that suggest a power issue and a really poor quality PSU. It may not be the direct cause of the problem, but the first thing to do in a situation like is to get a junk power source out of the mix.
 
  • Like
Reactions: MelodicCodes
Apr 17, 2024
3
0
10
EVGA sells some good quality PSUs. This isn't one of them. The W1 series would be the worst product ever released by EVGA if the N1 series of PSUs didn't exist. It's not broken; it's simply dirt-cheap, group-regulated junk that should never be turned on with the hardware of a modern gaming PC, even a budget one. There's nothing to test and most local PC shops aren't going to have a load tester and oscilloscope. A multimeter does nothing here as it doesn't measure voltage regulation and ripple on load. An RMA, if accepted, would just result in you having another identical inappropriate PSU that needs to be replaced. This is a problem if you *weren't* having these random crashes.

I understand that budgets are an issue, but it's way more economical to buy a quality PSU once rather than a junk PSU two or three or four times (and possibly new components). You have crashes that suggest a power issue and a really poor quality PSU. It may not be the direct cause of the problem, but the first thing to do in a situation like is to get a junk power source out of the mix.
that bad of a product, huh? hmm. when i have the budget available for a fresh PSU again i'll try another, higher quality replacement then, with a bit more research.

if i keep crashing after that i'll run the same diagnostics on memory, OS storage, RAM and CPU and if i get no luck i'll bump this probably.

what would you recommend i shoot for, going for a 700-800w supply?
 
that bad of a product, huh? hmm. when i have the budget available for a fresh PSU again i'll try another, higher quality replacement then, with a bit more research.

if i keep crashing after that i'll run the same diagnostics on memory, OS storage, RAM and CPU and if i get no luck i'll bump this probably.

what would you recommend i shoot for, going for a 700-800w supply?
Corsair RM750e is usually decently priced.