[SOLVED] System Instability/Hang - Highly Random

FelixLive44

Reputable
Jan 2, 2017
23
1
4,515
Hi, I've posted about this some time back, but have not been able to come to a conclusion or find a fix from the suggestions received. That being said, I have now tried virtually all tests and troubleshooting steps I could find, I am therefore asking, here, if there is anything else I should try, or, more likely, which part of my system is likely the cause for my system instability.

Quick note, I built this for my dad's work computer, he has his previous one so it's ok to take time for it.

Here are the specs :
CPU - AMD Ryzen 5 3400G (Stock)
Mobo - ASRock B450M-HDV R4.0 (Updated BIOS to latest)
RAM - 2×8GB G.Skill Aegis F4-3000C16D-16GISB (@ Default 2133)
SSD - Crucial BX500 1TB (clean Windows 10 Pro install)
PSU - Corsair CV450-450W
CPU Cooler - NH-U9S
Case fan - Scythe Kaze Flex (×1)
Case : DEEPCOOL Matrexx 30

The system instability issues I get are system hangs or restarts. No blue screen at all. Bluescreenviewer can't find anything, but Event Viewer can find Kernel-Power errors, usually with Bugcheckcode 234 and somewhat varying parameters, but usually all 0x0 or 0.
The crashes happen randomly, but somehow seem to never happen under full load. These crashes started occurring when the computer was put in use. All parts are new, except for the drive, which was cleaned and is less than 6 months old. The system is plugged in a UPS but behaves the same when plugged in an outlet.

The system need to do the following tasks for my dad :
  • Work for web tasks (most of his work is through web interfaces)
  • Run in 4k
  • Be able to run wallpaper engine in 4k (most of the time)
  • Be able to record videos (has been confirmed to work on the 3400G using his software)
  • Do all of this relatively silently (see cooling components chosen)
All of these seem to "work when it works", or so I could say...

The crashes are either system hangs, where the last played sound buzzes and the screen freezes, and quickly looses connection to the GPU, and requires a restart. Or, direct restart by itself. Both seem to report essentially the same error, as explained previously. When it restarts by itself, there is a weird effect on the screen on boot until the OS loads, where there is green noise across the screen, focused in horizontal stripes. Noise seems to be denser when the RAM is overclocked too aggressively (see lower "Other stuff I have done").

Here are some of the troubleshooting steps I have done :
  • I have tried changing the RAM, even with some on the QVL list from the mobo manufacturer. Same issue
  • I have tried a different disk. Same issue
  • I have tried a different power supply confirmed to work. Same issue
  • I CANNOT try another GPU.
  • I CANNOT try another CPU.
  • I CANNOT try another Mobo.

I have successfully run Prime95 (mixed, stress test) and Furmark (4k, stress test) for 24h and 12 minutes on the system, simultaneously. CPU, RAM and GPU were essentially at 100% (slightly varying) use for 24hrs with absolutely no issues. I have also ran 3DMark Benchmarks and small stress tests successfully. The only conclusion I could come to is that it will not crash if it is either under full load, or that i will not crash under artificial load.

I am currently running Memtest86+. As of now it has gone through 5 passes and found no errors, but I'll leave it running for at least 8 passes and likely up to 10 and interpret the results then.

Other stuff I have done (No note = didn't change anything) :
  • Windows is up to date
  • AMD Adrenaline Drivers up to date (tried both versions)
  • Motherboard BIOS is up to date (Helped, about 50% less crashes)
  • BIOS with all default settings
  • XMP Settings loaded for RAM (seemingly less stable, understandable as 3000 is over the 2933 max for stock 3400G)
  • Manual RAM OC to 2933 @ 1.3V (seems ok)
  • sfc /scannow (occasionally fixed some files after crashes. Wouldn't go past 5% before BIOS update)
  • chkdsk (nothing)

Considering the crashes happen at a random time and have absolutely no pattern to speak of and that the PSU and Disk are fine, I am suspecting hardware issues, in order of likelihood, from Mobo, RAM or CPU. After the memory test I'm running, I'll know if the RAM is to blame. If so, I'll try to get new sticks on the QVL list. If not, I'll RMA the mobo. If new mobo doesn't work better, I'll RMA the CPU.

That being said, is there any noticeable pattern someone here can notice? Is there something I'm missing, something I haven't tried, or a clear indicator to a specific issue? This is not my first build but I'm far from a pro in this. I've used some of these parts before without any issues, notably the SSD and the Mobo. I've been messing with this for way too long and I've found next to nothing that could help me, so I come here hoping someone can help me get this over with even if just a little faster.

Thank you for any input.
 
Solution
This "for over a year perfectly well". No problem per se and the PSU's calendar age may not necessarily be a problem.

However usage over time, especially heavy usage near or close to max wattage is a problem.

Most PSU's (as are many products) are designed with a certain EOL (End of Life) in mind.

After so much "wattage", heating up, cooling down, etc., the PSU will start to degrade and falter. In many cases there will be varying problems of some sort that grow in both number and nature.

Remember that PSUs provide 3 voltage levels(rails) with each voltage serving different components.

It only takes one voltage going out of tolerances to start causing problems.....

Or the PSU might simply just catastrophically fail. The...

Ralston18

Titan
Moderator
Look in Reliability History and Event Viewer for error codes, warnings, and even informational events that correspond with the times that the system hangs/crashes or has other noted problems.

Noted that you tried another PSU - was that PSU also 450 watts?

Take a look at what the system is doing via Resource Monitor and Task Manager - just only use one at at time.

You can also use Process Explorer in much the same manner although you may need to download Process Explorer via Microsoft's website.

https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer

You may discover some unexpected app or utility being launched that could be related to the problems occurrence.

"sfc /scannow" noted: there is also "dism".
 

FelixLive44

Reputable
Jan 2, 2017
23
1
4,515
Look in Reliability History and Event Viewer for error codes, warnings, and even informational events that correspond with the times that the system hangs/crashes or has other noted problems.

Noted that you tried another PSU - was that PSU also 450 watts?

Take a look at what the system is doing via Resource Monitor and Task Manager - just only use one at at time.

You can also use Process Explorer in much the same manner although you may need to download Process Explorer via Microsoft's website.

https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer

You may discover some unexpected app or utility being launched that could be related to the problems occurrence.

"sfc /scannow" noted: there is also "dism".
Thank you, I'll try this when the RAM test is over. Almost done with 8th pass and no errors yet.

Thanks again!

Edit : PSU I tried is a lower power 410W (essentially) no name PSU, but it has run a system with a HDD, an FX-8320 and a 1060 for over a year perfectly well.
 

Ralston18

Titan
Moderator
This "for over a year perfectly well". No problem per se and the PSU's calendar age may not necessarily be a problem.

However usage over time, especially heavy usage near or close to max wattage is a problem.

Most PSU's (as are many products) are designed with a certain EOL (End of Life) in mind.

After so much "wattage", heating up, cooling down, etc., the PSU will start to degrade and falter. In many cases there will be varying problems of some sort that grow in both number and nature.

Remember that PSUs provide 3 voltage levels(rails) with each voltage serving different components.

It only takes one voltage going out of tolerances to start causing problems.....

Or the PSU might simply just catastrophically fail. The host computer will simply not boot despite a lit LED or two.

Depends in many ways on the design, the quality of components, and the manufacturing/assembly process. Also known as "Quality".
 
  • Like
Reactions: FelixLive44
Solution