Question Computer is freezing at random intervals, out of ideas

exzachtly1

Honorable
Jan 18, 2014
8
0
10,510
Hi all! I am having a problem with my computer that I'm hoping someone can help me with. This is a custom built desktop PC that I built in March 2019. Upgraded a couple items along the way. The specs are:
  • ASUS Prime x470 Pro motherboard
  • AMD Ryzen 2700x CPU, no overclock
  • G Skill Tridentz Z RGB DDR4-3200 32GB (4 x 8 GB running at default 2133 MHz currently)
  • MSI Geforce RTX 2060 Super
  • OS Drive: Samsung 970 Pro 512 GB m.2 NVME SSD
  • Drive 2: Samsung 850 EVO 250 GB
  • Drive 3: Seagate FireCuda 2TB
  • Drive 4: WD Black 2TB
  • Corsair RM750x Power Supply
  • NZXT H500i Case
  • Windows 10 Professional
As noted, this was built 3/2019 and has been running flawlessly up to about 3 weeks ago or so, I think. In that time, the computer has started randomly locking up. The behavior is erratic... sometimes it is a hard freeze, where the image on the screen persists but everything is locked - no mouse, keyboard, nothing responds. More often, it is a gradual failure where I will notice things like the windows start menu or the task bar stop responding, eventually leading to a total lockup. A reboot will fix it temporarily. The time for it to lock up is totally random. I've seen it go days, other times only hours.

I have NOT received any blue screens. Just freezing.

Here are the steps I've taken to troubleshoot so far:
  1. Rule out OS issues by clean installing Windows 10. This was a complete reformat and reinstall of the main drive. No luck - the issue still happens.
  2. Move on to hardware. Starting with hard drives. Downloaded Seagate Seatools, scanned and tested all drives. Found that Drive 3 - the Seagate Firecuda - reported a FAIL status during the self diagnostic test. The program advised me to save my data and request a warranty exchange, or use the bootable USB software to attempt to fix bad sectors (I did not have luck with this - created the USB stick and I couldn't get it to boot whatever image was on there - no idea why). Backed up all data from the (presumably) failing drive, submitted a warranty claim with Seagate. The new drive is on the way and this drive has been physically disconnected.
  3. To test the memory, ran memtest86 with default settings (all tests, 4 passes). The first time I ran it it took about 7 hours and reported 2 errors during pass 3 of test 7.
  4. Ran multiple additional runs of memtest86 - no errors this time. I tried at least one test pass on two sticks of RAM at a time, trying to narrow down if a specific module is faulty but I have not been able to reproduce the errors that I saw the first time I ran it.
  5. I've also updated the motherboard BIOS to the latest version. Previously was running a very old version, probably whatever was available at the time I first built it. I don't think I've updated since.
  6. Finally - during this whole process, I realized my RAM is running at lower frequency that what it's capable of. It's at 2133 but could be 3200. So I did change my RAM configuration to load the XMPP profile to run it at this speed via the BIOS (ASUS calls this "D.O.C.P.") I have since reverted this change because I'm not sure if it was stable and I didn't want to introduce another variable - it seems like the freezes happen quicker when I'm running the memory at a higher speed but I haven't really been able to prove that yet.
  7. After all of this, I've been attempting to just let the computer run with only 2 sticks of memory installed at a time. And I'm still getting lockups regularly, with either set installed. It usually takes a few hours to happen, sometimes longer. The system will be partially responsive. I can minimize and expand windows, certain applications will respond, but things like task manager will be frozen and the more I click around the more messed up it gets until eventually everything is locked up. When this happens, I made sure to have task manager up, and NZXT CAM for hardware monitoring. Nothing looks abnormal, memory and CPU usage is normal, nothing useful in windows logs. This is making me think the RAM is not at fault, since it would be very unlikely that running with either set of 2 sticks would result in the same behavior (more than one bad stick)
  8. Also tried installing windows to the secondary SSD and running off of that. Still getting lockups, so I've ruled out the primary SSD as being the problem.
So that is where I'm at... I'm about at wit's end here. Really confused about what could be going wrong and looking for ideas of what to check. I'm starting to worry that it could be my CPU, motherboard, or power supply on the fritz, but I'm not sure how to test or prove that. Everything should still be under warranty, I hope... but I'm not sure what I should be looking to RMA first.

This has been SO frustrating, because every step of the diagnosis takes hours. Literally been at this for a week. Hoping to get some thoughts from anyone who has experienced similar issues. Thanks in advance!
 
Last edited:

exzachtly1

Honorable
Jan 18, 2014
8
0
10,510
Still no luck. I spoke to ASUS support two days ago and they suggested making sure I have the latest chipset drivers installed, which I did not (I just let windows handle the drivers on my latest install). So I installed all the chipset drivers and rebooted, and I am still experiencing freezes.

I've been digging a little more and made some additional observations, curious if they might be related. Ryzen master was showing a near constant 99-100% EDC (CPU) value. I did a little reading and saw that most people suggested to use a different windows power plan - either Balanced or AMD Ryzen Balanced - and tweak the minimum processor state in the advanced options to be something low like 5% or 20%. So I did that, and it did not have much effect. Checked task manager to see if anything might be causing my CPU to stay at high voltages and I did observe that Corsair iCUE and NZXT CAM do seem to be constantly pinging the CPU for very small percentages, so I think it's preventing it from going fully idle. If I kill both of those programs my EDC value goes down at idle to ~50-60% and the CPU temps go down too. Not sure if any of that matters here, but it seemed strange to me that the value is always spiked.

I also noticed that it won't idle below 4025-4050 core speed on any core (according to ryzen master). Task manager says it's lower - like 3.7-3.9 GHz range. I don't really know which one I should be paying attention to. I have not applied any overclocking.

So far I've been running all of my tests with these programs active. However - I never messed with this (or the power plan) previously when the behavior originally appeared, so I'm not hopeful that I will see a different result. I'm just desperate to try something different to see if the issue repeats.

Other questions:
Could it be the GPU? I've not tried playing any games. Maybe I should try removing the GPU and temporarily swapping in my old one?
Is re-seating power cables / cleaning connectors on GPU and RAM worth the effort?
Any other ideas?
 

Phaaze88

Titan
Ambassador
Your issue sounds like mine, just with completely different hardware.
I went and ordered a new psu a few days ago - it's expected next week - but now the issue seems to have fixed itself, and now I have to rush to figure out what the cause is before the package gets here.

When it was freezing, I tried:
1)Disconnecting and reconnecting all my psu cables, and reinserting the gpu.
2)Windows DISM and sfc scan.
3)I clean installed Windows like a month or 2 ago, so I don't think I need to mess with that right away...
4)Updated Windows to 20H2.
5)I tried a few different gpu drivers: 456.71, 460.89, 457.30...
5.5)Then I rolled all the way back to 384.76 - but that's when the system fixed itself. I even went back to the previous drivers, and all clear, like "WTH?"
6)All 3 of my storage drives are Samsung: Magician tells me all 3 are fine.
7)I checked my gpu for any physical damage. Granted, I've already taken mine apart to install a hybrid cooler on it.
8)Gpu isn't overclocked.
9)Swapped out the PCIe cables for the gpu, but it had 'fixed itself' by then, so I put the old ones back.
10)I cleared CMOS on my motherboard and ran the cpu at stock.
11)I did 8 passes of Memtest. 10hrs and 57mins later, it passes with flying colors.
12)I also tried a Clean Boot with no luck there either.

I did find ONE thing that 'fixed' it, and it's what made me suspect the psu - or the gpu - in the first place. It might help you.
I opened up Msi Afterburner and applied a 50% power limit. The freezing stopped. I even tried raising it: 60% was ok. 70%, nope.

So I'm like, "Something like this doesn't just disappear like that, does it?"
I'm trying to run stress tests to try and trigger it again... last night I ran Unigine Heaven Benchmark and Cinebench R23 at the same time, with no luck.
Tonight I'm going to try Prime 95, Small FFT, all AVX off, and Msi Kombustor.
I'm going to keep at it until the new psu gets here. It(Corsair AX850 Titanium) cost me 300USD, and I don't want to be stuck with it if I don't need it.
The current one is a Seasonic Prime Titanium 750. It's seen about 4 years and 5 months of use.
 

exzachtly1

Honorable
Jan 18, 2014
8
0
10,510
Thanks for the reply. Interesting to hear that it might be GPU related... I think I will maybe try reverting to an older driver like you did, or perhaps putting my 970 back in temporarily. What GPU do you have?

What's interesting is that as of right now I've had 22 hours of uptime with no freeze which is the longest I've gone so far since I started troubleshooting things. That's with iCUE and CAM turned off, EDC (CPU) was hovering around 50-60% while idling. I'm going to put it to the test a little bit today and actually let it do some stuff in the background while I work - primarily using Synergy (mouse and keyboard software) to share my peripherals between my main PC and my work laptop. This is what prompted the issue and all the investigation in the first place - I use this all day to work and it suddenly started doing the random freezing, causing me to lose control of the mouse and keyboard on my work laptop... super annoying!

I may also run a game in the background or something just to see how it behaves under some load.

Keep us posted on what you find, I'm very intrigued.
 

exzachtly1

Honorable
Jan 18, 2014
8
0
10,510
A strong argument here for NZXT CAM being suspect:


Now that I think about it, I believe I did update CAM a while back and it probably roughly correlates to when the freezes started occurring. Seems very suspect that the computer, so far, has been stable for the longest time yet with CAM disabled. Up to this point I've done ALL of my testing with CAM running because I use it for RGB, fan control, monitoring, etc.
 

exzachtly1

Honorable
Jan 18, 2014
8
0
10,510
Thank god I kept my downloads in my download folder haha - It looks like the version of CAM I was using previously that was apparently stable was CAM_Installer V3.7.5.exe, installed on 3/15/2019 around the time I did my build. Then, at some point I tried updating CAM because it was giving me weird errors with using my google account for single sign on. Unfortunately it doesn't look like I have that file in my downloads folder but I can confirm that the version I currently have installed (not running) is 4.17.0.