Question system freezing at random times

aweber1nj

Distinguished
Nov 26, 2007
4
0
18,510
I'm looking for advice on how to track down what is likely a failed component...

I built a Win10 (Pro) system from a refurb Dell Precision T7600. Has dual 6-core Xeons. Originally had 32GB ECC RAM, added a second "set" - so it has 64GB of ECC RAM. Dell PERC RAID, and GTX 570 video. (System is for work and running multiple VMs simultaneously...not gaming.)

I leave the system on 24x7. "Sleep/Hibernate" is disabled, but monitor-off is set to 2hrs.

Random times - usually after the system is running for a week or two - I will return to my desktop and shake the mouse to wake the screen...the monitors will appear to wake (based on their LED changing status), and usually I'll get a mouse pointer, but the screen remains blank/black. Kbd is non-responsive (even ctrl-alt-del). Trying to ping the PC from another PC gets no response. So the system is mostly hung.

I have also seen the scenario where the monitors WILL wake, and you can click on windows, but they are all unresponsive. The clock in the Win Tray shows a date/time from the past and is not updating. Nothing seems to function, can not ping. The only thing that I can do to get it back is force power-off and power it back on...and it will appear to run fine for another period of time.

I have run some CPU stress tests, and they run for however long I try. Windows System Log has no adverse events correlated to the time it hangs. Memtest 86+ seems to stop and reboot almost immediately during the first test - IDK if that's due to the ECC RAM? I have tried the built-in/BIOS memtest, and it has not reported any issues.

So given that info, how can I track-down what's happening??? Thinking I have to methodically check components, but I'm unsure what I should reliably run to check the system.

So I appreciate any links/tips.

-AJ
 
Sounds like the ECC RAM is the culprit. have you tried the built in mem test (windows 10) rather than memtest?

"use the Windows key + R keyboard shortcut, type mdsched.exe click ok to start the tool.
Have not tried the Win10 test. Will try that ASAP. Thank you for the quick reply!

EDIT: Tried that. It rebooted and started the util...then rebooted almost immediately! Does it log anywhere that I can see what it did/happened?
 
Last edited:
OK, pulled one set of RAM (re-configured to correct DIMM locations) and tried again. WinMemDiag crashes/reboots almost immediately when it starts. Pulled all those and replaced with the other set...same thing happens!

Is there something holding me back from running these standard memory diags using Dell BIOS or more generally, with ECC RAM?