Hard locks following RAM upgrade

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Nanako

Honorable
May 7, 2017
87
2
10,545
Symptom:
My PC is periodically locking up, hard. There's no sign that it's about to happen, everything simply stops moving. All audio cuts instantly with no stuttering, the screen freezes, nothing will respond. The only recourse is the power switch

This issue occurs frequently, but irregularly. On average 1-2 times a day, sometimes more. It seems to occur most frequently during gaming or when the PC is under heavy load. Cryptocurrency mining seems to greatly increase the chances of it happening, though its quite unpredictable. If I run a mining program on all eight cores, and play a high end 3D game, the issue reliably occurs in under 20 minutes. I can often run a miner on six cores - with the PC otherwise being idle - overnight, without issue. though it will sometimes lock up.it's a gamble really.

The issue occurs very rarely or not at all, when the PC is used for light tasks like web browsing and non-3D gaming (and when im not doing any mining)
I'm not certain exactly what the cause is, but i have three notable suspects, and i need help in narrowing it down.

Suspect 1: RAM:
Quite recently (Late october 2017) , i bought a new DIMM, 8GB of DDR3. PC3-12800 (DDR3-1600). This is the closest match i could find to my existing memory, which is identical in everything except brand.

During installation, i tested both sticks in isolation and they worked perfectly. I also tested every ram slot on my motherboard, more about that later.
Running both sticks together though, would not initially work. The PC refused to boot with both. A little research indicated voltage might be the issue, I raised the DRAM voltage in the bios, from its original value of 1.5v, to 1.625. This immediately solved the problem and allowed the PC to boot, however there is this freezing issue.
I cannot say with 100% certainty that the freezing problem started at that time, but i'm about 85% sure it did.
Since the freezing issue started, i have - in an attempt to solve it - farther raised the DRAM voltage up to 1.7v. This had no discernible effect. At this point i realised i don't really know what i'm doing when it comes to voltage configurations in bios, and should seek advice.

I believe this is the most likely cause of the problems, and that it might be fixed with correct BIOS configuration
I never suffered any issues with the old single stick, so dropping down to just that for the purpose of debugging is feasible. I know the values for a safe working config, so i can test different values as recommended.



Suspect 2: Videocard
Approximately around june 2017, i suffered a total failure which i later traced to the videocard. It's a Radeon R9270X, the thing was about three years old at the time, and it was completely dead. Sadly out of warranty.
I pondered replacing it, but I eventually went for a hail mary, and repaired it by baking. In an oven. Yes. It's a thing that works sometimes.
As far as i can tell, the card worked perfectly after that, but it's possible it was just a temporary fix, and my current problems are a symptom of the card suffering a slow death.
I have a backup videocard that can be used for debugging. It's an old GEForce GTS 8800 512




Suspect 3: Motherboard
Gigabyte 970A-UD3P
During the installation of the ram above, i tested every memory slot on the board. I found that three of them worked perfectly but one did not. The third slot out of four, is nonfunctional, and the computer refuses to even POST if a DIMM is installed in it. This clearly indicates the motherboard is not functioning perfectly, and maybe has other problems additionally. I was able to work around this by simply using two of the other slots, but this may indicate a board problem.
I have no backup motherboard, debugging this will be extremely hard


Unlikely suspects:
CPU:
My CPU is an AMD-FX9370, it's pretty new. I bought it, along with a new cooler, just after the videocard was fixed by baking in about july 2017.
The cooler is a Corsair H100i, closed loop watercooler. I've tested the CPU extensively, and it can run at 100% load for an hour without going over 50c. I'm pretty confident that the CPU and its cooling system are working fine, and are not the cause of the problems. At the very least, i can say with 100% certainty that the CPU is not overheating.

My system specifications:
Windows 8.1
AMD-FX9370 CPU
2x 8GB DDR3 PC3-12800 (brands are not identical)
Gigabyte 970A-UD3P Motherboard
 
Solution
Well you can eliminate memory from the equation and Overheating of the CPU.
There should be a sensor for package temps in HWiNFO64, check that.

You Rail Voltages under stress are within spec so that eliminates the PSU.

Your test of the GPU seems fine and maybe eliminates that however not very reliable. Further testing under more load using Cinebench will eliminate that from the equation. My concern is that your card is dying after baking.

A cache crash is an indication the CPU is unstable especially if your at stock frequency.

P95 is another CPU tester that will push your CPU to the MAX. If you run it then choose small FFT test for 20mins. Keep an eye on the test whilst running and stop the test if temps approach 80C.




Thank you for this advice. With a 15% drop in clock speed and a slight reduction in voltage, i've now managed to stabilise the system. After stress testing for several hours at constant 100% load, the VRM temps now stabilise at 92 celsius. This is still too high i'm sure, but it doesn't cause a crash, and it'll do for the next week

New board ordered, ETA six days. Will post a (hopefully final) update once it arrives and is installed

 
Well, i'm now running on the new motherboard.
With clock speed returned to normal (4.7GHZ) and everything the same as it was when i first made this thread, the system is now stable

It does the job, but frankly not as well as hoped. Voltage regulator temperature has stabilised at 92 celsius under load. Which is a stable enough temperature to not crash, but i'm not sure this is safe for longterm usage. By comparison the old board hit 114 celsius (then crashed) under the same conditions. So this is a 12 degree improvement and enough to not hit a failure point

Immediate stability problems are solved, but even with this new board, the voltage regulator is still the hottest part of the system, and still remains the bottleneck preventing overclocking :/

Maybe some aftermarket cooling solutions are available to target it? I'm not even entirely sure where the voltage regulator is on the board

Overall, this is an adequate but disappointing result. It is nevertheless a step forward
 
ahhh darn my hubris, i thought i had everything worked out, now i feel rather lost again. Oh MeanMachine senpai, please help ;-;

Ok the current situation: Over the past couple days, since installing the new board, i've been tryin to stresss test and tweak the system. As near as i can tell, the cpu is stable, and i don't have any issues with running a mining program overnight on all eight cores. But it seems like videocard engagement ontop of that still generates too much heat - i've been having a lot of lockups running a miner while gaming

Since installing the new board i've had five lockups, mostly mining while gaming but...

I'm doing my best to gradually lower voltage while keeping things stable. I've gotten the DRAM voltage down to 1.52 volts and stable - or so i thought.
Tonight the system locked up while watching a video. And worse still, it wouldn't boot after. The PC seemed completely dead, no POST, no beeps. I spent 15 minutes retrying it in vain
Running on a hunch, i took out a ram dimm, and it worked. I put it back in, and it still worked. So things are working again, but why would removing and replacing one memory stick fix that?

My best guess is that the 1.52v DRAM voltage i'm testing with is too low, and the ram is unstable due to being undervolted. But that is just a guess. My earlier statement that it's stable at this voltage is, too, just a guess. I'm not really sure how to test that. This is the crux of the matter:

I can't tell for certain whether system instability is caused by the voltage settings i'm experimenting with, or by voltage regulator overheating. I don't know how to conclusively determine which is responsible for a crash. And adjusting voltages does seem to be rather a shot in the dark, very uncertain trial and error.

How can i remove some of that uncertainty, and test conclusively whether or not a voltage config is good?
 
Well you have the tools to test your system under load.
If your not sure how to interpret results from AIDA64 then take screanies of your results at the 10min mark and from HWInfo64. Post your results here for analysis.
I will be looking for Rail Voltages,your Clock frequency, Core voltage and temps under load.

If the RAM your using is the same none identical pair then just use one module in the correct slot for single module operation as this can be an issue. Check the manufacturers recommendation for DIMM Voltage. If it's 1.5V then your can go to max 1.6V for stability however, keep an eye on moodule temperatures during stress testing.

If the system is stable with one module or the other and not with both then you have a mismatch. In that case you will have to get a tested kit of 16GB (2x8) to replace the mismatched modules.
 

TRENDING THREADS