Hard locks following RAM upgrade

Nanako

Honorable
May 7, 2017
87
2
10,545
Symptom:
My PC is periodically locking up, hard. There's no sign that it's about to happen, everything simply stops moving. All audio cuts instantly with no stuttering, the screen freezes, nothing will respond. The only recourse is the power switch

This issue occurs frequently, but irregularly. On average 1-2 times a day, sometimes more. It seems to occur most frequently during gaming or when the PC is under heavy load. Cryptocurrency mining seems to greatly increase the chances of it happening, though its quite unpredictable. If I run a mining program on all eight cores, and play a high end 3D game, the issue reliably occurs in under 20 minutes. I can often run a miner on six cores - with the PC otherwise being idle - overnight, without issue. though it will sometimes lock up.it's a gamble really.

The issue occurs very rarely or not at all, when the PC is used for light tasks like web browsing and non-3D gaming (and when im not doing any mining)
I'm not certain exactly what the cause is, but i have three notable suspects, and i need help in narrowing it down.

Suspect 1: RAM:
Quite recently (Late october 2017) , i bought a new DIMM, 8GB of DDR3. PC3-12800 (DDR3-1600). This is the closest match i could find to my existing memory, which is identical in everything except brand.

During installation, i tested both sticks in isolation and they worked perfectly. I also tested every ram slot on my motherboard, more about that later.
Running both sticks together though, would not initially work. The PC refused to boot with both. A little research indicated voltage might be the issue, I raised the DRAM voltage in the bios, from its original value of 1.5v, to 1.625. This immediately solved the problem and allowed the PC to boot, however there is this freezing issue.
I cannot say with 100% certainty that the freezing problem started at that time, but i'm about 85% sure it did.
Since the freezing issue started, i have - in an attempt to solve it - farther raised the DRAM voltage up to 1.7v. This had no discernible effect. At this point i realised i don't really know what i'm doing when it comes to voltage configurations in bios, and should seek advice.

I believe this is the most likely cause of the problems, and that it might be fixed with correct BIOS configuration
I never suffered any issues with the old single stick, so dropping down to just that for the purpose of debugging is feasible. I know the values for a safe working config, so i can test different values as recommended.



Suspect 2: Videocard
Approximately around june 2017, i suffered a total failure which i later traced to the videocard. It's a Radeon R9270X, the thing was about three years old at the time, and it was completely dead. Sadly out of warranty.
I pondered replacing it, but I eventually went for a hail mary, and repaired it by baking. In an oven. Yes. It's a thing that works sometimes.
As far as i can tell, the card worked perfectly after that, but it's possible it was just a temporary fix, and my current problems are a symptom of the card suffering a slow death.
I have a backup videocard that can be used for debugging. It's an old GEForce GTS 8800 512




Suspect 3: Motherboard
Gigabyte 970A-UD3P
During the installation of the ram above, i tested every memory slot on the board. I found that three of them worked perfectly but one did not. The third slot out of four, is nonfunctional, and the computer refuses to even POST if a DIMM is installed in it. This clearly indicates the motherboard is not functioning perfectly, and maybe has other problems additionally. I was able to work around this by simply using two of the other slots, but this may indicate a board problem.
I have no backup motherboard, debugging this will be extremely hard


Unlikely suspects:
CPU:
My CPU is an AMD-FX9370, it's pretty new. I bought it, along with a new cooler, just after the videocard was fixed by baking in about july 2017.
The cooler is a Corsair H100i, closed loop watercooler. I've tested the CPU extensively, and it can run at 100% load for an hour without going over 50c. I'm pretty confident that the CPU and its cooling system are working fine, and are not the cause of the problems. At the very least, i can say with 100% certainty that the CPU is not overheating.

My system specifications:
Windows 8.1
AMD-FX9370 CPU
2x 8GB DDR3 PC3-12800 (brands are not identical)
Gigabyte 970A-UD3P Motherboard
 
Solution
Well you can eliminate memory from the equation and Overheating of the CPU.
There should be a sensor for package temps in HWiNFO64, check that.

You Rail Voltages under stress are within spec so that eliminates the PSU.

Your test of the GPU seems fine and maybe eliminates that however not very reliable. Further testing under more load using Cinebench will eliminate that from the equation. My concern is that your card is dying after baking.

A cache crash is an indication the CPU is unstable especially if your at stock frequency.

P95 is another CPU tester that will push your CPU to the MAX. If you run it then choose small FFT test for 20mins. Keep an eye on the test whilst running and stop the test if temps approach 80C.


I think your problem is a RAM mismatch from your description.
It's never a good idea to mix RAM kits even of the same spec and can be a frustrating experience to get them to work together. Bios Timing and Voltage may not work as minor differences in latency during manufacture can create this instability. That's why manufacturers Bin their kits to conform.
They may test OK as individual kits however when the other kit comes into play a crash occurs.

My advice is to get a single kit at 1600MHz and choose the kit from the MB QVL which have been tested and almost guaranteed to work.

Memtest86 is a good memtester for modules that may be failing and producing errors.
 
To be blunt, I am poor. obtaining new computer parts always involves going into debt for me.
I'd rather have a frustrating experience than an expensive one. buying one new stick was cheaper than buying two, I don't really have a budget to speak of.

I'm pretty much shooting in the dark with voltage adjustment, and i haven't even attempted anything with timings. Frustrating or not, i'd like to try out those options before opting for a solution that will cost money.
Can anyone advise on good practises for bios configuration to make two sticks work together?
 
You have a lot more problems than just memory your motherboard is not made to support a 220 watt processor. That is probably the problem motherboard overheating.
Memory that has the same specks can be made with fifferent parts (even the same brand and model number) and just will not work together sometimes.
 
I totally agree with Zerk2012, the 220 TDP draw of your FX-9370 is beyond the capacity of your MB which is designed for 125W TDP MAX. What that means is the VRMs on the MB will Overheat when the system is under load and shut the MB down.

You have an unfortunate situation with your budget constraint and as far as your RAM is concerned, Read some tutorials on how to manually enter Primary Timings and Voltage in Bios. There is not a fixed set of instructions to make your RAM work together and no two systems are the same. It's a matter of trial and error with no guarantee of success.

Here is a guide to start off with: http://www.overclockers.com/a-newbies-guide-to-overclocking-memory/
There are others to research too.
 


I see, this is an interesting observation.
But if this is the case, wouldn't i have noticed it during my initial burn tests after installing the CPU? Would installing additional memory or increasing dram voltage be contributing factors to this?

Do you know of any way to diagnose or test if this is the issue? does the board itself typically have a temperature sensor?

And incase this is the problem, can anyone recommend a motherboard to upgrade to, which is as similar to my current one as possible? I tend to buy gigabyte because they're a cheap chinese brand
 

You're confusing me here. isn't TDP a measure of the heat outputted, rather than the power drawn?

your MB which is designed for 125W TDP MAX.
Where did this information come from ?

I'm looking at this page
https://www.gigabyte.com/Motherboard/GA-970A-DS3P-rev-10#sp
(I know this isn't my board, I'm looking at it as a potential replacement)
I can't seem to find any information about TDP listed anywhere here.
How can i tell if this - or any other board - is adequate to support my CPU?

Read some tutorials on how to manually enter Primary Timings and Voltage in Bios. There is not a fixed set of instructions to make your RAM work together and no two systems are the same. It's a matter of trial and error with no guarantee of success.

Here is a guide to start off with: http://www.overclockers.com/a-newbies-guide-to-overcloc...
There are others to research too.

I understand that there won't be a straight answer or known values that just work, i was more hoping for a rigorous method - steps to follow and repeat in order to stabilize things.
I had a peek at that link, are you sure it's what i need? It throws out a lot of technical information, but seems to be more aimed at overclocking already-stable ram, rather than syncing up multiple sticks. I'm not sur ethe instructions there are applicable, but ill read into it more thoroughly
 
Sorry for your confusion, TDP is "Thermal Design Power" and relates to the amount of Wattage drawn by the CPU. Essentially your H100i AIO Cooler will take care of dissipating heat from the Processor but with a MB not designed with a decent VRM (voltage regulator module) phase design and high end capacitors the VRMs will Overheat under load.
Sensors designed to protect your system would shut the MB down to prevent damage.

There are solutions to this and one of the cheapest is to provide a Case Fan directed at the NB VRMs. There are dedicated NB liquid cooling solutions however it's expensive.

Review your MB Support for CPUs here: https://www.gigabyte.com/Motherboard/GA-970A-UD3P-rev-2x#support-cpu
There are two revisions of the GA-970A-UD3P and neither support an FX CPU beyond the 8370.

No matter where you source your information regarding RAM Timings and Bios configuration it will be technical and although the tutorial is for Overclocking RAM the info is relevant.
There is much info regarding this topic and google is your friend. If you are going to mix RAM kits then a basic understanding of DRAM Timing Control is required.
 


Thank you, this helped. Looks like i need a GA-990 series to properly support my CPU. I've found one at a not-too-scary price, so i know what to get if necessary. I'm not gonna buy anything yet though, for now ill work on eliminating other possibilities

Ill do some testing with ram and post back
 
Small update.

I've run memtest86 overnight, 13 passes, zero errors. I'm thinking that this at least excludes the possibility of a faulty ram stick.
Could ram timing or voltage settings still be the cause after that?

I've also tried monitoring temperatures, running at 100% load caused the crash in about five minutes. During that time, the highest temp sensor anywhere in the computer was a motherboard sensor that went up to 64 celsius, but it fluctuated up and down, and was only 61 c at the time of the crash.

Maybe its possible some part of the board which doesn't have a sensor is overheating? But it looks to me like motherboard overheating isnt the issue (nor overheating of any other component)
 
Well your modules test OK with Memtest zero errors which indicates the modules are OK.
Some timing adjustments may be required for stability.

Some apps don't read sensors correctly and I have found HWinfo64 to be the best.
To determine exactly why the system crashes after 5mins will have to be evaluated under stress as it could be one of many things. 64C is certainly not going to crash the system.

First is to establish CPU stability under load and test CPU, FPU and Cache. I use AIDA64 in conjunction with HWInfo64 to assess, Core Temps, PSU Rail Voltages. Then the GPU and RAM.

Once this information is at hand then an assessment can be made. Run the stress test for 10mins and post your results.
If unable to determine the cause then post some screenies from AIDA64 and HWInfo64 and I will analyze what's going on.
 


Hardware Info: http://i66.tinypic.com/idxaph.png

Doing a stress test on CPU+FPU+Cache fails, crash occurred 3 minutes 51 seconds after starting test, maximum measured temperature was 60c
Stability test on system memory completed 10 minutes without incident http://i64.tinypic.com/jqoq42.png
Cache only: Crashed at 8.10

GPU Stress, completed 10 minutes successfully http://i63.tinypic.com/s0xxsx.png
It kept doing microfreezes but never fully crashed


These apps give a ton of information, i'm not entirely sure what would be most useful to see
In all tests so far, temperatures anywhere never rose above 64 c

does this help any so far?
 
Well you can eliminate memory from the equation and Overheating of the CPU.
There should be a sensor for package temps in HWiNFO64, check that.

You Rail Voltages under stress are within spec so that eliminates the PSU.

Your test of the GPU seems fine and maybe eliminates that however not very reliable. Further testing under more load using Cinebench will eliminate that from the equation. My concern is that your card is dying after baking.

A cache crash is an indication the CPU is unstable especially if your at stock frequency.

P95 is another CPU tester that will push your CPU to the MAX. If you run it then choose small FFT test for 20mins. Keep an eye on the test whilst running and stop the test if temps approach 80C.


 
Solution
I found the DRAM timings panel and noticed some difference between the two DIMMs, could this be significant?

http://i67.tinypic.com/21agzo7.png
http://i68.tinypic.com/ilbsyv.png


Something MUCH more worrying though:
http://i64.tinypic.com/1z6ys82.png

There's several sets of temperatures listed for the motherboard. This picture was taken when the computer was basically idle.
VR T1 and VR T2 are too high. I'm guessing those mean Voltage regulator temperature? If so then it sounds like the idea about overheating motherboard might have been correct

Then i tried running it under heavy load while looking at this:

The VR temps got as high as 114 celsius, the PC crashed about 30 seconds later. At the exact time of the crash they were 110 celsius.
I guessing those temperatures are above a safe level, although if the crash were temperature induced, wouldn't it instantly shut down at a specific threshold?

All the pieces of the puzzle don't quite fit into place, but this looks like the most promising lead.


I started running the logging from the hwinfo sensor screen while doing the test. I'm not sure if this is any use- if there's maybe a way to open and view it in hwinfo
https://mega.nz/#!48IRwTIT!NsV5h0ffitqAgugn0GmdDQlCXsMimEf3UMt51uRBqAg

If replacing the motherboard is the way forward to fix this, i can afford to do it. I'd just like to be sure though
 


How would i fix the VCore?

Could the voltage regulators be causing that? Not regulating voltage sounds like something overheating voltage regulators would do.

What actions do i take to fix this problem, what do i replace?

assuming it's the board, is this one an adequate replacement?
https://www.gigabyte.com/Motherboard/GA-990FXA-UD3-rev-40#support-cpu

The 9370 is listed there in the support list, so hopefully it has better VRMs.

EDIT: I found the Vcore setting in the bios. I tried dropping it to 1.4, PC crashed during booting windows. It seems like that's not enough to run it. (ive set it back to default now)

1.5375 V is the default and apparently recommended voltage in the bios. I haven't ever altered it from that value. Its probably autodetected. And i think it's worth noting that afaik, the FX-9000 series are just factory overclocked versions of the FX-8000 series, so an increased voltage may be necessary to support that
 
Bios would be on AUTO at default and can Overvault the CPU.
Core Voltage control can be found in Bios<M.I.T>Advanced Voltage Settings. change from AUTO to 1.375V. You may be able to have stability at even lower voltage if at stock frequency.

The only actions to prevent the VRMs from Overheating is to lower Vcore and/or lower CPU frequency or upgrade the MB.
 


It would appear that the GA-990FXA-UD3-rev-40 MB is a better board for the FX-9370 if you want GIGABYTE that is.?
I do have opinions on Gigabyte boards however controversial and no doubt fan boys would contradict. Most of my observations are however related to Overclocking and some will say that temps recorded at 100C+ are OK and backed up by Gigabyte. imo any temps approaching 100C is not good for the VRM caps and they would not last long if at those sustain temps.
Don't forget the FX-9370 has a TDP of 220W and that says it all.

 


oops, i was too slow, it looks like you missed my edit.
I found the setting and tried lowering it, the system crashed during startup


The only actions to prevent the VRMs from Overheating is to lower Vcore and/or lower CPU frequency or upgrade the MB.
then it looks like upgrading the motherboard is the way to go!
i might try lowering the frequency as an interim solution til the new board arrives though
 


I've marked this post as the solution because it's these words that helped me find the window in HWINFO that showed the VRM temperatures. Its wierd that no other program showed them
 


I edited my previous post!!!

 
#

I don't really have a preference, i go for gigabyte because i go for gigabyte. Circular logic i know, basically it's Better the devil you know. I've been too afraid to even look at alternatives.

Also they're cheaper, afaik. but maybe i'm wrong about that too.
Does anyone else even make socket AM3+ motherboards? Is there any non gigabyte board that could hold my CPU?


 
Well if your budget is of concern then you won't want to consider the alternatives.

The best MB for that CPU is ASUS Crosshair V formula Z. More expensive and for good reason.
Better VRM phase design and good quality caps that can endure Overclocking. Also incorporates DIGI+ for much better voltage control.
The CVFZ is becoming rare however Ebay have listings. Don't buy second hand tho.

You may also wish for another alternative and do away with the issue of VRMs.
Consider selling your CPU and MB on EBAY and update your system to the new AM4 platform and Ryzen 7 with lower TDP processors and much faster DDR4 RAM.
 


welp, both of these options are out of reach, ill go with the gigabyte board.

Any thoughts as to the instability when i tried to lower voltage? is there a good practise for that?
 
Yes good practice is to get your system stable at stock frequency and the lowest Vcore that is stable.

When Overclocking, don't jump to a high frequency in one hit and do it in 200MHz steps then stress test.
If at any stage the system refuses the Overclock then increase Vcore till stable in .01V steps.
During stress testing, keep an eye on temperatures as each voltage increase will be exponential in heat output. Happy Overclocking or underclock till your new board arrives.
 

TRENDING THREADS