• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

Question Unknown system shutdowns and crashes (possible CPU temp issues)

Apr 30, 2019
2
0
10
Specs
Case: Diablotek CPA-0170
Mobo: Gigabyte GA-78LMT-USB3
CPU: AMD FX 6300 6 core, 3.5ghz
RAM: 2 8GB PNY DDR3, (original - 2 8GB Crucial Ballistix Sport DDR3)
GPU: NVIDIA GeForce GT 720
Storage: 2TB Seagate ST2000DM001-1ER164 (Firmware CC26), Samsung Super Writemaster
PSU: 550w PowerSpec Bronze, (original - 400w Inland Gold)
Displays: 2 Dell S2209W 1920x1080 monitors
OS: Win7 Pro 64-bit (SP1)

First post and I already have to apologize for the lengthy novel below. It all seems like pertinent information, but I've included a TL;DR below.

I built this PC a few years ago to handle a decent load for graphic design work, that could also double as a mid-level gaming rig. It had some issues roughly a year after I built it that I chocked up to some bad software. After formatting the drive and installing Windows fresh, I seemed to be cruising with no further issues, until about 8 months ago, when similar problems to what I'd had originally started coming up.

It started with random freezing, system shut downs, and blue screens, eventually capping itself off with the endless Windows boot to blue screen to reboot cycle. After searching numerous forums and websites for possible causes and solutions, someone advised I run MemTest on the RAM (Ballistix Sport). I was able to make a bootable USB of MemTest86+, and after letting it run for multiple passes, it seemed like one or both of the DIMMs were bad. Before going out and buying new RAM, I decided to try the cheapo fix I'd seen somewhere, and simply moved the DIMMs to the other two slots on the mobo. This seemingly fixed the issue until it all started happening again about 4 months ago. Knowing that I'd already tested it, and it seemed bad, I went out and bought new RAM (PNY) to install.

Everything was working gravy until about 2 weeks ago, when I started getting random shut downs again. When I would reboot after the shut downs, I would check for a mini-dump, but only found one once, and it indicated driver issues as the cause of the crash. I also checked to see what was showing in the event log, but again, most everything seemed to be related to driver issues or GPU crashes (I've struggled with the NVIDIA drivers for this card since Day 1). I downloaded Snappy Driver Installer, and over the course of many crashes and reboots, I was finally able to get every single one of the drivers on the system updated. As before, this seemingly fixed the issue, until I came home one day and found the system completely shut down for now discernable reason. When I checked the event log, it listed the item as either a temperature or power related problem. After talking it over with some friends and colleagues, I decided to try a new PSU (PowerSpec) to see if it would resolve the issues. Spoiler, it didn't.

After the PSU replacement failed to resolve the issue, I decided to try swapping the RAM DIMMs back to the original slots. This has sort of resolved the issue, in so far as I can use the system under minimal load, but the moment I try to do anything that's even remotely memory intensive, it causes the system to shut down. Obviously I can't use the system for what it was intended at this point, and I'm at a loss as to how to go forward. I don't know if it's yet another RAM issue, or if it's something larger, and is an issue with the mobo.

TL;DR - System crashes/shuts down under any sort of memory load above surfing the web. System is on it's second set of DDR3-16GB (2 8GB DIMMs) RAM in less than 2 years, and it's second PSU in about 5 years.

Any thoughts, suggestions, or recommendations on this issue?

Thanks.
 
What about the CPU temp? Are you sure that it isn't overheating? Have you tested the new RAM with memtest? Have you configured the RAM in the BIOS according to the specs of your specific RAM modules? Sometimes a bad motherboard memory slot can cause memory errors and not a faulty RAM module. So in order to test the RAM thoroughly you have to test each RAM module separately and in each motherboard memory slot. I know that it will take a lot of time but it's the only way to know if you have bad RAM or a faulty motherboard (or perhaps CPU since the memory controller is part of the CPU). Finally you can increase RAM stability by increasing RAM voltage or memory controller voltage (northbridge) in the BIOS. Good luck.
 
What about the CPU temp? Are you sure that it isn't overheating?
Well, I honestly wasn't sure about the CPU temps, since I don't have a digital readout on my case. I installed Speccy to check the temps on both the CPU and the GPU. On initial boot, the CPU was sitting around 65-70. When idle, it seems to float between 58-65. The minute I started to put any sort of load on the system, it spiked up to 80 and kept climbing. Eventually it stopped around 90, but when I pushed it really hard with some 3D rendering utilities, it topped out over 100 (the top end of Speccy's readout), and the system shut down.

I should note here that I'm using stock coolers throughout the system.

I just updated my bootable flash drive for MemTest, and will be going about testing the modules as I can, although I'm not entirely sure that's the issue anymore...
 
The 1st thing you should always check in similar situations is the CPU and GPU temps. In your case you definitely have a CPU overheating issue and a CPU protection mechanism is shutting the system down in order to prevent thermal damage.

You have to make sure that:

1)The CPU fan is working and is dust free.
2)The CPU cooler is installed correctly and makes direct contact with the CPU heat spreader.
3)You may have to remove the CPU cooler, clean the old thermal paste from the heat sink and the CPU heat spreader and then re-apply new one. Over the years the thermal paste may dry which brakes the thermal transfer between the CPU and the cooler. Therefore you have to replace that thermal interface material.

Additionally your PC case should have adequate airflow inside (with at least 2 case fans) and you should clean any dust you may find. Now if you do all the necessary cleaning and re-apply new paste and the CPU is still overheating you may have to get a new (better) CPU cooler, or increase the airflow inside your PC case with more case fans. Good luck.
 
Well, can you please tell us which CPU cooler are you using? and that power supply seems a pretty low quality power supply.
Either ways, please mention the CPU cooler you are using and what kind of pull/push configuration you are using, and if there is any dust in the case clean it it could help the airflow, and also manage your cables inside the case if you don't have cable management
 
Let me add an educated guess based on experiences with all sorts and kins computer in general, PC especially, as well as internal and external peripherals, from almost day one.

It may turn out it has nothing to do with it at all, but it may help yourself or the other participants (well-versed and with a good 'feel' and recommendable actions given the case) to help you; at least find new boiling points to hook on to.

That intro may have sounded high-horsey, but i mean it, the actions taken and recommended to take are helpful and very good advise, even if they sometimes as you already experienced, finish into happy messages like ...

YOUR RAM IS FAULTY - FULL STOP - STOP

On topic:

All in all it seems (to me), there is (either) a problem in providing stable core voltages and/or a wrong assumption made by the motherboard on the effective RAM clocking and/or cache/latency-settings and/or voltage and/or a wrong handling of one or all of the named factors that might be results caused by the drivers used for the graphics card and/or the PCI-bridging, which can render a system unstable due to dynamic overclocking of GPU and the system is trying to accomodate by raising the bus clock and/or the CPU frequency, either trying to follow-up or getting crooked, due to decoupled or fantastic readings to rely on.

YES. THAT IS A THING.

Or ...

YOUR RAM IS FAULTY. AGAIN. I MEAN IT.

Okay, (not really) joking aside:

It is difficult to recommend a | the one truly true way to go on from this point.

OPTION A:

Easiest would be if you could switch device by device into a second system, that may totally differ in anything but the necessary connections. Optimal would be to run a (sysprepped, not necessarily, but helpful) clone of the original OS, which can be created by cloning the partition for example with:

  • CloneZilla, if you are well versed with TUI/CLI and Linux
  • AOMEI Partition Assistant Free (i think?)
  • EasyUSB or WinToHDD (i think? - again ...)
Sorry, i use them now and then but am too foul to actually search it up, because i use DISM to create a WIM or VHD and boot into that ..

WAIT! YES!

You can use
- Disk2VHD, from the Windows SysInternals suite!

Simple. Lean. Neat. Does the Job without learning fancy stuff.

This is for sure the long way, but with the benefit of learning each and any detail popping up alongside.

But i assume you want to try fix it the Windows-way aka heart-surgery without trippy tranquilizers, right? So you can take on to ...

Option B:

Start by assuming it is in fact a over-|under- -clocking-|-voltage- -problem. From inside the running Windows you can take a detailed look via a set of standard tools, like:

  • CPU-Z
  • TechPowerUp GPU-Z
  • Motherboard-Monitors provided by the vendor of your Mobo
  • RivaTuner
  • EVGA Precision X
  • MSI AfterBurner
  • NVidia SystemTools
  • GigaByte Extreme Engine Utility
  • Gigabyte OC Guru II
  • Asus GPU-Tweak
  • Nvidia Inspector
  • Zotac FireStorm
.. to name a few that are as far as i can remember merely independent of specific models and/or series;

Just give some of them a try, to gain a basic insight into the actual rates for the core-clocks and the voltages.

Try to compare that with information by the manufacturer about the advised and normally driven values.

If you have problems finding that data, use databases like gpucheck.com or|and provide load to the GPU and CPU in whole and unit by unit by making use of free benchmark tools like:

and so on.

HALT. CATCH FIRE. CONTINUE.

Just test them and choose one or two to your liking, they essentially do the same tests, just with different visualizations of processing and resulting data; So find the tools that fit your profile.

GURU MEDITATION FAILURE.

So to sum Option B up:

Research the nominal operating values and concentrate your active testing on necessary voltages for a given set of clockrates.

If you do not have a second monitor to watch the values in real time via the aforementioned clocking and info tools, while you run the benchmarks, you can analyse the logs later.

The latter takes a bit of understanding and learning how to read the logs, but i think almost all tools that i named have at least basic logging functionality.

Again: decide which tools fit your purpose and are in fact understandably to you, representing what you need to know.

All that matters is the basic comparison from idle to full throttle of GPU, CPU and RAM solely via MemTest and BurnIn, as also in combination via BenchMarking Software.

YOU DECIDE WHICH IS RIGHT


OPTION C:


If you just want to make an assumption based on what you know by now and you'd like to shuffle things up a bit, mainly for the purpose of cleansing your plagued soul by a reset of what you already probed and tried.

+ Remove DIMMs until one memory-module stays, put it not in the standard/default position, but on Slot 3 or 4 or Lane C, whatever; Let the others be left alone for your new refreshed testing session.

... Positive side-affects:

- You may identify a faulty module just by using

- You may gain a feel for what Dual- | Triple- | Quad- | -Channel sync and async can provide regarding the boost

... when you are through with your testing and have two or more modules again in such configurations running.

Especially helpful for estimating necessity of upgrades and a must-have knowledge to provide yourself with hardware for the reasons you use your machine for: Professional editing and creation of audio-visual content, if i remember right?

SYNTAX ERROR

+ Also if possible remove additional components in use, like:

- WLAN - If you can, connect via LAN, or vice versa.

- ASND - Autonomous soundcard, due to having codec-unit build-in.

- SCCM - Seperate Cooling Control Management, like boards which are interlinked to the sensors, which can produce unpredictable voltages.

WHY? BECAUSE.

... the internal sensors in CPU and possibly existing on-board GPU and on the motherboard report (in that situation) impossible values.

.. etc. pp. and so on ...


OPTION 0
... The Bonus Level

- Write everything down. Create a Blog out of it.


EARN CASH BY PLASTERING 8K-DISPLAYS WITH ADS!

- Report your findings to the manufacturer, link to help-requests like this thread, in a mail to the support of the manufacturers of your motherboard, your graohics card, your RAM-vendor.

DO NOT HOPE FOR ANSWER. JUST WAIT AND GET IT. REPORT BACK. HERE.

- Drive everyone crazy that is trying to help you, by stating things like:

I DO NOT WANT ANY MORE. NO. I DO NOT.

or by simply ignoring the thread for 4 weeks and the 103 helpful messages arrived inbetween.

---


Not totally seriously, okay. But none the less ...

---

Hope it helps. Thanks for reading.

---

POST SCRIPTUM:

It could also be that an initial overheating of the CPU has burnt the (potentially padlike or too thick or too unregular covering the processor onbrought) cooling paste.

Due to excessive heat as a result from that initial burn - which might have been sorted out by doing a burn-in right after set-up of the hardware - this can have lead to a minimal rift which destroyed the solution completely, or has loosened a not perfectly installed clamp to hold the cooler in place and in the end leading to a malformed clamp and or notch it holds on to.

MAYBE. IT IS POSSIBLE. IT HAPPENS. OFTEN.
 
Last edited:
Well, I honestly wasn't sure about the CPU temps, since I don't have a digital readout on my case. I installed Speccy to check the temps on both the CPU and the GPU. On initial boot, the CPU was sitting around 65-70. When idle, it seems to float between 58-65. The minute I started to put any sort of load on the system, it spiked up to 80 and kept climbing. Eventually it stopped around 90, but when I pushed it really hard with some 3D rendering utilities, it topped out over 100 (the top end of Speccy's readout), and the system shut down.

I should note here that I'm using stock coolers throughout the system.

I just updated my bootable flash drive for MemTest, and will be going about testing the modules as I can, although I'm not entirely sure that's the issue anymore...
Those temps definitely sound like the problem to me, your CPU should not be sitting at 60+ in idle. Is your CPU overclocked at all? The stock cooler is garbage for CPU especially for a 6 core 3.5 ghz. The PC will automatically shut itself down when temps get too high to avoid damaging the hardware and 100+ is definitely a danger zone for CPU. I would suggest upgrading the stock cooler and using a thermal compound with a good temp transfer such as Arctic silver 5. A good cooler should be able to keep Your cpu below 85C under load just make sure you check the size to be sure it will fit in you case.
 
I have had this issue many times before with the exact same side effects 9/10 it was a faulty motherboards first the ram slots start dying and then you start getting interference on the board causing random shutdowns / cpu sensor errors, the only temp fix for me was ram cycling and never letting the pc sleep or turn off. I mean sometimes it could go weeks without a skip but other times it would be non stop.