Custom watercooled system is crashing due to overheating

singo79

Honorable
Jan 11, 2014
28
0
10,530
Hi All,

So just under a year ago I built a new system and installed a water cooling loop. So to get the house keeping out the road, these are the components I am running, followed by the water cooling products I've installed;

Components
Intel i7-4770k (Stock Clock)
Asrock Z87 Extreme9ac Motherboard
16GB G.Skill 2400MHz Trident DDRIII Ram (XMP Profile)
2 x Gigabyte R9 290X (Stock Clock) (Latest firmware and drivers installed)
Asus 28" 4K Monitor (3840 x 2160 @ 60Hz)
Corsair Carbide Air 540
500GB Samsung Evo SSD
2TB WD Caviar Black
Windows 8.1 Pro 64-bit

Water Cooling
1 x XSPC EX360 Radiator with 3 x Corsair AF120mm fans at 1000rpm (pushing into case)
1 x XSPC EX240 Radiator with 2 x Corsair AF120mm fans at 1000rpm (pushing out of case)
Phobia UC-2 LT CPU Water Block
2 x XSPC R9 290X GPU Water Blocks
1 x XSPC D5 Photon 170 Pump/Reservoir Combo (Running at 3700rpm)
EK Pre-mixed Coolant
Artic Silver Thermal Paste

Order of components
Start=> Reservoir/Pump => EX240 Radiator => CPU Water Block => 1st GPU => 2nd GPU => EX360 Radiator => Reservoir/Pump.

Now that part is out of the road into the issue.

I have been having overheating issues recently whereby the computer doesn't completely crash altogether, rather it will just crash out of the game that I am playing and back to the desktop.

At idle the temperature of the liquid is around 29 to 35 degrees depending on the ambient temperature of the room. The idle temps of the two R9 290X is 42 and 41 degrees respectively, this is also dependent of the ambient temperature. The idle temp of the CPU is around 44 degrees, again depending on the ambient temperature.

When gaming, not even for a long period of time, the temps ramp up and the liquid in the loop gets up to 51 degrees. The two R9 290X GPUs get up to 67 degrees and the CPU gets up to around 67 degrees.

To me all of those temperatures look fine. The GPUs are manufactured to operate at 100% all the way to 94 degrees and the CPU should easily handle mid to high 80 degrees mark. Yet the system still crashes to desktop when running at high temperatures.

The first sign that the computer is about to crash to desktop is the glitching and artifacting on the screen. As the components get hotter, up to the temps listed above, the graphics glitch and do funny things until eventually the game crashes and I'm back at desktop.

As soon as I am back at desktop there are no further issues, there are no graphics problems or glitches and the temperature of the loop and the components start to drop quite quickly.

Only recently did I do some maintenance on the water cooling loop, draining the loop and installing new liquid and blowing out the dust that had accumulated in the several months since building it. This did not change the performance of the cooling system, in that it didn't affect the temperatures that I have already recorded.

Ideally it is good to have the CPU and the GPUs on different water loops, however despite the size of the case I was only able to install the one loop into it, though I know of other more capable installers have managed to fit two separate loops.

Is anyone able to offer some advice or guidance in relation to my issue? I just can't seem to put my finger on it and am hoping that someone might have some knowledge in this area to assist.

Thanks for taking the time to read this and I look forward to any assistance offered.

redir
 
The only issue I'm seeing with your components are the Fans. The AF series isn't optimized for for Static Pressure (pushing air through obstructed space) so running those specific fans at only 1000rpm probably isn't doing as much as you'd think.

Otherwise, I'd double check your VRAM modules are cooled properly with thermal tape making proper contact with the waterblocks and if that doesn't work, I'd try testing each card individually to make sure one isn't going bad (I know, that's going to be a pain in the neck when they're in a waterloop)
 
As it's been nearly a year since you built your rig. It would be a good idea to check inside of the GPU and CPU blocks for any build up of gunk on the fins/channels where the water flows through to remove the heat. as these clog you will get reduced effiency like you have stated. Another thing you may want to consider is using MSI's afterburner program to undervolt the cards, thus producing less heat. Something the 290x is renound for. I have an MSI 290x and after rebuilding my rig and installing the latest AMD drivers 14.11.2 beta, I have had several crashes in Battlefield 4. I have just rolled back to 14.9.
 
Thanks very much for the suggestions and comments thus far. Though it is going to be a massive pain in the backside, I think I will have to pull the GPUs out and inspect them a bit closer.

When I installed the water blocks onto the GPUs I was very careful to ensure that thermal pads were sufficiently covering the vram modules, however a more closer look is probably a good idea.

Since it's a big job I'll have to fit it around work, but hopefully I'll be able to post back here in the coming days with the results of my inspection.
 


yeah we will be here.. report back
 
Can you show a picture of your GPUs? Are they run in serial or parallel? It looks to be serial from your list above.

This being said, I don't think this is watercooling related.

Also, 67C isn't anywhere near temps that would cause crashing, although it could be related to incorrect seating on your VRMs and MOSFETs...although I'd think if that were the case, you'd see temps higher on your GPU cores also, as they wouldn't seat properly.

Have you overclocked your GPUs at all before? It says stock clocks, but have they always been this way? Also says new drivers/firmware...are you certain that firmware flashing didn't cause any issues? This can often lead to things you are experiencing if a flash went wrong or you flashed to something that was incorrect...that being said...my definition is that BIOS=firmware, so, if you got a bad flash or something beyond what your cards can handle...this is likely your culprit. Try flashing back, hoping you kept a copy of the original.
 
Ok, so I managed to drain my loop and pull the GPUs and heatsinks apart and everything looks fine there. The thermal paste was in good condition and the thermal pads were also making good contact.

I decided to get onto the Gigabyte website and download the latest R9 290X firmware and have upgraded accordingly. This still made no difference as after about 15mins sitting on about 62 degrees the game I was playing crashed. However, this time at least I got some sort of insight as to what exactly is causing the crashing.

I was playing Bioshock Infinite with graphics to max and at 4K. The same glitching started to happen when I got up over the 60 degree mark and after about 15mins at this temperature the game crashed, but this time I actually got an error message rather then a crash back to desktop with no explanation.

The error reads;

"The Direct3D 11 device has been removed or has crashed. This requires the game to exit. Please ensure that you have the latest drivers for your graphics device and please restart the game"

It is now becoming clearer that the issue surrounds the graphics cards or maybe even just one card. I am going to start testing the cards individually, without crossfire enabled, and see if it is one particular card that is causing the crash or whether both cards actually crash.

But to answer a few questions, the GPUs are run in serial within the water loop. The clocks on the cards are set at Core = 947MHz Memory = 1250MHz and the GPU BIOS is set to "Silent" and not the "Performance" mode. The two GPUs are both Gigabyte branded but they are different model numbers due to supplier constraint at the time of purchase.

GPU 1 is a GV-R929XD5-4GB-B (BIOS 015.041.000.001)
GPU 2 is a GV-R929XD5-4GB-B-GA (BIOS 015.041.000.002)

I'm assuming that the "GA" part number is due to the fact it was bundled with Battlefield 4 and the GA standards for GAME rather then anything specific with the hardware on the card.

Hopefully this link will work showing the loop ===> https://onedrive.live.com/redir?resid=993E63923523C7E6%21178

I will post back a bit later once I've been able to test the cards individually and the results of those tests.
 
From your picture your top fans are blowing air into the case pulling air through the radiator, heating the air in the process you need to reverse that so all rad air is exhausting out of the case, then turn the rear exhaust around and make it a fresh air intake and your overall cooling will improve.

Turning the rear exhaust around into an intake will serve 2 purposes, #1 supply fresh air to the rads, #2 supply fresh cooler air to the M/B VRs.

The way it is setup you are blowing hot air onto your M/B VRs, I see no mention of what your M/B temperatures are getting to, but a gradual failure is common to M/Bs that the VRs are overheating, which usually occurs because of removing the stock air cooler that was cooling the M/B VRs, and replacing it with a water block that is no longer cooling the M/B VRs.

The M/B controls all communication between your hardware and if it is running hotter than it should be, it is just a matter of time before it begins to fail, when you remove the stock cooling fan you either replace the same volume of air the VRs were getting or water cool the VRs.

In the link below scroll down and read Question #8.

http://www.tomshardware.com/forum/id-2196038/air-cooling-water-cooling-things.html

So what are your M/B temperatures reaching when the gaming crashing occurs?

What power supply is powering all of this, I don't see any mention of that?
 


Thanks for the suggestions Ryan, I have made these changes to see if that makes any difference.

However, to answer your questions on the M/B temps, when it crashes the M/B temperature is generally at 47 - 50 degrees.

The power supply is a Silverstone 1000W Gold PSU.
 
The changing of the fans has made a little difference, but still has not resolved the issue. I have noticed that the rig will last longer before failure so that is an improvement. I am thinking of replacing the Corsair AF fans with some AP fans to see if that makes any more difference. It could simply be that there isn't enough cool/fresh air entering the case and it is the MOSFETs that are actually overheating and disconnecting the video card(s).

Clearly I haven't isolated the issue at this stage, but I am very appreciative of everyone's help and suggestions. I will post back once the new fans arrive and I have installed them to see if that makes any positive change.
 
Can you monitor ur vrm temps with GPU-Z?
try running ur memory at 1600mhz cl9. XMP doesnt always work.
and maybe try running ur screen at 1080p. 4k drivers are new.. maybe they dont work that good yet?

If this all fails I would try unplugging 1 of the gpu's and try again.. after running a game switch the gpu's and try again is they both fail you know its not the gpu's
 
Hi All,

Many apologies for going cold there for awhile, especially after asking for help. Essentially I had to put everything on hold whilst I dealt with some personal matters, but I'm sure no one wants to hear about that.

I finally managed to get the new Corsair AP fans and they have had some impact, but not a great deal in terms of overall temperatures.

In the end I actually decided to sell the 2 x R9 290X and go back to nVidia cards, as such I purchased 2 x EVGA GTX980 cards and am waiting for waterblocks and backplates to arrive in order to put them into my water loop.

I'll be interested to see if this has any difference whatsoever. However it was interesting to note that the new owner of the R9 290X cards has been using them without issue, though he did state that he is running dual loops, one for the CPU and one for the GPUs. He has indicated that the temps have been very stable, often staying below 46 degrees for the coolant and 61 degrees on the cards.

So it is certainly confusing why I was getting the issues with crashes with similar temps, yet the new owner hasn't had one drama.

Anyway, I'll post up the temps and figures once the new GPU waterblocks and backplates turn up and I get them into my loop.

Thanks for everyone's help.
 
I still reserve that this isn't watercooling related as 50-60's C is nowhere near hot enough to cause problems for a GPU. Stock cooling will often see hotter temps than this. There is either an issue with the ancillary components not being cooled on the GPU PCB correctly (although you did have full cover blocks) or the fact that the BIOS had been flashed or even something else altogether. Just seems this is nowhere near related to your watercooling loop based, simply due to temps you are reporting. Had you said your GPU cores were hotter than 90-100C, I would be more apt to agree with you.

Possible issues:
Bad BIOS/firmware flash
Bad RAM (download and run Memtest86+, you might need to burn to CD or run from USB stick)
PSU unstable or going bad (do you have another PSU? Try testing, if it is rated for enough watts)
Motherboard unstable or going bad (I'm not sold this is an issue, but could be if you had a power surge or lightning strike)
CPU (although at stock clocks, not likely as an OC could cause this and yours currently isn't, however, are any other BIOS settings amiss?)