EVGA Addresses GeForce GTX 1080 FTW PWM Temperature Problems

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
I'm still a big fan of EVGA, I have a SC 780 that's lasted me years. Its only retired now because I don't have anything I can productively use it in. I also had a OC 640 in my old quad2core. That being said, I'm disappointed in them. When the 1080's came out they and Gigabyte were my first two choices because they typically have the higher clocks for their highest tuned cards with Zotac and MSI being tied for third.

Even though they're kind of ugly I'm loving these extreme editions even more now.
 


Hi, see JayzTwoCents video on this. The cards at EVGA were hitting over 100C on an open test bench as in the video.
https://www.youtube.com/watch?v=URyG1OP8p8I
 


I do not completely agree with you, yes some poeple dont know anything about ait flow but this issue has notthing to do with airflow. See JayzTwoCents video on this. The cards at EVGA were hitting over 100C on an open test bench as in the video. https://www.youtube.com/watch?v=URyG1OP8p8I
 


Companies make mistakes, people make mistakes. See JayzTwoCents video on this and where he asks the CEO of EVGA why this happened . https://www.youtube.com/watch?v=URyG1OP8p8I
 


Thanks for the link. Even though I didn't get one of these, I believe its a great tutorial to help those who did. I also appreciate the honesty from the CEO. I also agree with you about people/companies making mistakes. It seems that 2016 was definitely a good study year for q/c on both sides of the fence.
 
I don't see the big surprise here. This problem goes back at least to the 5xx series. If you were paying attention back then, the EVGA 570 SC in particular, along with the reference cards, had a habit of frying the VRMs (heat failure) when you overclocked them more than moderately.

The 560 Ti series, (look up the old THG review) showed that the EVGA SC model had significantly less VRM phases (4+1 IIRC) than major competitor's (Asus and Giga had 7, MSI 6) AIB Gaming cards.

http://www.tomshardware.com/reviews/geforce-gtx-560-ti-roundup-asus-engtx560-graphics-card-overclocking,2858.html

The EVGA 970 SC, for those whose reviews included significantly more than re-worded press releases, and did tear down analyses showed that the habit of equipping the SC line with reference PCBs / VRNs was continuing as well as lacking adequate VRM / memory cooling.

http://www.bit-tech.net/hardware/graphics/2014/09/19/nvidia-geforce-gtx-970-review/3

"Examining the EVGA 970 SC] PCB reveals a 4+2 phase power design – four phases near the rear I/O for the GPU, and two in the bottom right corner for the memory. This is a slight upgrade from the 4+1 stock specification but unlike MSI (6+2 phases) and ASUS (6 phases) , EVGA does not use any specially crafted components.

On the other side of the GPU is a metal contact plate that partially cools two of the four memory chips on this side, leaving the other two exposed. It also cools the MOSFETs of the power phases serving the memory, but no thermal pads are used, so heat transfer is likely to be limited. "

The heat sink on the early 970 SC also had one of the three heap pipes "miss the GPU" and again there, EVGA's press release attempted to assure customers that they designed that way on purpose and that, it really wasn't a problem that 1/3 of the cooling system capacity was rendered ineffective. The review also mentioned that users could opt for the FTW if this SC design error was a concern.

"There are 3 heat pipes on the heatsink – 2 x 8mm major heat pipes to distribute the majority of the heat from the GPU to the heatsink, and a 3rd 6mm heatpipe is used as a supplement to the design **to reduce another 2-3 degrees Celsius**....

Due to the GPU small die size, **we intended for the GPU to contact two major heat pipes with direct touch** to make the best heat dissipation without any other material in between."

To be fair the FTW line, AFAIK, didn't suffer these problems.... up till now. However, with Boost 3, we have seen very little performance differences between the AIB cards and it would appear that EVGA hoped to reap some cost savings by leaving these cooling features out of the FTW line as well as the SC series.

For those users shy about tackling water cooling EVGA should be swapping the cards out for ones that include the proper thermal pads and nor asking users to take the coolers off and installing them by themselves.
 
Wow, if that's true, they should issue an immediate and unconditional recall.

I would hope Tom's would be looking into that, as well.

Are these things UL-certified, or anything similar?
 


A couple of fishy things about that link.

1. It's very clear that none of the power connectors were hooked up. A few people also commented on it there. Generally speaking I try to operate all my hardware by the book so I'm not sure what that would do. It's even more strange when you might be trying to document something for legal purposes.

2. There's a link to the original video. If you click on it a disclaimer comes up stating that it has been removed by the poster.
 
@inovermyhead2

Thank you for your response on this. I do not condone in any way flaming/trolling/or any kind of abusive/aggressive behavior towards posters. Its truly a shame this man had to go through that. No one should ever and definitely one of the ugliest sides of the internet.

I still, however, remain skeptical about this. I admit he could be totally innocent of wrong doing but the shortness of the video also leaves a lot of doubt about missing information on this event. In light of the many possibilities that can happen in a home build or modded system its too difficult for me to draw a credible conclusion on such a video. In 30 years of building/modding/repairing systems I have seen a lot of user issues, most were innocent mistakes while some not so much. I'm ultimately only interested in the truth here.

Got any more links for others? I'm curious, not sarcastic.
 
i really wish people would stop reporting this as a evga 10x0 FTW problem. it's an ACX 3.0 problem, not limited to FTW cards! Please fix the title.
 


That is odd since, well EVGA is normally the best NVidia OEM out there. However, their support is also normally top notch and I am sure they will make it right for those who need it.
 


Yes their support is definitely great. Just sucks that I have to use it due to a TWO design issues (vs just random defects).
 
I have never understood that "internet forum" based "best out here" conclusion as the EVGA SC line has consistently ...

a) been the bottom ranked card in most roundups of the "Big 4" (Asus, EVGA, Gigabyte, MSI) "Gaming Cards" ... and by that definition I mean the card geared towards the "game oriented consumer" as opposed to the likes of the Lightning, Classified, Matrix level cards)

b) used a reference PCB / VRM design whereas the other 3 almost always use a custom PCB design

c) Skimp on component quality .... almost always have the fewest power phases on the VRM and have been less than attentive to VRM / Cooling needs as is the case here.

As for support, they have some pretty good written policies, but their practice laves a bit to be desired and falls into the category of Comcast and DirectTV. In our last EVGA experience, an FTW card simply would not run at it's advertised speed. I set up intermediate points between the reference GPU speed and the FTW advertised speed in MSI Afterburner. It remained stable at 10% of the way towards the boost that the FTW was advertised to deliver... IOW, if reference was 1500 and FTW was 1550 (out of the box), best stable setting was 1505.

On the 1st call, they blamed the MoBo, blamed the RAM, blamed the PSU all of which were premium enthusiast components running at stock speed. That call ended with a "To Do List" and "call us back when your are done". The callback, went right back to square 1 .... we had to do all the things the tech wanted performed on the 1st call. After naming all the components, CPU speed, RAM, Timings, etc..., that was not good enough. I had to take screenshots and e-mail them.

The next call ... again back to square 1 ... they had no record of the call, the e-mail or any previous call to TS. After about the 5th call, they RMA'd the card. Same issue. This went on for 18 months, 20 support calls and 5 RMAs... all of which produced the same result. They still blamed all the other components as being at fault. Finally, I had a user build to do with two Asus cards in SLI and as he wasn't in a rush, I "borrowed" the cards and installed them in the system. Not only was I able to get them to run at their advertised speeds, I was able to OC them 28% ... in SLI. At that point, they finally gave a workable solution and sent me a reference card but, since it was 18 months later, it was a next generation card. So yes, they did provide a workable solution eventually, but requiring 20 calls (every one of which I had to start at square 1 and describe the entire history), 18 months of frustration and 5 RMAs was a bit much.


 
So as engineer and electronics designer myself I am confused by what EVGA is showing us. FIrst how in the world BIOS update with increased fan speed can help to cool down components by cooling down heatsink which has no physical contact with parts at risk? That is just compete BS. Also that metal panel is not a heatsink, it does not have enough air contact surface to dissipate heat properly! So BIOS update is not a solution at all. Next what is that they are trying to show us with their thermal images? They are all taken with metal panel on, how it the world they can deduct what is the temperature on the component itself? There is no way to do it unless they are looking at exposed PCB. If they used thermal couplers then why they display thermal camera image at all? It is useless. Next are these components military grade or consumer? If they are military then they are indeed specd to work up to 100 deg, but if they are consumer then it is just 90 deg. So even 85 deg is awfully close to the limit. It is MAX allowed temp and lifetime of these components will be very short if continuously operated at these temperatures. What I will do on my card: order these thermal pads ASAP or in my case i actually have plenty of them in my drawer, just have to confirm proper thickness. Install them, then attach some kind of heatsink directly to that metal panel by removing any coating and applying thermal compound between heatsink and metal panel. Still not the best but at least better than what EVGA is currently proposing. They are just delaying the inevitable and not solving the problem.
 
I'm a little confused, there is mention that the GTX1060 version 06G-P4-6368 is entitled for an update under these conditions yet there hasn't been any BIOS updates released yet. I have put out a request for the thermal pad mod kit though I'm worried if there is ever going to be a VBIOS update for it as well.
 
Morale of the story: when you use the PCB as a heatsink for hot devices, don't put a decorative shields which will hinder airflow across the PCB on top. Especially if you didn't bother putting thermal pads in-between.

No shields is much cheaper than shields + thermal transfer pads, and would likely perform just as good if not better.
 


This is the same solution that AMD used for the reference cards on the 480. Why ? It's cheap. But here, after you pay a premium for a non-reference card, we get the standard EVGA "It's not really a problem" answer. Let's look the last time that EVGA issues a statement in response to the last ACX cooling solution failure.

The way the EVGA GTX 970 ACX heat sink was designed is based on the GTX 970 wattage plus an additional 40% cooling headroom on top of it. There are 3 heat pipes on the heatsink – 2 x 8mm major heat pipes to distribute the majority of the heat from the GPU to the heatsink, and a 3rd 6mm heatpipe is used as a supplement to the design to reduce another 2-3 degrees Celsius. Also we would like to mention that the cooler passed NVIDIA Greenlight specifications.

Due to the GPU small die size, we intended for the GPU to contact two major heat pipes with direct touch to make the best heat dissipation without any other material in between.

We all know the Maxwell GPU is an extremely power efficient GPU, our SC cooler was overbuilt for it and allowed us to provide cards with boost clocks at over 1300MHz. EVGA also has an “FTW” version for those users who want even higher clocks.

Regarding fan noise, we understand that some have expressed concerns over the fan noise on the EVGA GTX 970 cards, this is not a fan noise issue but it is more of an aggressive fan curve set by the default BIOS. The fan curve can be easily adjusted in EVGA PrecisionX or any other overclocking software. Regardless, we have heard the concerns and will provide a BIOS update to reduce the fan noise during idle.

1. They said that the had a 3rd heap pipe to "supplement to the design to reduce another 2-3 degrees". So you paid extra to get the over clocked design and higher performance.

2. Then they actually said they designed it so only 2 of the heat pipes touched the GPU. Well than a) how can they actually deliver that extra 2-3 C is the danged heat pipe misses the thing it's trying to cool. They say it was designed for 40% extra headroom overt the spec'd 970 wattage. That wouldn't include the factory or any manual OC. So if it was designed for 140% of the stock wattage, and then you lose 1/3 of the cooling capability, that only gives you 93% of the stock cooling need. Add 20% for overclocking and you are 27% short. Is this a case of:

a) EVGAs brilliant engineering department designed a cooler that they charged a premium for, but then didn't actually check to see if it ft so that 1/3 of it didn't miss the thing they are trying to cool ?

b) The bean counters just said, "use the design from the 7xx series, it will be fine".

c) It's the spin masters at work (these guys have a future in politics) who actually had the bawlz to say "we intended it" to be deficient.

d) all of the above

3. The 970 SC card had a weak VRM. They strayed from the usual procedure of using a reference PCB (they couldn't ... was no real 970 reference design) but the stock specification for the 970 called for the a 4 phase VRM. MSI went with 6+2 and Asus went with 6. Asus and MSI used upgraded components / EVGA did not.

"Examining the PCB reveals a 4+2 phase power design – four phases near the rear I/O for the GPU, and two in the bottom right corner for the memory. This is a slight upgrade from the 4+1 stock specification but unlike MSI and ASUS, EVGA does not use any specially crafted components."

4. As in the past, EVGA skipped out on cooling the PCB components as was described in the previous post


With the 970, despite saying everything was fine, they release the SSC model soon after which fixed the heat pipe deficiency, but left the same weak VRM and again didn't address the cooling deficiencies of the PCB components.

And here's where the consumer gets to blame themselves. Despite these obvious deficiencies and the weaker performance that comes with it, many ... whether to to brand loyalty or forum recommendations ... continue to buy the SC cards. And now with the 10xx series where the design has gotten even more efficient, they figured "Ahh... let's cut some costs and eliminate these improvements on the FTW line too". Peeps are paying attention but not as many as you might think given the long history of negative SC reviews .... The MSI 7970 Gaming has 624 reviews on newegg, Asus had 476 and EVGA has

And yes, it's hard to imagine that anyone would go for this "BIOS Fix". How can a new BIOS **fix** a deficient design. To reduce the temps of VRM and other components, you'd want to directly address the problem by making available the design measures (thermal pads, heat sinks, upgraded PCB componentry) that competing vendors provide. Again, this problem has been reported on previous SC designs

"On the other side of the GPU is a metal contact plate that partially cools two of the four memory chips on this side, leaving the other two exposed. It also cools the MOSFETs of the power phases serving the memory, but no thermal pads are used, so heat transfer is likely to be limited.

A new BIOS can do one of two things:

a) Reducing the clock speeds will prove less taxing on the affected components thereby cutting temps

b) Increasing fan speeds will push more air thru the shroud and this will have a significant effect on the GPU as it is aided by thermal interface material and heat sink with large surface area to dissipate the heat. The affected components are not so equipped and therefore the effect will be minimal.

Both these "solutions" are unacceptable as the consumer isn't getting either the graphics performance or the noise performance "as advertised".

The only viable solution is to install the componentry that Asus, MSI, Gigabyte, etc provide on similarly priced gaming oriented cards.





It doesn't affect the warranty so no real risk.


Morale of the story: when you use the PCB as a heatsink for hot devices, don't put a decorative shields which will hinder airflow across the PCB on top. Especially if you didn't bother putting thermal pads in-between.

If by decorative shields, you mean aftermarket backplates, especially the plastic ones, then yes. But when done right a backplate is an inherent part of card cooling design.

While high end GPUs generally produce twice the heat of our CPUs, water cooled GPUs run much cooler ... usually in high 30s or low 40s C. The reason is the large thermal mass and surface area of the water block. The large surface area of the metal backplate, coupled by thermal pads AND thermal interface material plays a large factor in PCB cooling.

Here's a pic of the installation instructions for water block. The backplate instructions also include the same thermal pad (2) to transfer the heat from the PCB to the backplate. This allows the entire surface area of the backplate (40 square inches) to move the heat instead of just the small 4 square inch surface of the chips themselves.

Like a CPU cooler..... the contact area between the CPU and and the mounting surface is small but you need the large surface area of the heat sink fins for the unit to be effective because while metal to CPU heat transfer is very efficient, the het transfer from metal surface toi air is very inefficient.

1830159








 

Which of the following has the lowest thermal resistance?
- the chip bonding to its package heat pad soldered to the PCB's power/ground plane, die-metal-metal thermal path
- the chip's plastic/epoxy package, thermal pad and shield, die-plastic-thermal pad-shield thermal path

My bet is on the solder pads and that the boards would fare better without shields and some forced convection. The shield may add "thermal mass" but it has a much higher thermal path impedance than the PCB and the PCB still has a significant amount of specific heat thanks to having multiple power planes under the FETs.

In high performance, high reliability applications, the heatsink for surface-mount components is either attached to the other side of the PCB with thermal vias to couple the heat through or soldered next to the part to more efficiently couple heat from the part's thermal pad.
 
No need to bet what might be happening when you can just read the temps with thermometer from ya tool box or utility on your computer.

Let's try approaching the question from this perspective. Again, how is it possible for a water cooling system manage to keep a GPU cooler than say an i7 CPU ? The die sizes are not that different . On our test system we use 6 temp sensors, infrared thermometers and a fog machine to do thermal and air flow testing. Here's what we have observed. Measurements made with infrared thermometer and HWiNFO 64

a) The GFX cards produces much more heat (295 watts each) / the CPU about 130.... advantage => CPU by 2 to 1
b) The GFX cards receive half the water flow of the CPU as the 2 cards are piped in parallel.... advantage CPU by 4 to 1.

So how do the GFX cards overcome that 8 to 1 disadvantage ? How is it that 1.25 gpm acting on 130 watts can't keep up with 0.62 gpm acting on 590 watts

Despite those disadvantages, the GPU manages to remain at 39C, the VRMs are at 52C and yet the CPU is way up in the low - mid 70s ? Any thermal heat transfer must eventually get "to air" and the large metal backplate provides a greater surface area for the metal to air heat exchange.

If, as you say, efficiently heat transfers from the chip to the PCB, then what ? Since the chip is exposed to air with no backplate any additional cooling is limited to the minute surface area of those VRMs where they are exposed on the back of the board.... the backplate quickly expands that VRM cooling surface area to 45 sq in.

The same thing applies to a CPU air cooler... what transfer co-efficient is higher ?

- Internal CPU to heat to IHS
- IHS to copper heat plate on cooler
- Copper plate to heat pipes
- Heat pipes to aluminum fins
- Alum fins to air

With the 1st 4, it doesn't matter which is higher and which is lower because as a chain is only as strong as it's weakest link, it doesn't matter what is the 2nd or 3rd weakest ... performance is limited by the final least efficient heat transfer, that being from aluminum fins to air. Decrease the surface area of the fins and performance will decrease regardless of those other coefficients. Same thing with the VRMs

Also... if the thermal transfer coefficient of the PCB material was better than metal, gotta ask the obvious question. Why don't we see air coolers and water blocks being made of PCB materials ? Better yet, why aren't PCB mounted heat sinks made of PCB material instead of highly conductive metals.

Now if you were to ask "Does the overall temperature of the PCB (say averaging 16 points around the board) increase or decrease with a backplate ? Can't say; I have never bothered to measure it since the results have 0 significance, but common sense says slightly warmer. However, I don't really care about the overall average temperature of the PCB as it will have no impact on card performance... the articles on the failing cards certainly don't list this as an issue.

Looking at this pic, why did they add a heat sink / thermal pad to the VRMs and not the entire PCB ? Cause the VRMs need to be cooled, the PCB is of no concern
normal_ASUSGTX760StrikerPlantinum_067.jpg


So why not put a heat sink on instead of a backplate ? Certainly a heatsink w/ fins would be more effective, especially with a fan.

a) On many MoBos, it would eliminate SLI as an option as the taller fins would hit the top card
b) Another purpose of backplates is to keep dust off the PCB surface, dust and humidity can be a bad thing for contacts that are close to one another.
c) Water cooled systems benefit from backpates, as small leaks found during installation, testing and even later usage oft get "saved' by the backpate

But getting back to the articles, I don't see as any of them covering this issue point to the temperature of the PCB as being **the** problem,. No, the only problem is the temperature of the VRMs. So, if we focus on **the problem**, it's both obvious and inarguable that a heat sink does a better job cooling VRMs (or any chip) than air does. Use a heat sink when it's needed ... and don't use a heat sink when it's not needed ... unless ofc, you just want cooler temps. And the bigger the heat sink, all other things being equal, the cooler the chips will be.

Certainly, I agree, EVGA created this problem by using a backplate that was not thermally coupled to the VRM. That is certainly worse the no backplate at all which is why the non-ACX models w/ no backplate are not on the list.

Yes you could improve things @ the VRM by blowing air across the PCB but you can improve **VRM temps** more by blowing air across the heat sink which has that 45 sq in of surface area. The VRMs are exposed on both sides of the board. So if you have VRM temp problems and you've done what ya can on the top of the PCB, best / easiest way to attack the problem is from the back side of the it. The relative question therefore is ...

What will cool the VRMs better ?

a) exposing the VRMs (whose exposed surface area on the back is probably 1 - 2 sq in) to air
b) affixing a heat sink to the VRMs (w/ 45 sq in of surface area) and exposing all that to air

If it's not b) then MoBo manufacturers and other electronics vendors have it all wrong.

I haven't done detailed thermal testing on Nvidia cards since the 7xx series. I'd like to do so on the next 10xx series build but would prefer to test on a SLI build (one w/ backplate on and one off). But w/ currennt CPU and driver limitations, only recommending SLI builds for 4K and can't recommend 4K until we break 120 Hz w/ Display Port 1.4 monitors so don't expect an opportunity anytime soon. If one comes up will come back and post the results.


 
A little something I noticed in general card design. It seems nearly everyone uses a standard fan, but the way they are mounted seems to make them work like a blower. Using a fan like a blower is very inefficient.
 
True. I had similar thoughts. It's one thing to mount a fan above a heatsink that it can blow through. But, I've wondered about the way fans are being used to essentially push air at a wall. I think you're right that there must be some losses, here.

Maybe, with fewer airflow constraints, fans could be a bit more efficient & therefore run at lower RPMs, for the amount of cooling they provide. With HBM, I think the natural thing for board designers to do is reduce board size. But, I wonder if some might put holes in the board, so you can actually have air flowing through the card and exiting out the other side.
 
Status
Not open for further replies.