The Skylake-X Mess Explored: Thermal Paste And Runaway Power

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

bit_user

Polypheme
Ambassador

I know I'm an outlier, but I still like down-draft air coolers (not that it'd be an option for this beast). With a 120 or 140 mm fan, you get a fair amount of VRM and DIMM cooling.
 

Adroid

Distinguished
I don't know why I can't quote you Elmojomikeo, but how "Us overclockers" think has nothing to do with it. If you are buying a chip for $1000, it shouldn't downclock itself while running at stock speeds.

And I don't agree with the posts suggesting that the "thermal limits" are to blame.

What is to blame is trying to continue to downsize the footprint of the processor. The 2500K didn't have these types of unacceptable thermal issues. At this point I think Intel has hit a wall not only in their manufacturing capabilities, but also at some point there is a mental block that the bragging rights of reducing the manufacturing process size, while simultaneously increasing the clock speed and number of cores with a relatively similar structure of CPU design as several years ago isn't working.

Maybe Intel needs to swallow their pride and increase the CPU footprint, and possibly increase the thermal envelope, they could make processors that would take the industry to the next level.. Laptop CPUs are fast enough for 99% of the population three generations ago.

If we wanted to save energy we could turn off a couple light bulbs. The power draw is so negligible does it really make that much of a difference? How about fixing the thermal issues now and reducing the footprint AFTER we have a working solution?

If the designers focused on making the CPUs transfer heat as a primary goal, and footprint sizes and efficiency secondary, we may see some real progress.
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Adroid your right of course. I think it is a little bit about state of mind. Maybe it's because we are the F1 team processor wise. Only overclockers can fill the corridors of Intel with laughter. I don't think Intel will be giving up on its F1 team just yet. Because every now and again we do something that even surprises them. So you want me to drive the car at that wall at 50mph. No problem.

If you have followed me so far I am saying once you start feeding big wattage at a chip at around levels of 90C most coolers can not get back on top, and temperatures run away uncontrollably.

The pros and cons between air and liquid cooling has already be explained very well above here.

Neither seem to fit the bill any more. At best liquid cooling works great in cars because once a car engine is at operating temperature the flow of cooling is self regulating. The mechanical nature of liquid flow, cam belt, is regulated by the speed of the engine in terms of revs. It is a totally integrated system.

What is required for a processor that can have major temperature fluctuations over a short period of time is not air or liquid as both are not anywhere near fast enough to react. Maybe a hybrid system of liquid and the old frozen electro copper tube trick. Anything just get us away from the tractor cooling systems,
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Out of interest I did a bit of research. Found some cool stuff on the Peltier effect. So all that generated heat can be used to cool the liquid on the side of the plate. Ok so where you put this is open to debate. This is definitely not tractor technology and so efficient.

https://www.youtube.com/watch?v=D3llYyzVxR0

I hope cooler manufacturers are aware of this and can't use it for some good reason?
 

bit_user

Polypheme
Ambassador

Peltier devices have certainly been tried, in CPU cooling. The waste heat of the peltier device, itself, keeps these from being a net positive.

I feel bad for the guys who tried this, but I'm glad they did (and that Tom's reviewed it):

http://www.tomshardware.com/reviews/phononic-hex-2-thermoelectric-cpu-cooler,4665.html
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
I didn't see that review. Well that's version 1 of a new technology. They haven't implemented it how I was thinking of using it. You can guess that using it directly mounted on the cpu is bad for several reasons. Of course it is the best place to get the maximum Peltier effect. But as the review and you say the effect is some what lost when the heat becomes stabilized against the cold side of the plate. I was thinking of my joke 40ft of copper pipe around the giant fan post above. Without the 40ft pipe of course. He is actively cooling by using a bucket of ice.

My thoughts are to use the effect in the radiator of a liquid cooled system. If used at the liquid input stage (Hot) of the radiator it could be used to actively cool the liquid as it enters the radiator via a short copper pipe that has peltier devices around it. I fully expect the Peltier effect to be less efficient in the radiator, however, they are potentially low cost, so use as many as it takes. As a result you also get a chance to power something in the radiator also that is directly related to temperature. Maybe another electronic cooling device based inside the radiator also. The reason I would do it in the radiator is the new risk of condensation. Once you start actively cooling it will of course introduce the next phase of energy exchange. But problems with condensation in one location in a build are more manageable than runaway heat issues everywhere else.

Just a couple of hours browsing has resulted in a few things that could seriously improve computer cooling. It's a difficult environment to cool stuff down in sure. I can't see any good reason out there as to why coolers have not kept pace with processor development. The amount of untapped energy escaping as heat that can be used to be directed back at cooling is nothing short of crazy.

The reviewer is right we are using cutting edge processors standing on technology over a hundred years old. It's embarrassing for the industry that we haven't done the ground work before we reached this stage.
 

Crashman

Polypheme
Former Staff
Here's one from 2007:
http://www.tomshardware.com/reviews/vigors-monsoon-ii-tec-cpu-cooler,1565-2.html
I reviewed one prior to that as well.

The current problem is that the heat isn't getting to the CPU cooler efficiently. The only way a Peltier could assist is to cool the heat spreader below ambient, and then you have the condensation problem.

You could have a diamond cooler the size of the Titanic with some theoretical perfect thermal compound keeping the surface of the heat spreader at 0° above ambient, and the CPU would still overheat beneath the heat spreader.

The solution, then, is to fix the CPU's thermal conductivity problem.

 
At this point, I wouldn't be surprised if Intel decided to "make life easier for partners" by applying a conformal coating to the sockets, thereby "making sub-ambient cooling consumer-friendly".

Seriously, why is Intel trying to dump their own engineering problems on it's partners and system builders?
 

Crashman

Polypheme
Former Staff

I can't answer for Intel, but it appears that they looked at the variety of systems they were expected to fill, decided on the 140W TDP to fit those enclosures, decided it would be acceptable to cut performance to achieve that 140W TDP, and then said to each other "the crap we're already using is good enough if we hold the TDP down to 140W".
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Only Intel can sort out that problem. The way I see it the lid is the line between third party and Intel. But this is such new territory what do you do in Intel's position. Remain chained by the almost complete lack of engineering imagination exhibited by the research done and implemented.

We should be at the stage now where we can regulate the temperature of the liquid and deliver it smoothly to the processor as it requires it. So control is key and I can't see an alternative to some form of active cooling. I looked at the Peltier effect because it has the capability to cool and heat.
So attached to copper studs embedded into the radiator into two separate chambers and different temperatures. You could have a lower tank for pre chilled fluid ready to mix with the warmer. You have the beginnings of delivering cooling on demand and some where around temperatures. That will avoid the risk of thermal shock to the materials in the processor.

Yes the condensation issue if based on the the CPU itself is unacceptable. But if you look at how these devices are used in coolers they require a very good heat block across the join and thermal insulation around the Peltier device itself. All really low cost stuff.
With the right humidity sensors a lot of condensation could be managed out by adjusting the temperature of fluid in the radiator. However, I can imagine a good chance of it occurring under processor load as you would be trying to make colder liquid more frequently. The only answer I have is that future case design will have to change.

Even if you just pass fluid through a water block with a Peltier device attached it could be mounted externally. You are into active cooling.

We should have been doing this stuff years ago. Bigger radiators with bigger fans flashing disco lights and rev counters all are good for sales. But where is the Science.
The question now is
 

Crashman

Polypheme
Former Staff
The only way the TEC element will benefit the CPU is if it's on the CPU. Remember what I said about a diamond heat sink the size of the Titanic.

 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Sorry about that, The question now is, as alluded to above, who will lead the standardisation of this part of the PC system. You would think Intel, AMD and NVIDIA all have a vested interest in setting some standards or at least minimum and recommended levels of cooling.

If they were to start imposing some levels of standard it might make the cooler manufactures start spending some money into improving the technologies.
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Oh! I am obviously missing something here. I am not thinking of using the TEC element on the cpu at all. The fan driven heat sink can be located just under the radiator where ever you want it. I am not trying to do all this on the CPU as the previous ventures. Using the heat generated by the processor into the liquid going over it and then into the radiator. The thermal difference will steadily rise until the TEC device kicks in cooling the fluid. Think of Corsair H100 type of device but with active cooling. The magic happens in the radiator and not on top of CPU.
 

Crashman

Polypheme
Former Staff
Right, I got that. But my statement about a diamond heat sink the size of the Titanic still applies. The thing is, we already have big enough coolers in the retail industry, but the heat is still trapped inside.

So, say Intel decided "We're going to need to increase the TDP to 220W, to maintain the expected level of performance". Well that's just fine and dandy, retail system builders would be required to use a big cooler like the retail builders use. And...the heat would still be trapped under the CPUs heat spreader. So maybe Intel would have used a different material to conduct it. And...TEC still doesn't do anything.

Now, if Intel were to put a TEC between the CPU core and heat spreader, that might do something. But I don't think you could get that much cooling power into such a small space, and you wouldn't have enough metal to act as a thermal buffer between the TEC's cycles. TEC goes "pop" and turns into a resistor, then you have a hotplate on top of your CPU core.
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Ah I see where your going now. Undoubtedly some change has to be made because this isn't the first time that I have heard this too. The thermal gap has been mentioned over years and led to the art of de-lidding. Again these are decisions that need to made and agreed open. System builders must be scratching their heads saying so what now.

Some sort of standardisation has to be agreed. Do intel fill it and totally rely on the integrity and capability of the cooler. Fill it with as you say an active cooler that kills on failure. What did Intel use to cool it in house?. Was what they used reasonable and doable in the real world. Why haven't they addressed the thermal gap?

Whats the right way to go?

In fairness what they do owe us is a clearer roadmap as to what to expect in the future in terms power consumption and heat. So they have to take some blame in that they haven't being doing that very effectively.

I hope that Skylake X owners are not sat around in four years time, thinking I wonder how fast I could have run this on a daily basis. if only I could control the temperature.
Don't do what I did and assume it will all be sorted within a year.

Because I have been running i7 3970X's for four years and have never being able to cut them lose. Reviews I have seen all stop sensibly when the temperature starts to climb rapidly and you have to stop it. I haven't seen much on my combination using the Sabertooth X79. So who knows!

We need to move together in the same direction.
 
A TEC would be unable to move the amount of heat we're talking about. The only way to move that much heat from such a tiny area is to either turn the CPU into a vapor chamber or use something with better thermal conductivity. Honestly, it sounds like the first option may be Intel's best bet if solder causes reliability issues.
 

bit_user

Polypheme
Ambassador

Yes. That point was made clear in the article (did you read it?) - that even the best coolers are constrained by the conductivity of the material between the die and heat spreader.


The only thing remotely new about it is the amount of heat emitted per mm^2 of die area. But not really, as I think this thing is made on the same process they've been using since Broadwell (including Broadwell-EP).

Intel has used solder before, and others have made chips dissipating even more heat. For one: GPUs Secondly, as I pointed out & aldaia clarified, IBM has dissipated up to a whopping 1.8 kW from a single package:

http://www.tomshardware.com/forum/id-3445596/intel-core-7900x-review-meet-skylake/page-3.html#19850379

So, as Nerd and I were saying, the best option would be to build a vapor chamber inside the package.

http://www.tomshardware.com/forum/id-3464475/skylake-mess-explored-thermal-paste-runaway-power/page-2.html#19924391

Perhaps you could use lithography to fashion capillaries right into the upper surface of the die!

But Crashman has a valid point: Intel is saying their thermal solution is good for 140 W. They're not promising anything beyond that. Maybe such a stance is good enough for their commercial customers, but the mistake they made is that they're selling this as an enthusiasts' part, with a premium price tag. That market expects to be able to push things a bit further.

So, do we know if the new "Precious Metals" Xeons use solder under the IHS?
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
I agree with just about all you have said. Certainly the gap is an issue that will have to be addressed. It can almost be used as a yard stick to see how Intel deal with it or not. If they don't we can assume things will not be changing soon.

As for this isn't new technology, well where are those devices that can deal with it. I mean a smart proper closed loop systems with an ADC reading the input and almost immediately responding down on the metal.

There are purist who wouldn't even call these closed loop systems. Because they have to rely on feed back from an external source the operating system. If like me you regularly get warnings like 12v = 0v and 3.3v = 24v. If some of my firmware did that I would be run out of town. That is just and engineer who hasn't an idea how to filter ADC readings. Why would s/he, s/he was probably a software engineer feed raw ADC data and didn't know how to process it. A Firmware Engineer will know and a Softie is less likely to.

The truth is cooling systems were plodding along getting creative on the user interface. The integration of nice to have things such as LED lighting, RAM setup, Power Supply monitoring. Oh yes and somewhere in there we will decide to set the cooler fans. Again as a Firmware Engineer experience tells me NO! All this stuff living in the same memory space doing monitoring and adjustments to hardware. Means that the software that forms an important part of the closed loop system is at risk from things that don't belong to it. It's just good old fashioned best practice. I can't help thinking that at some time engineers must have expressed their concern. However, a management decision is made to please the customers.


Yes the Vapour chamber and all its size and complexity. Would be nice. But so difficult to install compared to what we have now. Four years ago I don't think as many options were there then. Ok I have been a bit tomorrow's world about the Peltier stuff. Even that is based on work by Seebeck in the late 1800's. What can be new is how you can combine old technologies given small electronics improvements over time. So you have to constantly go back and check if things have moved on.

I thought that rating of 140W is at base clock. I know from my Intel reference manuals that my TDP is 150W at base clock. Which you have do something weird to change.

The IBM method seems by far the best thought out. Fear not on the question of solder under the IHS. Someone somewhere will be doing it now. Give it about a fortnight for its first LN2 extreme overclocking and we will know all. If it hasn't already happened. :)


 

aldaia

Distinguished
Oct 22, 2010
533
18
18,995


If I may correct myself, 1.8 KW was nominal TDP for zEC12, however peak power exceeded that value. If anyone has access to IEEE Xplore Digital Library the following paper is worth a read:
G. Goth, R. Mullady, R. Zoodsma, A.C. VanDeventer, D. Porter, P. Kelly, "An Overview of the IBM zEnterprise EC12 Processor Cooling System", Proceedings of IEEE ITherm, 2014.
A few quotes from that paper:
In rare applications, the power in these MCMs can exceed 2000W, well beyond air cooling capability. This paper describes a new cooling methodology IBM employs in zEC12 to cool its processor MCMs.
The MCM uses a 96 mm square glass ceramic substrate to interconnect six processor and two system controller (SC) chips.
While MCM powers are typically between 1200 and 1500 watts, in an absolute extreme workload application and maximum ambient a single processor chip may generate 400W and MCM power may reach 2150W
The heat is conducted through the silicon, and then crosses a Thermal Interface Material (called TIM1) between the silicon and lid or hat of the MCM.
In zEC12, this TIM1 material was changed to a thinner gel compound with no thermal degradation over system life.
Lastly, the TIM1 thermal resistance is critical. In zEC12, the TIM1 resistance improved from z196 sufficient to offset the small 2o C to 4oC hat temperature rise created by replacing refrigeration with the air to water heat exchanger.
TIM1 (see bolded sentece) is precisely where Intel is behaving miserly, just to save a few cents.
 


They say the TIM is good for 140 watts. That's fine.

Why do they also have features that can draw over 200 watts at stock settings combined with those thermal limits, though?
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Bit_User:
Perhaps you could use lithography to fashion capillaries right into the upper surface of the die!

That started me thinking. Then whilst trying to work out why a PIO pin on an ARM processor was pulled down it struck me. We already have a type of capillary into the the package. Lots of them in fact. The processor pins. Are we thinking about his the wrong way around. Cool both sides of the package particularly the pins! A socket cooler it's perfect Intel and the motherboard manufactures problem,

However, a Peltier device on the rear of the socket assembly. Joking Ha Ha... Maybe not a Peltier, but you can see where I am going with this. Yet another option.

I think on the watts thing you are getting confused between 140W at base clock and the recommended burst max which Intel have set at 300W on this package. How you interpret this is a little vague. The generally accepted interpretation is stay under this and you should be ok. I think it is best managed by a safety margin as well. On this 300W package I would set the Short Duration Power Limit to 270W and then increase the burst time (Long Duration Maintained) to allow longer intense processing times. I can do this because I have left the headroom at 30W of which I have never seen more than 10W used.
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Ironic if not the perfect proof about what I have been saying. Corsairs latest Link software is broken. Probably something in the lighting effects they are having trouble with. Meanwhile I am sat here with free running fans unable to push anything because my closed loop system is no more, because of a disco lighting issues. Previous installer can't fix it. What is this crap doing in my coolers closed loop. Its insane.
 

ElMojoMikeo

Prominent
Mar 26, 2017
38
0
530
Yes. In engineering speak a closed loop is just that. A closed loop in electronics terms means it reacts as fast as possible to small changes in input. you don't go wonder of to check the Football results and come back to find you have lost a few readings. Sometimes I think dealing with Software Engineers is like herding Cats.
 
Status
Not open for further replies.