Intel Core i9-7900X Review: Meet Skylake-X

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


The real question is how we're defining feasible. If the 7900x qualifies as feasible to Intel, I have to question what brought them to that conclusion.
 
I'm still thinking this thermal problem can't be so new or unique. IBM Power 4 burned a whopping 500 W:

http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/power4/

And I'm sure I remember reading about some IBM Power MCM (perhaps quad-die) that dissipated multiple kW. I think it was intended for use in mainframes. Perhaps Power 7?

Anyway, this is hands-down the coolest looking MCM I've seen:

https://en.wikipedia.org/wiki/IBM_POWER_microprocessors#/media/File😛OWER5-MCM.jpg
 

As far as I can tell, the 500W Power4 CPU you're talking about is a quad-die MCM, based on the regular (single die) Power4 specs here: https://en.wikipedia.org/wiki/POWER4

Which would help explain how they could cool such a beast, given that the heat generating cores are spread across the package and overall there's a large die surface area for heat conduction.
 


IBM, but not Power. Power 7 MCM was intended for HPC and disipated 800W (hot hot, but not yet in the KW league). The IBM z196 (2010) was the one intended for mainframes, backward compatible all the way back to IBM360 (1965). The z196 MCM disipated 1800W. The more recent zEC12 (2012) also uses MCM modules rated at 1800W
 


The FX 9590 burned a ton of power as well. It is not new at all. So did the first Pentium 4 EEs.

I am sure they will find ways to handle it in later revisions. I am actually a bit surprised as Intels 14nm seemed to be doing pretty well for their older CPUs.

I wonder if the Mesh is causing it to heat up, especially with that much bandwidth being pumped through it.
 


It's not the mesh. It's the thermal interface material. The thermal resistance between the CPU die and the heat spreader is extremely high for a CPU of this TDP. It's nothing as complicated as "some architectural component is responsible". It's just a terrible design. They would need to make quite a few bad engineering decisions to back themselves into this corner.
 

I think we all understand that much.

The discussion had already moved on to whether solder would've been a viable option (some have speculated that the mechanical stress of repeated thermal cycling would be too great), and considering what approaches were used by other high-power silicon.
 


I don't think the mesh is causing heat but I am talking power draw. It is not that different besides the mesh and the cache change yet it draws quite a bit more power and puts out way more heat than the Broadwell-E equivalent. The TIM also would not cause higher power draw but might cause the higher temps, although everything I read shows it to be a difference of a few c which is pointless to anyone except the extreme OCers.

Honestly I doubt OCing is even needed anymore. If apps can become highly parallel and utilize the threads efficiently more cores would finally make sense for people.
 


As far as I know mechanical stress of repeated thermal cycling can be an issue for small dies, however 7900x is 300++ mm2

The rule of thumb is to use thermal paste for <100W and solder beyond 100W. However IBM developed "advanced" thermal paste for the 300+ W z196 dies. For the 400+ W zEC12 dies IBM used a gel compound.

The heat is conducted through the silicon, and then crosses a Thermal Interface Material (called TIM1) between the silicon and lid. In zEC12, this TIM1 material was changed to a thinner gel compound with no thermal degradation over system life

If IBM could dissipate 300+ W 10 years ago so can Intel today. Also soldering shouldn't be an issue with a 300+ mm2 die. My opinion is that the only reason why Intel is not using solder or a more adequate thermal paste is to save a few pennies, which is quite lame on a $1000 CPU

 


There's another potential reason: the testing and verification phases are longer when they use solder, as solder can introduce additional stress to the CPU die. Paste doesn't introduce those forces, and can therefore skip a good deal of testing.

The remaining tests should have told Intel that this was a bad idea, though.
 

Why would mechanical stress be less for large dies? I would have guessed that there would be possibly more stress with a larger die. Because I'd imagine a larger surface area would mean more potential for temperature deltas across the area -> different areas experiencing thermal expansion to different degrees -> mechanical stress. But this is just back of a napkin reasoning, maybe I'm out to lunch on this.
 


I have no idea, but people that know a lot about the issue say so:
"Void and micro crack occurrence is mainly affected by the solder area – thus the DIE size. Small DIE size (below 130 mm²) e. g. Skylake will facilitate the void occurence significantly. However, CPUs with a medium to large DIE size (above 270 mm²) e. g. Haswell-E show no significant increase of micro cracking during thermal cycling."
 

Where is that quote from?
 
I just checked the spec sheet for Skylake-X. Apparently, the maximum sustained storage temperature for these CPUs is only 40C. It can deal with 85C for up to 72 hours.

Considering that storage specs are almost invariably higher than operating specs, this is more than slightly concerning. If the specs are accurate, this could be the least reliable CPU from Intel since the P4. That spec implies worse reliability than the C2000.

What was Intel thinking?
 


http://overclocking.guide/the-truth-about-cpu-soldering/
Have in mind that what they call "intensive thermal cycling" is under extreme overclocking with liquid nitrogen and their thermal cycles are -55 °C to 125 °C. Regular users, have much narrower thermal cycles, so they should be much less concerned about those potential issues. Sandy Bridge was a soldered relatively small die and i never heard about anyone developing any of those issues.

Figure 12 in particular is self explanatory http://overclocking.guide/wp-content/uploads/2015/11/Rjc_cycling.png
Small DIE thermal resistance is significantly affected while there is no noticeable change on medium size DIE.
 

Are you looking at the "Tsustainedstorage" value in the datasheet? Because it's the same value (+40C) for the last several generations of Intel CPUs.
 


That's correct. The difference is that these operate at significantly higher sustained temperatures than any recent generation. Specifically, these CPUs struggle to stay under 40C at even the least intensive loads. Older generations didn't normally have that issue. It's more likely the CPU will go for extended periods (>72 hours) without dropping below 40C. If that keeps up, the spec sheet seems to state that reliability would be compromised.
 

I'm guessing it's something like what they happened to test. I'm sure you're reading too much into that number.


That's nuts. Datacenters run their CPUs hot, in order to save on cooling costs. Unless this is a consumer-only part, with no Xeon version planned, it can surely handle higher sustained temps - probably well into the 70's.

https://forums.anandtech.com/threads/what-temperatures-do-datacenter-cpus-run-at.2487803/
 

I can't think of any CPU that will reliably stay <40C under load, unless you're using a hugely overkill cooling solution (read: almost nobody). Depending on the ambient temp, it may very well idle at 40 or higher.

I can only assume that we must be interpreting the Tsustainedstorage value incorrectly, or Intel specced it ridiculously conservatively. Because if any Intel CPU from the last several generations is prone to failure if it's at 40+C for > 72 hours, there would have been failures all over the place and I think we would have heard something.
 


If that's the case, why spec the storage temp so low?
 

Good question. Obviously, we don't know. Maybe there's a good reason for it, but maybe not. Like, I once heard that the shelf life listed on prescription drugs (or was it OTC drugs?) wasn't based on models of the active molecules' degradation, but rather just based on what they actually tested.

Could it have anything to do with something weird, like electrostatic charge buildup? There could be effects that occur in storage that you wouldn't see in actual operation or even when it's plugged into a machine that's powered down.

I think we at least have consensus that it should not be taken a long-term operational limit.
 
ok the article said they were doing something different with pricing ? I don't see it. looks like the same old shit from intel. higher and higher and higher pricing. the trade off of bang per buck really goes over their heads every time. not a single processor they released for over 300$ has been worth the 500, 600, 1000 ect ect amount it cost. This is why I hate intel. only 10% of the population can afford to throw down 1000+ dollars on a cpu, and only 1% of software in the world will ever see an actual benefit from using those 1000+ priced cpus. as for that 1)% that can afford it; sure if you can afford one go for it, but don't brag about it folks unless you just like being an uppity rich snob.
 


Same could be said about people who can't afford it, griping about those who can, get over it. If you have been around PC's 20+ years this pricing isn't even that high. I recall spending 3K for a middle of the road non gaming PC back in 1995(that was with monitor, keyboard, and mouse) and if you adjust for inflation that would be right around $4,800 bucks in today money. Also, if you wanted high end its was more like 4-5K back then.

The pricing did change some, the 8 core went down from 1K to $600. Also, Intel didn't adjust much because threadripper inst out, once it lands expect pricing adjustments especially on the 10+ core parts as Intel's frequency advantage will disappear at the higher core counts.
 


As someone who works with microcontrollers, that just seems lazy to me. This is the first datasheet that specs storage temps lower than safe operating temperatures that I've run across. Either the datasheet is wrong (and given the number of errata Intel has in the correction sheets, that wouldn't surprise me), or they have a seriously unique design. And by "unique", I do not mean "good". If it is wrong, it's anyone's guess as to why. It's interesting that they went to the trouble to spec out the 72 hour limit and absolute maximum values, though. Those look about right for a processor/microcontroller on this node.

That said, I'm still concerned that the CPU runs as hot as it does, even with the best available cooling at stock settings. The engineer in me says that it's asking for trouble.

Also, it's worth mentioning that this design simply wouldn't fly in a datacenter. The bottleneck in heat transfer means significantly greater cooling costs with no corresponding increase in reliability. While I hope the Xeons use solder, the fact that Intel decided not to offer consumers the kind of reliability that previous HEDT platforms were known for doesn't sit well with me.

As far as electrostatic buildup goes, that's not really tied to temperature very strongly. The only other thing I could think of is that the epoxy they use on the IHS may have a limit beyond which it's adhesive properties dissipate. Again, that would be a pretty terrible design that shouldn't fly.
 
Status
Not open for further replies.