Thoughts and questions regarding C2D Temperature Guide

mikaelc

Distinguished
Dec 21, 2007
3
0
18,510
First things first... No matter what doubts and questions I have, I must admit that the guide is excellent and I really appreciate the lots of research and work CompuTronix must have put into it. Now to the point...

I've read the first version of the guide and recently found that it was revised. And I was very surprised with all these changes...

[1] Delta to Tcase... Unless CompuTronix has a reasonable theory and calculations (and I suppose he has), I consider Y values to be a weak point. Where did he get those numbers from? And how should I tell whether my cooler is a "mid-low", "high-mid" or other range? :) Reviews differ :) But seriously... Where do those numbers come from?

No matter what CompuTronix answer is or might be, the truth is - readings must be offseted. My CPU would run below ambient temperature otherwise and that's impossible with air cooling (btw. stock intel cooler, but I'm going to change it soon). Before CompuTronix revised his guide I have simply substracted the expected 15* from Tj and computed Tc that way. Well, of course - assuming that the delta to Tj max (returned by CPU registers) was correct and reliable... There always must be some assumption, right?.

And once again... No matter what his answer is - that makes sense. I've computed my delta step by step, pursuant to the directions. And it works... Calibration was simple:

IDLE:

Ambient: 24.5 *
E2160 L2 idle power: 8W, X = 2
Intel stock cooler: Y = 11
Delta = (2+11)/2 = 6.5*

Tcase = 31*
core0 = 37*
core1 = 35*

LOAD:
Ambient: 24.5*
Tcase = 49*
Core0 = 56*
Core1 = 56*

Tjunction may need an offset (possibly -1*), but I consider a little bit hotter reading to be better (for safety reasons).

[2] Delta to Tjunction... OK. So it is 5*, not 15*. That changed a lot of things (and that's what worries me the most). Since it is 5*, we've lost 10*, right? Previously we had, let's say, an overclocked CPU running "safely" with Tc = 50* and Tj = 65*. Now we have a different situation - with the very same Tj (65*) we have a CPU running with 60* Tc. For a CPU with TDP = 65W and Tcase max = 61.4* it's the red, hot scale! What was the reason for that change in findings?

[3] Tcase as a limiting factor... The last doubt... Since Tcase is the limiting factor for CPU's safety (and overclocking), why does Intel rely on Tj? Why does thermal throttling depend on Tjunction? Since Tcase is the limiting factor, we should never be able to reach that point (my CPU would melt or burn in living flames at Tc = 70* and Tj only 70 + 5 = 75*, 25* below Tj max), unless...

Unless... What will happen if we cross the Tcase max? Math tells us that crossing Tcase max should always be (more or less) below Tj max. So - what are the possible consequences? Why should we stick 10* below Tcase max and not, let's say 20* below Tj max?

And one more thing... What is a possible reason for such a huge difference between different revisions of the same CPU model? The same E2160 rev. M0 has 74* Tc max and 85* Tj max, so it will be possible to put that baby higher in overclock and still within safety margins... And on the other hand: if I cross the Tc max on my L2, I will still have about 39* to Tj max. But - on the same M0 revision, if I cross the Tc max, I will have only 11* to Tj max... What is that? Different IHS construcion? What's the reason for such difference?

Thanks in advance for sharing your thoughts and information regarding pointed questions and doubts...
Michael
 

CompuTronix

Intel Master
Moderator
mikaelc,

Wow! Hell of a first post! You've now obligated me to write a detailed and lengthy response, which I presently have no time for, until perhaps the weekend. Sorry you got me at a hectic time during my work week.

I can assure you that I'm not accustomed to distributing FUD, so I didn't just pull these numbers out of the clouds. A great deal of research and testing was involved. There are perfectly sound reasons and explanations for each of your questions, so please be patient, I'll get back to you.

Also, where did you find a "first version"? There have been countless changes, and they haven't all happened overnight. Do you have any idea how many updates, revisions and major edits have gone into the Guide since it was first Sticky'd here at Tom's on 2/2/07?

From the Guide:


Section 16: Comments

■This Guide may be frequently revised as new processors and information becomes available.


Comp :sol:
 

mikaelc

Distinguished
Dec 21, 2007
3
0
18,510
Thanks for reply, Comp.

As for the "first version". I have it saved somewhere home and cannot tell you now which revision it is. But it surely was something dated on december 2007 (the time I joined the forum and wanted to take a part in discussion, last but not least didn't and now I don't recall why :)). As I said, I've noticed the latest revisions and they rolled my understanding of Tc/Tj upside-down :)

I believe you didn't pull the numbers out of the clouds. And I won't try to hurry your answer, so take your time. Additional information will be a benefit to everyone.

Even after all the changes to Tc and Tj calibration my OCed temps are still between green and yellow zone (52 Tc of 61.4 max), so I'm not in need for solution of a problem. I just hope we'll have a possibility to discuss a few things :)

Michael
 

CompuTronix

Intel Master
Moderator
mikaelc,

I've had an unexpected change in my schedule, so here we go. Let the inquisition begin.


graysky's Overclocking Guide:


graysky doesn't want to maintain a list of Heat Sincs, and neither do I, as it's a part time job in itself just to keep up with new coolers. Nevertheless, he offers 1 link, while I offer 3. My Guide is not a cooler review or SpeedFan tutorial. Since I clearly state in the second paragraph:

"Scope:

This Guide is intended for intermediate to advanced users..."

I must assume that this level of enthusiast is generally intelligent and capable enough to be able to determine for themselves how to rank their coolers according the six classes shown in the Guide as "Y".

Y = 2 . . . . Cooler efficiency: high-end
Y = 3 . . . . Cooler efficiency: high mid-range
Y = 4 . . . . Cooler efficiency: mid-range
Y = 5 . . . . Cooler efficiency: low mid-range
Y = 7 . . . . Cooler efficiency: low-end
Y = 11 . . . Cooler efficiency: Stock Intel

Concerning "Y", I was able to determine these values based upon researching idle temperatures from several cooler reviews, as well as gathering empirical data from hands-on IR testing of numerous combinations of Core 2 variants and coolers, using the standardized Test Setup shown in the Guide as a control. I then cross referrenced "Y" values against known thermal resistance values (degrees C per Watt) for many of the most popular coolers. Once I was able to plot a curve, other coolers could be interpolated with reasonable accuracy.

As an engineer, I'm very familiar with technical documents, and having studied hundreds of pages of Intel papers, specifications and formulas for nearly 2 years, I know there's theory, and then there's application. Often they agree very closely, but regardless of the math, there are always deviations, tolerances and variables to consider, which include IHS and Heat Sinc flatness, and thermal compound, just to mention a few.

Even if processors which idle at 8 Watts with high-end coolers have both surfaces lapped, and the best thermal compound is properly applied, IHS temperature (Tcase) is at minimum about 1c above ambient, using the standardized Test Setup. So the vast majority of 8 Watt idle processors with high-end coolers which are not lapped, will typically idle about 2c above ambient.

"X" values, however, are very straight forward. When "X" and "Y" values are used to calculate "Z", or offset, then Tcase idle is about as accurate as the device used to measure ambient. It's not perfect, but it's the simplest method I could develop to accurately calibrate Tcase idle without overwhelming users with discouraging calculations.



Last year in previous versions of the Guide, I had included Tjunction Max values in the Section 6: Scale . Although I was in correspondence at that time with the author of Core Temp, Arthur Liberman, Tjunction Max values for desktop processors were undisclosed by Intel, and still are for 65 nanometer processors. So Core Temp, Everest and SpeedFan used the published values for mobile processors, and educated "guesstimates" to fill in the blanks. As we now know, the values for mobile variants don't conform well to desktops, and in some instances, aren't even close.

This created so much confusion in the overclocking community regarding Tjunction accuracy, and debate concerning the Guide, that I decided I must somehow discover a method to achieve accurate calibrations without using Tjunction Max values. The current version of the Guide accomplishes this goal based upon the Tcase idle calibration, and a few obscure Intel documents, which inadvertently revealed Intel's thermal "Holy Grail", that shows Tjunction load is consistently 5c higher than Tcase load. http://arxiv.org/ftp/arxiv/papers/0709/0709.1861.pdf See page 4, Figure 5.

Hence, calibrating Tjunction load is very easy, which again is performed using the standardized Test Setup. Consequently, there are no longer any Tjunction Max values in the Guide, because I now have enough information to reverse-engineer Intel's thermal secrets to within a few degrees. So where other utilities such as Real Temp, approach the solution for accurate Core temperatures from the "top down" using Intel's questionable Tjunction Max values, my calibration procedures for SpeedFan approach the solution from the "bottom up" using as many known values as is practical.


The Thermal Specification shown in Intel's Processor Spec Finder - http://processorfinder.intel.com/details.aspx?sSpec=SLB8W - is Tcase Max, NOT Tjunction Max. This is a very common misconception among many enthusiasts. Intel uses Tjunction Max for thermal protection only, and does not recommend that the Tjunction sensors be used for temperature monitoring due linearity issues such as "slope error", the severity of which was revealed at Intel's recent IDF 2008 DTS presentation. http://rapidshare.com/files/141444140/SF08_TMTS001_100r.pdf.html Conversely, the Tcase sensor is designed specifically for temperature monitoring, and is relatively linear from idle through load. This is why Tcase Max values are given in tenths of degrees, while Tjunction Max values, which Intel disclosed for 45 nanometer variants at the DTS presentation, are given in round figures.

Essentially, Intel tests and certifies stability for each processor variant to a model and stepping specific Tcase Max, using a standardized Test Setup in a laboratory environment at various ambients, at a certain TDP, using a model specific Intel cooler. So Tcase Max is the Therfmal Specification Intel shows in their Processor Spec Finder, which regardless of variant, is consistently about 26c to 30c below Tjunction Max, or shutdown temperature. If Tcase Max is exceeded, then stability may be compromised, which of course, is highly dependent upon load. Although the processor will typically continue to function normally under moderate loads up to the point of throttling, Tjunction Max is factory calibrated so that processors are capable of exceeding Tcase Max while tolerating extreme thermal environments. However, since experienced overclockers know that cool = stable, Tcase Max is always the limiting thermal specification.

Since it was cleverly discovered that DTS sensors in later P4's and Pentium D's could also be used to monitor Core temperatures, and the utility "Core Temp" was released, everyone has become so brainwashed on Core temperatures that they've dismissed CPU temperature (Tcase) as a valid and accurate temperature monitoring method. Many users still believe that CPU temperature still comes from a thermistor located in the socket, touching the bottom of the CPU, as it was many years ago.


Very astute observations. The processors you've brought into question are the best examples of inconsistent Tcase to Tjunction Delta values, since the Tjunction Max values themselves are in question, which still remain undisclosed from Intel for 65 nanometer variants. Check out rge's post # 2152 on page 87 of the Real Temp thread over at Xtreme Systems - http://www.xtremesystems.org/forums/showthread.php?t=179044&page=87 - Don't become misled by undisclosed Tjunction Max values, because some former assumptions are incorrect. Remembering these 3 simple expressions brings temperatures into perspective:

■(X + Y) / 2 = Z

■Z + ambient = Tcase idle

■Tcase load + 5c = Tjunction load

I've been corresponding since April with the author of Real Temp, Kevin Glynn, who is unclewebb. I've followed the Real Temp thread since it began, and I've read all 2267 posts in 91 pages. Real Temp is the only temperature monitoring utility which is based upon real world IR testing. unclewebb has invested a tremendous degree of effort in researching and revealing the variables involved in Tjunction Max values, however, since Intel's DTS presentation, although he was closer than any other utility, and dead-on in one instance, Intel's 45 nanometer Tjunction Max values show that Real Temp may be a few degrees low.

However, since Intel refers to "worst case condtions" frequently in their technical documents, and Tjunction Max values are round figures, such as variants which use 100c, we think that he was closer at 95c than Intel would care to admit. Personally, I think he was only low by 1c due to omitting a minor variable. He and I, along with the insights of a few extremely sharp members at Xtreme Systems, now think that the Tjunction Max specification of 100c might actually be 98c +/- 2c, since Intel now clearly states that Tjunction Max varies from part to part, which makes perfect sense.

So consider the following; since my calibration procedures for SpeedFan don't use Tjunction Max values to achieve accuracy, then why, when calibrations are completed, do users find that their Tjunction Max values are about 97c to 98c? As you've pointed out, my calibration procedures work! :)

Regardless, since Intel's IDF 2008 DTS presentation, I've been re-testing several 65 and 45 nanometer variants over the past few weeks, where I've included some previously unobserved calibrations points, so as to reveal more information concerning "slope error" characteristics, and to validate previous findings from the data I've collected over the past 20 months. I hope to complete the work early this coming week when I'll post the results on the Real Temp thread.

To summarize, I think that we've finally reached a point where we're able to calibrate temperatures within a few dgrees, or 5%, which is quite acceptable, so it's unlikely, given the sensor deviations and variables involved, that we'll zero in much closer. Although my findings will serve to confirm that unclewebb's work on Real Temp, and my work on SpeedFan calibrations are both well grounded, I don't see any major changes in the Guide until Core i7. I will, however, continue to edit the Guide for content and clarity.

My personal opinion? All of this is so much overkill... 4 cylinder engines don't have 5 temperature guages, and cores don't have intercoolers!

Do these explanations answer your questions?


Comp :sol:
 

mikaelc

Distinguished
Dec 21, 2007
3
0
18,510
Thanks for reply, Comp...

I must assume that this level of enthusiast is generally intelligent and capable enough to be able to determine for themselves how to rank their coolers

Ok. I understand and I think that noone really expects you to maintain such a list (it would be long and frequently updated). I just wanted to know where do these numbers come from and since you say (as I assumed) that it's based on your research, observations and tests, I believe :) And since the difference between a "high-mid" and "mid" or "mid" and "low-mid" range will only be 0.5*, I have nothing against a more modest approach in classification of my new aftermarket cooler...

Intel uses Tjunction Max for thermal protection only, and does not recommend that the Tjunction sensors be used for temperature monitoring due linearity issues such as "slope error"

That's what I meant saying that Intel relies on Tj (as an impulse for "thermal brake"). But there is one more thing, I'd like to point out and reconsider. Intel says that "the system/processor thermal solution should be designed such that the processor remains within the minimum and maximum case temperature (Tc) specifications when operation at or below the Thermal Design Power (TDP) value listed per frequency in Table /.../" and "Thermal Design Power (TDP) should be used for processor thermal solution design targets. The TDP is not the maximum power that the processor can dissipate".

It is obvious that putting a CPU on a higher frequency and overvolting it leads to a higher power consumption and dissipation. So... Since we overclock, the CPU runs over the maximum power consumption level specified by Intel. Shouldn't we extend the curve than? What I mean is: the maximum Tcase was specified for a maximum TDP of the CPU. Let's stick with my E2160 L2... For a maximum TDP equal to 65W, the maximum Tc is specified at 61.4. And it is somehow linear (as the curve plot by Intel says, max Tc = 0.28 x Power + 43.2 - Datasheet for Dual-Core Desktop Processor E2000 Series). Now - let's say we have my CPU overclocked and overvolted in such way, that it consumes about 85W. Doesn't that mean that the maximum Tc allowed in such conditions would be 0.28 * 85 + 43.2 = 67.0* ??? And once again - if that was the truth, we would fall back to the Tj max as the limiting factor for overclocking (the spot where throttling occurs).

As a continuation, for a throttling spot at 95 Tj (doesn't matter if it is exact number, it's just an example), Tc would be 90* and it would mean that the theoretical, torelable power consumption of that CPU is (90 - 43.2) / 0.28 = ~167W. Assuming that I computed the capacitance of my CPU correctly, 167W corresponds to something like 3.6 GHz @ 1.45+ Vcore...

I'm sorry if it was just a naive idea... I'm only trying an opposite side approach :)

everyone has become so brainwashed on Core temperatures that they've dismissed CPU temperature (Tcase) as a valid and accurate temperature monitoring method

Well... Before you revised your findings... I was no better... :) I was even ready for fighting in the name of "stick to Tj (CoreTemp) reading" as a blind... Simply because: without calibration Tc is usually incorrect and Tj is reported by CPU registers (what could be more accurate?). And I assure you - there are lots of people who don't know your guide and stick with Tj.

And since your findings have changed, it occures that I was running my CPU in the hot scale of Tcase (remember? we've "lost" an imaginary 10* delta between Tj and Tc). And that could be a problem of a bigger group of "enthusiasts" (esp. those very young, who overclock like children playing "who's better" game and disregard CPU temps at all)...

Huh... I will have to dig through the discussion on xtreme systems (RealTemp thread), as I haven't before (there were too much posts "work's great", "doesn't work on my Vista", a.s.o. in the beginning, so I gave up). If I come up with another question, I'll surely ask :)

Michael