Thermal Protection is it all that?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Out of the 60 some server boxes that they pay me to tinker with, one has lost a CPU fan. Fortunately, the bios was set up to halt upon CPU fan error - the RPMs got too low. How handy is that? One new CPU fan (actually, I replaced both while I was in there) and the box was off and running.
What do the developers kill most often? Hard drives without a doubt.

Sweating like a rancid chunk of pork
 
How old are those boxes?

In theory, there is no difference between theory and practice.
In practice, there is.
 
Man, in all my years of working with computers (and I have worked with many, many comps), I have replaced 2 cpu fans--not because they failed, but because they were noisy.

I have assembled perhaps hundreds of rackmount systems, each with upwards of 6 fans--one fan per power supply, three fans blowing over the card chamber, and one fan per CPU. Some (the 7u's, 9u's, and 11u's) have an additional exhaust fan in the back, and three power supplies instead of two. Some are dual-CPU, meaning yet another fan. Some are split-backplane, meaning perhaps four CPUs. Generally all of them stay powered on constantly, in service for years and years. Very, very few customers ever order spare fans (and these fans are not the kind of fans you can buy at your local computer store, mind you). The one system we had come back for service because of fan failures came back because the fools had disconnected the alarm board that delivered power to the fans.

The IT department at work has to maintain about 75 workstations. We replaced a fan in one simply because it was abnormally noisy out-of-the-box. Note that these were systems built by the lowest bidder.

My own chassis has fourteen fans (count 'em, <i>fourteen</i>), and I've never had to replace one.

Fan failures do happen (every mechanical piece of equipment will fail eventually if you use it long enough), but they are extremely rare.

Kelledin

bash-2.04$ kill -9 1
init: Just what do you think you're doing, Dave?
 
My point is, If you improperly install the heatsink or it falls off during transport, Power on the system without knowing. Will the Thermal protection will kick in fast enough to save it? I don’t think it will but lets see some Intel supporter prove me wrong.

The summery of my point is this. If the thermal protection will not save a P4 with no heatsink, the fact AMD has no protection is a non issue.

Thx & Cya


<font color=red>There are only 2 types of hard drives. Ones that have crashed and ones that are about to.</font color=red>
 
Were talking about <b>CPU fans</b> not Power Supply fans or other case fans. I just wanted to make that clear.
CPU fans are the most likely to fail because they are so small. CPU fan failures do happen, and it's usually more likely to occur after about 2 yrs of use. The main reason is because of lint or dust build up on the small fan blades which causes the fan to be slightly out of balance, over a short period of time the bearings or brushes tend to wear out causing the fan to fail. Have you ever opened a computer up that has been in a busy office environment for 1-2yrs? The amount of dust in them can be outrageous. With server's this is less likely to happen because servers are usually in a clean and sealed room. <b>CPU fan</b> failure is not as extremely rare with personal pc’s as you tend to think. I agree that cpu fan failures are not so common as to call it typical or normal.. but it happens more then I like to see.



(A)bort, (R)etry, (G)et a beer?
 
Isn’t the watts a CPU draws directly related to how much CPU utilization is going on. Like when you are benchmarking a P4, running SETI or a 3D game that is really taxing the CPU. Wouldn’t the P4 be at 72.9 watts when the utilization of the CPU is 100%? My Athlon crunches SETI 24/7 and it’s nice to know it’s running at 1 gig all the time and not throttled back by as much as 50%.

Thx & Cya


<font color=red>There are only 2 types of hard drives. Ones that have crashed and ones that are about to.</font color=red>
 
My point is, If you improperly install the heatsink or it falls off during transport, Power on the system without knowing. Will the Thermal protection will kick in fast enough to save it? I don’t think it will but lets see some Intel supporter prove me wrong.
Yes, the thermal protection on Intel CPU's will kick in. Thats why Intel engineers designed theremal protection into the CPU's. Thats what its there for, fan failures or other situations were heat can be extreme enough to damage the CPU. Can I prove it? No I cant, I dont have a P4 to try it with.


(A)bort, (R)etry, (G)et a beer?
 
Why would a small fan fail faster then a big one? Assuming same RPM of course.

In theory, there is no difference between theory and practice.
In practice, there is.
 
Wouldn’t the P4 be at 72.9 watts when the utilization of the CPU is 100%?
Intel specs show that the 1.3Ghz draws 48.9 watts MAX. It's nice to know that if i were to have a CPU fan failure with a P4 it would shutdown <b>before</b> the CPU burns itself up.


(A)bort, (R)etry, (G)et a beer?
 
Why would a small fan fail faster then a big one?
Because any small imbalance will affect a smaller fan more then a larger fan. Lint and dust build up on small CPU fan blades takes its toll on its smaller bearings much quicker then the big 3.5" or 4” case fans.


(A)bort, (R)etry, (G)et a beer?
 
Were talking about CPU fans not Power Supply fans or other case fans.
I'm talking about all fans--and I'm including CPU fans as well. If anything, our systems should be coming back in for warranty service on CPU fans more than any other fan--the CPU fans are attached almost immovably onto the CPUs or the SBCs, whereas most of the rest are hot-swappable. They <i>never</i> get sent back for CPU fan failures

Oh, and among the 75 workstations we have at work, the only fan that has failed (or rather, been abnormally noisy) is a case fan, not a CPU fan.

Kelledin

bash-2.04$ kill -9 1
init: Just what do you think you're doing, Dave?
 
You keep talking about your servers. I thought this discussion has been about CPU fans in personal computers. Again, the main reason CPU fans fail in <b>personal computers</b> is because of dust and lint build up on the small fan blades which causes imbalance which in turn causes the bearings to fail and thus the fan freezes or stops turning. I have seen this happen on numerous occasions. Again, this is less likely to happen with servers because they are typically kept in an environment which is clean and sealed.


(A)bort, (R)etry, (G)et a beer?
 
If the fan dies after about 2 years of service, it is likely to be a cheapo sleeve bearing fan. You want something with a ball bearing. Preferably a double ball bearing. The last for ages of constant service. Even with a pretty high rpm.

What I see happen in big companies is that computers get replaced every couple of years. the performance computers tend to get replaced every year. The servers usually stay for quite a while, but their role changes. For example, a live server of today may be used as a stress test server 2-3 years from now.

This usually means the administrator never sees a fan and thus the cpu die on them. the puters get replaced well before that happens.



<font color=red>"My name is Ozymandias, King of Kings:
Look on my works, ye Mighty, and dispair!"</font color=red>
 
Errr...you apparently haven't been out on many service calls.

Big professional companies (the ones that know how to maintain servers) usually keep their servers in more or less a clean-room environment. This was the case at MCI when I was working for them--the server rooms were always clean, and the temperature was always slightly below room temperature.

Not all companies are as diligent as that though. A lot of our customers have absolutely disgusting server rooms--spaghetti cabling, racks always left open, temperatures above 28 C, servers stacked on top of one another instead of being put in a rack. Some don't even keep their servers in server rooms, but stick them on <i>top</i> of servers that have been stuck where it shouldn't be possible to stick servers. The systems that come back often carry exactly the kind of dust you're talking about, as well as a dozen other marks of maltreatment. A few have actually come back shaped like parallelograms because of the beating they've taken.

Kelledin

bash-2.04$ kill -9 1
init: Just what do you think you're doing, Dave?
 
Actually, dust unbalancing the fan will have a <b>bigger</b> effect on a bigger fan. A mass imbalance has a longer moment arm to work on with a bigger fan.

And the systems I was referring to were desktop machines, not servers.


In theory, there is no difference between theory and practice.
In practice, there is.
 
Actually, dust unbalancing the fan will have a bigger effect on a bigger fan. A mass imbalance has a longer moment arm to work on with a bigger fan.
Yes, an imbalance on a larger fan can have more affect but, by the time you get enough dust on a large 4” fan it could take many many years. The smaller CPU fans are more sensitive to small amounts of debris than the larger fans and are more likely to fail much sooner.



(A)bort, (R)etry, (G)et a beer?
 
I believe large fans accumulate just as much dust as small ones--if not more, due to the fact that they move more air. With the same amount of dust, it's still worse on a larger fan. What's critical in the imbalance is the moment of inertia--and the moment of inertia increases when you have more of a rotating body's mass located further from its axis of rotation.

Kelledin

bash-2.04$ kill -9 1
init: Just what do you think you're doing, Dave?
 
Keep in mind also that CPU fans generally spin at RPM's 2 times the speed or more then large case fans do. Add that to the fact that CPU fans are directly mounted on top of a heatsink which contributes to the fact that debries build up is more common and thus failure is more common on CPU fans.


(A)bort, (R)etry, (G)et a beer?
 
Ok I’m obviously confused.

The article says a P4 1.5 gig uses 72.9 watts of power but you say it only runs at 50 watts of power 99.9% of the time. When dose it run at the 72.9 watts? I don’t see how having a heatsink (or not having one) affects how much power it dissipates. Is the article lying? How does the fan dying make the CPU use more power? The way I see it either the P4 uses 72.9 watts or it uses 50 watts, regardless if the fan is running or not. Also you keep using the p4 1.3 gig for comparison, I’m sure the P4 1.7 uses more power. That’s like me using the Tbird 900 for comparison.

Thx & Cya


<font color=red>There are only 2 types of hard drives. Ones that have crashed and ones that are about to.</font color=red>
 
Then why do case fans fail just as often, if not more often, than CPU fans? This has been the case at home and at every place I've worked, in servers and in workstations. Failures of either kind are rare, even working at places with hundreds of desktop workstations. Overall, even in desktops, CPU fans seem to last even longer than most hard drives--and I'd consider hard drive failure to be a far more serious setback than a toasted CPU.

And even if a CPU fan does fail, there's still plenty of time for health monitoring software to shut the system down. Remember, a T-bird lasts about an hour with just a heatsink.

Kelledin

bash-2.04$ kill -9 1
init: Just what do you think you're doing, Dave?
 
Then why do case fans fail just as often, if not more often, than CPU fans?
I don’t think either of us can back up our experiences with hard solid facts or data. So let’s just leave it at that. In my experience I have seen CPU fans die with much more frequency then case fans. You have a different view. There are other factors such as the quality of the fans that should be taken into account. Neither of us have the facts to back our selves up. I enjoyed the debate, but I’m tired of discussing CPU fans and dust build up. Not really a thrilling subject is it?



(A)bort, (R)etry, (G)et a beer?
 
Power increases linearly with frequency and with the square of voltage. That is why I compared the two cpu’s on a clock-to-clock basis. When talking about power consumption it wouldn’t be fair to compare a higher clocked cpu to a lower clocked one.
Intel’s Thermal Monitor includes an accurate on-die temperature sensing circuit which can tell when the CPU gets too hot. If the CPU exceeds a certain temperature the Thermal Monitor will clock down the CPU using Thermal Modulation until it cools down to normal operating temps.
The author of that article you refer to fails to mention that Intel tested the P4 with over 200 apps under very stressful conditions. These included: Transaction Processing Performance Council TPC-C, SPEC, SPECint SPECfp, SPECweb, Ziff-Davis 3Dwinbench and Winstone, Microsoft desktop applications, Quake, CorelDraw, Video playback, several of which were run under multiple operating systems. (Including Microsoft Windows 98, Microsoft Windows NT and Linux) and other compute intensive applications.
During this testing, power consumption never exceeded 75% of max. So CPU throttling is very unlikely to occur unless there is a fan failure or some other major heat related problem. Maybe using a P4 in the Sahara Desert would trigger Thermal Modulation. :)



(A)bort, (R)etry, (G)et a beer?
 
>I believe large fans accumulate just as much dust as small
>ones--if not more, due to the fact that they move more air.

Agree, they have more surface area, and move much more air. Airflow is roughly proportional to the crossectional area of the fan. And that goes up as the square of the radius.

And on your later point ... I'll take a CPU failure over a drive failure any day!

In theory, there is no difference between theory and practice.
In practice, there is.
 
"If you improperly install the heatsink or it falls off during transport, Power on the system without knowing. Will the Thermal protection will kick in fast enough to save it?"

Yes it will. The thermal sensor is actually on the CPU. Within a few seconds the CPU will get excessively hot and throttle down to half speed. If it still continues to rise in temperature the thermal diode will completely shut down the CPU.

-Raystonn

= The views stated herein are my personal views, and not necessarily the views of my employer. =
 
"Wouldn’t the P4 be at 72.9 watts when the utilization of the CPU is 100%?"

No, the 72.9 watt measurement would be the power draw if all transistors were flipped on and off repeatedly at the same exact time over and over. That's not possible outside of a special electronic testing unit. Your software will never make it go that high. Power draw depends on what instructions are being executed.

-Raystonn

= The views stated herein are my personal views, and not necessarily the views of my employer. =
 

TRENDING THREADS