Discussion What made the Pentium 4 so hot?

i76700hquser

Reputable
Jul 3, 2019
292
26
4,740
To this day, even being from Gen Z, and having never used a Pentium 4 until 2 years ago, I'm curious why the Pentium 4 was so hot compared to say, a Core 2 Duo, which was much cooler and even needed a less beefy cooling system.
My Northwood 478 Pentium 4 idles at 50 degrees on the BIOS, and so does my Prescott, which is pretty hot.
The stock Intel fan (on the Northwood) is also really noisy, and always runs at max, but that may be a bad fan.
 

Aeacus

Titan
Ambassador
Pentium 4 was 1st released in 2000 and depending on architecture, it had either:
  • 180nm node (Willamette) released 2000.
  • 130nm node (Northwood) released 2002.
  • 90nm node (Prescott) released 2004.

There were also hyper-threading variants of Pentium 4,
  • 130nm node (Northwood) released 2002.
  • 130nm node (Gallatin XE) released 2003.
  • 90nm node (Prescott) released 2004.
  • 90nm node (Prescott 2M) released 2005.
  • 90nm node (Prescott 2M XE) released 2005.
  • 65nm node (Cedar Mill) released 2006.

Intel Core 2 Duo, however, compared to Pentium 4, is much newer.
  • 65nm node (Conroe) released 2006.
  • 65nm node (Allendale) released 2007.
  • 45nm node (Wolfdale) released 2008.

Making the processor out of smaller transistors (smaller fabrication node) means that it can run at higher clock speeds and produce less heat.
 

i76700hquser

Reputable
Jul 3, 2019
292
26
4,740
Pentium 4 was 1st released in 2000 and depending on architecture, it had either:
  • 180nm node (Willamette) released 2000.
  • 130nm node (Northwood) released 2002.
  • 90nm node (Prescott) released 2004.
There were also hyper-threading variants of Pentium 4,
  • 130nm node (Northwood) released 2002.
  • 130nm node (Gallatin XE) released 2003.
  • 90nm node (Prescott) released 2004.
  • 90nm node (Prescott 2M) released 2005.
  • 90nm node (Prescott 2M XE) released 2005.
  • 65nm node (Cedar Mill) released 2006.
Intel Core 2 Duo, however, compared to Pentium 4, is much newer.
  • 65nm node (Conroe) released 2006.
  • 65nm node (Allendale) released 2007.
  • 45nm node (Wolfdale) released 2008.
Making the processor out of smaller transistors (smaller fabrication node) means that it can run at higher clock speeds and produce less heat.
Interesting, there were so many models. Why was the Prescott Pentium 4 known to be hotter than the Northwood though?
 

Aeacus

Titan
Ambassador
Why was the Prescott Pentium 4 known to be hotter than the Northwood though?

My best guess: neglect.

With Pentium 4 builds, was the servicing a PC a thing? Meaning opening it up and cleaning dust from innards? If not, then it doesn't take much for PC to fill with dust. And depending on PC case used and CPU cooler on it, some are more prone to collect dust than others.

So, on CPU level alone, Prescott isn't as hot running as Northwood is. But if you factor in the environment (e.g PC on the table vs on the floor), neglect, design of PC case and CPU cooler, then you can skew the results, whereby, on average Prescott is hotter running than Northwood.
 
Most of the P4's I've ever had were usally in systems with crappy cooling or if I worked on a system with a P4 usually a dell OEM that was packed with dust. I never really spent the time to know why one was hotter than the other or even recalling ever really keeping track either. Most of the systems I came crossed had dead fans due to dust build up, dust them out and changing a fan and usually they were fine after.

Not to mention computers at the time were so large, people put them on the floor where it collects dust like no other.
 
To this day, even being from Gen Z, and having never used a Pentium 4 until 2 years ago, I'm curious why the Pentium 4 was so hot compared to say, a Core 2 Duo, which was much cooler and even needed a less beefy cooling system.
My Northwood 478 Pentium 4 idles at 50 degrees on the BIOS, and so does my Prescott, which is pretty hot.
The stock Intel fan (on the Northwood) is also really noisy, and always runs at max, but that may be a bad fan.
I seem to remember comments about a transistor power leakage problem that resulted in high current and temperatures. It also used netburst architecture with super deep pipelines that allow for extremely high clocks, actually relying on it to overcome the problem of pipeline stalls, also helped a lot with hyperthreading that added to the thermal problem in the package. The plan was to push it to 10Ghz but high clocks lead to uncoolable temperatures so it never got beyond 3.8Ghz or so.

It was an overclocker's dream, though, if you could cool it. I think this was around the time of the birth of serious sub-ambient cooling with things like peltier's, phase change and manufactured (not DIY cobbled together from auto heater cores and aquarium pumps) water cooling. Cool it well enough on a motherboard with sufficient power delivery and the sky was the limit with what you could do.
 
Last edited:

i76700hquser

Reputable
Jul 3, 2019
292
26
4,740
And another thought; when was the last time thermal paste (between CPU and CPU cooler) was replaced, if ever? Since when thermal paste ages, it hardens and looses it's thermal conductivity, worsening the cooling of a CPU.
I haven't used that computer in a few months, but before using it I replaced the thermal paste, it was pretty much rock solid.
 

i76700hquser

Reputable
Jul 3, 2019
292
26
4,740
I seem to remember comments about a transistor power leakage problem that resulted in high current and temperatures. It also used netburst architecture with super deep pipelines that allows extremely high clocks, and relies on them to overcome the problem of pipeline stalls, helped a lot with hyperthreading. The plan was to push it to 10Ghz but high clocks lead to uncoolable temperatures so it never got beyond 3.8Ghz or so.

It was an overclocker's dream, though, if you could cool it. I think this was around the time of the birth of serious sub-ambient cooling with things like peltier's, phase change and manufactured (not DIY cobbled together from auto heater cores and aquarium pumps) water cooling.
Yeah, though I don't think a 10GHz Pentium 4 would be as powerful as CPUs are today, so I'm thankful they changed their CPUs to be based on the Pentium 3 (if I recall correctly?).
 
Yeah, though I don't think a 10GHz Pentium 4 would be as powerful as CPUs are today...
Nowhere close because today's computers have 6-16 cores, 12-32 hardware threads, to work with as well as DDR4 memory. And, of course, an OS that (now) knows how to use them. Two, actually.

Also, developers stopped focusing so on clock speed and more on Instructions Per Clock (IPC) to improve performance. That's where AMD really screwed up with bulldozer and excavator: deep pipelined processors that needed really high clocks to perform, with the penalty of high TDP's to go with it.

I'm not really that familiar...I thought the successor was Pentium MD? or was it the Core processors?

Pentium M was mobile...D was desktop.
 
Last edited:

Aeacus

Titan
Ambassador
I'm not really that familiar...I thought the successor was Pentium MD? or was it the Core processors?

Pentium D was the dual-core variant of Pentium 4 and successor of Pentium 4 (which was single-core). Pentium D saw releases in 2005 and early 2006. Pentium M, yes, was a mobile branch of Pentium 4 series (also only single-core CPUs).

Pentium D was short lived, after which Intel moved on to Core 2 micro-architecture, by also combining previous CPUs (Pentium 4, D and M under single name, for marketing purposes). And after Core 2, Intel moved onwards with the current Core lineup: i3, i5, i7, i9, that we know today.
 
To answer your question precisely, it was partly because branch prediction was not very good back then, and partly because very long pipelines were employed as things were expected to ramp up to ever higher speeds.

Northwood had a 20-stage pipeline and was expected to scale to 4GHz. Prescott had a 31-flavor pipeline and was expected to cover 4-7GHz while Tejas would've had a 40-50 stage pipeline and was targeted at 7-10GHz. Long pipelines are supposed to help improve clockspeeds because each stage does less work--there are more, but smaller steps. This is why Prescott was such a slow and hot disappointment--it never even reached the speed Northwood was predicted to top out at, so the longer pipeline made it perform worse than Northwood despite the doubled cache.

The problem with long pipelines is if anything goes wrong, the entire pipeline must be flushed and so not only does performance tank because you have to start all over wasting all of the time that went into filling that pipeline, but all of the power used to fill that pipeline is wasted too.

Speculative execution is a way to increase performance over in-order processing in much the same way as caching--you start working based on a prediction that the result or data will be needed shortly, and discard it if it wasn't. The problem is predicting what will be needed in the cache or which calculation result is not easy, and the accuracy rate of early branch prediction (which tries to statistically predict which of two possible branches will be the needed one) was not good.
Well if you do not care about power consumption there is a perfectly obvious way to make up for a poor branch predictor--you simply execute both branches and discard the one that isn't needed, essentially doubling your power consumption or halving your efficiency. But the result will be ready in time without any stalls from having guessed wrong.

The problem with Logic is sometimes a branch cannot continue until the result of something else is known. There is no way to delay the progress of something in a pipeline (that is, there is no special parking area to pull something out of a pipeline until that dependency is available) so normally if this is encountered the pipeline gets flushed and you have to start over from the beginning. The Pentium 4 has a unique system that redirects things back to the beginning of the execution units in the pipeline where it can loop over and over wasting power until the result is available and the branch can continue. That means the Pentium 4 can load its execution units to 100% while doing no work at all and just waiting for a result.

Netburst was the first Intel architecture that was undocumented (to stymie competitor AMD) so we only know this from the Russians over at iXBT Labs.

As an aside, both speculative execution and cacheing are very power consuming ways to improve performance, so the original Atom processor went back to in-order with very little cache (like a Pentium 1) to save power at the expense of performance. That's why they weren't affected by Spectre/Meltdown which attacks the method of branch prediction used by the CPU manufacturers. By the time of Core 2, the branch predictor had improved enough to guess correctly an average of 96% of the time.


Pentium 4 also ran at full speed all of the time, plus Windows 9x didn't even have a halt instruction, so you'd expect it to idle hot. The exception was Pentium 4M for laptops which had a low default multiplier and would increase its multiplier under periods of high demand for performance. Core 2 is the same way--the default multiplier was 6x and it could increase to the maximum rated multiplier as needed, kind of like the Turbo system in today's CPUs (which is very different from the Turbo Button of yore which could lock the CPU at 4.77MHz)
 
Pentium 4M was a true Pentium 4 slightly tweaked for laptop use. Pentium D was two entire Pentium 4 cores placed onto the same package to make a dual-core chip. They were not connected except via FSB, just like in a dual CPU-socket machine

Pentium M was essentially the Pentium III on the Pentium 4's FSB, for low-power laptops. ASUS made an adapter to fit it into desktop boards and the overclocked performance made the Pentium 4 look silly.

Core and Core Duo were the successors to Pentium M for laptops. The Duo was dual-core on the same die so could share the same L2 cache

Core 2 was the successor to Core, and the desktop Core 2 Duo and Core 2 Quad came to be after the cancellation of the Tejas project
 

jnjnilson6

Distinguished
What do you think about this piece of wonderful comparison?

nw8000-vs-dc7600.jpg
 
And Northwood wasn't even that bad. Sure it pulled more power than the Athlon XP at the time, but there were a couple things they did quite well.
Like video encoding. Basically all the kinds of tasks that managed to keep the long pipeline filled.
In games it was quite different, a cheaper and less power hungry Athlon was often faster.

Now Prescott is a different beast. Even longer pipeline so that it could reach even higher clocks. Which didn't happen. It pulled more power, was slower per clock, and didn't clock any higher.
There was a die shrink, Cedar Mill, that reduced power consumption again, and was fairly decent, but still no Northwood.
(I have a 3.0 GHz Cedar Mill here, and it runs pretty cool even with OC to 4 GHz with a proper cooler. Under 70°C with 1.4V under a Hyper 212 Evo)

Northwood had a 20 stage pipeline, Prescott had 31. And there were already early samples of the next version, Tejas, which lenghened the pipeline again.
A Tejas test chip with 2.8 would be rated for 150W, while Prescott could do that with 84W, a Smithfield Pentium D (dual core) with 95 W and a Northwood with 68 W, and of those the fastest would be Northwood/Smithfield (if the software can use both cores) > Prescott > Tejas.

In the end it was a design aimed at high clocks. (10 GHz by 2011) A long pipeline helps with that, but reduces IPC.
AMD actually did something similar with their FX chips. A hit in IPC, but high clocks. And IBM did it as well with POWER6

And we're approaching the same limit again.
From a 3.8 GHz Pentium 4 with 115W to a to 3.73 GHz Pentium D with 130 W to a 5 GHz POWER6 with 160W to a 5 GHz FX-9590 with 220W and now we got some nice 5 GHz i9 with 350W
 
  • Like
Reactions: jnjnilson6
Assuming the power consumption of the processors was more or less in line with its TDP, how hot something runs really just depends on the cooling system around it. If you run a 65W TDP Ryzen processor (which really consumes ~85W under full load) using a Wraith Prism cooler, it'll run just as hot as a Pentium 4 back in the day.

This was the stock cooler for a Northwood Pentium 4. Apparently it used to be worse.
hsf.jpg


And a typical case of the time would've had at best an 80mm fan in the front and an 80mm fan in the back.

Otherwise the answer to why a Pentium 4 consumed so much more power was already given.