Netburst's problem wasn't so much a "too much power" issue as it was a too many sacrifices for clocks one since Intel sacrificed 40-50% of its IPC relative to Coppermine/Tualatin to get there. Netburst needed to clock almost twice as high out of the gate to beat the P3 under all circumstances, kind of awkward when your 2GHz top-end new part barely beats your 1-1.3GHz previous-gen parts.
As for 10nm and beyond, hitting the physical node dimensions (or at least something close enough to call it as such) is only half the battle, still got to get the process to also yield the expected performance and quantities. As far as we can tell from available Icelake SKUs, 10nm+ does not appear to be there yet. Hopefully things will go more smoothly for 7nm in 2021.
I'm not expecting much out of 10nm. At this point, 10nm is mainly about Intel having to prove to investors that the billions it spent getting 10nm to work as intended as Intel told investors that 10nm was "on track" while delaying 10nm products due to setbacks for years in a row were not completely wasted.
It wasn't quite that bad. Certain tasks, yes, an overclocked 1.4GHz Tualatin P3 could beat a 1.4 GHz Willamette P4, but in other stuff the P4 was quite a bit faster. And once P4 hit 2+ GHz, it generally wasn't even close. You had to carefully cherry pick tests to get better results on P3. Plus,
P4 Willamette was the 'bad' first gen part, but Northwood was very good overally and rapidly scaled to 3GHz and more. That that point, it was faster than any of AMD's Athlon XP chips -- it was only Athlon 64 that actually took the lead, in part thanks to its integrated memory controller. And really, you needed the much more expensive socket 940 chips, or later socket 939 -- socket 754 was okay but didn't always match or beat P4 Northwood chips.
I
had an overclocked Tualatin, incidentally, and later upgraded to a Pentium 4 rig. It was a very noticeable jump in overall performance. And at stock clocks, Tualatin for desktops topped out at 1.13GHz (which was why it was overclocked to 1.4GHz). Some of that was memory as well -- P4 could do quite nicely with the right memory setup. Rambus was technically faster, but the later chipsets with DDR support weren't bad.
The thing is, Willamette was first gen NetBurst, so you can sort of understand some of the mistakes that were made in retrospect. Northwood fixed a lot of those, but then Prescott went off a cliff and just couldn't scale to the frequencies and performance Intel wanted. Even worse, Tejas -- which actually was super close to a public release and had been sampling for many months to testers and other places -- didn't really help and had even worse heat and power characteristics. Which is why Intel pivoted, killed off P4, and came out with Core 2 Duo / Merom / Conroe (after the success of Yonah and the Core Solo/Duo chips).
Anyway, I'm very curious to hear what precisely Intel does with Rocket Lake to try and keep it relevant. PCIe Gen4 is not going to be nearly sufficient. Integrated Xe Graphics won't really matter either, since these are desktop chips that will just use a dedicated GPU. I have trouble imagining the Willow Cove architecture alone will be anywhere close to sufficient to compete against Zen 3 and maybe even Zen 4. It's all just a holding pattern and business as usual while Intel works to get out proper 10nm or 7nm desktop parts.