P-M on desktops: Nice review

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
A nice review from <A HREF="http://www.behardware.com/articles/565/page1.html" target="_new">behardware.com</A>.

It shows that a 2Ghz Dothan running on an 800Mhz FSB with two channels of DDR400 is about <b>a full +10% faster</b> than a 2Ghz Dothan running on a slower 533Mhz FSB with two channels of DDR400...
Results are interesting and show that the Pentium M architecture still has some room for improvement and is clearly castrated by official use.
Indeed, what Intel clearly needed to have done is scrap all the thoughts on dual-core prescotts and get out Yonah for desktops. There have been numerous reports about this chip being OKed for ~2.5Ghz <i>stable</i> operating frequencies without a hitch! This obviously points to the fact that Intel doesn't want to jeopardize prescott, because dothan would shine more if on a desktop board.

This also puts to rest some concerns about P-M's lower-than-optimal scaling with FSB. I mean, once equipped with a 10% faster FSB/memory combo and put to stable operation at 2.4Ghz/2.5Ghz, which seems doable according to a <i>lot</i> of sources on the net, it's pretty obvious that P-M's would smoke prescott easily. And it would probably even manage to match AFXs in gaming benchmarks as well. And thermal solutions currently available for prescotts could probably manage to cool even 4 dothan cores right now, let alone an eventual dual-core P-Ms. All they needed is put the engineers working on smithfield to add 64-bit extensions to dothan and do the wiring for a dual-core dothan...

Pretty stupid, Intel.

Why insist on prescott?
 
According to a certain news, Intel is working on Conroe. This will be Intel first 65 nm technology for dual core in early 2006. I'm not sure if Intel can delivery at that date, but I heard their 65 nm technology process is going well.

Conroe (I wonder what Conroe is or where the name Conroe comes from) will be a new architecture and is different from netburst. I wonder when they mean by different, does that mean netburst is dead or create a function like netburst, but more efficent?

If Conroe will ever get out, expect the heat to drop. No more having to eat ice cream to cool you down while Intel CPU is working 🙂. I also heard the northbridge is the weakest link and will be replace be CSI (forget what it mean, maybe something like combining System Integration). This will allow Intel CPU to have better memory access. If my memory is correct.

<font color=red> Live long, Live to the fullest and be Bless </font color=red> :wink:
<font color=blue> ____________</font color=blue>
Aspire X-Alien / Antec SX1040B
P4 3.2E OC to 3.6 / AXP 1.67 OC to 1.9
Abit IC7 / Abit KR7A
 
>It shows that a 2Ghz Dothan running on an 800Mhz FSB with >two
> channels of DDR400 is about a full +10% faster than a 2Ghz
>Dothan running on a slower 533Mhz FSB with two channels of
>DDR400..

I wouldnt be too quick to conclude a "full 10%" boost from by looking at just two widely known memory bandwith dependant benchmarks like WinRAR and UT.

>Indeed, what Intel clearly needed to have done is scrap all
>the thoughts on dual-core prescotts and get out Yonah for
>desktops. There have been numerous reports about this chip
>being OKed for ~2.5Ghz stable operating frequencies without
>a hitch!

Reread the article:

We were able to reach this FSB under Windows via Clockgen, but from 200 MHz upwards we had inexplicable 3D graphic bugs with some of the games, even when the AGP frequency was locked in. Furthermore, the system didn’t want to boot with this FSB directly configured via the bios.

200 MHz FSB corresponds to 2.4 GHz.

Furthermore, you still seem to think because people can overclock cpu's to this frequency, that intel could just pump them out in volume at that frequency. Its just not true! what overclocking sites may report as "stable", will almost certainly not pass intels (or AMDs) qualification, let alone have enough margin to be produced in any quantity.

>And it would probably even manage to match AFXs in gaming
>benchmarks as well

Dothan is without a doubt, an excellent performer, but try and keep things in perspective; you are comparing max overclocked, overvolted mobile chip running on a mature 90nm process that roughly matches a stock 130nm chip (with 90nm products around the corner, if not on the shelves). That is not all *that* impressive in my book, especially not when prices are more or less comparable. Its even less impressive when you consider one is a 32 bit only chip.

>All they needed is put the engineers working on smithfield
>to add 64-bit extensions to dothan and do the wiring for a
>dual-core dothan...

LOL, is that all ? I hope you realize adding 64 bit support is not like patching SSE3 through microcode. It requires you redo the entire core layout, its basically a completely new chip. "Adding" 64 bit support to Dothan is most likely a greater change compared to designing Dothan starting with the Pentium Pro/2/3 core.

Intel is working on it (and certainly has been for quite some time), and if they could, rest assured they would ship it ASAP, but it seems it will take another ~18 months, simply because designing, validating, taping out, testing and producing these things takes that much time.

Just for the record, AMD started working on K8 back in 1997 or 1998 if I'm not mistaken (likely, intel started not all that much later with their Yamhill). But a 64 bit Dothan was not likely started until the Prescot fiasco became apparent, somewhere in 2003. If true, and intel manages "64 bit Dothan" (forgot its code name) ~3 years later, that would be quite an achievement. 4-5 years is a more typical.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
>(I wonder what Conroe is or where the name Conroe comes from)

Google is your friend.. Conroe seems to be a city in Texas.

BTW, you mixed up some products/code names. AFAIK, Yonah is the Dothan derived 65nm product scheduled for early 2006. It will not be an all new core, basically a shrink of Dothan, possible with some tweaks (like SSE3), and of course, also available as dual core.

The "all new" cores are Merom (mobile) and Conroe (desktop), 64 bit enabled, most likely Dothan-eque but with far greater changes, but these chips are only scheduled for Q3 2006.

Until that time, all intel desktop chips will still be netburst based; they will shrink Prescot to 65nm early next year, and these chips are called Presler (dual core successor of Smithfield) and if I'm not mistaken, Cedar Mill (single core ? or Xeon ?).

>Conroe (I wonder what Conroe is or where the name Conroe
>comes from) will be a new architecture and is different from
> netburst. I wonder when they mean by different, does that
>mean netburst is dead or create a function like netburst,
>but more efficent?

No, netbrust will indeed be dead and burried by the end of next year. Merom is said to be a mostly from scratch design (but Intel and AMD always claim that :)), but it seems to resemble Dothan much more than P4. The INQ reported it would have higher IPC than Dothan, but would clock in the 2.5 GHz range.

My best guess is, it will borrow quite a lot from Dothan, but feature 64 bit extentions (which would already qualify it as a new design), slightly increase pipeline length, possibly add a trace cache, most likely feature redone x87 FPU and SSEx units.

>I also heard the northbridge is the weakest link and will be
> replace be CSI (forget what it mean, maybe something like
>combining System Integration).

CSI=Common System Interconnect. This will be a hypertransport like system bus, but it will not be present yet on Merom. Its scheduled for some chips after that, in 2007.

BTW, this bus topology is mostly important for multicpu systems, for a single (or dual core) setup, the gain over a traditional FSB isn't all that great. However, closely related, but not the same, what does matter is the memory controller, which is ondie for AMD, and resides in the northbridge for intel systems. Wether or not intel will move to a ondie memory controller is not known yet (afaik), although it seems likely. But it would be perfectly possible to implement CSI and still use off die memory controllers, like they do now.

> If my memory is correct

You might want to upgrade to registered ECC memory 😀

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
A nice review from behardware.com.
A cute article, but like all, to be taken with a grain of salt.

The PM architecture really does look promising. But then the P4 architecture <i>had</i> promise at one point too. I even remember ancient schematics when the P4 was first introduced that had a second FPU, but it was cut to release the architecture prematurely. :O Gee, where'd Intel go with <i>that</i> intended design? Intel seems to have almost completely wasted the P4's potential by now. They even killed the Scotty cache improvements with an even longer pipeline. So I wonder how badly they'd screw up the PM's potential if they officially ported it to desktop too. Is the potential there? Yeah. But Intel has a bad track record with turning potential into advantage.

:\

But back to current implimentations. Personally I wouldn't toy with a PM with DC DDR400 (When will it be DDR2?) until the motherboards are a bit better. (Well, mostly BIOS support really.) Asus makes some good products, especially for Intel CPUs, so I wouldn't mind being stuck with their mobos for a good desktop PM. But it's pretty clear that things are still very much in a teething stage here.

One other thing that greatly irks me is that damned HSF solution. I'd much rather see something that bolts into the P4 HSF mount and lets you in turn mount a P3 (or even one of AMD sockets?) HSF to the PM than run with that thing. Or even one that just reproduces the P4 HSF mount, but elevated to the height of the adaptor so that you could put a P4's HSF on the PM. Whatever. Just <i>anything</i> so that you're not limited to that single HSF.

<pre><font color=purple><i>Jesters do oft prove prophets.</i> -Regan in
King Lear (Act V, Scene iii) by William Shakespear</font color=purple></pre><p>@ 187K -> 200,000 miles or bust!
 
200 MHz FSB corresponds to 2.4 GHz.
<i>Point taken on the limitations of 200Mhz FSB on these processors.</i> However, 200Mhz FSB doesn't correspond to 2.4Ghz at all. The 2.4Ghz result with highest FSB came from the OCed 1.73Ghz P-M working at 13x185Mhz, which according to my math <i>isn't</i> a 200Mhz FSB.

In addition to that, <i>they validated a 2680Mhz OC with 134Mhz FSB on the 2.0Ghz, 400Mhz FSB Dothan.</i> This is quite impressive, for a perfectly stable and functional OC. It indicates <i>they could probably release 2.26, 2.40 and even maybe 2.53Ghz 533Mhz FSB parts if they wanted to at some point in time.</i> But they don't want to make dothan more powerful than prescott, because, well, that would make them look very silly. There are a gazillion different sites that feel exactly this way and there are a lot of people out there thinking like that - hell, that's even why the 479/478 socket adapter <i>exists</i> in the first place!

You know, P4Man, you always manage to surprise me. You are obviously very litterate about this stuff and you know a lot, but from what I've seen, you make some of your conclusions based on what you want to believe in. Why is that? You really know stuff... and yet some of your posts aren't quite coherent...

For instance, a Venice core working at 2.4Ghz on retail products does 2.8Ghz on air all over the net. That's a 17% overclock. We recon that AMD probably has clock headroom and could release more speed bins. Even the 2.4Ghz A64 X2 can do 2.7Ghz all over the net too, indicating additional headroom. Quoting you, you agree:
>It indicates that a 2.6Ghz dual-core could probably run OK
>with 90nm..

Keep in mind, these are just the very first 90nm chips from AMD ! Its a new process, the scaling has yet to begin. Power consumption is down dramatically compared to 130nm (which wasnt all that high either), so I wouldn't be surprised if 3 GHz dual core chips will be upon us before AMD moves to 65nm around this time next year.
So the single-core chip can do 2.8Ghz (keeping in mind that sources related processing errors in torture tests beyond 2.8Ghz), dual-core can go 2.7Ghz and then you suddenly expect 3.0Ghz dual-core chips with this technology. OK.

But then, we report that 2.0Ghz P-Ms can do a <b>30%</b> overclock! We recon, as we did before, that P-Ms have huge frequency headroom. But wait! You object:
Furthermore, you still seem to think because people can overclock cpu's to this frequency, that intel could just pump them out in volume at that frequency. Its just not true! what overclocking sites may report as "stable", will almost certainly not pass intels (or AMDs) qualification, let alone have enough margin to be produced in any quantity.
Did you notice how you reacted differently to two symmetrical situations? In one case, with the venice cores, you even added some extra frequency headroom for the dual-cores (3.0Ghz, while even single-core venices haven't been reaching 3.0Ghz with ease)! Where's the oh-so-careful validation you were speaking of in the AMD case? Or haven't you heard that even a simple Prime95 stress test resulted in wrong numbers for >2.8Ghz Venice cores in a few cases?

Be careful not to jump to whichever conclusion you set your mind on... I might as well have answered your post here with your <i>"you still seem to think..."</i> quote - it would have been very appropriate.

I mean, I'd love to see a 3.0Ghz dual-core A64 X2 on watercooling (heck, if possible, I'd be the first one to buy one), but I'd also love to see P-M shining on desktops. Question is, why wouldn't you?

<P ID="edit"><FONT SIZE=-1><EM>Edited by Mephistopheles on 05/11/05 05:23 PM.</EM></FONT></P>
 
...and that's why we're talking about P-Ms on desktops, which wouldn't have different cooling potentials, and we conclude that P-M would do better than Prescott...

...that's more or less what I was thinking...
 
>Point taken on the limitations of 200Mhz FSB on these
>processors. However, 200Mhz FSB doesn't correspond to 2.4Ghz
>at all.

It does for the 740 which CPU-Z screenshot was posted above that paragraph, but I could have misread.

As for the rest of your post, let me just tell you that I don't apply a percentage to achieved overclocks to (gu)estimate what would be possible to produce.

I use many other criteria as well, like maturity of the process, scaling on previous process, design of pipeline and cache, remaining "time to live" of the core, as well as roadmaps and the rumour mill. If you want, go ahead and look up my posts about initial Tbreds and early K8s, they didn't overclock didely squat, yet I was confident they would scale once the process matured, or some speedpath issues where ironed out. If I had used your overclocking logic, I would have made the claim many made back then that K7 couldn't break 2 GHz.

Furthermore, I never said I thought 3 GHz was possible for K8 *today*, the quote you put there, was if I remember correctly in the context of Q3 2006 scenario (versus Conroe). And I stand by that quote, simply because unlike Dothan, K8 *is* designed for higher clocks, doesn't feature an ultra low latency & ultra low power (but therefore, low clocking) L2 cache (*), and unlike Dothan, it is being produced on a brand new, immature process, and has at least 12-18 months of tweaking ahead of it. Finally, with FX57 expected anytime now, its really not a bold claim, or is it ?

(*) Just to go a bit more in depth on this one, you do realize Dothan has a <b>10</b> cycle L2 ? You do realize that is almost half of what K8s L2 cache latency is (18), and almost 1/3 of prescot 2Ms cache (27) ? Why do you think intel upped cache latency from 16 cycles on Northwood to 27 on Prescott 2M ? Simple: clockability. When they designed they where still shooting for 5 GHz remember ?

But there is no way they can make a 10 cycle L2 scale to anywhere near 3 GHz. Dothan was designed to perform at low clock, low power, and it does terrific at what it was designed for, but if you think because it has low power draw or overclocks well that it can clock indefinately then you are hallucinating.

BTW, don't underestimate the performance effect of this either, its an essential part of Dothans excellent performance, and a far greater factor in Prescotts "mediocreness" (word ?) than its increased pipeline that is always quoted.

Anyway, I stand by my predictions/claims. *Dothan* (and not a revised tweaked Dothan core with a Prescotesque L2), even if freed from its 27W or whatever thermal enveloppe is not likely to scale to much more than ~2.4-2.5 GHz on 90nm as a standard product. K8, I firmly expect to see 3(+) GHz parts on 90nm. And no liquid cooled 3 GHz Dothan screenshots or 5% max Venice overclocking reports will make me change my mind on that any time soon.

= The views stated herein are my personal views, and not necessarily the views of my wife. =<P ID="edit"><FONT SIZE=-1><EM>Edited by P4Man on 05/11/05 07:49 PM.</EM></FONT></P>
 
It does for the 740 which CPU-Z screenshot was posted above that paragraph, but I could have misread.
Actually, after you posted, I checked that part more thoroughly and I agree that they don't quite make things very clear.

They state "2.6Ghz is stable and validated for use", with a huge CPU-Z screenshot with ~2670Mhz speed, 888(4x222)Mhz FSB, but I'm thinking the stable one wasn't this config, but rather the 755 OCed to 2.68Mhz on a more conservative ~533(4x134)FSB; I think that's it, because there's a picture of such a configuration earlier on that same page.

Kind of confusing though. :frown: Not very straightforward to read.

Anyway, you do have a point. I'm sorry for being pedantic earlier, but it's just that Dothan sometimes feels like so much superior to prescott, that we think it could do just anything. Of course, that would also be naive. I just felt that I had to try and convey the impression that maybe a dual-core dothan could possibly be better suited for a <i>temporary</i> solution while Conroe/Merom don't show up than prescott. Many people think that, and to that particular general idea, I'm rather inclined to agree.

Then again, <b>this is probably just all consequence of prescott being such an incredibly sh!tty processor! </b>Centrino, as is, as nice and beautiful as it is for notebooks, is no serious contender to K8, and there's no way to disagree to <i>that</i>. So while there are daydreams about Centrino ruling prescott, it's not and will not be a solution to Intel's problems. Intel is in deep trouble right now. Got to agree to that too. So, putting everything in perspective: you're absolutely right, Centrino isn't quite so much of a good idea anyhow too. Not for the big picture; maybe just enough to warrant that converter for very few enthusiasts though. It's nice and colorful for a few guys, but it's no long-term solution, it's probably not a viable mass-market solution... if you think about it...

In any case, enough said. Nice discussion here. Bottom line (old one, but just in case):

OMFG, PRESCOTT <b>S U C K S</b> SOOO MUCH!!!!!!

<P ID="edit"><FONT SIZE=-1><EM>Edited by Mephistopheles on 05/12/05 02:13 AM.</EM></FONT></P>
 
>You still don't get it do you?
>Pentium-M is targeted at mobile users

It doesn't matter where intels marketing *targets* it. A64 X2 is targetted at enthousiasts, does that mean it wouldn't make it a suitable chip for a workstation or even a low end server if you use registered RAM with it ? XP-M was targetted at mobile users, does that mean it wasn't a great overclocking desktop chip either ?

What matters is that Dothan is *designed* for low power and high efficiency, which by itself doesn't mean it couldn't be considered elsewere, but in this case, the design simply involves compromises that make it unsuitable for high end desktop. 64 bit is one factor, clockability another.

>Even though it can go a lot more higher Intel doesn't want
>to cuz it's SOLE FUCCKING PURPOSE is for mobiles and mobiles
> only!

Thats hardly an argument. If Dothan could have clocked higher, who cares what purpose it currently serves ?

>Don't try to disagree with other unless you know what it
>really it. *sigh*

He knows what it is, just wondered what it could also be.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
>So uhhh... what were you two arguing about again? I forgot!

You could try scrolling up and read ? But since I'm in a good mood, let me summerize: Mephisto claimed Dothan *could* be a K8/P4 killer if intel wanted, I claim it can't, even if it wanted, both agree it isn't and intel doesn't want it :)

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
And finally, I did have to give in to P4Man - the original enthusiasm was exaggerated. He made me think about it, and his conclusion sounded perfect to me.

My latest point was that the exaggerated enthusiasm is probably because Prescott stinks so badly. Dothan was a step forward and is great, and this causes frustration, but dothan is no K8 killer, P4Man is right.

I just hope Intel does good with Conroe/Merom - the market has been a little stalled in terms of scaling because of Intel.

BTW, I've been having a really hard time to avoid writing "Monroe/Cerom", for some reason. Damn you, Marilyn - and I don't even know who you are...

<P ID="edit"><FONT SIZE=-1><EM>Edited by Mephistopheles on 05/12/05 11:28 PM.</EM></FONT></P>
 
>My latest point was that the exaggerated enthusiasm is
>probably because Prescott stinks so badly

Does it ? Some not quite related issues aside (new socket, new motherboards, DDR2, .. and associated performance "gains" and higher prices) its only real major flaw is powerconsumption. And yes, performance may not be what was expected compared to NW, but mostly, like I said, because the cache and pipeline where designed to hit far higher clock rates. I have little or no doubt it could clock *far* higher if intel could fix the power consumption.

This is IMHO the irony; if 65nm indeed is all intel claims it to be, it sounds like the *perfect* medicin for Prescott. yet intel won't fully apply this cure, because they have now decided to both abandon netburst and go dual core all the way, 2 major factors which means we may never see Prescotts true potential.

So I think Prescotts design wasnt that bad, it was mostly badly timed: Northwood should have been the 90nm P4, and prescott its 65nm successor. I think both could have made excellent chips. Could it be intel pulled Prescott forward to be able to enable iAMD64 ?

As for Conroe.. clearly no one knows, but I fear intel keeps overcorrecting. Going from one extreme to the other, and never striking the right balance. Prescot, especially on 90nm was overpipelined and "too clockable", but I'm really not sure a 2.5 GHz core by the end of 2006 will be such a formidable all round performer either. Especially not if it inherents most of Dothans basic layout (my expectation, a bit like K7->K8), because no matter how many IPC increasing tricks you implement, some things just only scale with clock.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
Especially not if it inherents most of Dothans basic layout (my expectation, a bit like K7->K8), because no matter how many IPC increasing tricks you implement, some things just only scale with clock.
You do have a point there, but in any case, there's a discussion there: clock speed can only go physically so high. Gigahertz is already pretty damned fast. Tens of gigahertz is probably unfeasible, if you consider current design paradigms. You'd probably have to change a LOT of things and a LOT of the ways things are getting done now.

In any case, either the way we do chips changes to accomodate more clock, or there has to be a clever application of resources. And personally, I'm inclined to think that a clever application of resources can probably still give surprisingly good results, despite of clock speeds.

It's still a rather good debate, though. A question: Is K8 such a well-designed architecture that it already is close enough to the maximum IPC asymptote? I mean, K8 is good, but the CPU designers can't possibly have exhausted all IPC-increasing options. Not just a few small tricks, but streamlining system interfaces, completely changing processor layout and inner workings, and so on...

But then again, I'm the forum's resident Eternal Optimist, so I'm not truly qualified to answer that with 100% accuracy. I for one would just be way too disappointed if all we could do to increase current performance levels by more than a notch would be to increase clock...

<P ID="edit"><FONT SIZE=-1><EM>Edited by Mephistopheles on 05/13/05 02:07 AM.</EM></FONT></P>
 
You do have a point there, but in any case, there's a discussion there: clock speed can only go physically so high. Gigahertz is already pretty damned fast. Tens of gigahertz is probably unfeasible, if you consider current design paradigms. You'd probably have to change a LOT of things and a LOT of the ways things are getting done now.

In any case, either the way we do chips changes to accomodate more clock, or there has to be a clever application of resources. And personally, I'm inclined to think that a clever application of resources can probably still give surprisingly good results, despite of clock speeds.

It's still a rather good debate, though. A question: Is K8 such a well-designed architecture that it already is close enough to the maximum IPC asymptote? I mean, K8 is good, but the CPU designers can't possibly have exhausted all IPC-increasing options. Not just a few small tricks, but streamlining system interfaces, completely changing processor layout and inner workings, and so on...

But then again, I'm the forum's resident Eternal Optimist, so I'm not truly qualified to answer that with 100% accuracy. I for one would just be way too disappointed if all we could do to increase current performance levels by more than a notch would be to increase clock...

There are probably more "tricks". However, the thing to keep in mind with any of this is how much gain those tricks will gain you vs the cost in transistors, power, etc. The K7/K8 is already beyond the point of diminishing returns IMO when it comes to ILP. We've got 9 execution units and 9 issue ports and its achieved IPC is maybe 1-2 on a good day and that's with 3-way decoders. Like it or not, there simply isn't that much ILP to be extracted out of your every day code.

However, that is pretty much irrelevent. Memory is the main limitation and the processor that has the better memory subsystem wins. Prescott has quadruple the L1 data latency as NW and 2x the L2 latency. It's other architectural enhancements which did fix a lot of problems that existed on NW, only serve to mask this rather than provide any substantial gains in ILP.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 
>I wouldnt be too quick to conclude a "full 10%" boost from by looking at just two widely known memory bandwith dependant benchmarks like WinRAR and ...<

IMHO, "same Cpu core" at the same final CPU speed, the boost in performance of a Cpu could be like this:
if is running @ 133Fsb clock, rising it to 200MHz will get approx. 13% boost [200/133=1,5 than 3rd rooted to get 1,13 index] as some testings time ago showed & published. If same time rising Mem clock for 50% (@ 1:1 divider) you get a in total 25% more system performance [1,25 index in total].
For memory performance see my article under comp/benches -updated on my site ...

--
Regards , SPAJKY ® http://www.spajky.vze.com
3rd Ann.: - "Tualatin OC-ed/BX-Slot1/inaudible setup!"
 
>You do have a point there, but in any case, there's a
>discussion there: clock speed can only go physically so high.
> Gigahertz is already pretty damned fast.

I'm not sure there is a hard limit somewhere, let alone I'd know where it would be, but I remember when 10 MHz was already pretty damn fast.

>It's still a rather good debate, though. A question: Is K8
>such a well-designed architecture that it already is close
>enough to the maximum IPC asymptote?

There is no IPC asymptote, its not like >1 IPC isn't possible. I do think however, that K8 is currently one of the most (if not, the most) <i>balanced </i> designs.

>I mean, K8 is good, but the CPU designers can't possibly
>have exhausted all IPC-increasing options.

Of course not. Just look at the late Alpha EV8 design. AMD and intel however, are not on a quest to achieve maximum theoretical IPC. They also need to take into account economical aspects like as well as power, design complexity, yielding, die size and last but not least, clockability. What good is an overly complex chip design that achieves uber theoretical IPC, but consumes excessive ammounts of heat, is 400mm² large, hardly scales beyond 1.5 GHz, doesn't perform anywhere near theoritical performance on real world code ? Oh wait, it exists, its called Itanium 😛

>Not just a few small tricks, but streamlining system
>interfaces, completely changing processor layout and inner
>workings, and so on...

Not sure what you are trying to say here, and I assume you don't either 😛. However, if your question is: is there any low hanging fruit left to extract more ILP from current x86 designs ? I would be inclined to say: not a whole lot. The best tricks have been implemented, though not necessarely all in the same CPU, but if you mix and match the best features of current CPU's, I think you're getting pretty close to what is realistically achievable. A few decades ago there where many big things on the horizon, like OoO, pipelining, ondie cache etc... today, I don't see any of those, at least not as far as ILP is concerned. IMHO, ILP scaling will be all but dead and burried by the end of this decade, and any performance boosts will mainly come either from TLP (multicore), transistor scaling (frequency), and hopefully from further decreasing the memory bottleneck.

>I for one would just be way too disappointed if all we could
>do to increase current performance levels by more than a
>notch would be to increase clock...

Performance has always been increased in small steps. Not a single x86 cpu launched in the last two decades suddenly gave a huge performance boost.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
There was (time ago) a lot of testing & reviews of that time newcomer P4 Northwood 200Fsb (800qp) against predecessor on 133Fsb (533qp) on same final core clock, which revealed approx. that difference in performance I stated. Gains were from few % to few tenth %, but average was that 13%, which is not huge, but noticeable (some applications were more or less affected) ...

--
Regards , SPAJKY ® http://www.spajky.vze.com
3rd Ann.: - "Tualatin OC-ed/BX-Slot1/inaudible setup!"
 
But wasn't that mainly due to the P4's architecture, e.g. the really long pipeline and so forth? Much like how Hyperthreading would benefit the longer piped P4 than say the K8.

<A HREF="http://nfiniti.blogspot.com" target="_new">nfiniti plus one - my blog</A>
 
I'm not sure there is a hard limit somewhere, let alone I'd know where it would be, but I remember when 10 MHz was already pretty damn fast.

With modern manufacturing technology, sure there is. You'll hardly get flip-flops that'll work with a delay of 1ps to reach 1 THz let alone combinational logic in between that those pipeline registers are suppose to speed up.
The operational delay of your largest component (which if you continually break up your circuit, will be your flip-flops) will most certainly limit your clockspeed and I doubt any have reached 1 ps yet.

There is no IPC asymptote, its not like >1 IPC isn't possible. I do think however, that K8 is currently one of the most (if not, the most) balanced designs.

Depends on what you're talking about. Theoretical software models can provide plenty of IPC. Realistically, that's not so. Hell, look at Itanium.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 
It has nothing to do with that; same is with AMD; when you rise Fsb, you rise a bit also performance of system; if you add to that also mem.clock rise, you gain even more. So thats why OC-ers like, if they can unlock Cpu & they can push Fsb (&/or mem) higher with lower multiplier. Well, latest CPUs are all locked more or less ...

--
Regards , SPAJKY ® http://www.spajky.vze.com
3rd Ann.: - "Tualatin OC-ed/BX-Slot1/inaudible setup!"