AMD states K8L aka Barcelona faster than all Intel cores

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
There will be no clock speed war. It is long over ever since netburst died. Intel can only clock the netburst architecture to 3.8G, not Core architecture. Remember to clock higher, your pipeline depth should be longer (P4 pres 31stage, Core 14stage).
This is true, but remember that people have overclocked Core 2 beyond 3.8GHz, which suggests that the architecture itself is capable of exceeding that. My understanding is that a deeper pipeline depth reduces the overall logic delay in each stage, thus allowing you to reach higher clock speeds. Bear in mind though that Netburst was designed to reach 10GHz, thus this explains the very long pipeline. Core 2 would never be able to reach this, but clearly it shows no problems in going over 4GHz.

Something I did forget though was that they reached 3.8GHz with a single core CPU. Given process refinements, then 3.6 to 3.8 is probably the limit they could do at present on a dual core with decent thermals. Intel's limit has always been keeping thermals under control, rather than architectural limits which seems to be whats keeping AMD held back at present.
 
I must have missed something.... you think "K8L" can beat Core 2, but not Kentsfield? Isn't Kentsfield just 2 dual core Conroes with a little more cache?
So if dual core K8L could beat Core 2 in your opinion, then why could quad core K8L beat "quad core" Core 2? I know the article refers to 4x4, but I don't really see 4x4 as a permanent solution, it seems to me that it is more of a stop gap or marketing ploy to slow down the Intel train. The true quad's will be where it's at.

And to the other poster, Julain, AMD has sold 3ghz+ Opterons to Sun Micro. At least that is what was reported. We cannot buy them, just the fact that they were reported as selling these 3Ghz cpu's to sun.

And my final statement.... who the hell cares.... and who the hell knows. You kids need to stop wasting your time on this crap. You guys are about the only ones that really care. In the end nothing you guys talk about will either make or break either company. AMD has a foothold now and they are not going anywhere. Go outside, get some sun.... do something other than argue about something that cannot be proven.
 
Okay after reading through this post it seems there are 2 points of view they are:
Intel fanboys - Intel pwns AMD cant beat them with K8L and
AMD fanboys - Haha Intel gonna loose the lead again

All this aside it annoys me when people say "AMD can't come up with a conrow counter in the near future" well unless you work for AMD tech dept. I dont think you are qualified to say that, anyway my $0.02 worth

That's because you have people on both extremes fighting over something that needs not to be fought over.

Truth is, K8L may in fact be faster. We cannot say this factually, since no one has toyed with a working sample. We can only speculate from the specifications released.

At the same time Intel are not sitting on there laurels the way AMD did with there K8. By the time K8L launches (the server Opteron variant first), Intel will be ready to launch there 45nm parts and the first Xeon processors using the CSI shared interconnect bus (IMC will only come in 2008).

K8L's Desktop variant will likely face Intel's 45nm native Quad Core CSI interconnected Desktop variant. Only main difference is that this new Intel Architecture will need a new platform (motherboard, socket, chip-set).

So as I've been saying for quite some time... we will not be witnessing either company dominating the other for 3years as we did with the K8 over Netburst. It will be as it was before, with each side trading blows.. kinda of like an nVIDIA vs. ATi scenario for those not old enough to remember the Pentium !!! vs. Athlon days.
 
The proverbial proof is in the proverbial pudding.

Jack

Just bookmarking.

Anyway, I found this interesting (and so did the author, it seems):

When talking to both Phil and Pat, we got some interesting answers about advancements in Instructions Per Clock with the Barcelona Pat Conway said that we will see a “performance improvement of 150% over the next two years that will be within the same power envelope.” If you take into account that you can back off frequency and add more cores, some of this can be achieved now if you are counting IPC across all cores. (See below how simply pulling back on frequency by as little as 16% can have a huge impact on power draw.) Intel is bound by the same principles as well.

On IPC and AMD’s Barcelona Phil Hester had this to say: “We want to stay focused on upgrade compatibility. We must realize more IPC per watt, and our next-generation architecture will show a 50% improvement of IPC per watt.”
(http://enthusiast.hardocp.com/article.html?art=MTE3NCwxLCxoZW50aHVzaWFzdA==)

(Just haven't got the time to go through C2D vs Barcelona IPC/Watt; it should be interesting, though, supposing Barcelona's 50% IPC/Watt improvement over the K8...).


Cheers!
I was starting to think i was the only one noticing this sentence on HardOCP's article, and it comes straight from AMD.
Well, 50% more IPC per watt (over K8 ), means that the actual IPC advantage will be (sadly) much less, and far from the 80% that the Baron speculates.
Why?
Because K8L has many more advanced power saving features, including the individual clocking of the 4 cores, over a standard 2x 2 cores K8 setup (if that is the reference for the comparison).
But AMD here is quite ambiguous in its terms...
 
At least for us megataskers. :wink:
You mean megabull$hiters?


That's your job. Should I take a pic of my multimon setup with VS 2005 and SQL Server running?

WIth lots of IE tabs and 2 different email programs, plus a systray full of items? Noo, that would just make you even worse. I guess if I actually said what my current project is that would be even worse.

SO what's it like to have no life?
 
At least for us megataskers. :wink:
You mean megabull$hiters?


That's your job. Should I take a pic of my multimon setup with VS 2005 and SQL Server running?

WIth lots of IE tabs and 2 different email programs, plus a systray full of items? Noo, that would just make you even worse. I guess if I actually said what my current project is that would be even worse.

SO what's it like to have no life?

What CPU are u running now?
 
Can anyone help me on this one? The only tangible information on the article on which this topic was based on is the alleged “50% improvement of IPC per watt”.

Now, how would that work exactly? Instructions per Cycle per watt… :? :roll:

You know what? I don’t care.

I didn’t even find anything on the article that would remotely suggest AMD or its representatives stated K8L would be faster than C2D (so much to say about the title of this post).


I really hope AMD comes up with something great. But I have to say this post can only be for the joy of fanboys (from either side).

It's simple. If your drive current can be lowered, you will save power running each cycle.

AMD HAS said that Barcelona will be 80% faster than Opteron and FX. That is their measure, not Core 2, not Kentsfield.
 
You don't need to work in AMD tech department to know this --- roadmaps show at best a 6000+ by end of year, and 4x4 is not going to cut it. Intel will have the performance lead into 2007, AMD will not have anything to compete with C2D until mid 2007 this is from AMD.

Aside from that, AMD has a rough year a head, but as long as they deliver the playing field will remain competitive.

Yea Intel got a real jump with their C2D. But what about the Quad Cores? Well what I mean is that with Intel Leading the way in the new year, wouldnt they move to the QuadCores, raising the bar even higher for AMD?
 
second, since when does a cpu with 4 cores not get to be called "true" quad core because it's not designed a certain way.

Thats like saying a car with 2 V6's is the same as a car with a V12. Not saying that the 2 V6 car would be slower, just that it is not the same because of the design.

Quite like the analogy i made.

3 teams are trying to break the land speed record.

Team A uses 4 cars which all reach 400mph.
Team B uses 2 cars which both reach 500mph.
Team C uses 1 car which reaches 800mph.

Which team went the fastest? Cumulatively Team A won but in the case of the record its the fastest speed per instance. If Team C were to use 2 cars they would blow them all away. Raw power is what is needed.

It seems people are going mad with multicore lately which is a mistake. Core2Duo isn't sucessful because of it's multiple cores, it's because of it's raw power. Until AMD increase the power of each core they are going to sit behind Intel. There is a limit to what the human brain can do at once. Marketing always makes me laugh "play games, surf and check email all at once". If someone can teach me how to do all those simultaneously anyway I'll stand on my head too.
 
AMD HAS said that Barcelona will be 80% faster than Opteron and FX. That is their measure, not Core 2, not Kentsfield.
Baron, could you please provide a link?
I really can't remember reading anything like that..
It would be also interesting to know, how they intend this 80% faster.. is it core for core?
Or a quad core K8L outperforming a dual core K8, and in which application scenario..?
Frankly, according to what they have published about the K8L architecture so far, i could foresee an 80% improvement (core for core) only in synthetic SSE benchmarks.
 
AMD HAS said that Barcelona will be 80% faster than Opteron and FX. That is their measure, not Core 2, not Kentsfield.
Baron, could you please provide a link?
I really can't remember reading anything like that..
It would be also interesting to know, how they intend this 80% faster.. is it core for core?
Or a quad core K8L outperforming a dual core K8, and in which application scenario..?
Frankly, according to what they have published about the K8L architecture so far, i could foresee an 80% improvement (core for core) only in synthetic SSE benchmarks.

He's making it up.. 😛
 
At least for us megataskers. :wink:
You mean megabull$hiters?


That's your job. Should I take a pic of my multimon setup with VS 2005 and SQL Server running?

WIth lots of IE tabs and 2 different email programs, plus a systray full of items? Noo, that would just make you even worse. I guess if I actually said what my current project is that would be even worse.

SO what's it like to have no life?

Yes Baron please give us a picture! Just make sure not to leave any of your janitorial supplies in the background this time
 

It's simple. If your drive current can be lowered, you will save power running each cycle.

AMD HAS said that Barcelona will be 80% faster than Opteron and FX. That is their measure, not Core 2, not Kentsfield.


What I meant with my previous post is that the statement is rather vague.

According to the article, Phil Hester stated: “50% improvement of IPC per watt”. To say he’s talking about a more energy efficient design is a given… So what!?

Hence my question on how would that work? How will they manage to do that? Those questions are not addressed in the article (and I can’t agree with your simplistic approach to it). It is by the way, a major wandering on whether quad cores are going to be needed, by whom and to what purpose.

There’s nothing in that article that justifies this thread. At least not in the way it was proposed.


Also, you should pay more attention to what’s written. It was not I who said Barcelona would be faster than Core 2 (or Kensfield for that matter). It was a critic over the title of this thread.
 
The Baron has predicted it and has also told AMD how to do it.

How can anyone here doubt the word of such a legend? He has told AMD and Intel how to improve their CPU's for years now and everyone of his ideas that has been implemented has put the industry years ahead in development. He will even point you to the sites where he told AMD and Intel what to.
 
I do want to address this better in the near future, however, we can get a lower limit impression using a core 2 duo for this purpose, assuming a linear scaling. I tend to agree with you to an extent, Tom's data set is rather good, but not enough has been done to be conclusive. I believe I worded my response as 'the trend appears to not support' rather than 'It will not saturate under any circumstance' a major difference wording hopefully I was careful enough to leave the reasonable doubt.

However, I believe the trend is in a direction that suggest for most applications FSB throttling will be a non-factor overall.

I have run 2 major experiments on the X6800, the first one ratchets down the system clock to 100 MHz or 400 MHz bus, yielding a paultry 3000 MB/s read BW and about 2000 MB/s write (measured by everest), I then ramped the multiplier up to 20, giving an effective CPU clock to system bus ratio of 5:1. I then measured 2 games, rendering, and synthetics with a mix of rendering, encoding, etc. (PC Mark 5). The speed scaled linearly all the way to 20x multiplier, I repeated again with 133 MHz or 533 MHz FSB, and again could not saturate the bus with any application except for WINRAR using it's direct memory bench algorithm. Winrar saturates the memory-GMCH bus link before the FSB easily for memory speeds < DDR-667.

Effectively, I could forsee a few cases where the FSB demand is high enough to saturate, but based on the datda set I collected, it would appear that a 5.3 GHz Core 2 Duo, running both cores 100% across a variety of apps does not saturate the FSB.... given this, I can see why Intel chose 2.67 GHz as the release speed, aside from power concerns.

I will publish my data set in a few weeks, I am still collecting data at 200 MHz, 233, and 267 to ensure that the scaling goes correctly.

One thing is certain, I have completely been able to blow the Xbitlabs memory part 1 analysis out of the water.

I am now working on a Perfmon counter scheme to try to actually detect realtime the BW utilization on a C2D. I don't think a P4 comparision is valid as the memory subsystem (cache, prefetchers) are so totally different to equate P4 bus demands to C2D bus demands would be erroneous at best.

Jack
You have a X6800? Lucky. Just for interest did you get a P965, i975X or other chipset? People say the P965 is better for overclocking, but the i975X gives better performance at a given clock speed.

In any case, I may have worded my response too strongly so hopefully I wasn't taken the wrong way. From your describtion I gather you were testing for decreases in performance scaling as you increased clock speed via multiplier at a given FSB speed. The linear scaling even on a paltry 400Mhz FSB is pleasantly surprising. I guess we have the large 4MB L2 cache and the superior prefetchers to thank for that since most of the FSB traffic is just pre-filling the L2 caches, something that is rather orderly and planned by the prefetches, rather than the FSB traffic being consumed by supplying information to satisfy pipeline stalls.

I was wondering if you could run some tests with the following configurations: 2.67GHz (133x20), 2.67GHz (167x16), 2.6GHz (200x13), 2.67GHz (267x10), and 2.67GHz (333*8) with as standard a memory speed as possible. I'm interested in this sort of configuration since it'll take into account the latency benefits of a faster FSB in addition to the bandwidth benefits. Personally, I don't think there will be much difference between a 1333MHz and 1067MHz FSB, but I'm thinking there may be a more noticable difference between 1067MHz and 800MHz and performance will drop off from there. Obviously, if you don't have time or aren't interested that is fine.

Your efforts are much appreciated and I'd certainly like to more of the linear scaling data that you've already obtained. Maybe you could make another chart like the one with SuperPi scaling?
 
I wonder if Socket F's Register DDR2 memory controller is also backwards compatible with unregistered DDR2 memory. Otherwise they would have to design a separate chip which isn't very convenient. Opteron 1xxx which use unregistered DDR2 are use AM2 not Socket F so they don't really have an example of that yet.

As far as I see it (no pun intended!), there would be no probs for both AMD's sockets (AM2 & F) regarding unregistered DIMMs, since AMD & mainboard manufacturers could use terminators for the extra [registered] DIMMs & Socket's pins/pads traces (is this correct?); if true, the issue would rely on the mainboard's HT traces, hence, on the very compatibility when using registered DIMMs. This would lead, compulsorily I think, to two different mainboard designs for the same socket, one with & one without terminators. Would it have to imply a chip redesign, in any case? (assuming the IMC could support both DIMM types & no chip's pinout diff.).

Edit: I'm taking into account BM's words, when he addresses socket AM2 & a "crippled" 1 pin less ES chip, for compatibility's sake.


Cheers!
 
Now, here is what is really fun about this chip... at 400 MHz FSB here are the idle full load temperatures ambient was 72 deg C, core temperatures measured with Core Temp.

Multiplier/Speed/Idle/Fullload
6/600/31/31
8/800/31/31
10/1000/31/32
12/1200/31/32
14/1400/31/32
16/1600/32/33
18/1800/32/34
20/2000/32/34

Phenomenal.

Jack

Amazing results indeed, if you consider a X6800 (top of the line).

Just curious: When you state «ambient was 72 deg C», I'd assume you're reporting 22º C (72º F) ambient temp. since, if the system's reporting 72º C (inside the case), that'd be too hot.


Cheers!
 
At least for us megataskers. :wink:
You mean megabull$hiters?

That's your job.
I belive you Baron. With such therapy like yours, the objective reality is disordered and you have the oposite image about the the world that surrounds you.
Should I take a pic of my multimon setup with VS 2005 and SQL Server running?
No Baron, it is OK. Just don't jump(over the window), the hallucinations will gone after 16 hours. Than, you will be able to conclude that your multimon setup is the floor you are moping.

WIth lots of IE tabs and 2 different email programs, plus a systray full of items? Noo, that would just make you even worse. I guess if I actually said what my current project is that would be even worse.
Oh....it seems that you are very communicative person. Do you remember when was the last time you were somewhere with a friend(if you have any) or with a girl/boy firend(depends on which one you prefer)?

SO what's it like to have no life?
I don't know, but I really feel sorry about people like you.
 
Now, here is what is really fun about this chip... at 400 MHz FSB here are the idle full load temperatures ambient was 72 deg C, core temperatures measured with Core Temp.

Multiplier/Speed/Idle/Fullload
6/600/31/31
8/800/31/31
10/1000/31/32
12/1200/31/32
14/1400/31/32
16/1600/32/33
18/1800/32/34
20/2000/32/34

Phenomenal.

Jack

Amazing results indeed, if you consider a X6800 (top of the line).

Just curious: When you state «ambient was 72 deg C», I'd assume you're reporting 22º C (72º F) ambient temp. since, if the system's reporting 72º C (inside the case), that'd be too hot.


Cheers!

Corrected. Yeah 72 F or 22 C, slip of the type, corrected above. Thanks, at stock conditions (Vcore is 1.30, 2.93 GHz) I idle 34-36 and full load at 40-41 measured by Core Temp.

I think I am getting these good temps for two reasons, first low ambient -- 22 C as you pointed out, I used sparingly (very thin layer) of AS5, I thorough cleaned both CPU and HSF with acetone then alchol, and finally the case has outstanding front to back flow through.

I positioned a thermocouple mid-case, coaxial with the CPNS9500 Zalman HSF (it is a vertically oriented fan) and about 6 inches away. It measures 25 deg C for the air temperature at the intake to the CPU fan.

JackJack, when you ran it down as low as 6x100, what was the vCore? Was it still at ~1.30v , or can you drop it below that in your BIOS? I'm assuming that it was either 1.3v, or not too much lower, and the reason the temps didn't increase much from 600MHz to 2GHz was because of static vCore. :wink:
 
second, since when does a cpu with 4 cores not get to be called "true" quad core because it's not designed a certain way.

Thats like saying a car with 2 V6's is the same as a car with a V12. Not saying that the 2 V6 car would be slower, just that it is not the same because of the design.

Quite like the analogy i made.

3 teams are trying to break the land speed record.

Team A uses 4 cars which all reach 400mph.
Team B uses 2 cars which both reach 500mph.
Team C uses 1 car which reaches 800mph.

Which team went the fastest? Cumulatively Team A won but in the case of the record its the fastest speed per instance. If Team C were to use 2 cars they would blow them all away. Raw power is what is needed.

It seems people are going mad with multicore lately which is a mistake. Core2Duo isn't sucessful because of it's multiple cores, it's because of it's raw power. Until AMD increase the power of each core they are going to sit behind Intel. There is a limit to what the human brain can do at once. Marketing always makes me laugh "play games, surf and check email all at once". If someone can teach me how to do all those simultaneously anyway I'll stand on my head too.


You make an excellent point. The C2D's individual core performance is what is powering the CPUs success. Were Intel to market a single core varient, it would outclass, by no small margin, any of AMDs A64 single core CPUs.

It is worth pointing out though, that it is cheaper and faster to simply double the number of processors to increase potential performance rather than to design a new processor which doubles perfromance on a single core.

Using the automotive analogy, it would be cheaper and easier to put 2 GM 454 cu engines into a Lakes racer than to design and manufacture a 908 cu engine.

You are quite correct in that the current state of programming (single program) does not make use of the potential additional cores provide, however multiple cores can increase single program performance when multiple programs are running by allocating the tasks to separate cores. You, as the user, can assign which core a program will "prefer" (in win XP) through the task manager "affinity" setting. As such, you can "free" a processor to devote its full power to a single program while the other core can process other programs. Not an ideal solution, but with the node "wall" not to far away, multicore CPUs are here to stay.
 
second, since when does a cpu with 4 cores not get to be called "true" quad core because it's not designed a certain way.

Thats like saying a car with 2 V6's is the same as a car with a V12. Not saying that the 2 V6 car would be slower, just that it is not the same because of the design.

Quite like the analogy i made.

3 teams are trying to break the land speed record.

Team A uses 4 cars which all reach 400mph.
Team B uses 2 cars which both reach 500mph.
Team C uses 1 car which reaches 800mph.

Which team went the fastest? Cumulatively Team A won but in the case of the record its the fastest speed per instance. If Team C were to use 2 cars they would blow them all away. Raw power is what is needed.

It seems people are going mad with multicore lately which is a mistake. Core2Duo isn't sucessful because of it's multiple cores, it's because of it's raw power. Until AMD increase the power of each core they are going to sit behind Intel. There is a limit to what the human brain can do at once. Marketing always makes me laugh "play games, surf and check email all at once". If someone can teach me how to do all those simultaneously anyway I'll stand on my head too.


You make an excellent point. The C2D's individual core performance is what is powering the CPUs success. Were Intel to market a single core varient, it would outclass, by no small margin, any of AMDs A64 single core CPUs.

It is worth pointing out though, that it is cheaper and faster to simply double the number of processors to increase potential performance rather than to design a new processor which doubles perfromance on a single core.

Using the automotive analogy, it would be cheaper and easier to put 2 GM 454 cu engines into a Lakes racer than to design and manufacture a 908 cu engine.

You are quite correct in that the current state of programming (single program) does not make use of the potential additional cores provide, however multiple cores can increase single program performance when multiple programs are running by allocating the tasks to separate cores. You, as the user, can assign which core a program will "prefer" (in win XP) through the task manager "affinity" setting. As such, you can "free" a processor to devote its full power to a single program while the other core can process other programs. Not an ideal solution, but with the node "wall" not to far away, multicore CPUs are here to stay.Yeah, look at tractor-pulls with 6-8 big-blocks in the tractors. They aren't necessarily the best pullers. Even though they have enormous amounts of torque,you have to make full use of the power(traction)...much the same as having software to take advantage of the quads power. :wink:
 
AMD HAS said that Barcelona will be 80% faster than Opteron and FX. That is their measure, not Core 2, not Kentsfield.
Baron, could you please provide a link?
I really can't remember reading anything like that..
It would be also interesting to know, how they intend this 80% faster.. is it core for core?
Or a quad core K8L outperforming a dual core K8, and in which application scenario..?
Frankly, according to what they have published about the K8L architecture so far, i could foresee an 80% improvement (core for core) only in synthetic SSE benchmarks.

It's in one of the Hector Ruiz interviews. It was also mentioned in the AMD analyst Day slides from June. According to these, they expect Barcelona give around 80% and the 2008 core to give around 150% (over Opteron perf\watt). I think I posted the Analyst Day link.