Clovertown (double Conroe) and Athlon64 compared

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Are you saying that you would willingly trade <$300 worth of chips for one worth $3million? Wow, what a deal!! (of course that cloverton wont be worth much next year, there will be lots of takers then)
 
On inverse threading, it's hard to believe across processors due to the IPC latency. However, within the same processor, multiple AMD64 cores can act as one, that can push single thread performance thru the roof.
 
On inverse threading, it's hard to believe across processors due to the IPC latency. However, within the same processor, multiple AMD64 cores can act as one, that can push single thread performance thru the roof.
Stop making i...t of your self. 1 core can act as many, but not many as 1.
You have no idea about any CPU architecture and programming... so it is better for you and others like you who are harldy beliveing in the stories and wishes writen by the fanboy bloggers that also have no idea about how CPU works and programming, to be QUIET!
It is just stupid and pointless...
 
On inverse threading, it's hard to believe across processors due to the IPC latency. However, within the same processor, multiple AMD64 cores can act as one, that can push single thread performance thru the roof.
Stop making i...t of your self. 1 core can act as many, but not many as 1.
You have no idea baout any CPU architecture and programming... so it is better for you and others like you who are harldy belive in the stories and wishes writen by fanboy bloggers that also have no idea about how CPU works to be QUIET!
It is just stupid and pointless...

Word.
 
On inverse threading, it's hard to believe across processors due to the IPC latency. However, within the same processor, multiple AMD64 cores can act as one, that can push single thread performance thru the roof.
Stop making i...t of your self. 1 core can act as many, but not many as 1.
You have no idea about any CPU architecture and programming... so it is better for you and others like you who are harldy beliveing in the stories and wishes writen by the fanboy bloggers that also have no idea about how CPU works and programming, to be QUIET!
It is just stupid and pointless...

You need to have a bit of brain juice. Show me your diploma and your grades in school, I can bet they are not that good.

The whole idea of superscalar architecture is to merge multiple execution engines into one from program point of view. You run multiple instructions simultaneously on multiple piece of data, but it appears to the external world as if you run them one by one in order. It's only natural to take a step further.

I won't elaborate more.
 
http://sharikou.blogspot.com/

Please visit the link to see more details

Clovertown scores revealed

Clovertown compared to Athlon 64 2800+ (1.8GHZ, socket 754, 130nm, single channel DDR)


Intel showed off Clovertown quad-core server CPUs running on the Bensley platform with FB-DIMM memory at Spring IDF Taipei. Clovertown is basically two 65nm Conroe CPUs stacked together, with total of 8MB L2 cache. This page contained the benchmark scores for a 2P Clovertown. The clockspeed was 2GHZ. For single threaded test, it got a Cinebench 9.5* score of 362. Daniel J. Casaletto, Intel Vice President, Digital Enterprise Group Director, Microprocessor Architecture and Planning, was running the demo. For 2P 8 cores, the score scaled to 1723, or 4.7x. Adding 7 cores led to 3.7x more performance. I think this is quite poor, you get only about half a core's worth when you add a core -- FSB bottleneck.

Let's pay more attention to this photo here, which shows the 2P Clovertown in action and is quite exciting. Look at the upper left corner, it reads Cinebench 64 Bit Edition. Finally, we can see Intel got 64 bit working, it's running the 64 bit version of Cinebench 9.5!

In comparison, a 3 year old 2GHZ single core Opteron 246 achieves a score of 366 in single threaded test, 1.1% faster than the NGMA Core at the same clockspeed. Clock for clock, Intel CORE (Merom/Conroe) is slower than Hammer.

On my old Athlon 64 2800+ (1.8GHZ, Socket 754, 130nm), I got a Cinebench 9.5 score of 294. My ClawHammer is a bit slower than Conroe CORE, but only a little. If you consider my CPU is only 1.8GHZ and only uses single channel DDR, and my old PC only has integrated S3 UniChrome graphics which eats some memory, it's quite good. I managed to overclock it to 1.9GHZ and got a score of 312. I expect the old ClawHammer to get a score 0f 294*2/1.8= 327 at 2GHZ.

I am interested in seeing some Clovertown and Sempron socket 939 comparisons. If you have such a machine running Windows x64, please submit your results in the comments. Don't under estimate AMD desktop CPUs, check out this Athlon 64 and Xeon comparison.

The Conroe performance analysis is here. I pointed out that when working set is larger than Conroe's cache (4MB), Conroe performs slower than Athlon64. The Cinebench 9.5 needs over 150MB to run, as a result, Clovertown's 8MB cache didn't help.


LOL UM taking from your username and the sites name im gonna have to say this is your blog, its been posted many of times and all of us mins the retarded, have said this blog is 100% AMD Fanboy...ism...thingy

look dude or dudette it dosnt matter anymore Intel has a better chip for now; oh noes! AMD will come out with something then intel and so on and so on, and guess what? the next thing u know it will be just like ATI and Nvidia it wont freaking matter because the diffrence will be so small that the only thing that will matter is how pricey it is...
 
http://sharikou.blogspot.com/

Please visit the link to see more details

Clovertown scores revealed

Clovertown compared to Athlon 64 2800+ (1.8GHZ, socket 754, 130nm, single channel DDR)


Intel showed off Clovertown quad-core server CPUs running on the Bensley platform with FB-DIMM memory at Spring IDF Taipei. Clovertown is basically two 65nm Conroe CPUs stacked together, with total of 8MB L2 cache. This page contained the benchmark scores for a 2P Clovertown. The clockspeed was 2GHZ. For single threaded test, it got a Cinebench 9.5* score of 362. Daniel J. Casaletto, Intel Vice President, Digital Enterprise Group Director, Microprocessor Architecture and Planning, was running the demo. For 2P 8 cores, the score scaled to 1723, or 4.7x. Adding 7 cores led to 3.7x more performance. I think this is quite poor, you get only about half a core's worth when you add a core -- FSB bottleneck.

Let's pay more attention to this photo here, which shows the 2P Clovertown in action and is quite exciting. Look at the upper left corner, it reads Cinebench 64 Bit Edition. Finally, we can see Intel got 64 bit working, it's running the 64 bit version of Cinebench 9.5!

In comparison, a 3 year old 2GHZ single core Opteron 246 achieves a score of 366 in single threaded test, 1.1% faster than the NGMA Core at the same clockspeed. Clock for clock, Intel CORE (Merom/Conroe) is slower than Hammer.

On my old Athlon 64 2800+ (1.8GHZ, Socket 754, 130nm), I got a Cinebench 9.5 score of 294. My ClawHammer is a bit slower than Conroe CORE, but only a little. If you consider my CPU is only 1.8GHZ and only uses single channel DDR, and my old PC only has integrated S3 UniChrome graphics which eats some memory, it's quite good. I managed to overclock it to 1.9GHZ and got a score of 312. I expect the old ClawHammer to get a score 0f 294*2/1.8= 327 at 2GHZ.

I am interested in seeing some Clovertown and Sempron socket 939 comparisons. If you have such a machine running Windows x64, please submit your results in the comments. Don't under estimate AMD desktop CPUs, check out this Athlon 64 and Xeon comparison.

The Conroe performance analysis is here. I pointed out that when working set is larger than Conroe's cache (4MB), Conroe performs slower than Athlon64. The Cinebench 9.5 needs over 150MB to run, as a result, Clovertown's 8MB cache didn't help.


LOL UM taking from your username and the sites name im gonna have to say this is your blog, its been posted many of times and all of us mins the retarded, have said this blog is 100% AMD Fanboy...ism...thingy

look dude or dudette it dosnt matter anymore Intel has a better chip for now; oh noes! AMD will come out with something then intel and so on and so on, and guess what? the next thing u know it will be just like ATI and Nvidia it wont freaking matter because the diffrence will be so small that the only thing that will matter is how pricey it is...
Figure that last part out all on your own? Its the generic responce when u have nothing meaningful to add to the conversation.
 
IDF Taipei, benchmark run by Daniel J. Casaletto, Intel Vice President, Digital Enterprise Group Director, Microprocessor Architecture and Planning

Setup: Clovertown (Double Conroe), 65nm, 2x4MB cache,
FB-DIMM, 2GHZ

CINEBENCH 9.5 64 bit edition
Score: 362

clovertown-3_rSGeUlFKkfnX.jpg



Setup: Athlon 64 3000+, 90nm, 0.5MB L2, DDR500, 2GHZ
CINEBENCH 9.5 64 bit edition

Score: 370

cine6tm.gif
 
I'm sorry, but do you have to blatantly lie to everyone in order to struggle to prove some point you are trying to make?

Setup: Athlon 64 3000+, 90nm, 0.5MB L2, DDR500, 2GHZ
CINEBENCH 9.5 64 bit edition
Your conveniently placed "Created with HyperSnap 6" logos covered the title bar of the program so we can't confirm that this is the 64-bit version. However, I can confirm that this is not Cinebench 9.5 but actually Cinebench 2003 from examining the text inside the program itself. Once again these results are not comparable.

Secondly, the HTT base speed has been increased to 250MHz which yields an unfair advantage because even AM2 will still be using a 200MHz base speed. I'll let you get away with the DDR500 though since AM2 will be using higher bandwidth RAM.

Frankly, next time you lie you should try harder. A bit more image manipulation is needed to make it more believable or more likely than not less so.
 
I'm sorry, but do you have to blatantly lie to everyone in order to struggle to prove some point you are trying to make?

Setup: Athlon 64 3000+, 90nm, 0.5MB L2, DDR500, 2GHZ
CINEBENCH 9.5 64 bit edition
Your conveniently placed "Created with HyperSnap 6" logos covered the title bar of the program so we can't confirm that this is the 64-bit version. However, I can confirm that this is not Cinebench 9.5 but actually Cinebench 2003 from examining the text inside the program itself. Once again these results are not comparable.

Secondly, the HTT base speed has been increased to 250MHz which yields an unfair advantage because even AM2 will still be using a 200MHz base speed. I'll let you get away with the DDR500 though since AM2 will be using higher bandwidth RAM.

Frankly, next time you lie you should try harder. A bit more image manipulation is needed to make it more believable or more likely than not less so.

I would have to say hes one of our regulars trying to stir the pot up.
 
Well put, though mostly a waste of time. I dont see the point of trying to compare to a chip that is shown as proof of potential.
There are no comparisons here, because that was not the point.
The point is that Intel has a functional 4 core chip. That's all.
(hardly all really since that is impresive)
Anyone who thinks those scores are the end result for this chip, has some serious deficiencies.
 
And in other news on the CPU front:
I ROCK. Yes. It's true. Unlike this thread. Which is totally useless. Having been recently dumped by my girlfriend(for the postman, how humiliating) and left with nothing to do, I googled 'sharikou' only to find that there has been mucho AMD-centric material posted under this guy's name on various message boards. Now, this is just what THEY want you to think. THE TRUTH? AMD employs a small, elite band of NINJAS that ALL g by the name of Sharikou. The also conduct the odd assassination. Eve wondered where FUGGER went? *Nods* Beware of (the) sharikou(s)! They're evil. Unlike me. I rock, remember?
 
These are additional proof that Intel CORE won't demonstrate any IPC advantage over current AMD64 implementation.

You:

#1 do not understand CPU architecture
#2 are comparing "grandmothers to frogs"
#3 deriving faulty conclusions based on your lack of knowledge and logic reasoning

On cache size, 4MB

Is pretty much the standard nowadays (as in 2x 2MB L2 in Presler) thank you.

where the big apps which really need performance are also memory intensive.

Name one memory intensive application where bigger cache doesn't help but instead hurts.

Adding cache is not an architectural solution.

Neither it is adding REX prefixes to enable 64-bit computing on top of a legacy x86 junk and thus breathing back life to something that should have been left to die.

Dude, try comprehend, be calm.

Of course, adding cache improves performance, you don't have to be a genius to know that, the question is to what extent. Once your working set is much bigger than the cache size, doubling the cache won't do miracles. Right now, we are talking about whether Intel's CORE will show 20% IPC advantage over AMD as Mooly Eden bragged. All evidence show that to be false. As I have pointed out, from all independently verifiable data, the only cases where Intel showed a 20% IPC advantage are when the whole working set fits in the 4MB cache.

There is a major difference between Presler's 2x2MB and Conroe's 4MB. I hope you understand the difference.

Regarding the 64 bit extension on x86, Intel had 4 groups trying to figure it out, but they failed. Now Intel is simply following AMD.

See this http://news.com.com/2100-1001-985432.html

"Four separate design teams at Intel examined how the company could take one of its 32-bit chips and transform it into a 64-bit machine, said Richard Wirt, another senior fellow at Intel. After running simulations, all four teams concluded that such a transition wouldn't be economically feasible, he said. "

So, it's not trivial to do AMD64.


Sharikou made a good point there. I mean Idont know why people begins to talk about processor arch without even knowing what that is. If they studied a bit of uP arch, well they should return to their books, or if they didnt, they better dont give any opinion. Lets say that a Rex prefix is a way to access the 64 bit registers, its a name. 64 bit computing is not only double sized regs and more memory addressing space, but twice more registers, more to the risc idea but thats another story.

If your register space is bigger then you need less stack allocation for local variables, so you use less L1 (cache) which limits a bit the polution of the other cache levels everytime you store something.

To keep the party going you need a legacy compatible system and if you can do it efectively, well why not to extend x86-32 to amd64? It simply works. They made it work and Intel simply had to follow.

Bigger caches help when your memory arch is your bottleneck, and of course if your workset fits in the cache. If you are still stuck in the fsb era of course you need bigger caches to reduce the impact of the latency introduced by a northbridge. Take a look at amd64 / sun t1/t2 approach. Their caches are sized acording to the need.

IPC? well realy important issue, but stall cycles are the important ones, shorter pipelines are the solution and thats maybe a plus for the conroe. Again if your compiler is able to fill the gaps, then you are saved.
 
You need to have a bit of brain juice. Show me your diploma and your grades in school, I can bet they are not that good.

The whole idea of superscalar architecture is to merge multiple execution engines into one from program point of view. You run multiple instructions simultaneously on multiple piece of data, but it appears to the external world as if you run them one by one in order. It's only natural to take a step further.

I won't elaborate more.
I don't feel I will do a metric on my penis with you. The grades and diploma are some prove that someone knows something, but that is not enough. In this case for example, my diplomas, number of languages I understand and I am speaking fluently, number of programming languages I know, my working expirience and some other things that I have achieved have nothing with the fact YOU DON'T UNDERSTAND WHAT ARE YOU SAYING.
That idea was figured as unsucessfull long time ago(30 years), thats why chip producers continued to improve the perofrmance of 1 core CPUs rather than multi-low performance cores CPU. But that part of your IT education is also missing like many facts that are missing not only you, but many fanboy discutants on forums like this one, ex:
The first 64bit CPU was made 15 years ago by MIPS tech, implemented by Intel before AMD did in their mainstream CPUs.
The first multicore CPU was made by IBM, not by Intel or AMD;
The number of registers is independend, no metter how many bit CPU it will be......
 
How do you keep on missing the obvious Cinebench 2003 references in that second score?

He's dillusional.

Here we go, I will step back and use the same twisted methods Shakrrimsue (what ever the hell is name is) and re-state the arguemnt. In his "Conroe😛erformance busted" rant, he used a linear scaling to extrapolate expected performance clock for clock based on two anomolous scores from the victor wang bench (ignoring, BTW all the other data).

Using this logic, then we can go to this bench:

This is what I got when I ran Cinebench on an Arima SW500.

Processor : Opteron 880 x 4
MHz : 2.4 GHz
Number of CPUs : 8
Operating System : Windows 2003 server 64 bit

****************************************************
64 bit version
Rendering (Single CPU): 397 CB-CPU

Rendering (2 CPU): 741 CB-CPU4
Multiprocessor Speedup: 1.85

Rendering (4 CPU): 1277 CB-CPU
Multiprocessor Speedup: 3.19

Rendering (8 CPU): 1929 CB-CPU
Multiprocessor Speedup: 4.86

****************************************************
32 bit version
Rendering (Single CPU): 358 CB-CPU

Rendering (Multiple CPU): 1719 CB-CPU
Multiprocessor Speedup: 4.80
Link: http://forums.2cpu.com/showpost.php?p=624674&postcount=9

Based on this, single thread score of 397 for a 2.4 GHz 4 way 880. Thus, if you scale the Clovertown score of 362 up to match clock, you would get
2.4/2.0 * 362 = 434, which shows the 2 way, 4 core clovertown kicking the crap out of a 4 way 2 core 880. Since the CPU scaling factor in the clovertown demo was 4.7, this would mean an 8 way bench would result in 2042, still beating the Opty 880 clock for clock in the bench above. :)

Now is this right? Nope, but at least I compared Cinebench 9.5 64 bit to Cinebench 9.5 64 bit, so the comparision is likely more valid or is it. Anyway, this is the same twisted logic Mr. Shakirou (what ever the heck his name is) uses...... and it is completely, utterly, and totally incorrect.

What I do put faith in is that the CPU scaling factor is 4.86, the clovertown showed roughly 4.7 (not far off).... in otherwords, the Snoop Filter, cache coherency improvements in the dual FSB, and better cache management overall may propel intel, at least temporarily past the scaling problems in the 4way and 8 way space.

Word.
 
(...)

Clovertown scores revealed

Clovertown compared to Athlon 64 2800+ (1.8GHZ, socket 754, 130nm, single channel DDR)


Intel showed off Clovertown quad-core server CPUs running on the Bensley platform with FB-DIMM memory at Spring IDF Taipei. Clovertown is basically two 65nm Conroe CPUs stacked together, with total of 8MB L2 cache. This page contained the benchmark scores for a 2P Clovertown. The clockspeed was 2GHZ. For single threaded test, it got a Cinebench 9.5* score of 362. Daniel J. Casaletto, Intel Vice President, Digital Enterprise Group Director, Microprocessor Architecture and Planning, was running the demo. For 2P 8 cores, the score scaled to 1723, or 4.7x. Adding 7 cores led to 3.7x more performance. I think this is quite poor, you get only about half a core's worth when you add a core -- FSB bottleneck.

Let's pay more attention to this photo here, which shows the 2P Clovertown in action and is quite exciting. Look at the upper left corner, it reads Cinebench 64 Bit Edition. Finally, we can see Intel got 64 bit working, it's running the 64 bit version of Cinebench 9.5!

In comparison, a 3 year old 2GHZ single core Opteron 246 achieves a score of 366 in single threaded test, 1.1% faster than the NGMA Core at the same clockspeed. Clock for clock, Intel CORE (Merom/Conroe) is slower than Hammer.

On my old Athlon 64 2800+ (1.8GHZ, Socket 754, 130nm), I got a Cinebench 9.5 score of 294. My ClawHammer is a bit slower than Conroe CORE, but only a little. If you consider my CPU is only 1.8GHZ and only uses single channel DDR, and my old PC only has integrated S3 UniChrome graphics which eats some memory, it's quite good. I managed to overclock it to 1.9GHZ and got a score of 312. I expect the old ClawHammer to get a score 0f 294*2/1.8= 327 at 2GHZ.

I am interested in seeing some Clovertown and Sempron socket 939 comparisons. If you have such a machine running Windows x64, please submit your results in the comments. Don't under estimate AMD desktop CPUs, check out this Athlon 64 and Xeon comparison.

The Conroe performance analysis is here. I pointed out that when working set is larger than Conroe's cache (4MB), Conroe performs slower than Athlon64. The Cinebench 9.5 needs over 150MB to run, as a result, Clovertown's 8MB cache didn't help.

Pardon me to barge in, sir, but... aside benchmarks issues, new vs old m/arch, Intel vs AMD & all, do you - honestly - believe that your Athlon 64/2800+/1.8GHZ/socket 754/130nm/single channel 64bit wide IMC/HTT 1.0/512KB L2 cache & $50 later, overclocked to 1.9GHz, stays "within 10% of the future Conroe", i.e., this http://arstechnica.com/articles/paedia/cpu/core.ars, without even thinking, for a moment, "what tha heck's going on???!!!"
Not to mention your article "Conroe performance claim being busted" (well, I just did!): what about specs, system configuration, where did that Conroe come from & all?!
From all things biased, Intel/Anand's was a truly fair comparison, compared to the one you... I mean, are you serious at all?! The Intel/Anand's [biased] test was done with a 2.66GHz Conroe vs an overclocked FX-60 @2.8Ghz!!!

And, sir, what intrigues me most, is this:

The reason why Conroe did so well in the MolDyn test is simple: Conroe has a huge 4MB of unified cache, for such single threaded tests that can fit in 4MB*, Conroe can just run off the cache with very high speed. Since cache misses drastically reduce peformance, applications run off cache exhibit unrealistic performance numbers.

However, once you go over the 4MB limit, Conroe is slower than Athlon 64 at the same clock. Both the Cryptography and STREM tests use a lot more than 4MB, larger than Conroe's 4MB cache, and Conroe immediately falls below Athlon 64 on the performance curve.

I know the IMC helps a lot but... the Athlon 64 has a meagre 1024KB L2 cache! Speaking about cache coherency (pun intended), well, where is it?!


Cheers, sir!