Clovertown?? ... I don't get it!

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

1Tanker

Splendid
Apr 28, 2006
4,645
1
22,780
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Even worse the negative performance increase is achieved by using Intel's best architecture, Core 2.

At this point I'm quite sure Intel's 45 nm upcoming chips won't make much difference in 2007, the FSB it's killing them.

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

Share you thoughts on this with me, please!


No you AMD fanboy. That's why a FX-62 outpaces a FX-74 in single threaded applications :roll:

This is not the case on single thread applications running on multi core CPU but the case of multi threaded software that actually benefits from multi core getting a negative performance increase by doubling the number of cores, which is an entirely different issue.

And don't call me a fanboy!!Good Luck with that. :)

Hey I'm not a AMD fanboy, I've owned several Intel PC and only one AMD it was a K6-III, even now my rig is on Intel ... no matter how much this upsets you Intel true fan boys there's something wrong in the Clovertown picture and I was hoping to get some smart answers not this.

I agree servers should be tested using server software but I don't see any reason why a perfectly good threaded application that takes full advantage of multicore should not receive any kind of advantage by doubling the number of cores....I didn't mean to imply that you are. What i meant was GL with AM stopping calling you a fanboy. Telling him to stop calling you a fanboy, won't work. AM can be quite headstrong, at times. :)
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
If the slower clocked cpu with more cores isn't using all those cores because the app doesn't have that many threads then *GASP* its going to be slower!

image063.gif


Notice how with 1 cpu the clovertown is slower but with more threads *GASP* its faster! OMG NO WAI!

image043.gif

image045.gif

image046.gif

image051.gif

image053.gif

image028.gif

etc .... I'm not going to show them all but if this isn't negative perf increase I don't know wich is.

BTW the results make Cinebench the most flawed benchmarking application as it's doesn't realistically show real world application scaling ...

So don't trow that Cinebenh crap in my eyes, surely you could do better ....
 

Action_Man

Splendid
Jan 7, 2004
3,857
0
22,780
Once again moron if the slower clocked cpu with more cores isn't using all those cores because the app doesn't have that many threads then *GASP* its going to be slower! OMFG NO WAI!

etc .... I'm not going to show them all but if this isn't negative perf increase I don't know wich is.

What a moron.

BTW the results make Cinebench the most flawed benchmarking application as it's doesn't realistically show real world application scaling ...

If the app uses more then 4 threads then it is moron.

So don't trow that Cinebenh crap in my eyes, surely you could do better ....

Surely you can't be this stupid?
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Once again moron if the slower clocked cpu with more cores isn't using all those cores because the app doesn't have that many threads then *GASP* its going to be slower! OMFG NO WAI!

etc .... I'm not going to show them all but if this isn't negative perf increase I don't know wich is.

What a moron.

BTW the results make Cinebench the most flawed benchmarking application as it's doesn't realistically show real world application scaling ...

If the app uses more then 4 threads then it is moron.

So don't trow that Cinebenh crap in my eyes, surely you could do better ....

Surely you can't be this stupid?

Well I don't mind you insulting me if it actualy shows your class for the world to see ...... fanboy
 

WR

Distinguished
Jul 18, 2006
603
0
18,980
The scalability of an application is not based merely on the number of threads it spawns but the workload balancing among those threads and the spread of data transfer to avoid creating a bottleneck. Encoding applications are notorious for uneven workloads because the output is a stream or consists of blocks requiring varying amounts of CPU time, and switching to a new block requires burst data transfer to/from HD or RAM.

So far I don't see a lot of evidence of 4x4 scaling much better than Clovertown in any one application. I was expecting a much smoother NUMA setup for 4x4 to help with memory intensive applications and hope AMD finds its way around the high cHT latency.
 

Action_Man

Splendid
Jan 7, 2004
3,857
0
22,780
Good comeback assclown, completely wrong and you've got nothing. Remember when the dual cores came out and people were crying because they didn't run games faster? Games used one thread while there were two cores, same thing here but theres more cores and threads.

Come back when you know something.
 

Action_Man

Splendid
Jan 7, 2004
3,857
0
22,780
The scalability of an application is not based merely on the number of threads it spawns but the workload balancing among those threads and the spread of data transfer to avoid creating a bottleneck. Encoding applications are notorious for uneven workloads because the output is a stream or consists of blocks requiring varying amounts of CPU time, and switching to a new block requires burst data transfer to/from HD or RAM.

So far I don't see a lot of evidence of 4x4 scaling much better than Clovertown in any one application. I was expecting a much smoother NUMA setup for 4x4 to help with memory intensive applications and hope AMD finds its way around the high cHT latency.

Yay someone else out there gets it! Its a christmas miracle!
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Games used one thread while there were two cores, same thing here but theres more cores and threads.
Wrong! most (almost all) muti threaded aplications are not dual threaded or quad threaded!!! they simply spawn worker threads to do work. The numer of worker threads is either equal to core count, configurable or more !!!

This is where the diference lies! sigle threded aps don't take advantage of mlticore, multi threaded should take advantage of all of them, and they do[!] but not on the Clovertown architecture!
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
output is a stream or consists of blocks requiring varying amounts of CPU time, and switching to a new block requires burst data transfer to/from HD or RAM.

Well this is a fact that treads need to exchange some data and I don't see how one can build software without it. Should we make compresion software that never joins the pieces together or what? so that clovertown performs well? No, this is a limitation of the architecture, being unable to exchange data well without critical performance drop!


So far I don't see a lot of evidence of 4x4 scaling much better than Clovertown in any one application. I was expecting a much smoother NUMA setup for 4x4 to help with memory intensive applications and hope AMD finds its way around the high cHT latency.

Does't scale much better? It actualy manages to close the gap between Core 2 and K8 ... wich is infinatly better than negative scaling.
 

bullaRh

Distinguished
Oct 6, 2006
592
0
18,980
seriously i wonder if your dumb or joking me cuz u really dont sound very intelligent

dont call other people morons or stupid if u dont have anything 2 have it in! it really shows ur lack of intelligence.
 

WR

Distinguished
Jul 18, 2006
603
0
18,980
Well this is a fact that treads need to exchange some data and I don't see how one can build software without it. Should we make compresion software that never joins the pieces together or what? so that clovertown performs well? No, this is a limitation of the architecture, being unable to exchange data well without critical performance drop!

I was not explicit enough and perhaps contributed to a misunderstanding. The lack of scalability of many benchmarks from four to eight cores in the 2P Clovertown review probably has nothing to do with bandwidth constraints but program limitations.

When I mentioned encoding threads encountering a bottleneck at the RAM/HD when trying to move on to the next block, I was speaking theoretically of bad software design to illustrate the difficulty of multithreaded programming. Such a design would hurt 4x4, too, not just Clovertown - though I see no clear evidence of such a problem in any of the published benchmarks. In reality, any decent encoder or archival application employs buffers and keeps track of work finished for multiple worker threads... but only up to a limit.

Two years ago all the popular consumer-level encoders/archivers didn't scale from one to two cores. Today, they pretty much all scale to two cores, but only some continue onto four cores. And in THG's review of 2P Clovertown, it seemed apparent that none of the tested encoders or archival tools scaled to eight-core, although proof of that would involve comparative benchmarks on eight-core Opteron systems, which I have yet to find. Typically, programming for massively parallel workloads is reserved for professional level applications. You don't hear anyone today seriously complaining that DivX or WME can't keep track of more than four cores because most people are still running on one or two.

Does't scale much better? It actualy manages to close the gap between Core 2 and K8 ... wich is infinatly better than negative scaling.

This is what I fail to see - K8 closing the gap with C2D in scaling. I'm looking at this pretty comprehensive benchmark review at Xbitlabs which includes identical clock comparisons of the FX-62 (2x 2.8GHz), FX-72 (4x 2.8GHz), E6700 (2x 2.66GHz), and QX6700 (4x 2.66GHz): http://xbitlabs.com/articles/cpu/display/amd-quad-fx_9.html.

That page contains the encoding benchmarks, and I've also looked at the other pages except for the purely synthetic SysMark/PCMark page. I fail to see a single instance where Kentsfield scales noticeably worse than 4x4. Can you find one and write back with the name of the benchmark?

So far, the only thing 4x4 has helped AMD accomplish is to spread out the heat dissipation of the four K8 cores such that they can keep the voltages and clock speeds higher than they could under a single socket. This is not bandwidth scaling but rather processor frequency scaling. But the K8 core is far enough behind the C2D that this doesn't make 4x4 faster overall.
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
A few threads??? yahoo messenger has around 30 threads, the winlogon process can have 10, 20 or more ... FireFox with 5 open pages has 17 ... I'm quite sure any one of the multi threaded benchmarks spawns (can spawn) more than 8 threads. The system process in windows can have hundreds of threads ... A typical instance of SQL Server running on my dev. machine has close to 300 threads ...

As a software developer that works with databases every day I sure would like to see some test on database performance.

As far as I know the best threaded software and most scalable is encoding software so I'm expecting to see even worse performance for database tests ... also databases don't even require that much number crunching power they just need fast memory and disk access to scale well.

Sure they all runs heaps of 'threads', for overlapped workloads, etc, but how many are actually multi-threaded to the point where they'll use more than 1 CPU cores worth of processing power ?

The pefect example is game server software from 2000, they run with 6-12 threads, but will only use the equiv of 1 CPU core (eg: 80% of one core, 5% of another, 10% of another, 2.5% on another two).

The thread count in TaskManager does not indicate how many threads can be scaled over multiple processor cores.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
Ugh, this name calling is elementary school material.

Do people really believe that if you call someone names (and more names = more proof) that you will actually come across as believable and you will prove your point?

I cannot see why people cannot have a civil discussion and explain their points of view without having to resort to name calling. Not everyone will agree, and if you don't like someone in general, then just avoid discussion with them altogether. All it takes is a little self-control.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
That page contains the encoding benchmarks, and I've also looked at the other pages except for the purely synthetic SysMark/PCMark page. I fail to see a single instance where Kentsfield scales noticeably worse than 4x4. Can you find one and write back with the name of the benchmark?

Kentsfield doesn't scale worse than 4x4 , Clovertown scales far worse than Kentsfield or 4x4. I'm not expectiong doubling in perf by doubling cores but at least some gains like 50% although with a good architecture the gains should be around 80% if badwidth scales acordingly and latency doesn't increase very much!

If the same encoding benchmarks showed a perf increase of sometimes 80% by going from dual to quad core although they were never recompiled (wich means the are properly threaded) why aren't they gaining at least a mere 25% with Clowertown?

I highly doubt the fact that all of the encoders are poorly writen and limited to 4 cores because even before Clovertown launch, computers with more than 4 procesors cores were available and video encoding is also done on high end computers, sometimes with 8 or 16 procs, so there was absolutely no reason in the world not to do proper threading for encoding appplications.

TH would do all of us a great favor if they published all the cores load percentage during the tests, to see if it's a software or harware problem, but the fact is that they never mentioned that only half of the cores were used!
 

WR

Distinguished
Jul 18, 2006
603
0
18,980
Kentsfield doesn't scale worse than 4x4 , Clovertown scales far worse than Kentsfield or 4x4. I'm not expectiong doubling in perf by doubling cores but at least some gains like 50% although with a good architecture the gains should be around 80% if badwidth scales acordingly and latency doesn't increase very much!

Kentsfield and Clovertown are basically the same chip connected to different packages to fit their respective sockets/motherboards. But I guess what you mean is that Clovertown doesn't appear to be scaling from 4->8 cores in many applications (meanwhile, there is no 8-core setup with Kentsfield or 4x4, so we're speaking here of 2->4 scaling) - you're right that the benchmarks are suggesting this.

But the lack of any scaling at all on a CPU-based benchmark is a strong indicator of a software limitation and not of full FSB saturation.

If the same encoding benchmarks showed a perf increase of sometimes 80% by going from dual to quad core although they were never recompiled (wich means the are properly threaded) why aren't they gaining at least a mere 25% with Clowertown?

It means that when the encoding software was updated from single-core to multi-core, the programmers planned to support up to 4 cores, not just 2. Some developers saw a little ahead and thought correctly that dual core would be followed by quad core on the desktop.

I highly doubt the fact that all of the encoders are poorly writen and limited to 4 cores because even before Clovertown launch, computers with more than 4 procesors cores were available and video encoding is also done on high end computers, sometimes with 8 or 16 procs, so there was absolutely no reason in the world not to do proper threading for encoding appplications.

Those were server and workstation computers, and they normally run professional software, which tends to support large core counts. Encoding companies do not use Windows Media Encoder, Xvid, AutoGK, or DivX to make their HD-DVDs. End-users run this stuff. :)

TH would do all of us a great favor if they published all the cores load percentage during the tests, to see if it's a software or harware problem, but the fact is that they never mentioned that only half of the cores were used!

I completely agree - THG didn't explore why there was no scaling at all, nor did they comment on it. The readership needs to know that 0% scaling is probably caused by software limitations.
 

IcY18

Distinguished
May 1, 2006
1,277
0
19,280
That page contains the encoding benchmarks, and I've also looked at the other pages except for the purely synthetic SysMark/PCMark page. I fail to see a single instance where Kentsfield scales noticeably worse than 4x4. Can you find one and write back with the name of the benchmark?

Kentsfield doesn't scale worse than 4x4 , Clovertown scales far worse than Kentsfield or 4x4. I'm not expectiong doubling in perf by doubling cores but at least some gains like 50% although with a good architecture the gains should be around 80% if badwidth scales acordingly and latency doesn't increase very much!

If the same encoding benchmarks showed a perf increase of sometimes 80% by going from dual to quad core although they were never recompiled (wich means the are properly threaded) why aren't they gaining at least a mere 25% with Clowertown?

I highly doubt the fact that all of the encoders are poorly writen and limited to 4 cores because even before Clovertown launch, computers with more than 4 procesors cores were available and video encoding is also done on high end computers, sometimes with 8 or 16 procs, so there was absolutely no reason in the world not to do proper threading for encoding appplications.

TH would do all of us a great favor if they published all the cores load percentage during the tests, to see if it's a software or harware problem, but the fact is that they never mentioned that only half of the cores were used!

Ha, you must have missed the part where Kentsfield and Clovertown are the exact same cpu except one is for socket LGA775 and one is for LGA771...

And i'll just send the reminder if the program does not utilize more all 4 cores on a 4 core cpu it will lose everytime to a cpu the is clocked faster regardless of core count, but since it is shown against a dual core you are like omg its slower than dual cores when that is not the case at all. What shows the truth is where 4 cores clocked slower(2.66GHz) are faster than a dual core clocked at (2.9Ghz)

Your thinking is flawed and all this bs is software limited. Not hardware limited. With the correct code doubling the cores would theoretically double performance, -3% due to memory or other random bottlenecks.

Also the fsb is not limited any of intel's current cpu's when they are clocked at stock speed. Its a old technology but it gets the job done for now till they come up with something better
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
Not actually in reply to: IcY18

Man, people...

8-cores, over 2 sockets, and a heap of L2 cache, with 2 independent FSBs at 1333 MHz. (Screw ccNUMA for now if this provides 21.33 GB/sec peak without the NUMA complexity or 'stuttering' issue).

It is made for consolidation of servers as virtual machines to a single platform that is only 4.5 cm tall, and 19" wide. 8)

Next people will be complaining no software takes advantage of 'Sh' or 'C+' style code on GPUs so they are not seeing a 20x to 200x fold increase in floating point stream processing over huge arrays (eg: Encoding HD video in minutes, not hours or days using ATI's next card with Shader Model 4.0, or nVidia GeForce 8800 series cards or better).

Seriously - :roll: :lol: :?

8)

You want software to scale over 8 cores or be 20x to 200x faster by offloading some calcs to a SM4.0 capable GPU (even if only DX 9.0c is installed btw) ?, How badly ?....

Badly enough to write it yourself ?

- No I didn't bloody think so - :p
 

bullaRh

Distinguished
Oct 6, 2006
592
0
18,980
he call all people with less than 1000 posts names i believe

i havent been here that long but i have seen him sometimes just burst into topic and call people stupid and insulting em cuz they have another opinion than himself.
 

CaptRobertApril

Distinguished
Dec 5, 2006
2,205
0
19,780
Ugh, this name calling is elementary school material.

Do people really believe that if you call someone names (and more names = more proof) that you will actually come across as believable and you will prove your point?

I cannot see why people cannot have a civil discussion and explain their points of view without having to resort to name calling. Not everyone will agree, and if you don't like someone in general, then just avoid discussion with them altogether. All it takes is a little self-control.

Completely agreed. Civil and informed discourse is the only way to coherently exchange information. And if you weren't so ugly and your mother didn't dress you so funny and if you didn't have cooties you'd know that.

:lol:
 

Action_Man

Splendid
Jan 7, 2004
3,857
0
22,780
But the lack of any scaling at all on a CPU-based benchmark is a strong indicator of a software limitation and not of full FSB saturation.

It means that when the encoding software was updated from single-core to multi-core, the programmers planned to support up to 4 cores, not just 2. Some developers saw a little ahead and thought correctly that dual core would be followed by quad core on the desktop.

Those were server and workstation computers, and they normally run professional software, which tends to support large core counts. Encoding companies do not use Windows Media Encoder, Xvid, AutoGK, or DivX to make their HD-DVDs. End-users run this stuff.

Hurrah for people who get it, someone buy this guy a drink.
 

Latest posts