Clovertown?? ... I don't get it!

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.
Most of the benchmarks in the review are workstation tests, that don't really take advantage of more than a few threads. Plus the comparison is with 3GHz Woodcrests vs 2.67GHz Clovertowns. Even for good scaling tests,

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...
The FX-74 gets beat by the FX-62 in most single-thread applications, especially in games.

A few threads??? yahoo messenger has around 30 threads, the winlogon process can have 10, 20 or more ... FireFox with 5 open pages has 17 ... I'm quite sure any one of the multi threaded benchmarks spawns (can spawn) more than 8 threads. The system process in windows can have hundreds of threads ... A typical instance of SQL Server running on my dev. machine has close to 300 threads ...

As a software developer that works with databases every day I sure would like to see some test on database performance.

As far as I know the best threaded software and most scalable is encoding software so I'm expecting to see even worse performance for database tests ... also databases don't even require that much number crunching power they just need fast memory and disk access to scale well.

You should visit tpc.org if you haven't. Opteron rules. You're right the FSB is choking above 4 cores so finally we see why industry people embraced Opteron.

Transactional databases are most effective when totally stored in memory and the bandwidth of HT allows greater throughput in conjunction with any necessary disk access.

The term platformance has been bandied about lately and this clearly shows what it is and what is isn't.
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
This thread is retarded.
Maybe some task manager pics in future to go along side benchmarks?
This would show what cpu power isn't being used, thus showing what can be done with less utilization
I personally don't need them, but it would be useful for the brain dead around here.

I think a ban for everyone in this thread who fails to understand what the symbol Ghz stands for and why 2.66 of these strange Ghz is less than 3ghz of them
 

Heyyou27

Splendid
Jan 4, 2006
5,164
0
25,780
Wrong! most (almost all) muti threaded aplications are not dual threaded or quad threaded!!! they simply spawn worker threads to do work. The numer of worker threads is either equal to core count, configurable or more !!!

This is where the diference lies! sigle threded aps don't take advantage of mlticore, multi threaded should take advantage of all of them, and they do[!] but not on the Clovertown architecture!
LOL! Did AMD tell you that? Clovertown = Xeon Variant of Core 2 Extreme QX6700 (Kentsfield)
 

Action_Man

Splendid
Jan 7, 2004
3,857
0
22,780
This thread is retarded.
Maybe some task manager pics in future to go along side benchmarks?
This would show what cpu power isn't being used, thus showing what can be done with less utilization
I personally don't need them, but it would be useful for the brain dead around here.

I think a ban for everyone in this thread who fails to understand what the symbol Ghz stands for and why 2.66 of these strange Ghz is less than 3ghz of them

Someone buy this guy a drink as well.
 

accord99

Distinguished
Jan 31, 2004
325
0
18,780
You should visit tpc.org if you haven't. Opteron rules. You're right the FSB is choking above 4 cores so finally we see why industry people embraced Opteron.

Transactional databases are most effective when totally stored in memory and the bandwidth of HT allows greater throughput in conjunction with any necessary disk access.

The term platformance has been bandied about lately and this clearly shows what it is and what is isn't.
Xeon MPs and DPs dominate the most important TPC benchmark in the x86 world. Well-designed chipsets, multiple FSBs and decent size caches scale better than Hypertransport.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
This thread is retarded.
Maybe some task manager pics in future to go along side benchmarks?
This would show what cpu power isn't being used, thus showing what can be done with less utilization
I personally don't need them, but it would be useful for the brain dead around here.

I think a ban for everyone in this thread who fails to understand what the symbol Ghz stands for and why 2.66 of these strange Ghz is less than 3ghz of them

There was no indication in TH artiocle that all of the benchs used only half of the cores ... so I'm right not you! You're the one that fails too see the obviouse or doesn't want to see it possing lame excuses for Clovertown under performance!
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
This thread is retarded.
Maybe some task manager pics in future to go along side benchmarks?
This would show what cpu power isn't being used, thus showing what can be done with less utilization
I personally don't need them, but it would be useful for the brain dead around here.

I think a ban for everyone in this thread who fails to understand what the symbol Ghz stands for and why 2.66 of these strange Ghz is less than 3ghz of them

There was no indication in TH artiocle that all of the benchs used only half of the cores ... so I'm right not you! You're the one that fails too see the obviouse or doesn't want to see it possing lame excuses for Clovertown under performance!

Didn't you call yourself a software developer earlier?
CPU cycles don't eat themselves.
You probably miss understood my post though. If you get a single threaded application like a game and dont set affinity take a look at the graph in task manager.
Now in the benches, the graph would have been similar. Unless all the cpu graphs are maxed out at 100 then the processor is not using all its power.

The programs that clovertown lost to woodcrest in were not multi-threaded. As we see in the application that is properly multi-threaded clovertown gains it's advantage.
 

dobby

Distinguished
May 24, 2006
1,026
0
19,280
it an xeon, these are for server and workstations, the advantage only come whe they in a rrom on there own doing multiple mindless task, if there are 4 tasks like servers do, then you see the increse, but if your only doing 1 task then a twin chip with the same archuture but i higer clock will easily beat it) dont mock it, because results arent so odvious, also the athlon 4X4 should be compare to the core to extreme (you knoe the quad one) As they are true rivals, not the xeon.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Didn't you call yourself a software developer earlier?
CPU cycles don't eat themselves.
You probably miss understood my post though. If you get a single threaded application like a game and dont set affinity take a look at the graph in task manager.
Now in the benches, the graph would have been similar. Unless all the cpu graphs are maxed out at 100 then the processor is not using all its power.

The programs that clovertown lost to woodcrest in were not multi-threaded. As we see in the application that is properly multi-threaded clovertown gains it's advantage.

No that's not true, usualy windows runs a thread on the same processor, actualy moving a thread from one proc to another is a very costly operation, more costly than tread sincronization or volatile memory reads/writes even on SMP systems , and windows doesn't do it to avoid cache misses cache trashing ... 4 threads runing at full will will make 4 out of 8 cores be ocupied 100% and not all 8 at 50%

Affinity only specifies the procesor windows should start the tread if no afinity is specified windows decides the procesor on wich to run the thead an keeps running it on that processor until it finishes if the procesor is not overloaded in wich case it will move a thread to a more free processor.

So this means that if you only have 4 thread they won't be randomly scheduled over 8 cores they will use only 4 of them!
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
I'm not talking about how windows does it's context switching I'm talking task manager. Try it 1 application. Single threaded. Does 1 core max out at 100%. No.

Anyway thats not the point that is being argued. The point is not all the cores are being used fully hence why i said open task manager.

If you're theory is right, 4 cores would be sat there doing nothing.
If mine is right there would be some unevenish balance over 8 cores.

 

Pippero

Distinguished
May 26, 2006
594
0
18,980
[
The programs that clovertown lost to woodcrest in were not multi-threaded. As we see in the application that is properly multi-threaded clovertown gains it's advantage.
You kidding, right?
In all those programs, Kentsfield has proven to outperform an X6800 despite the big clock disadvantage, hence they are heavily multithreaded.
Now it could be that they don't scale well to 8 cores, yes.
But why nobody takes even remotely into account the possibility that the architecture itself doesn't scale as well to 8 cores as it does to 4?
I think it's interesting the fact that in 2 of those benchmarks where the dual Clovertown excels, even the 2-socket FX-7x performs at its best (in Cinebench it even surpasses Intel's quad, and in 3DSMax it is almost neck and neck under Vista RC2).
Couldn't it also be that those applications have a reduced inter-core traffic?
For example, applications which are designed to be run on clusters, have to reduce the CPU to CPU traffic at the bare minimum, since the interconnections there are waaaaay slow.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
I'm not talking about how windows does it's context switching I'm talking task manager. Try it 1 application. Single threaded. Does 1 core max out at 100%. No.

Anyway thats not the point that is being argued. The point is not all the cores are being used fully hence why i said open task manager.

If you're theory is right, 4 cores would be sat there doing nothing.
If mine is right there would be some unevenish balance over 8 cores.


that's exactly how a sigle threaded aplication runs on a multicore!!! it maxes out the usage of one core!
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
I'm not talking about how windows does it's context switching I'm talking task manager. Try it 1 application. Single threaded. Does 1 core max out at 100%. No.

Anyway thats not the point that is being argued. The point is not all the cores are being used fully hence why i said open task manager.

If you're theory is right, 4 cores would be sat there doing nothing.
If mine is right there would be some unevenish balance over 8 cores.


that's exactly how a sigle threaded aplication runs on a multicore!!! it maxes out the usage of one core!

Excellent! I'm glad you agree that when i set the affinity to a single core in task manager that it maxed out that single core.
Now look at the first half of the graph where the affinity was set to both cores.

PS: Congratulations on proving me right :D
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
[
The programs that clovertown lost to woodcrest in were not multi-threaded. As we see in the application that is properly multi-threaded clovertown gains it's advantage.
You kidding, right?
In all those programs, Kentsfield has proven to outperform an X6800 despite the big clock disadvantage, hence they are heavily multithreaded.
Now it could be that they don't scale well to 8 cores, yes.
But why nobody takes even remotely into account the possibility that the architecture itself doesn't scale as well to 8 cores as it does to 4?
I think it's interesting the fact that in 2 of those benchmarks where the dual Clovertown excels, even the 2-socket FX-7x performs at its best (in Cinebench it even surpasses Intel's quad, and in 3DSMax it is almost neck and neck under Vista RC2).
Couldn't it also be that those applications have a reduced inter-core traffic?
For example, applications which are designed to be run on clusters, have to reduce the CPU to CPU traffic at the bare minimum, since the interconnections there are waaaaay slow.

Well okay you could be right, but a lot of them arent even multi threaded.
Lame isn't, xvid isnt.
Divx has shown to scale well to 4 cores, but maybe it can't take advantage of more?
I'm not sure that most of these benchmarks are even relevant to multi core processing.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
[
The programs that clovertown lost to woodcrest in were not multi-threaded. As we see in the application that is properly multi-threaded clovertown gains it's advantage.
You kidding, right?
In all those programs, Kentsfield has proven to outperform an X6800 despite the big clock disadvantage, hence they are heavily multithreaded.
Now it could be that they don't scale well to 8 cores, yes.
But why nobody takes even remotely into account the possibility that the architecture itself doesn't scale as well to 8 cores as it does to 4?
I think it's interesting the fact that in 2 of those benchmarks where the dual Clovertown excels, even the 2-socket FX-7x performs at its best (in Cinebench it even surpasses Intel's quad, and in 3DSMax it is almost neck and neck under Vista RC2).
Couldn't it also be that those applications have a reduced inter-core traffic?
For example, applications which are designed to be run on clusters, have to reduce the CPU to CPU traffic at the bare minimum, since the interconnections there are waaaaay slow.

3D Studio Max does very well even on two difenrent PC's over a network [:)] it simply splits it work into tasks that have their own set of data and don't need to comunicate with each other! I know this because I've worked with Max's SDK
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
I'm not sure that most of these benchmarks are even relevant to multi core processing.

Strangely a month ago when C2Q apeared they were among the only ones relevant for multithreading! and they scaled well from 2 to 4 cores without recompilation wich means they were made to spawn threads equal to the number of CPU cores available because I doubt it they weere made to recognize C2Q as a proc because it didn't exist at rthe time ...
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
I'm not sure that most of these benchmarks are even relevant to multi core processing.

Strangely a month ago when C2Q apeared they were among the only ones relevant for multithreading! and they scaled well from 2 to 4 cores without recompilation wich means they were made to spawn threads equal to the number of CPU cores available because I doubt it they weere made to recognize C2Q as a proc because it didn't exist at rthe time ...

The only one that scaled well was 3d studio max.
http://www.tomshardware.co.uk/2006/09/10/four_cores_on_the_rampage_uk/page9.html

Take a look.
Also take a look at an x2 Athlon vs a Single core same gen athlon. You'll then see which applications can take advantage of multi-threading.
Then look at the performance increase gained.
If an application only increases 40% from having 2 cores, its not going to increase a lot more with 4 is it?
 

Pippero

Distinguished
May 26, 2006
594
0
18,980
Well okay you could be right, but a lot of them arent even multi threaded.
Lame isn't, xvid isnt.
Divx has shown to scale well to 4 cores, but maybe it can't take advantage of more?
I'm not sure that most of these benchmarks are even relevant to multi core processing.
Agreed.
In fact when that article came up, i posted in another thread that i didn't feel this was the proper way to test this architecture.
And, i might add, i never meant to endorse the "negative scaling theory" ;)
My point is that this architecture hasn't been tested well enough yet, and that emerged some phenomena which might be the result of several concurring causes (one is obviously that most common applications, even when multithreaded, are not designed to take advantage of many cores), which deserve better testing and further investigation.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
I'm not sure that most of these benchmarks are even relevant to multi core processing.

Strangely a month ago when C2Q apeared they were among the only ones relevant for multithreading! and they scaled well from 2 to 4 cores without recompilation wich means they were made to spawn threads equal to the number of CPU cores available because I doubt it they weere made to recognize C2Q as a proc because it didn't exist at rthe time ...

The only one that scaled well was 3d studio max.
http://www.tomshardware.co.uk/2006/09/10/four_cores_on_the_rampage_uk/page9.html

Take a look.
Also take a look at an x2 Athlon vs a Single core same gen athlon. You'll then see which applications can take advantage of multi-threading.
Then look at the performance increase gained.
If an application only increases 40% from having 2 cores, its not going to increase a lot more with 4 is it?

The thing is that these aplications don't scale because of bandwidth limitations, 3D Max 8 scales well because it is not badwidth constrained but CPU constrained, it needs way more CPU SSE/FP power than badwidth as it does huge (make that huge square) amounts of number crounching on relatively small sets of data! and it's also optimized to all the calculations at oance on a small piece of data and not iterate over and over on all data doing all the necesary calculations .... this kind of optimisations allow it to scale even on Clovertown well, but for the rest of the aplications that don't or can't follow this model of programming or arent as CPU intesive but bandwidth intensive Clowertown won't give them any advantage no matter how multithreaded they are!

Clowertown definatly has a weakness! Memory bandwidth and although its and number crunching monster it's of limited use for a number of aplications that are memory intensive... and most server applications are very memory intensive!
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
Completely agreed. Civil and informed discourse is the only way to coherently exchange information. And if you weren't so ugly and your mother didn't dress you so funny and if you didn't have cooties you'd know that.

:lol:

Ha! You only got 2 out of 3 right! :lol:
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
Just for the record:

Very good multi-threading:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=188
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=185

Pretty good:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=182
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=184
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=183

Some improvements:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=186
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=175

Very Little:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=187

Little to None:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=181
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=178
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=179
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=180
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=176