Clovertown?? ... I don't get it!

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
Just for the record:

Very good multi-threading:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=188
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=185

Pretty good:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=188
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=182
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=184
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=183

Some improvements:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=186
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=175

Very Little:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=187

Little to None:
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=181
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=178
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=179
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=180
http://www.tomshardware.co.uk/cpu/charts.html?modelx=33&model1=472&model2=467&chart=176

The multhithreding of 3D Max is as good as the multithreading of encoding applications. Max is not memory intensive but CPU intensive and it scales well. Encoders are memory intensive so therefore can't scale as well without proportional badwidth scaling!! Capisci?
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
The only codec that is MT is the H.264 one.
The result is strange but thats not due to memory bandwidth. Thats like saying driving your car at top speed is slower than when its at 80%. The cpu would just be have longer wait times on memory requests. But if the bus is saturated that means its processing more thus it should be working quicker than the woodcrest.

EDIT: And Divx sorry. But a quick bit of maths makes me believe Divx only scales well to 3 cores.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
The only codec that is MT is the H.264 one.
The result is strange but thats not due to memory bandwidth. Thats like saying driving your car at top speed is slower than when its at 80%. The cpu would just be have longer wait times on memory requests. But if the bus is saturated that means its processing more thus it should be working quicker than the woodcrest.

EDIT: And Divx sorry. But a quick bit of maths makes me believe Divx only scales well to 3 cores.

Congrats! you have chosen a codec that is 10x more CPU intesive than the others as representative !!! H.264 is notorius for it's hunger for floating point power ... H.264 is as thdeaded as others, t just does tenfold more processiong on data wich means more procesiing and less memory access wich means it scales better on Clovertown! This is how you define good multithreading, I see by how little of memory access it does!
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
The only codec that is MT is the H.264 one.
The result is strange but thats not due to memory bandwidth. Thats like saying driving your car at top speed is slower than when its at 80%. The cpu would just be have longer wait times on memory requests. But if the bus is saturated that means its processing more thus it should be working quicker than the woodcrest.

EDIT: And Divx sorry. But a quick bit of maths makes me believe Divx only scales well to 3 cores.

Congrats! you have chosen a codec that is 10x more CPU intesive than the others as representative !!! H.264 is notorius for it's hunger for floating point power ... H.264 is as thdeaded as others, t just does tenfold more processiong on data wich means more procesiing and less memory access wich means it scales better on Clovertown! This is how you define good multithreading, I see by how little of memory access it does!


Oh dear, i think you just proved what your word is worth. You haven't even looked at these charts have you?

image046.gif


The only reason i picked that codec is because its the only one that shows any kind of decent sign of multi-threading capability from the cpu charts. As we see it does worse than a dual core woodcrest.

Please pick a memory intensive test from the list? You keep going on a bout them but i don't see them. C'mon give us some figures.

Infact take a look at 4x4. Since Intel is so starved for bandwidth how come AMD couldn't pass them in these "memory intensive" tests that you have going on in your mind.
 

cryogenic

Distinguished
Jul 10, 2006
449
1
18,780
The only codec that is MT is the H.264 one.
The result is strange but thats not due to memory bandwidth. Thats like saying driving your car at top speed is slower than when its at 80%. The cpu would just be have longer wait times on memory requests. But if the bus is saturated that means its processing more thus it should be working quicker than the woodcrest.

EDIT: And Divx sorry. But a quick bit of maths makes me believe Divx only scales well to 3 cores.

Congrats! you have chosen a codec that is 10x more CPU intesive than the others as representative !!! H.264 is notorius for it's hunger for floating point power ... H.264 is as thdeaded as others, t just does tenfold more processiong on data wich means more procesiing and less memory access wich means it scales better on Clovertown! This is how you define good multithreading, I see by how little of memory access it does!


Oh dear, i think you just proved what your word is worth. You haven't even looked at these charts have you?

image046.gif


The only reason i picked that codec is because its the only one that shows any kind of decent sign of multi-threading capability from the cpu charts. As we see it does worse than a dual core woodcrest.

Please pick a memory intensive test from the list? You keep going on a bout them but i don't see them. C'mon give us some figures.

Infact take a look at 4x4. Since Intel is so starved for bandwidth how come AMD couldn't pass them in these "memory intensive" tests that you have going on in your mind.

I've said it scales better than other codecs becaus it's not as memory intesive as the others not that it actualy has better performance! It's still badwidth starved to scale positively on 8 cores ... but not as heavily as the others!

AMD has enough bandwidth C2 has too, C2Q almost has enough and Clovertown is bandwidth starved as can be seen from the fact that it the only platform that does't offer any positive perf increase for bandwith intensive (and properly threaded) aplications.
 

accord99

Distinguished
Jan 31, 2004
325
0
18,780
Xeon MPs and DPs dominate the most important TPC benchmark in the x86 world. Well-designed chipsets, multiple FSBs and decent size caches scale better than Hypertransport.
I cannot agree with you.
Xeons have better performance in key enterprise benchmarks like TPC-C, SAP-SD. The current implementation of Hypertransport is well-known to suffer from scalability past 4S. IBM and Unisys both produce more advanced chipsets that allow scalability upto 32S for Xeon MPs with reasonable scaling.
 

Action_Man

Splendid
Jan 7, 2004
3,857
0
22,780
AMD has enough bandwidth C2 has too, C2Q almost has enough and Clovertown is bandwidth starved as can be seen from the fact that it the only platform that does't offer any positive perf increase for bandwith intensive (and properly threaded) aplications.

Ahahahaha, what a load of sh!t, ahahahahah!

I should point out that your memory bandwidth theory/fsb saturation is complete BS because it is running on 1333FSB rather then then 1066 on the desktop and its dual independent FSB's.

From the article:

It's good to see that Intel went after the faster 333-MHz system bus speed for its quad-core processors, since this pushes the bottleneck threat back for the time being.

And:

The main difference between the E7500 series and the latest 5000 chipset series is the latter's Dual Independent Bus (DIB) architecture. Processors on older Xeon platforms had to share the Front Side Bus, which can represent a bottleneck. Now, each processor socket gets its own interface. Thanks to the clock speed increase to 333 MHz, each processor has a total bandwidth of 10.66 GB/s.

Owned.
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
heh yep.
A saturated FSB wouldn't produce slower results anyway. I don't know where this idea is coming from...It just means the memory is the bottleneck.

Each test for each machine would have the same amount of data to process. If saturating the bus makes it slower then you are saying the less speed you use the faster you are going.

Thats like saying if you drive two cars 10000 miles along a road which can support speeds of 1333mph. Car A travels at 1066mph and car B at 1333 MPH car A will get there first!

EDIT: Another point. I really don't think memory is fast enough to bottleneck the FSB anyway. The exception maybe being if every single instruction in a cpu loop requires direct access to memory.
Only time i can see that happening intensively are memcpy and memset instructions on large amounts of data. Still then you are going to need to use SSE copy instructions to saturate the bus. PLus you are going to need a lot of memory to copy from and to!
 

Action_Man

Splendid
Jan 7, 2004
3,857
0
22,780
And if you look at anandtechs review of quad crap x link, it scales fine from 2-4 cores for the most part. So *gasp* maybe, just maybe its the applications they're using. Also take note that in the H.264 test the higher clocked core 2 is ahead of the quadcore and with AMD the dual cores are faster then the quad cores at the same clock. Well sh!t, how about that.
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
with AMD the dual cores are faster then the quad cores at the same clock. Well sh!t, how about that.

:D

Exactly why i dug out those links from the cpu charts.
They show the percentage scaling of AMD. AMD has so much bandwidth they could power the world, so i thought they'd be a great benchmark for showing what can be gained from going single to dual core without any so called memory restrictions...

The only benchmark that shows a really good increase across all platforms is 3D Studio Max.
Divx Scales well to 2 cores, then to 4 it seems like it only uses 3. We can tell this because a mhz increase gains a lot more performance than core increase.
MainConcept H.264 does pretty much the same too.

We could say this is a bandwidth issue, only problem with that is the 4x4 should break that trend. It doesn't.
 

bga

Distinguished
Mar 20, 2006
272
0
18,780
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Thats because most of todays applications and certainly every game out there today, do not load balance in any meaningful way, so 2 cores at 2.93GHz are faster than 4 cores at 2.66GHz, because the two extra cores are just sitting idle.

At this point I'm quite sure Intel's 45 nm upcoming chips won't make much difference in 2007, the FSB it's killing them.
That is utterly nonsense. Anandtech has some good benchmarks where they slow down the FSB to see what effect that has on performance. The result is almost none - so no, the FSB is nowhere the limiting factor. The limiting factor is application load balancing or benchmarking in a multitasking environment (Try burning a DVD and encoding two HD video streams while playing a highend game).

A few threads??? yahoo messenger has around 30 threads, the winlogon process can have 10, 20 or more ... FireFox with 5 open pages has 17 ... The system process in windows can have hundreds of threads ...

They are communication threads, waiting for input, or waiting for other processes. It is not enough to be multi threaded to take full advantage of multiple CPU's, it is nessesary to multitread the critical path in your applications execution. That is much more difficult to code, than to create a lot of communication threads.

A typical instance of SQL Server running on my dev. machine has close to 300 threads ...
As a software developer that works with databases every day I sure would like to see some test on database performance.

Thats more interesting. Databases are good at load balancing. The only two benchmarks in the article won clearly by dual Clovertowns (8 core) are the database tests. So you will love dual clovertowns in your database server - thats their proper home. For the gamers the fastest machine is single CPU systems build on the X6800 (overclocked preferably).
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
Thats more interesting. Databases are good at load balancing. The only two benchmarks in the article won clearly by dual Clovertowns (8 core) are the database tests. So you will love dual clovertowns in your database server - thats their proper home. For the gamers the fastest machine is single CPU systems build on the X6800 (overclocked preferably).

I tell you what. I didn't even see this page...this is what people who are buying clovertowns will care about too.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
Thats more interesting. Databases are good at load balancing. The only two benchmarks in the article won clearly by dual Clovertowns (8 core) are the database tests. So you will love dual clovertowns in your database server - thats their proper home.

Not necessarily. Databases are very I/O intensive and with today's modern processors, I see that disk I/O is the bottleneck more often than not. For example, I have a cluster of four nodes with dual core Opteron 280s and 3TB of fiber channel disks connected via 2Gb fiber. The CPU is idle for about 50% of the time, while it waits on disk. So, adding more CPU does nothing to the system at all.

Now if you are upgrading from older systems where you are CPU bound, then I'd advise one to determine which CPU gives them the most bang for the buck. TPC reports are very generic and are not representative of one's particular load, so I'd shy away from them.
 

djgandy

Distinguished
Jul 14, 2006
661
0
18,980
Thats more interesting. Databases are good at load balancing. The only two benchmarks in the article won clearly by dual Clovertowns (8 core) are the database tests. So you will love dual clovertowns in your database server - thats their proper home.

Not necessarily. Databases are very I/O intensive and with today's modern processors, I see that disk I/O is the bottleneck more often than not. For example, I have a cluster of four nodes with dual core Opteron 280s and 3TB of fiber channel disks connected via 2Gb fiber. The CPU is idle for about 50% of the time, while it waits on disk. So, adding more CPU does nothing to the system at all.

Now if you are upgrading from older systems where you are CPU bound, then I'd advise one to determine which CPU gives them the most bang for the buck. TPC reports are very generic and are not representative of one's particular load, so I'd shy away from them.

You have a good point too :) I'm not familiar with databases and memory intensive applications, but i do know that the information does have to get into the memory somehow. And i have disassembled plenty of programs in my time, and i'll let everyone in on a secret, they don't spend all their time doing memory transfers.

To see 5gb/s+ slowing down i'd assume you'd have to some ridiculous amount of information loaded into memory ready for use.

This kind of leads onto PCI-E and graphics cards....Back in the day when all the graphics buffer was stored in system memory a fast bus would be required...Nowadays all the textures and other rubbish are stored on the graphics card. So what is actually being sent across this PCI-E 16x slot while you are playing.....Okay on game load all the information has to get to the graphics card so you may see a substantial speed increase there.... but do you seriously think 8gb/s is being sent across your PCI-E bus to your graphics card every single frame? Of course its not!! There have been tests that 4x is fast enough for graphics requirements...

....anyway blah blah....bored....the moral of the story is use your brain, just because something is marketed doesnt make it good...data is useless until it is processed...When you do systems analysis you find this out. What processes data....the processor.....
 

bga

Distinguished
Mar 20, 2006
272
0
18,780
Not necessarily. Databases are very I/O intensive and with today's modern processors, I see that disk I/O is the bottleneck more often than not. For example, I have a cluster of four nodes with dual core Opteron 280s and 3TB of fiber channel disks connected via 2Gb fiber. The CPU is idle for about 50% of the time, while it waits on disk.

Sounds like you could use some more RAM. :D

Databases can stress systems in a number of ways. Very large databases, where the queries are not to complex, are disk bound.
Small databases with complex queries on systems with massive amounts of RAM are CPU bound.
As you wrote the limit in the I/O system is almost always the disks. Even with such a high performance disk system as yours, there is nowhere a FSB or memory bottleneck. The disks are way slower than anything else in a modern server.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
Sounds like you could use some more RAM. :D

Databases can stress systems in a number of ways. Very large databases, where the queries are not to complex, are disk bound.
Small databases with complex queries on systems with massive amounts of RAM are CPU bound.

Well, we have 32GB of RAM supporting that database, and that's plenty so far.

And you cannot generalize the bottlenecks of a system like that. Even on small databases, if you have either:

1) Poorly written queries
2) Poorly designed indexes and tables

You can still be disk bound. If you have a CPU that is constantly being hammered, even after you properly tune the queries and objects, then, yes, you are definitely CPU bound.

I see many people who get the mindset of "throw hardware at the problem and let that fix it" and I love to watch their faces when the problem gets worse. Deploying an Oracle RAC database comes to mind.

As you wrote the limit in the I/O system is almost always the disks. Even with such a high performance disk system as yours, there is nowhere a FSB or memory bottleneck. The disks are way slower than anything else in a modern server.

Totally agreed.
 

bga

Distinguished
Mar 20, 2006
272
0
18,780
And you cannot generalize the bottlenecks of a system like that. Even on small databases, if you have either:
1) Poorly written queries
2) Poorly designed indexes and tables
You can still be disk bound. I see many people who get the mindset of "throw hardware at the problem and let that fix it"

In my posts I certainly assume a well written application with well thought out indexes.

And I agree that with 99% of performance problems, the solution is to check and optimize the software - not to throw hardware at the problem.
But returning to benchmarks, if testing CPU's, the benchmarks should be written to stay in RAM and be optimized so they are not testing defective software instead of testing the newest and greatest CPU.
Then you will see, that databases is one of the applications which scales best with increasing core count. Thats why Sun is making CPU's with lots of simple cores.
 

belvdr

Distinguished
Mar 26, 2006
380
0
18,780
Then you will see, that databases is one of the applications which scales best with increasing core count. Thats why Sun is making CPU's with lots of simple cores.

Well, then that's a specially configured database, and I see what you're saying. I didn't know Sun still made CPUs; I'm not sure how long they can hold out making their own proprietary hardware. I know of no other manufacturers who are still producing proprietary hardware, except IBM. I know they were dwindling for some time, and I don't think it will stop.
 

bga

Distinguished
Mar 20, 2006
272
0
18,780
I didn't know Sun still made CPUs; I'm not sure how long they can hold out making their own proprietary hardware.

Yes, it will be interesting how long the can hold out. At least they are not trying to compete in the general CPU space but going for CPU's developed specifically for servers. Here are some interesting links for your information:

New 16 core CPU from Sun.
http://news.com.com/Sun+puts+16+cores+on+its+Rock+chip/2100-1006_3-6141961.html?tag=nefd.top

Opteron, Woodcrest and Sun T1 test:
http://www.anandtech.com/IT/showdoc.aspx?i=2772
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
Encoders are memory intensive so therefore can't scale as well without proportional bandwidth scaling!! Capisci?

Encoders are neither memory intensive, nor L2 cache intensive.

The best encoder systems have huge core counts, very little L2 or L3 cache per core (waste of die space), and not anywhere near the memory throughput some people think. (Bear in mind they use the same core algorithms, and CODECs).
 

bga

Distinguished
Mar 20, 2006
272
0
18,780
Excellent-o, does it have 2 or 4-way SMT per core and only use 72 watts like the last one ?
[/b]

It has 2-way SMT (per core=32 threads total) as current the T1 processor.
I don't know about power - the processor prototype hasn't yet taped out, but some more data should come in january.
[/quote]
 

bga

Distinguished
Mar 20, 2006
272
0
18,780
I thought the current ones were 8 cores, 4-way SMT each, for 32 threads at 72 watts.
So this halved SMT per core, but doubled core count ?

Sorry, error in last post: The current processor is 4 way SMT, 32 threads. From the information in the article it mentions that the new 16 core CPU are also 32 threads per CPU. Does that make them only 2 way SMT?