AMD ''Interlagos'' Bulldozer Benchmarks Leaked

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
No one talking about MFLOPS here, it sounds impossible for the processor to only achieve 88MFLOPS!!! even for just misspelling that switched GFLOPS to MFLOPS is horribly slow!!!

If you have 32 cores at 1.8GHz capable of executing 4 FLOP per cycle, you are supposed to achieve a theoretical 230.4GFLOPS peak performance on this system. Assuming a poorly tunned system, you can achieve at least 85% of total performance, since there's no network bottleneck and you have enough memory to accommodate all data in RAM so you can expect at least a 195GFLOPS system.

Giving the fact that this Himeno test is not "multithread friendly", lets assume it just used 1 core of the system, it is still at about 1.22% of its total performance with a result at 88MFLOPS (out of 7.2GFLOPS total).

All in all, I don't trust these results, it seems to be just a run of the benchmarks without taking care of tunning them for the system.
 
[citation][nom]jamessneed[/nom]I have no idea how anyone can compare this server CPU to a desktop CPU. The IPC may scale up at higher clock speeds and im pretty sure the desktop version will be at least twice the clock speed of this 1.8Ghz CPU. Lets all just chill out and wait until we get real benchmarks of desktop CPU versions. Until we get the real scoop it is just total speculation.[/citation]
How can it? To change the IPC requires changes in architecture.
 
@PreferLinux

you are indeed correct, i was thinking workstation, i guess i saw their benchies and thought workstation but they used the term server so that kind of stuck in my head, guess i got myself all confused lol

now if we talking integer crunching then your GPGPU is going run circles around CPU cores any day of the week considering that each stream processor can be used to crunch numbers and a standard GPU has odd over 100s of them, of course for that to happen openCL or such likes need to take off
 
Exactly, the IPC is just that, instructions per cycle. Increasing the clock speed not only doesn't increase the instructions per cycle it can actually lower it because the gap between CPU clock speed and memory speed will get wider, increasing the pipeline stall imposed by a cache miss. Increased clock speeds can increase aggregate throughput just like increases in core count, but there is a point at which the law of diminishing returns comes into play. Fundamentally the Sandy Bridge CPUs have better IPC, and when 6, 8, maybe even 10 core Sandy Bridge-E processors show up Bulldozer could have problems in all but the most highly threaded workloads.
 
Incorrect assumptions being made:

1. That any of those tests scale perfectly by core count
2. For those FP tests, a 16 module Bulldozer setup is essentially a 16 core, not a 32 core. If it was primarily an integer workload, you could consider it a 32 core.
3. That the 2.6.37 kernel properly supports Bulldozer. Proper Bulldozer support is likely to come in the yet-to-be-release 2.6.39 kernel, or even 2.6.40.
4. That a dual-socket motherboard scales perfectly
5. That a multi-chip packaged CPU scales perfectly.

Of course, the fact that we're comparing a server CPU designed for massive parallelism to a desktop CPU is a problem, and the fact that those benchmarks are not geared toward the purpose of said server CPU is bound to cause lots of confusion. I'm sure a 4 module BD @ 4.0ghz in a single socket desktop mobo with desktop RAM would yield very different results.
 
This isn't going to run at 3GHz, it's not running at 1.8GHz because it's an "engineering sample". It's running at 1.8GHz because they won't be running 16 cores in a single package at 3GHz+. They will be running it at 1.8GHz, possibly 2.0GHz, _maybe_ 2.2GHz. At 3GHz the thing will burn like Thermite.
 
@dragonsqrrl I wrote very clearly that I was referring to the desktop version of Bulldozer. I do not know much about servers, so I won't even bother comparing BD server chips against Intel server chips. You can't really go by comparing cores to cores because this just isn't Apples to Apples comparison (they both have different engineering architectures). The only most legit comparison, like I said earlier, is price / performance.
 
[citation][nom]jojoshabadoo[/nom]This isn't going to run at 3GHz, it's not running at 1.8GHz because it's an "engineering sample". It's running at 1.8GHz because they won't be running 16 cores in a single package at 3GHz+. They will be running it at 1.8GHz, possibly 2.0GHz, _maybe_ 2.2GHz. At 3GHz the thing will burn like Thermite.[/citation]

No it won't. Remember that each "core" is a tiny 12% size increase per module which in itself translates into a 5% increase per die, all on a 32nm process. Now, think about the fact that Magny-Cours is a 12-core part utilising two 45nm 6-core processors in a multi-core module... and doesn't burn out nor consume untold amounts of power even at 2.3GHz.

I'll be very surprised if Zambezi is as impressive on the desktop as it will be in the server world but we don't really know yet, and won't do for a while.
 
Bulldozer_apologist writes:
> 1. That any of those tests scale perfectly by core count

C-Ray does, that's why I use it on my site for testing SGIs (eg. I have a 36-CPU
Onyx3800), and it's also why - according to Michael Larabel - it's become so popular
with vendors of large multi-CPU/core systems. Indeed, C-ray's thread limit is merely
the vertical resolution of the output image, so the test used here could also run
perfectly on very large systems such as a 200-CPU Altix UV (ie. 1200 cores total,
1200 threads, 1 per core, ie. 1 scan line per core; perfect scaling).

Can't comment on the other tests though.

Ian.

 
Hi guys!

If you look carefully in the test you will see that there is a comparison between 2*16 core bulldozer with 32 effective threats witch scored in C-Ray 1.1 25.97 against 4*8 cores Xeon X7550 with 32 cores and 64 threats witch scored 13.47 running at 2-2.4 ghz.

So the Intel cpu has higher frequency and double the number of threats

Theoretically these to configurations now are equals with actually around 10-20% advantage on AMD side.

I hope it is a fairly correct assumption!

Thanks

 
So, here's a random fun idea. How about we stop with the "fair testing" arguments and take the numbers as a good indication that BD isn't too far around the corner? I never care what engineering samples can do cuz that's all they are. Samples. Meaning there could be tweaks, mods, changes in frequency, architecture, etc. From a desktop point of view, even being an AMD fanboy, I'm just hoping that BD is in good league with SB. If it fits between SB and IB, I'll be ecstatic. Competition is good for both sides. Whether you want bragging rights or not, it's important to keep things moving along. Look at what XP hanging around has done for the gaming industry. DX9 is still top dog out there with regards to popularity. I guess when it comes down to it, I'm more of a tech fan then a camp supporter in the end, as I usually root for the underdog. But, at the end of the day, if prices have no reason to come down, they won't.
 
[citation][nom]jprahman[/nom]but somewhat weak IPC, which doesn't bode well for the type of gaming performance we could see.[/citation]
Most highend games are getting efficient using multiple cores and the lowerend games don't exactly need more power. Battlefield: Bad Company 2 uses up to SIX cores! I only have a quad core at the moment but I can tell you they're all at 100% when playing. It leaves no room for streaming or frapsing properly.
 
1.8 Ghz * 2 cores = approx 1 bulldozer @3.6 Ghz

The benchmarks probably are not scaling well or optimized for BD either.

GET READY TO STOMP
 
schmich writes:
> Most highend games are getting efficient using multiple cores ...

How does one know they're efficient?


> ... I only have a quad core at the moment but I can tell you they're all at
> 100% when playing. It leaves no room for streaming or frapsing properly.

To me that sounds more like it being inefficient, like FSX causes heavy gfx
loading because its texture management is so bad.

Without knowing how it's coded or what's going on, there's no way of knowing
the efficiency of an application, certainly not in a manner that allows one
to say it's efficient just because it's maxing multiple cores. Could easily
be entirely the opposite.

Ian.

 


I agree and I should have elaborated.

The "Bulldozer" server parts and desktop parts have some minor differences. The desktop "Bulldozer" part will support PC3-15000 and the server part will have a PC-12800. The faster bus clock will compensate for faster CPU clock to some degree as far as IPC is concered. The branch predictor from all reports looks fairly deep so Im just guessing that scaling the clock to close to 4Ghz wont start to have much of an impact on the cash miss penalty.

My main point on my inital post was we can't judge the desktop performance by this test as the bus speed will be higher and the clock speed will be double or more than this CPU tested. Ill give you the IPC may go down and not up but nobody really knows until we see some real tests.
 
I believe most models of Llano will also support 1866MHz DDR3. Let's hope that AMD allow that with all slots populated.

That's a theoretical 29GB/s bandwidth for a dual-channel setup but, as we all know, it's never that simple.
 
Very true, GeekApproved; back in the elder days, when SGI held crazy numbers of
server/HPC records, their CPUs had unremarkable MHz ratings compared to x86 at the
time. As John Mashey once said (I think it was), it's not how fast one can process,
it's how much one can process. Look at the old Onyx2, CPUs at only 300MHz, but it
can load and display a 67GB image file in less than 2 seconds (Onyx2 Group Station
for Defense Imaging). I/O is still critical for servers, not MHz; even when it comes
to rendering, I/O is still very important. A guy at Sony Pictures told me, "For
large environments, 500GB definitely gets hit. Hundreds of millions of polys spread
among thousands of objects rendered in a raytracer where you have layered shaders,
each calling 10-20 4096x4096 textures? Definitely. In production on all shows I've
been on, it's a constant bottleneck that has to be managed carefully, 'cause every
show will have up to several hundred shots with that high of a demand for textures."

In that regard, C-ray is not a very good test as it doesn't stress I/O that much
at all. Rather, it measures pure peak fp potential. Still interesting though.

Ian.

 
So, based on the C-Ray results, would you say Interlagos impresses or disappoints? The i5-2500K is a 4C/4T design at 3.3GHz so it's clocked at nearly twice the speed of the Interlagos samples, and Interlagos only features one module-level FP so, using a rough-as-a-bear's-arse comparison, 16 FP units compared to 4 clocked at about double the speed results in just over twice the speed. Assuming the immaturity of the platform plays a factor along with the fact that I/O isn't really an issue here (as you touched upon), Bulldozer may actually be similar in FP capability to Sandy Bridge in this test clock-for-clock.

Amusingly, Phoronix classes the i5-2500K as a "quad-core + Hyper Threading" which we know to be wrong.
 


Yeah, I agree that's one point we should take away from this. While it's nice to get some more numbers from which we can extrapolate and debate the possible performance for the actual chips we will still be extrapolating and debating. So we'll just have to wait until Bulldozer's release in June until we have final, definitive figures rather than potentially flawed extrapolations.
 
silverblue writes:
> So, based on the C-Ray results, would you say Interlagos impresses or
> disappoints? The i5-2500K is a 4C/4T design at 3.3GHz so it's clocked
> at nearly twice the speed of the Interlagos samples, and Interlagos
> only features one module-level FP so, using a rough-as-a-bear's-arse
> comparison, 16 FP units compared to 4 clocked at about double the speed
> results in just over twice the speed. ...

Kinda hard to say with a simple test like this, but assuming AMD is
planning a proper version with 2 FPs then it kinda looks like they're
on a par re performance per clock tick. But if the final version is
still only one module-level FP then perhaps not so good, unless the
clock rate is a lot higher. However, as others have said, only final
proper desktop tests will show for sure.

I'll be interested to see whether SGI supports it in the Altix line.
They're already moving towards the 1st-gen goal of a quarter of a
million cores in a single system image; Interlagos (where do they get
these names??) would double that.


> issue here (as you touched upon), Bulldozer may actually be similar in
> FP capability to Sandy Bridge in this test clock-for-clock.

I suppose it would then boil down to what Intel can do above and beyond
SB for the Z68 release and server/XEON equivalents. SB is good, but I
have a feeling it's just the beginning. AMD has been playing catchip for
a while now at the performance level - could be it just happens all over
again. Would certainly be good if AMD can get competitive again, the
market needs it to be healthy.

So far I've never been able to test how a Nehalem XEON compares to a
normal i7 for 'general' tasks, but a friend is going to loan me a X5570
4-core 2.93GHz, at which point I'll be able to test it in a bare-bones
Dell R7500 I've obtained; I'll compare to an i7 950 I'm planning on
buying, see what happens (for desktop stuff I mean). Once done, the
X5570 will be up for sale.


> Amusingly, Phoronix classes the i5-2500K as a "quad-core + Hyper
> Threading" which we know to be wrong.

😀

I kinda skipped SB. I didn't like the low number of PCIe slots on
typical boards, or the price points. Plus I'm still having fun messing
about with i3/i5/i7s (I have an i3 540, i5 670, i5 760, 2 x i7 870,
soon getting an i7 950 I hope).


jprahman writes:
> Core i5-760 @ 3.4GHz|EVGA P55 FTW|4GB GSkill DDR3 1600MHz|2 X EVGA
> GTX 460 1GB Superclocked SLI| ...

I'm intrigued by your sig. 8) I've just obtained a P55 FTW, probably
going to fit it with an i5 670. Meanwhile, I'll be testing a 760 with
two EVGA GTX 460 FTWs soon (already done tests with an i7 870).
I'll test the 670 with the same cards eventually aswell, along with a
whole bunch of other cards.

Ian.

 
Status
Not open for further replies.