Interlagos and Valencia Discussion

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
A little bit of technical background here courtest of Chris Angelini:

http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-10.html

Big hello to Sylvie B at eetimes ... whose fine stories many of us once enjoyed at "The Inq" ... with a decent dash of humour I might add!!

http://www.eetimes.com/electronics-news/4230565/AMD-s-Interlagos-and-Valencia-finally-emerge

This is a sticky for you to post and discuss benchmarks and the architecture for the new Interlagos and Valencia Server CPU's released today, based on the Bulldozer modular design.

I have put it under the server subsection here.

Enjoy.

http://www.amd.com/us/aboutamd/newsroom/Pages/newsroom.aspx



:)
 
Anand bench's favor Intel ... who would of thought it.

Anyhow, I'd have to see what compile options were used and a code analysis first. It's why we do our own benching with in house tools, it's entirely too easy to *tweak* a bench to favor one product over another.

We were looking at BD when we were considering going from SPARC to x86, but decided to stay with the T2/T3 architecture for now.

Also stop thinking about server bench's the same as desktop. It's not about a single big number in a single instance of a program. Even a suite that *cough* "simulates" a server environment isn't good enough. You need to actually setup a database server (Oracle in our case), setup the web front end (BEAWLS) with all the components and connectors. Configure a few java application servers and connect them to your primary J2SE instance. Deploy a bunch of webapps to these webservers.

Then you benchmark the whole suite. Some applications favor certain architectures over others, especially if the developers spent some time hand coding optimizations into certain functions.

I've personally seen this make a difference, specifically when we were looking at IBM power vs SUN Sparc awhile back. The IBM box posted higher virtualized transaction numbers, but when we did a suite test, the Sun box got more total transactions. This was due to SUN's CMT architecture handling tread switching better then IBM's, something that would never been seen in a virtualized test but exists in real world.

I'm not a fanboy, but I absolutely hate seeing people pull numbers out of a report, or an article based on a review of a report, and abuse it.
 
In regards to per-core licensing, I will say this: Microsoft could have ridden the money train VERY early on by licensing Windows by the core, like some other products [IE: Anything by National Instruments] did. They didn't, and I give MSFT a LOT of credit for that.

That being said, pure money grab by MSFT on the server side. Can't blame them though.
 
I wish AMD just silently admitted to the OS that the module concept was really just an optimised version of hyperthreading. Saves it a bag of hurt.

What I think AMD will succeed in, is the world of HPC applications dominated by Linux. There we can live with not paying anything for server software, mitigating Intel's per-core advantage.
 


I don't think Microsoft could afford to do it. Most of the per-core licensed software (ie. most VMware, Oracle, IBM stuff) also have Linux counterparts that would have probably pushed a lot of companies to really consider Linux to keep overhead down. For a lot of these companies, the software running on top of the OS is more important and they don't have much of a choice of running anything else.

edit: I did hear something about SQL Server 2012 having some type of core based licensing, but not sure.
 



Yes, but most of those applications running on top of Linux are per-core licensed, for example, Oracle DB Enterprise is per-core licensed. For people running these types of applications more-cores isn't always better. The reason a lot of these companies run Linux is for performance, overhead cost, and flexibility or specific needs (ie. ZFS).

At the end of the day, to me at least it seems like AMD is going the way of Netburst, except instead of speed they are pushing cores rather than making each core efficient per clock.
 


IIRC the LGA1567 Xeons can be run in up to 8-way operation without any additional "glue" chips, like the previous Opteron 800/8000 series CPUs. That gives you 64 cores and 32 memory channels using Nehalem-EXes and up to 80 cores/32 memory channels with Westmere-EXes. The Opteron 6000s only support up to 4P operation. Probably the most important distinctions are that you can put 64 Opteron cores on a 4P board in a 1U server and buy CPUs + board for $3000-5000, while 8P Xeon setups are 4U+ only due to the CPU daughter cards and each 8-way-capable Xeon MP by itself costs about what four 6272s and a 4P board cost.
 
http://www.anandtech.com/show/5553/the-xeon-e52600-dual-sandybridge-for-servers

Conclusions

Our conclusion about the Xeon E5-2690 2.9 GHz is short and simple: it is the fastest server CPU you can get in a reasonably priced server and it blows the competition and the previous Xeon generation away. If performance is your first and foremost priority, this is the CPU to get. It consumes a lot of power if you push it to its limits, but make no mistake: this beast sips little energy when running at low and medium loads. The price tag is the only real disadvantage. In many cases this pricetag will be dwarfed by other IT costs. It is simply a top notch processor, no doubt about it.

For those who are more price sensitive, the Xeon E5-2630 costs less than the Opteron 6276 and performs (very likely) better in every real world situation we could test.

And what about the Opteron? Unless the actual Xeon-E5 servers are much more expensive than expected, it looks like it will be hard to recommend the current Opteron 6200. However if Xeon E5 servers end up being quite a bit more expensive than similar Xeon 5600 servers, the Opteron 6200 might still have a chance as a low end virtualization server. After all, quite a few virtualization servers are bottlenecked by memory capacity and not by raw processing power. The Opteron can then leverage the fact that it can offer the same memory capacity at a lower price point.

The Opteron might also have a role in the low end, price sensitive HPC market, where it still performs very well. It won't have much of chance in the high end clustered one as Intel has the faster and more power efficient PCIe interface.

And it looks like Johan added an HPC test to his benchmark suite:

This is one of the few benchmarks (besides SAP) where the Opteron 6276 outperforms the older Opteron 6174 by a tangible margin (about 17% faster) and is significantly faster than the Xeon 5600, by 29% to be more precise. However, the direct competitor of the 6276, the Xeon E5-2630, will do a bit better (see the E5-2660 6C score). When you are aiming for the best performance, it is impossible to beat the best Xeons: the Xeon E5-2660 offers 20% better performance, the 2690 is 31% faster. It is interesting to note that LS-Dyna does not scale well with clockspeed: the 32% higher clockspeed of the Xeon E5-2690 results in only a 14% speed increase.

A few other interesting things to note: we saw only a very smal performance increase (+5%) due to Hyperthreading. Memory bandwidth does not seem to be critical either, as performance increased by only 6% when we replaced DDR3-1333 with DDR3-1600. If LS-Dyna was bottlenecked severely by the memory speed we should have seen a performance increase close to 20% (1600 vs 1333).

CMT boosted the Opteron 6276's performance by up to 33%, which seems weird at first since LS-DYNA is a typical floating point intensive application. As the shared floating point "outsources" load and stores to the integer cores, the most logical explanation is that LS-DYNA is limited by the load/store bandwidth. This is in sharp contrast with for example 3DS Max where the additional overhead of 16 extra threads slowed the shared FP down instead of speeding it up.

Also, both CPUs seem to have made good use of their turbo capabilities. The AMD Opteron was running at 2.6 GHz most of the time, the Xeon 2690 at 3.3 GHz and the Xeon 2660 at 2.6 GHz.

The three vehicle collision test does not change the benchmarking picture, it confirms our early findings. The Opteron Interlagos does well, but the Xeon E5 is the new HPC champion.
 


At least quote Tom's own Xeon E5 article. 😉

Intel has won this round pretty clearly, even if you use the best-case scenario of a lot of benchmarks on Linux using GCC instead of Intel's crippling-anything-that's-not-GenuineIntel compiler on Windows. Anand is a big-time Intel shill and takes no opportunity to prop up Intel or trash AMD in a sensationalist manner. I read some of their articles but it's like watching MSNBC- there is occasionally some good info in there, you just have to dig it out of the pile of spin. However, Tom's, which is much more fair and actually tried to give AMD a fair shot still arrived at the same conclusion. Intel knows it too, since they have actually *raised* prices on CPUs. The top Xeon DP E5 costs over $2000, compared to around $1500 in the past several generations. AMD really does need to get back in the fight as a decent server CPU generally makes a decent workstation CPU and vice-versa. Tom's correctly said that AMD has a decent platform with the AMD 890FX-based SR5690/SP5100 chipset. Their problem is that once you get beyond the severely kneecapped "basic" Intel SKUs, Intel pretty well dominates the market and can charge whatever they want. That's why we are seeing >$2000 DP CPUs again. Bulldozer isn't a bad design on paper and is more of an execution problem than a design problem, or so it appears. Fix the caches and the FPU scheduler in Piledriver and AMD ought to become at least decently competitive again. Lord knows we don't want to go back to the mid-90s again with Intel charging around six grand in today's dollars for their top-bin CPUs. (Those of you old enough remember what the top Pentiums and P2s cost? The top original PII cost over $2000! 😱 ) I appreciate AMD's willingness to give customers a fair shake in doing things like not changing sockets like a teenage girl changes her wardrobe, severely crippling chips that aren't considered "high end," or charging outlandish prices for multi-socket capable CPUs, but it becomes pretty darned hard to buy AMD CPUs if they are quite a bit behind Intel's. Here's to Piledriver at least being at least an Istanbul on Linux to Intel's Nehalem instead of a B2 Barcelona on Windows...
 


IIRC AT was first with their review, which is why I quoted them.

Sorry MU but that just strikes me as yet more AMDZone propaganda. It was Johan de Gelas' article - not Anand Lal Shimpi - and he is pretty respected most places with the exception of AMDZ.. I have read some server threads over there where Johan attempted to explain his testing methodology and got nothing but insults and flames and little substantive or constructive feedback for the most part. Even the supposedly knowledgeable mods there couldn't manage to articulate any real flaws but just spew crap instead. Most of their objections and rhetoric stem from when Johan allegedly over-mentioned the Xeon competition during an Opteron review a few years back IIRC.

Bulldozer isn't a bad design on paper and is more of an execution problem than a design problem, or so it appears. Fix the caches and the FPU scheduler in Piledriver and AMD ought to become at least decently competitive again.

Maybe. But then AMD had something like 5 years to work on BD from when it first appeared on their roadmap. The common perception after BD appeared and the benchies were disappointing, was that BD was optimized for server workloads and so Interlagos should really shine. However those Interlagos reviews were also a pretty mixed bag. AMD had to compete on price and not much else. Now that the E5s have a model similarly priced and better performing, I guess AMD will have to drop prices once the HPC and server market has finished upgrading from Magny Cours.

Here's to Piledriver at least being at least an Istanbul on Linux to Intel's Nehalem instead of a B2 Barcelona on Windows...

Yes, competition is always good as it drives R&D and thus improvements. However I just wonder how much attention AMD is going to pay to server if their revenues from it remain low and marketshare < 5%. With Read's statements amounting to "let's move on from competing with Intel", looks like AMD is perhaps prioritizing new markets where Intel doesn't compete much.
 
Not to be an advocate to conspiracy theories, but they didn't publish the internal numbers for the AMD bench tool and didn't delve deeper in the "bad" numbers. Where we really wanted to delve deeper, actually, lol.

Not a bad analysis to be honest, but I feel they missed some critical points. Also, they never talk about code compilation to take advantages of the new instructions in BD.

I'm eager for the continuation on their "forth" finding.

Cheers!