Stress Test MK II

Page 19 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
>The setup will be very much like any SUN/HP/IBM multi CPU
>systems

Ahem.. sorry, but not many Sun, HP or IBM setups really look a lot like each other from an architectural POV, assuming you mean UltraSparc, PA Risc& Itanium and Power5 machines, not these companies x86 boxen (and even then there is a considerable difference, eg IBMs Hurricane).

>where multiple CPU cards are connected via a backplane.

Thats not what I care about, physical layout is not important.

>Obviously efficiencies go down (look at sun servers that
>need hundreds of CPU to keep up with the likes of IBM with
>only 64 CPUs)

Obviously, that has nothing to do with backplanes or anything, other than that Utrasparc cpu's can't keep up with x86/Power/IPF.

>Remember though that on this type of big iron, cache
>coherency is not that important

I suggest you rethink that. It is *extremely* important in the small iron market (big iron is mainframe class in my book, not 'cheap' 32 way servers).. Where it plays a much smaller role is HPC. If cache coherency and chip to chip communication would not be an issue, clustering cheap 2 way x86 boxes is a far more obvious solution, and its no wonder that is exactly what they often do for HPC. But not for commercial workloads a la SAP.

>I.e. if two people are entering a Sales Order there is no
>information they need to share that needs to reside in
>memory all the locking type stuff is handles by the
>database.

First, on which cpu (or rather, in which cache) do you think the database engine resides ? hmm ? Surely not all of them !

Secondly, it seems you don't quite understand how cache coherency works with MESI (or even MOESI for K8). On a glueless opteron system, every time you need to access memory, you have to check the caches of all other CPU's to see if that data is not already somewhere else, and potentially changed. This is cache snooping, and its required for every memory read. This puts a strain on the HT bus regardless of the app, and the strain will grow exponentially with the number of CPU's. Indeed the type of app also plays a role; if its a frequent occurance that several cpu's share the same data, it gets a LOT worse, as with every modification, the owner of the data has to invalidate the cache lines in the other cpu's, and transmit the modified data, etc, but even in perfectly seperated threads, cache coherency is an issue.

And then i'm not even mentioning the fact that opteron 8xx only has 3 coherent HT links, which would mean that a 32S would need between 1 and 8? hops to reach RAM located on another cpu. If you have 32 cpu's all "hopping" like that, it will kill latency and totally rape bandwith as the HT links will simply be saturated.

Now all this works fine with 4 chips, as HT is pretty fast, and a cache snoop is much smaller than the actual data. But its already resulting in severely worse scaling with 8 sockets, and it will render a 32 socket opteron system completely useless without some better solution. Maybe its not that bad for HPC, but for any commercial app a glueless 32 socket opteron will crawl to a halt. I think the Iwill scaling from 4 to 8 sockets is already something like 40% at best, runnin SpecInt Rate which scales as good as it gets. A 32S system might well be *slower*.

Now I'm sure they are working on a solution for this, its not like the problem isn't known and widely documented, Im just curious how it is they expect to solve it. Implementing a directory protocol ? Seems like that would need support by the cpu. Maybe a HT switch ? Maybe both, a chipset that acts as switch and uses a directory based protocol between nodes of 4 opterons that work as usual ? I don't know.. but it will be something for sure. Opteron as it is just isn't suited for >4/8 sockets without "glue". And I'm curious about that glue :)

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
It is the infor from the Inquirer. The way it sounds they will be using 8 way boards and use a propriatory repeater or switch to connect them all together. They will then stack 2-4 of these in a rack with the repeaters. That is currently how they do it for the XEON for that many CPU. It is not as efficient, but the XEONs use 4 way and the Opterons can use 8 way systems to start. Plus they are more efficient, they can use the faster HT bus, and they can also use dual cored Opterons. They also draw less power and generate less heat... Humm. I wonder why HP would be so excited about these things.
 
Friday, June 10, 2005: Both systems have been running without any problems so far, AMD since June 7, 8 pm, Intel since June 8, 3 pm.

I tought that AMD has been running without any problems since the beginning of the test..

:lol:
 
The interseting thing is that they finally listed the reason the heat went ballistic on the P-EE...

"Replacement of boxed coolers, 05B Reason: new thermal pad"

They tried to reuse the thermal pad after swapping the boards. That was a big "roocky" mistake. The dumb thing is they say they had the wrong cooler right from the beganing, but the temps did not jump to 90'C until they hosed the thermal pad. The temps looked OK as long as the thermal pad was intact. They did have stability issues if cheap thermal grease was used during a test with the P-4 a while back. AS5 worked and so did the thermal pad.

They only listed using one Intel board instead of two, but the reason for replacing:

"Replacement of Intel D955XBK Reason: boot impossible"

They could not even get the boards to boot at all. I wonder if they would boot with another CPU (P-D or even a P-4)?
 
I just went through the original article. It seems the EE came with a boxed cooler. It is the same as the one they are running now.
So, the guy takes out the boxed hsf, and has 2 to choose from. One looks identicle to the boxed fan. The other looks close, but has fewer fins, and a smaller copper plate on the bottom. Which one would you pick?
My guess is that Tom's has decided to hire the mentally handicaped.
Oh, btw, the fan is rated @ 3500rpm, wonder why it was running @ 4500+rpm?

<P ID="edit"><FONT SIZE=-1><EM>Edited by endyen on 06/10/05 10:06 PM.</EM></FONT></P>
 
6 feet pole with spikes will do just fine.


This quote is a comment from anandtech forums, i think it's quite good.

Having a cheap cooler is one thing, sure the CPU will get hot. But a cheap cooler should not be responsible for frying a motherboard! I have never seen a processor fry a motherboard, never. Well that's not true, I did see a Xeon bake a motherboard but it was not the fault of the processor, something happened to the voltage and it skyrocketed and cooked the CPU which in turn burned the mobo.

Personally I see the PEE 840 as a desperation product that is not ready for the mainstream. It is no wonder Intel does not offer any dual core chips in servers. They would do serious harm to their reputation.
Make sure it's a thick pole.
 
>The setup will be very much like any SUN/HP/IBM multi CPU
>systems

Ahem.. sorry, but not many Sun, HP or IBM setups really look a lot like each other from an architectural POV, assuming you mean UltraSparc, PA Risc& Itanium and Power5 machines, not these companies x86 boxen (and even then there is a considerable difference, eg IBMs Hurricane).

>where multiple CPU cards are connected via a backplane.

Thats not what I care about, physical layout is not important.
My guess is, HP would be using the Newisys Horus chipset or something similar--basically a group of Opteron CPUs is divided into cells of four, with each cell having its own L3 cache arbitrated between all four CPUs and the cells connected to each other by yet more HT fabric. The four CPUs in each cell would probably be set up about like a quad-socket Opteron system.

Of course there would be some latency introduced, and the logic for arbitrating the per-cell L3 cache would be pretty complex. But it meshes perfectly with the nice little 4-socket daughterboard Iwill's solution is using, and the Newisys guys were boasting about some pretty nice latency numbers.

No inside info on my part, mind you, just an edjumacated guess. :wink:

<i>"Intel's ICH6R SouthBridge, now featuring RAID -1"

"RAID-minus-one?"

"Yeah. You have two hard drives, neither of which can actually boot."</i>
 
Just did some reading on Horus.. Looks mighty impressive. I will be very curious to see who picks this up to use in their systems. Sun seems like a very probable candidate (already using Newisys designs, and they announced >>4 way systems). But HP too ? Dunno, this would really start competing head on with Integrity, I'm not sure if they are confident enough to allow that already now. I.o.w, if they do release a 32 socket horus based opteron.. I might as well see them port HP UX to it (which is being rumoured), which would pretty much kill IPF I'd say. One thing I didn't read/find... whats the ETA of Horus ?

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
But HP too ? Dunno, this would really start competing head on with Integrity, I'm not sure if they are confident enough to allow that already now. I.o.w, if they do release a 32 socket horus based opteron.. I might as well see them port HP UX to it (which is being rumoured), which would pretty much kill IPF I'd say. One thing I didn't read/find... whats the ETA of Horus ?
I'm not able to find out about ETA either, but from what I heard several months ago, working prototypes were already up and running, and formal design verification was either done or well underway.

As for HP using it, I suppose it comes down to meeting customer demand at minimal cost. If HP wants a 32-way (or 16x2-way) Opteron, they'll have to come up with some solution--either create their own interconnect fabric, or use someone else's. It's quite possible that HP <i>won't</i> be using Horus due to cost issues, but something similar might be feasible, perhaps with the gargantuan 64MB cache cut back.

As far as threatening IPF, I don't think HP cares much about that at this point. HP can always figure to relegate IPF to 64-way and higher Superdome systems, or just forget about it and let SGI carry it. AFAIK HP hasn't put a whole lot into it aside from designing the CPU itself.

<i>"Intel's ICH6R SouthBridge, now featuring RAID -1"

"RAID-minus-one?"

"Yeah. You have two hard drives, neither of which can actually boot."</i>
 
>As far as threatening IPF, I don't think HP cares much about
>that at this point.

Of course they do! They are betting their entire $5+ billion Unix/Risc market on IPF, by killing PA Risc and Alpha in favor of IPF. And replacing those sales with (comparably) cheap, low cost x86 boxes is not something they will want to do if they can avoid it. Not too mention how its customers would react, after being forced into IPF.

>HP can always figure to relegate IPF to 64-way and higher
>Superdome systems,

But just think how little incentive Intel would still have to keep pouring money on IPF development, if it would only sell like a couple of tens of thousands of chips per year for superdomes.. and two trays for SGI.

>or just forget about it and let SGI carry it.

SGI represents less than 0.5% of the high end server market..
They are only big in HPC, a market that is fairly CPU agnostic.

>AFAIK HP hasn't put a whole lot into it aside from designing
> the CPU itself.

Just <A HREF="http://www.hp.com/hpinfo/newsroom/press/2004/041216a.html" target="_new">a few billions left and right </A>


= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
Of course they do! They are betting their entire $5+ billion Unix/Risc market on IPF, by killing PA Risc and Alpha in favor of IPF.
Well they *were* betting on it. But it seems a losing bet. At the very least, HP is mishandling IPF pretty much the same way Compaq/HP ran Alpha into the ground. Workstation support's already slipping away, and you're not going to get much developer support without workstations.

And replacing those sales with (comparably) cheap, low cost x86 boxes is not something they will want to do if they can avoid it. Not too mention how its customers would react, after being forced into IPF.
I'd have to wonder how much of HP's RISC customer base has actually gritted their teeth and knuckled under to IPF. There may not be enough to really matter. RISC systems are very seldom replaced, and then only if most or all applications can be migrated feasibly (a very tricky proposition from PA-RISC to IA64, HP-UX port notwithstanding).

SGI represents less than 0.5% of the high end server market..
They are only big in HPC, a market that is fairly CPU agnostic.
That's also one of the very few markets where IPF is a winning proposition. IPF isn't really cut out for traditional server roles and certainly isn't up for desktops. I think HP has been slowly realizing that.

>AFAIK HP hasn't put a whole lot into it aside from designing
> the CPU itself.

Just a few billions left and right
How much of that might apply equally well to other platforms? Chipset development, at the very least, might apply to Xeons as well, since I recall HP designing IA64 CPUs to be somewhat interchangeable with Xeons. ISTR similar interchangeability with PA-RISC as well.

Also, I note that the linked press-release is from the Fiorina era. Carly Fiorina stepping down (or getting thrown down the steps) might have signaled some pretty big changes at HP...

<i>"Intel's ICH6R SouthBridge, now featuring RAID -1"

"RAID-minus-one?"

"Yeah. You have two hard drives, neither of which can actually boot."</i>
 
Looks like Intel's pr staff have to find more FUD to be spreaded.

<A HREF="http://www.theinquirer.net/?article=23883" target="_new">http://www.theinquirer.net/?article=23883</A>

Yeah it's theinquirier but it still gets lots of readers.
 
My take is that this "test" has shown HT for what it is.
If you try to use HT and have two apps running, you better want them bothe to have the same priority.
Most often, we run 1 app up front, and others in the background. If one of those background apps can use a lot of chip time, it will. That would really hurt the primary app.
For example, if you are playing a game, and encoding a movie, your encoding is going to seriously slow the game.
 
Big deal. Just reduce thread priority for the background app, if even that is needed as most encoders will give themselves low priority anyway. This is no different from a non HT chip, the only difference is that the HT chip (assuming both single core) will actually get some encoding done, where the non HT will do pretty much none. I'm sorry, but I can't see how this somehow constitutes a disadvantage of HT.
If your point was that two cores is better than one core + HT, than.. doh, of course.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
{quote]most encoders will give themselves low priority anyway[/quote]
Wrong, or at least on an HT chip. The prog will see a processor that is doing nothing, so it will use as much of it as it can.
Setting priorties in windows will have the same effect. The only way around it is to disable HT.
 
When discussing the SAP architecture, the current thinking and recommendations by SAP is infact to have more smaller application servers (not discussing the database here), i.e. lots of small 2xway boxes. SAP was designed for sideways scaling (add more seperate boxes for more users), that was back in 1992 when gigabit ethernet was not available.
The database usually resides elsewhere (yes Cache coeherency is important there, that is why the old oracle parallel server was crap for transactional based systems)
The SAP enqueue (application level locking) process relies on a common file not a lock somewhere in memory.
SAP does not require instant response times, one or two seconds is good enough. What matters with SAP is to be able to support as many concurrent users as possible.
Besides the likley hood of any data residing in cache is going to be very small anyway as it would all be swapped out by the time a user requires access to previous data.
 
>Wrong, or at least on an HT chip. The prog will see a
>processor that is doing nothing, so it will use as much of it
> as it can.

>Setting priorties in windows will have the same effect. The
>only way around it is to disable HT.

Really.. show me one test where a P4 with HT disabled performs significantly better on the foreground app while running a low priority background app .

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
How about if I show you where having HT enabled, reduces the foreground app? See the current "tests" and compare the P4 to the PD 840 in Tom's last comparison for ther Fart cry "benchmark". In single thread opp, the 840 lost to the 4800. Now it seems, in multi-threaded, the A64 beats a better gaming chip.
 
Isn't X2 based on San Diego? If so I'd say they're extremely lucky because normally the single core can barely reach 2.85Ghz on air.
I think it's a Toledo, a dual core San Diego.

<font color=red>"We can't solve problems by using the same kind of thinking we used when we created them."
- Albert Einstein</font color=red>
 
Really.. show me one test where a P4 with HT disabled performs significantly better on the foreground app while running a low priority background app .
Actually, if I'm understanding this debate correctly, I can vouch for HT being a hinderance to gamers. I've got two F@Hs running on my P4 set at the lowest priority. If I start up a game the first F@H stalls (like it should) to give way for the higher priority process. The second F@H process however continues to take up 50% of the CPU's resources, directly competing with the game, because there's no process maxing out the 'second' CPU for it to give way to.

It's because of this that I've had to set up batch files (and connected desktop icons) to start and stop both F@H services, because some games run like crap while F@H is running. If I didn't have HT set up I wouldn't have to mess with that.

(Actually, I'm also wondering, if my F@H instances are specifically assigned to CPU0 and CPU1 respectively, meaning that if I only had one F@H instance set to CPU0 then would this problem be solved, at least for F@H?)

<pre><font color=purple>The silence is golden, even if the PC is olden. Fanless P4C2.6 rocks.</font color=purple></pre><p>@ 190K -> 200,000 miles or bust!
 
My guess is that Tom's has decided to hire the mentally handicaped.
That'd pretty much sum up my observation of David Strom after a brief email conversation. He seemed like he couldn't even hold a coherant thought. And to think that he's the Editor in Chief!

Also after looking at THG's list of editors there are some people that I actually respected that aren't THG editors anymore. In fact almost none of the names look familiar. :\ I'm now wondering if there wasn't some big tiff at THG that may (or may not) be related to Tom's departure.

Maybe THG has gone to hell because most (all?) of the intelligent editors left? I mean it's weird enough having a <b>Tom's</b> Hardware Guide without Tom, but it seems to have gotten much worse that so many good editors are gone too.

<pre><font color=purple>The silence is golden, even if the PC is olden. Fanless P4C2.6 rocks.</font color=purple></pre><p>@ 190K -> 200,000 miles or bust!
 
I am not hired or payed by Intel. Geesh.

He said something against AMD, whaaa.... What a bunch of cry babies.

You are better off attacking me as your replies to anything else are useless as always.

<A HREF="http://www.xtremesystems.org" target="_new">www.xtremesystems.org</A>