Raid 5 and Core2Duo Performance

MadHacker · Nov 22, 2006

I heard that there were some issues with raid 5 and Core2Duo
I'm interested in moving my raid setup to my machine but not sure how well it will perform.

I am planning to buy HighPoint RocketRAID 2320 PCI Express x4 SATA II RAID Card and run my existing 4x320 gig hd's as raid 5 , as well as get 4 500gig HD and run as raid 5.
What I want to know is if there will be a significant performance hit on my CPU usage.

I currently have HighPoint ROCKETRAID 2224 PCI-X SATA II Controller Card running on my P4 2.4Ghz and don't see much of a cpu hit at all.

When the Core2Duo was first released I read about an article from the Inquirer (yes i know most stuff they publish is rubbish) stating Conroe shows dodgy RAID performance anomalies. and was wondering if anyone else is running raid5 on their Core2Duo and can verify or deny this rumor.

Thanks..

bberson · Nov 22, 2006

When the Core2Duo was first released I read about an article from the Inquirer (yes i know most stuff they publish is rubbish) stating Conroe shows dodgy RAID performance anomalies. and was wondering if anyone else is running raid5 on their Core2Duo and can verify or deny this rumor.

That article sounds like a variation on this:

US government unit throws Intel out over RAID problems

I'm not tunning the combo yet but have been doing some reading since that's the hardware I'm currently considering.

At the time of the article discussing ServeRAID I did a little digging around. First, read up on what RAID 1E is. Interesting. Anyway, no matter how much digging I did I could not find out which of IBM's ServeRAID solutions was being used for that test scenario. Without saying anything accusatory, I'm just going to point out that not all of IBM's ServeRAID solutions are use dedicated processors. I personally don't think it's wise to expect no performance hit when doing anything beyond RAID-0 or -1 without a coprocessed RAID card.

Of course if the C2D platform barfs when handling hardware tasks like that and AMD doesn't, then there's obviously a problem. But I haven't seen any benchmarks that actually prove this out. All I've seen is a couple of invariably sensational Inquirer articles based on parts and platforms that haven't been explained in any detail at all.

Could there be a problem? Maybe. But I'm not seeing anything beyond rumors. Worse yet, Inquirer rumors.

Show us the benchmarks, someone!

-Brad

TrueTenacity · Nov 22, 2006

I'm also thinking of running a RAID 5 Array on my nForce 590 and i'd like to see if there is a performance hit to the C2D...

bberson · Nov 22, 2006

Here's a few interesting, somewhat relevant tidbits...

First, see the third "Brad.B" post here, paying special attention to the ICH7R readings:

http://forums.storagereview.net/index.php?showtopic=23633&hl=ich7

Those numbers come from vbin's post here:

http://forums.storagereview.net/index.php?showtopic=22660&hl=matrix&st=125

I PM'd that poster asking for more details on the system he was using. If and when he replies I'll put that info here.

-Brad

MadHacker · Nov 22, 2006

well if 12% on one core will be the most i will get for a hit that would be acceptable...
unfortunatly the 12% was raid 0 not a clue what raid 5 will be yet...

bberson · Nov 23, 2006

well if 12% on one core will be the most i will get for a hit that would be acceptable... unfortunatly the 12% was raid 0 not a clue what raid 5 will be yet...

There's an interesting nuance there which may affect your outcome, perhaps improve it. He was using Matrix RAID, which means he had both a RAID 0 volume and a RAID 1 volume on the two disks.

I personally think 12% is a lot to lose but I'm one of those folks who beats the drum long and hard about using hardware coprocessed, battery backed RAID.

-Brad

casewhite · Nov 23, 2006

The issue of incompatiblity lies in the 64 bit operating system and is no rumor. Intel has deviated significantly from the JEDEC/AMD64 standard to the point that a separate compiler is really necessary. If you are using 32bit IA32 architecture everything will function as normal. But 64 bit is another standard.

"The divergence of AMD and Intel x86 implementations has created certain "challenges" for compiler vendors. Application developers would like to deliver a single binary that can execute optimally on both architectures. PGI's solution allows users to create separate versions of code for both chips, but enables them to be built into a single PGI Unified Binary. The Portland Group's Michael Wolfe describes the rationale and implementation of this technology, which he presented in an Exhibitor Forum this week at SCO6." http://www.hpcwire.com/hpc/1096734.html

In a nut shell, Raid 5 will only work if the vendor has written separate drivers for EM64T. Otherwise you might as well be trying to run ATI drivers on an nVidia video card. The high performance computing and IEEE web sites will have the answers to these issues long before the general press has the story. Linux Beacon broke the story on this issue back some 8 months ago and IEEE confirmed it in May. Apparently some one at the Inquirer reads the HPC sites when no one else seems to.

croc · Nov 23, 2006

If you use an external raid solution, then it would be independant from hardware / OS issues. Problem solved. Tom's has reviewed several.

bberson · Nov 23, 2006

The issue of incompatiblity lies in the 64 bit operating system and is no rumor. Intel has deviated significantly from the JEDEC/AMD64 standard to the point that a separate compiler is really necessary. If you are using 32bit IA32 architecture everything will function as normal. But 64 bit is another standard.

"The divergence of AMD and Intel x86 implementations has created certain "challenges" for compiler vendors. Application developers would like to deliver a single binary that can execute optimally on both architectures. PGI's solution allows users to create separate versions of code for both chips, but enables them to be built into a single PGI Unified Binary. The Portland Group's Michael Wolfe describes the rationale and implementation of this technology, which he presented in an Exhibitor Forum this week at SCO6." http://www.hpcwire.com/hpc/1096734.html

In a nut shell, Raid 5 will only work if the vendor has written separate drivers for EM64T. Otherwise you might as well be trying to run ATI drivers on an nVidia video card. The high performance computing and IEEE web sites will have the answers to these issues long before the general press has the story. Linux Beacon broke the story on this issue back some 8 months ago and IEEE confirmed it in May. Apparently some one at the Inquirer reads the HPC sites when no one else seems to.

Wow, I see a lot wrong here.

Start by reading http://en.wikipedia.org/wiki/X86-64

That having been done, you'll discover there simply is no "standard" for X64...

...Which also should serve as a sign that JEDEC has nothing to say about the instruction sets used by these CPUs.

And I can't see why a separate compiler should be necessary when a sufficiently developed compiler should be able to make do with a simple directive. But I haven't written Assembler in a long time so I suppose anything is possible.

-Brad

evilr00t · Nov 23, 2006

Why don't you list some hard facts (not niche articles) to support your claim. Last I checked, I didn't need to buy Windows XP Intel EMT64 edition or Windows XP AMD x86-64 edition.

compiler vendors

Ohes noesW!!1oneone!!1

MadHacker · Nov 23, 2006

Thanks for all the info...
I guess the next people i will ask is highpoint how their drivers are and how efficient they run on 64bit windows.

casewhite · Nov 23, 2006

The issue of incompatiblity lies in the 64 bit operating system and is no rumor. Intel has deviated significantly from the JEDEC/AMD64 standard to the point that a separate compiler is really necessary. If you are using 32bit IA32 architecture everything will function as normal. But 64 bit is another standard.

"The divergence of AMD and Intel x86 implementations has created certain "challenges" for compiler vendors. Application developers would like to deliver a single binary that can execute optimally on both architectures. PGI's solution allows users to create separate versions of code for both chips, but enables them to be built into a single PGI Unified Binary. The Portland Group's Michael Wolfe describes the rationale and implementation of this technology, which he presented in an Exhibitor Forum this week at SCO6." http://www.hpcwire.com/hpc/1096734.html

In a nut shell, Raid 5 will only work if the vendor has written separate drivers for EM64T. Otherwise you might as well be trying to run ATI drivers on an nVidia video card. The high performance computing and IEEE web sites will have the answers to these issues long before the general press has the story. Linux Beacon broke the story on this issue back some 8 months ago and IEEE confirmed it in May. Apparently some one at the Inquirer reads the HPC sites when no one else seems to.

Wow, I see a lot wrong here.

Start by reading http://en.wikipedia.org/wiki/X86-64

That having been done, you'll discover there simply is no "standard" for X64...

...Which also should serve as a sign that JEDEC has nothing to say about the instruction sets used by these CPUs.

And I can't see why a separate compiler should be necessary when a sufficiently developed compiler should be able to make do with a simple directive. But I haven't written Assembler in a long time so I suppose anything is possible.

-Brad

Wikipedia barely scrached the surface of the differences. If you had read the HPC atrticle or bothered to look up Tom's paper on the web you would have learned as follows in laymans tems:

HPCire: Today what are the main differences between EM64T and AMD64 that a compiler writer needs to be aware of? Are there still ISA differences or is it all micro-architecture?

Wolfe: The main differences are in the chip implementations, the detailed micro-architectures of the processor cores. For instance, Intel EM64T processors have typically been implemented with deeper instruction pipelines and higher clock rates. This increases the importance of good scheduling by the compiler in order to avoid pipeline stalls and extract maximum performance from the chip. With respect to streaming SIMD extensions (SSE) instructions, Intel EM64T chips use parallel floating point pipelines, which provide higher performance for packed arithmetic but no advantage for scalar code.

The AMD64 implementation uses separate pipelined floating point units. This allows for faster double-precision scalar performance, but essentially means that AMD64 has the same peak performance for double-precision scalar or packed SSE instructions. There are also a wide variety of cache sizes and configurations, which are significant to how and when a compiler should generate parallel code on a multi-core processor.

There are also temporal ISA incompatibilities. Intel introduced SSE2 instructions to the Pentium 4, and these were later adopted by AMD as part of AMD64. AMD introduced 64-bit extensions and extended register sets to the x86 architecture with AMD64, and these were eventually adopted by Intel. Intel introduced SSE3 instructions with EM64T, which AMD adopted soon thereafter, and Supplemental SSE3 instructions with Core 2 which create binary incompatibilities between Core 2 processors and current generation AMD64 processors. The PGI Unified Binary allows users to leverage these innovations as they occur, but without generating code that is sub-optimal or simply does not work on competing processors.

For the compiler, we see five distinct categories. First, as mentioned, some instructions are introduced by one vendor, so there is a time period where the ISA is different. Second, the scheduling rules for instructions differ among the processors; typically, the schedule is more critical for the Intel processors, with its deeper pipeline. Third, instruction selection can also be important. We have cases where there are two or more instructions or instruction sequences to give the same result; due to the micro-architectural differences, different instructions or sequences will be faster for the two vendors. Fourth, the various choices for vectorizing for the packed SSE arithmetic can be very specific to the chip; this includes instruction selection and scheduling, but also involves tradeoffs in the breakpoint between scalar and vector code, whether to optimize for aligned operands, and so on. Lastly and also related to vectorization, cache optimizations can depend on the cache size, which differs between the chips and even between different revisions of the same processor"

The wikipeia article ignores the the deeper issues that belong in the high level technical conferences like the temporal ISA differences. I am afraid that the detailed mathematical and engineering terms of Tom Wolfe's paper are beyond the grasp of all but one or two readers in Tom's forums and obviously beyond the other posters here. Who can post a clear explaination of temporal ISA or why it matters and who can calculate zero sequence numbers or who has even heard of them. Crok is correct the external solutions ar the easiest route. They are not processor dependent.

As to evilroots FUD, that is FUD. Here is the link to the IEEE paper presented in March2005 that provides th mathematical documentation of the feasiblity and proper use of the multigrahics card for a display.
http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/ipdps/2005/2312/04/2312toc.xml&DOI=10.1109/IPDPS.2005.121#search=%22ron%20sass%20reconfigured%22
Yes for low level operations (gaming) in Vista you will not need a different version because the code is merely compromised. See Tom Wolfe's discussion above. Certain features like raid 5 are very high level and in the case of IBM Raid 10 controlers(based on Fortran 95) flat won't work with EM64T. So if you are doing very basic things the differences will simply show up as inefficient use of the CPU. Again see Tom's discussion above. If you do high level floating point apps like solid works or operating at about 200gflops or above, then you would gain significantly by getting a recompiled version of Vista. With the use of GPU and chip accelerators like IBM's Cell or Cell+ or ATI's Stream , 500gflops will be available on the desktop by Christmas 2007. In today's slow paced world of 20gflops the design differences really don't matter. This time next year they will. Recompiling the linux kernel to accomodate EM64T resulted in a 40% increse in performance for Thunderbird at Sandia. An increase from 38 to 53 teraflops. http://www.top500.org/system/ranking/8114

evilroot if you are going to leave your FUD display up why don't you wander over to Lawrence Livermore and see Gauss. It is 256 Quadro 4500's interconnected. Granted the interconnect is an Infiniband network rather than the PCI or PCI-e bus. Display is 4.5m by 7m. Resoultion would be equal to 3270x2400. http://www.llnl.gov/pao/news/news_releases/2005/NR-05-11-04p.html[/img]

evilr00t · Nov 23, 2006

Vezzini: He didn't fall? Inconceivable!
Inigo: You keep using that word. I do not think it means what you think it means.

Replace Inconceivable with FUD.

bberson · Nov 23, 2006

Wikipedia barely scrached the surface of the differences. If you had read the HPC atrticle or bothered to look up Tom's paper on the web you would have learned as follows in laymans tems

While your post was fascinating, it still leaves us with the fact that JEDEC (and IEEE for that matter) have absolutely no say over implementation of AMD's or Intel's CPU instruction sets (read: "JEDEC/AMD64 standard" is a non-sequitur), there is no defined standard for X64 processors (read: how can you diverge from a standard that does not exist?), and the hassle of optimizing low-level and high-level language compilers for different target architectures is absolutely nothing new. At least not in the 25 years I've been playing the Zilog, Intel and DEC equipment.

And since I hate to post without saying something new and useful, I'll add that the software and driver situation for x64 platforms is indeed interesting, with some vendors supplying different binaries for the varying platforms and other supplying unified binaries. While I have not tried Vista-64 yet, I can also tell you that the 64-bit Windows Server 2003 will not permit you to install 32-bit drivers and if you somehow install 32-bit drivers anyway, they will not execute. Whoopeedoo.

-Brad

evilr00t · Nov 23, 2006

As to evilroots FUD, that is FUD. Here is the link to the IEEE paper presented in March2005 that provides th mathematical documentation of the feasiblity and proper use of the multigrahics card for a display.
http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proce...ngs/&am
Yes for low level operations (gaming) in Vista you will not need a different version because the code is merely compromised. See Tom Wolfe's discussion above. Certain features like raid 5 are very high level and in the case of IBM Raid 10 controlers(based on Fortran 95) flat won't work with EM64T. So if you are doing very basic things the differences will simply show up as inefficient use of the CPU. Again see Tom's discussion above. If you do high level floating point apps like solid works or operating at about 200gflops or above, then you would gain significantly by getting a recompiled version of Vista. With the use of GPU and chip accelerators like IBM's Cell or Cell+ or ATI's Stream , 500gflops will be available on the desktop by Christmas 2007. In today's slow paced world of 20gflops the design differences really don't matter. This time next year they will. Recompiling the linux kernel to accomodate EM64T resulted in a 40% increse in performance for Thunderbird at Sandia. An increase from 38 to 53 teraflops. http://www.top500.org/system/ranking/8114

That's a pretty impressive reply, as I was expecting mere FUD. But you have brought truth to the table, so I will reply accordingly.

Certain features like raid 5 are very high level and in the case of IBM Raid 10 controlers(based on Fortran 95) flat won't work with EM64T.

I would not call "RAID-5" high level. The algorithm for RAID-5 is merely XOR, which is a purely integer operation. And no one running high-end computers uses SOFTWARE RAID-5 -- the controller executes the XOR operations.
In the case of IBM not supporting their hardware with suitable drivers, it's their loss to their competitors, in the market for RAID controllers. There are plenty of customers with EM64T processors who will skip right over IBM's product for a competitor's EM64T-supported product.

If you do high level floating point apps like solid works or operating at about 200gflops or above, then you would gain significantly by getting a recompiled version of Vista.

People who use bloated and inefficient Vista while doing computational work of 200+ gigaflops are insane. Linux is the way to go for such work.

I doubt it. OS overhead is not quite as much as you think; look at Linux. You can recompile the kernel, optimized for a particular march, for a marginal performance boost for every linux distro. However, Gentoo Linux demonstrates that optimized APPLICATIONS are more important than optimized KERNELS.

evilr00t · Nov 23, 2006

Let me answer your question right away:

The INQ article you read does not apply to you. I'm sure they're right that RAID-5 on the ICH7R shows craptastic performance. For once, the INQ is right, since the ICH7R does NOT do XOR operations itself, instead offloading that to the system processor.

Summary:
HighPoint RocketRAID 2320 PCI Express x4 SATA II RAID Card
This product seems to have hardware XOR offloading. Heatsink on chip pretty much gives it away.

HighPoint ROCKETRAID 2224 PCI-X SATA II Controller Card
This product may have hardware XOR offloading.

ICH7R
This product does not have hardware XOR offloading. The CPU is used to calculate parity.

In other words:
You have nothing to worry about.

bberson · Nov 23, 2006

I got a reply back from that fellow on StorageReview's forum. Turns out he's running an 805. So it's a dual core, but not a Core 2 Duo.

Actually there's performance improvements to be had, far over and above the numbers I posted earlier and what was shown on the original link I provided. Here's a link to a very long thread about this...

http://www.ocforums.com/showthread.php?t=467848

Try not to get too distracted by the silly people making uninformed assumptions, and insisting that the results and even the matrix configuration itself are impossible. There are lots of really interesting benchmarks and comparisons to sift through there, and there are some good links to follow too.

[edited to be a little nicer to the folks in that thread at OCF]

-Brad

bberson · Nov 23, 2006

Summary: ...

Indeed. And unless the OP (or anyone else of course) buys a high-end server motherboard or a very high-end workstation motherboard, any on-board RAID is going to be host CPU dependent.

-Brad

Raid 5 and Core2Duo Performance

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Share this page