RAID 5 Configuration writes at 6 to 8Mb/s

avs_usmc_2047

Distinguished
Jun 16, 2011
2
0
18,510
Hi there,
I have been trying to figure out why my Promise VTrak M610p raid 5 configuration writes only at

speeds between 6 to 8 MB/s. I have tried many things to increase the write speed, but I've

failed. Can someone please provide some help???

Here is what I have:
Storage Unit: Promise VTrak M610p
Raid configurations: -Four 2TB HDs in raid 5
-Four 2TB HDs in Raid 10 (or 1+0)
-Four 1TB HDs in Raid 1
OS: Windows Server 2008 R2 (x64)
Server: HP ProLiant DL360 G5 with 32GB RAM
PCI Express SCSI Controller Card: Adaptec SCSI Card 29320LPE Ultra320 SCSI

The PCI express card is installed into the HP Proliant Server and then its used to connect to

the Promise VTrak M610p storage unit. I use the built-in software in the VTrak to configure the

three different Raid configuration listed above. All three logical drives have the same

configuration:
Stripe Size 128KB
Sector Size 512Bytes
Read Policy ReadAhead
Write Policy WriteThru

All three raid configurations are done synchronization. The write speed for Raid 10 and Raid 1

average between 30MB/s to 40MB/s, but Raid 5 averages 6 to 8MB/s. I have tried disabling the

Write-Caching policy under Device Manager\Disk drives\[individual drive]\Properties\Policies tab and it just decreases the write speed to 2 to 4MB/s, but it does not affect the write speed on RAIDs 10 and 1.
All hard drives are brand new. I have used the same hard drives and configured them as RAID 10 and RAID 1 and the write speeds are 30 to 40MB/s. I'm not sure what else to do. Can someone please help???
Thank you in advanced
V
 
30 to 40MByte/sec for RAID 1 or 10 is not particularly fast, but if that's the case then 6 to 8MByte/sec is probably about right for RAID 5. Write performance of RAID 5 is very poor because each write requires reading the old data sector and the corresponding old parity sector, recalculating the parity, and then writing the new data sector and the updated parity sector.
 


Are you inherited this box or just purchased? If you just purchase it... exchange, return it...

The architect of your raid, it is complete of out whack!

From 16x drive @ 3Gb/each goes down to 320MB/s bus the down to 2.5Gb/sec single lane PCIe express?!!!, with 4x raid volumes

There are TOO MUCH I/O for this PCIe to handle... On top of it there is no dedicate I/O processor.

Try this, remove all other volume just leave the raid5 only in the box, you will see the improve, but will not go over 150MB/sec in sequential W/R, where is

You application need hardware raid system like this:


then your raid5 with 4 drives would transfer about 300MB/sec and with 16x then you would get above 800MB/sec


 





I argree, even with an hardware raid controller under optimal condiions, you aren't going to write 800MB/sec on RAID across 16x HDDs
 
Well! I was wrong. I was underestimate.

It's over 1000MB/sec not 800MB/sec

Here is the test we did:

async


This test was conducted while the raid is in production (this morning).
- RM16_SAS6-R with 16x Hitachi 3.0TB Deskstar

On Win2008, we got even better result
 
This AJA runs less the 5 min on the MACPro 2010

As long as 25% empty space, this is the speed you will get...

@ 90% or more fill of the RAID volume you would get up 20% less speed transfer.

Note: This utility is from AJA to determine the transfer rate for HD video editing as you can see the test was conduct with 2K video stream (double the bit rate of 1080p)


Edit: for more clarity
 
The numbers don't add up for filling the disk vs. transfer rate. Most hard drives have only half the transfer rate on their innermost cylinders, not just 20% slower.

I suspect that transfer rates that high are a result of cache in the RAID controller, which couldn't be sustained for very long before the caches fill and the system has to slow down to the write rate of the actual drives.

A bigger cause for concern is using that many volumes in a RAID-5 set. Many drives have an unrecoverable read error rate of one error per 10^14 bits read - since there are about 10^14 bits in a 10TB RAID volume that can mean that you have as poor as even odds of NOT being able to recover from a disk failure, which makes RAID-5 basically pointless in the first place. Even if you use higher-spec'd disks with one error per 10^15 bits read, you're still looking at around a 10% chance of loosing all your data when a drive dies, which I don't find particularly inspiring.
 
i know what you talking about, but I don't base on just theory. My clients want the actual result, and that is what I deliver 🙂

Here is the Speed Report base on IO Meter of RM16-SAS6-R from its manufacture web site.

The OdieRAID is the second system in service about 4months, which bases on SAS 6Gb/s, where the 1st generation base on SAS 3Gb/s have been in service sine Sept 2008.

It is FOUR years ago! and we have couple HDD died, we just replace it and keep on running

Here is the note of 10^14
Reading from a HDD just like reading the data from optical disc... If it encounters an error
it will retry that's all. Remember the reading function WON'T STOP at 10^14
On top of that ECC memory also help
 


Wiki shouldn't always be taken verbatim.

I keep seeing people talking about BER and not understanding what it means. You don't "lose" all your data if you had a unrecoverable BER and a disk dies on a RAID-5. BER is a very misunderstood term, like MTBF its a metric to use when determining the quality of your disk's not an actual prediction. The HDD has built in error detection and correction, being a physically rotating device there can and will be read / write errors. The HDD corrects for those and keeps trucking. Being a physical there is a chance that sometimes there is a chance that data can't be read properly and that the drives own recovery mechanism's won't catch it. The probability of this happening is known as BER. This is the error rate for a SINGLE read or write operation, single as in one not many. This single error won't stop your data from being rebuilt and depending on the type of HBA / OS your using, they will often detect the error and recommit the write operation. As a home user, BER is meaningless. And BTW UBE / BER is calculated on number of reads / writes not on volume of data accessed. You can get unrecoverable errors reading a 10MB disk a few million times.

What that comes into play is when the data in question is worth millions of USD. Then a single bit of incorrect data can cost the company money. Storage solution providers use these statistics to scare the pants out of management into over subscribing to data redundancy. This is why backups are important to maintain, but then again the backups themselves may of experienced a BER, so you should do two simultaneous backups. But then those could of experienced a BER, better to do four backups and in place data duplication (basically a second set of equipment that only maintains an exact copy of your data and is constantly synchronized). And that itself may experience an error, so now we're at four backups and two in-place duplications. Yeah oversubscribe is the word of the day...
 
A drive's "unrecoverable read error rate" means exactly what it says - every so often the drive will have an unrecoverable read error. "Unrecoverable" means that the drive is not able to successfully read the data, even after attempting to correct it using the ECC information. It's not a case where the data is OK on the media and the head just didn't happen to pick it up this time - it's because the recorded data and ECC bits don't yield a valid result even when the head reads them properly. A drive that is spec'd at one unrecoverable read error per 10^14 bits read means that you have as much as even odds of not being able to successfully read a sector after having read that many bits. Period. It's a fact of life.

Have a look at this WD Spec sheet under "Reliability/Data Integrity": the metric is "Non-recoverable read errors per bits read" Non-recoverable means just that - the data cannot be recovered. It is toast. Gone. ECC has done it's best and fallen short. Your data has gone to Valhalla. The drive's SMART data will register a "pending" sector, and every attempt to read the sector returns an error status. It really does happen that way.

That kind of error rate was long acceptable for hard drives because they contained a few orders of magnitude fewer bits and therefore chance of running into the problem on any one given drive was pretty small. But with today's huge capacities it's becoming a serious issue, especially when you combine multiple drives into a larger composite volume. More and more drives are being spec'd at one error per 10^15 bits read, which is a significant improvement. But with 10TB RAID volumes holding 10^14 bits, that's still as much as a 1 in 10 chance of running into an unrecoverable read error.

The problem is that if you loose a drive with RAID-5, controller must be able to read EVERY remaining block on EVERY remaining disk in order to rebuild the array and return it to "healthy" status. If it can't, then the drive remains degraded, attempts to read the sector which involves the unrecoverable block will fail, and any any additional drive failure will kill the entire volume and make ALL your data irretrievable.

If you need redundancy for such large RAID volumes you should be using RAID 6, which stores multiple copies of parity and eliminates any single point of failure.
 
Umm you obviously don't fully understand what your talking about. Unrecoverable read errors is just the drive's detection not detecting the error and thus passing invalid data to the system. That data is most certainly there, or rather the BIT is there, either a 1 or a 0. Drive errors do happen, typically a head miss's or other anomaly, the drive detects this and rereads the data and pass's the corrected data to the OS. Basically a recoverable read error is one that the drive detected and fixed, an unrecoverable read error is one that the drive wasn't able to detect and thus not fix. The bit is STILL THERE, it didn't explode or vanish into some black hole. Executing another read command and it'll show up, as the chances of two URE's happening back to back on a healthy drive is so remote, that a meteor will crash into the earth and obliterate us before it happens. 10 ^ 14 * 10 ^ 14 is a pretty big number.

Like I said, many people just quote off wiki and don't actually know what their talking about. You really should stop quoting wiki like that, its obvious.

"The problem is that if you loose a drive with RAID-5, controller must be able to read EVERY remaining block on EVERY remaining disk in order to rebuild the array and return it to "healthy" status. "

This is patently false. Having recovered faulty failed raids I know this personally. RAID will repair and recover as much as possible, and then tell you (the sysad) what failed and isn't recovering. Thing is RAID five can lose one disk, literally you pulled the fcking disk out of the enclosure and lit it on fire, and your data is still accessible. The parity bits on the other member disks are used to synthesize the missing data bits. If one of those other parity bits experiences a URE, then that bit of data will be invalid, but the other trillion+ bits are still valid and accessible. A URE happening will not invalidate your array nor will it reduce the protection of your data.

URE is nothing but an extrapolation of a drives BER which is itself based on number of operations not on size. You can have a 10MB volume and still generate URE's if there is a high enough volume of traffic. Read from a 10MB volume a few trillion times and somewhere in there you generated a URE.

Now here is the time, modern FS's have built in error checking on their own. Meaning when that URE happens from the drive and it returns an invalid bit, then the data sector will fail CRC checks and another read command will be issued. If it fails again, then the OS will report a read error and inform you that your disk is about to go south and manually mark the sector as bad. We used to call those "bad sectors" years ago, they happen. Modern HDD's usually do their own bad sector detection and remap around them, this is why its rare for a modern HDD to report a bad sector. But its not impossible for it to happen.

But please, continue pasting VERBATIM from a wiki article. Really its that obvious. And BTW RAID6 has it's own issues, namely that it suffers horribly write speed and still succumbs to URE's. NOTHING protects you from URE's except the host OS's file system doing CRC checks.
 
I don't know where you're getting the idea that this information comes from Wikipedia. Did you read the Western Digital specification sheet I linked to? Do you not understand the meaning of "non-recoverable read error"? "Non recoverable" means the data cannot be recovered.

You seem to be under the misconception that if you let the drive retry the read often enough it will somehow be able to recover the data. This is often the case, and when that happens the drive will remap the recovered data to a spare sector. But it doesn't always happen that way. Remember that errors can occur on writing the data as well as on reading them, and if the recorded bits plus their checksum don't form a valid codeword then the data really is unrecoverable. It doesn't matter how many times you try to read it, you won't be able to figure out what it's supposed to be.

The symptom of this is a drive which returns a read error every time you try to access a sector. Checking the drive's SMART data shows one or more "Pending Sectors" - that's the worst kind of SMART error because in most cases it means your data is gone. The drive doesn't mark sectors as "Pending" unless it's already retried the read many times - it's very unusual for a drive to somehow suddenly be able to recover that data. A sector marked "pending" does NOT have a 1 in 10^14 chance of getting an error the next time you try to read it, it's almost certain that you WON'T be able to read it. The chance of an unrecoverable error is not (once per 10^14) squared, it's once per 10^14 bits read, PERIOD (for drives so specified).

I've worked with literally thousand of disk drives in industry and at home for over 35 years. I've personally had drives with unrecoverable read errors, and I've seen numerous posts here from people who've reported "Pending Sectors" in their drive's SMART data. It's a real problem, and wishing won't make it go away. Drives are not perfect, they REALLY CAN LOSE DATA, even if the drive itself doesn't fail completely. And the more drives you string together, the greater chance you have of it happening to you.

My recommendation to use RAID-6 for large arrays doesn't come from Wikipedia, it's a well-understood guideline in the industry. Here's another non-Wikipedia reference for you: a White Paper from Hewlett Packard. Note the graph that shows the probability of "logical drive failure". A logical drive failure means you can no longer access any of the data on your RAID volume.
 


Same here. We manage very large arrays with lots of data moving around, never ~ever~ seen a problem with bit creep.

He has switched the definitions of his terms on the last post. The first set of posts he was referring to URE / BER's when a disk read returns bad data and doesn't recognize it and thus doesn't tell the host OS that the data is suspect. Then in his last post he's talking about bad sectors and sector remapping, in that scenario the disk is returning data read error's to the host OS. The first can effect RAID-5 repair as it presents an illogical situation to the controller while the second won't cause any issues as the controller will be able to regenerate the data from the bad sector. Two different things and he's switching them up and thus spreading FUD.

Raid-6 can not protect against URE's as the logic situation doesn't chance with more parity info.

Take a five member RAID-5. When you remove one member the parity data is used to regenerate the missing bit. A read of

10X1 p1 (parity bit set to 1, odd) will be interpreted as 1011. 11x1 p1 will be interpreted ad 1101.

If a URE takes place, then the parity data won't make sense and the controller will have to guess, chances are good that the bit will be incorrect. This is a single bit out of 80 trillion (10TB worth of data), not a big deal and can be corrected if your file system supports data integrity checking. Throw in another parity bit and you still get the same problem, an illogical bit pattern, the extra encoded bit won't help as the disk error was undetected and thus uncorrected. If a disk can detect the error, then it is by definition not a URE as the disk did detect it and remap to recover it. If the data sector is physically damaged then you better have back media, a RAID setup or running a modern journeling copy-on-write file system like ZFS.

Of the two kinds of data error's he's talked about, the first is by FAR the more dangerous. Even if a bad sector happens on a non-redundant array and a small amount of data is lost, at least the drive tells the host OS about it and the host OS can take actions to remedy it, including recovery from a journal or last-known-good-copy. Then the host OS will inform the administrator and the sysadm can swap out the disks and repair the array. If a undetected URE happens and the disk reports data that it thinks is correct, then the host OS will act on the assumption that the data is correct and you will get corruption. We call this silent corruption, otherwise known as bit creep. Its the bane of all administrators of large arrays, as you never know if your data has been slightly corrupted until you access it. As all older file systems assume that the disk is 100% correct they won't know to conduct CRC checks on the read data. Newer modern file systems now provide their own CRC checking and redundancy checking facilities. They don't assume that what the disk says is 100% correct and will check the read data against stored CRC values to determine if its true or not. Sun did a huge push to get ZFS out into the enterprise to protect against this.

smin, yes you were quoting from wiki. Its very obvious because what you said was practically verbatim what was posted. More specifically the 2008 article that was used as source for the current wiki entry had the exact same numbers in the same grammar structure using the same words as you posted. Even under the rather huge assumption that you wrote the original article, the odds of you using the same words / grammar twice that way are very low. Hence the difference between your first two posts and your last post, very different layouts and style as you deliberately try to avoid looking like your argument came from there. You even switched terminology and started talking about something completely different while pretending it was the same.

Anyhow this is all getting way too off tangent.

To the OP, looking over that device I would hazard that there is either a misconfiguration, you have a dead drive and the array is synthesizing the reads / writes, or that you just created the array and are trying to test it before its done initializing.

From what I understand you have,
Storage Chasis => U320 LVD => Server as your configuration?

Raid configurations: -Four 2TB HDs in raid 5
-Four 2TB HDs in Raid 10 (or 1+0)
-Four 1TB HDs in Raid 1

The last one has be confused, you have four 1TB in RAID1? Or is it RAID0 as some sort of scratch media? RAID1 is almost always two disks although some enclosures allow you to add additional mirrors for performance reasons.

Getting only 30~40 MB/s from the other arrays isn't good, they should be spitting out 80MB's. My home eSATA four disk RAID0 array can do 120MB's and cap the bandwidth on a PCI bus (long story), so your enterprise class equipment should be doing much higher. I would definitely check your cable termination scheme, ensure that the enclosure is properly terminating at its end. Also are you using some sort of multipathing with both U320 channels connected? That might not work depending on the OS / HBA your using. And finally, is this a new build, are you at a point where you can destroy the RAID5 and recreate it? It's sounding it like didn't initialize properly.
 
You're the one who first threw those terms around. My first post on error rates simply used the term "unrecoverable read error rate". All the rest of my posts used that same term (or Western Digital's "non-recoverable") while trying to convince you that "unrecoverable" meant that the data couldn't be recovered. Geez...

The ECC codes on hard drives are very sophisticated, and it's very, very unlikely that the drive will return bad data with no indication of an error. Much more unlikely that the data is simply unrecoverable. The most frequent cause of data being silently corrupted is due to errors that occur in non-ECC RAM memory as the data goes through the protocol stack (including the RAID controller).
 


A URE ~is~ when the drive does exactly that, it returns data that it believes is valid but isn't. That is exactly what you were referring to in your first few posts. That is the 10^14 and 10^15 number and 10TB number that you keep using. Its referenced inside the wiki article on it, the article is referenced to a 2008 paper wrote up after some tests showed that with extremely large amounts of data being transactions that the chances of that happening are pretty good. This is a number that is extrapolated off the Bit Error Rate, which itself is the odds that a drive read will return an incorrect value. When a BER happens the drives own CRC checking mechanism's detect it and reread the area of the disk without informing the OS, the error was recoverable. If the drive rereads the sector and finds that the data is either unreadable or continuously fails CRC checking, it then reports to the host OS a disk failure and remaps the sector while trying to salvage whatever data it can. The Host OS, or RAID controller (whichever is controlling the disk) then use's parity information to rebuild the lost data, thus RAID-5 protects from a damaged sector. Regardless of the array size, or the data transacted, a RAID-5 will always protect from a sector read error as parity information exists to rebuild it.

Now CRC isn't perfect and there is an extremely small chance (10^14 ~ 10^15) that a single bit read error will go undetected and be reported as correct even though its not. RAID-5 does not protect against this, nor does RAID-6 or RAID-7 (triple parity). The controller and host OS are being told the data is good by the drive and thus the data is stored and treated as valid, even though there is a single bit that is wrong. This is known as an Unrecoverable Read Error (URE), as the disk was unable to recover from the error due to now even knowing it happened, all other read errors are recoverable as the disk reports them even if data was lost. Your getting bad data from the disk and not knowing about it, over large amounts of time with many transactions, this results in several bits being "wrong" and thus bit creep happens. Only way to protect against it is the host OS using a file system that is designed with the concept of the drives being ~wrong~ and maintains it's own CRC values. ZFS is such a file system, so is CEPH and AFS. I don't know if the NTFS implemented by W2K8 era supports file system CRC checking, MS seems to allude that it does but I haven't had hard info about it.

And your right, its very VERY unlikely that a drive will report data that is bad. About,
0.00000000000001% or 1/100,000,000,000,000 of the time. 10TB is 8 * 10,000,000,000,000 = 80,000,000,000,000 bits. I'll leave you to figure out the odds of 10TB or 100TB having a URE, but its likely. And that's 10TB of data read / written not 10TB of storage. You can read 10MB a trillion times and get the same result.

In the end, the only solution is a file system that is aware of the possibility of silent data corruption.
 
Which wikipedia article are you referring to exactly, this one? That's the page I get when I search Wikipedia for "URE" and choose the link referring to disks.

I can't find the 2008 Whitepaper you're referring to, could you please provide a link?