RAID 5 May Be Doomed in 2009

Status
Not open for further replies.
[citation][nom]mtyermom[/nom]At the rate drive capacity is growing... it'll be sooner than you think.[/citation]

i agree. i remember when i got my dell p4 system in early 2004. it came with a 120GB hard drive which was fairly impressive for its time, and now we're already up to 1.5TB, more than ten times the capacity in only 4 years. getting up to 12TB may happen within the next couple of years.
 
RAID 5 seeing this issue, I can see. RAID 6 maybe but by the time we have 12TB drives, they are not longer going to be spindles, SSDs are far from high capacity now but given time that will happen.
 
I still don't understand how RAID with parity can be less reliable than 1 drive with no parity. I never in my life had more than 1 drive fail at the same time (with big & small RAID5 arrays).
i.e. If one 12TB drive fails in RAID5, it's POSSIBLE (or likely) a reconstruction can fail. If one 12TB drive fails with no RAID, all data is 100% gone for sure.
Where can I find more info on this?
Thanks!
 
@enewmen

The point of this article was that the increase in drive size will increase the chance of read errors, thus increasing the chance of an error happening while you are rebuilding your array. According to this article, if there is a read error during the reconstruction then the whole array will be lost. If you'll remember from earlier, the read error will be more likely due to the size hard drives will be in the near future. So, in conclusion, your experience with hard drives in the present has no relevance to the issues of larger hard drives of the future, which is the subject of this article.
 
[citation][nom]enewmen[/nom]I still don't understand how RAID with parity can be less reliable than 1 drive with no parity. I never in my life had more than 1 drive fail at the same time (with big & small RAID5 arrays).i.e. If one 12TB drive fails in RAID5, it's POSSIBLE (or likely) a reconstruction can fail. If one 12TB drive fails with no RAID, all data is 100% gone for sure.Where can I find more info on this?Thanks![/citation]

I couldn't give you info that is 100% technically sound on this topic (or a terribly elegant one), at least compared to some hardware junkies...but I'll take a stab at it.

Consider, that with one drive you are dealing with the probability of one drive failing. That is, the probability that one drive is defective, or has an off 'moment' inside, or is just too old, or whatever. This is the chance that the one drive will 'become another statistic', as I'll call it.

The simple explanation here provides that the more drives you have, the better chance you have for a single drive failure. So instead of having 1 chance for one drive to fail...you have 5 chances for one drive to fail. Then, the more drives you add, even better the probability that one of them will 'become another statistic.'

The problem enters when you consider that (after a drive failure) rebuilding an array of a certain size takes a quantifiable amount of time and number of disk operations. I have not personally done the math, but the idea is that each drive may (for argument's sake) have a 1 in 10,000 operations error rate, and rebuilding an array of the specified size or disk count may require 10,001 disk operations from EACH disk; and this is assuming that each drive operates within its specified tolerances. Another drive really could fail altogether.

The basic point is that the numbers can catch up to you
 
thanks for explaining..
It still seems like RAID6 is the better choice over no RAID. (the title of the article can be more clear)
Lets hope the rebuilding time also decreases with capacity increases.
 
What I hate about such types of articles is that the guy basically says: "RAID 5 sucks, RAID 6 sucks, you're screwed. HA! Told ya!"

It's a bit like announcing the end of the world because X is going to happen, yet not giving any practical solution for how X can be solved. The negative tone of the whole article is just like saying to us that we're doomed to lose our whole data every once in a while and that, oh, it's life... I'm sure there are already researches being conducted to circumvent such an issue, I'm sure technologies will show up in due time, and I'm pretty darn sure the drives will get larger and larger without stopping. There have always been seemingly unsurmountable issues, yet here we are today, through all those impossible obstacles and still living to talk about it!
 
Keyword: unrecoverable.

Error correction routines must become more robust otherwise all data will be corrupted, by their logic.

CRC errors happen constantly with I/O, you never notice them, or maybe you do..that weird hiccup or glitch. RAM has ECC as well, so does your CPU.

Our favorite hobby is awash in bit flipping and transients, if it wasn't for error correction algorithms, absolutely nothing electrical would work...hell all MP3's are just one big approximation of what you might be lisenting to.
 
Makes me wonder why in hell you want a single 12TB drive volume? Ever heard of the expression, "Don't put all your eggs in one basket?"

Smaller drives are better.
 
Ok guys they are talking about Corporations with NAS or SANs devices that hold 12 drives and 6TB of data. This is in RAID 5. Just 3 weeks ago we had 2 drives fail. 1 was part of the parity and the other was a spare. Heck one of our servers holds 6 drives 2 used in raid 1 and 4 in raid 5.

The more drives you have the higher the likely hood that you have multiple failures at the same time.

RAID 6 is expensive and requires hardware that is expensive. If you've ever bought a Server you will understand. One of our PowerEdge servers cost over 4 grand and that just had sata drives. Throw in 15k SAS drives and you're looking at 7-8grand. Now you need windows server...

This is old news anyway. Companies have and use redundant systems now. Companies don't rely on RAID all that much. A lot of companies also invest in imaging software like Acronis Echo or Symantec Ghost as well. As soon as your company experiences multiple drive failures in a RAID 5 they will switch to redundant systems faster than you can ship the server. You'll be buying the server that same day.
 
@Darkk

Exactly. Keep each RAID-5 volume under the size where read errors are statistically likely, and it's a non-issue. Assuming the scenario being suggested does occur, then it will simply become an industry standard that all RAID-5 arrays are built under the safe size limit. Either the drive designers will figure out how to resolve the problem, or another technology will replace hard drives, but no corporation or responsible individual is going to risk data corruption in exchange for capacity. Especially when media distribution is becoming increasingly an online enterprise...does anyone think that you'll be able to re-download your multi-terabyte movie collection for free (legitimately, I mean) when your array craps out?

Also, let's not forget that RAID-5 is not a backup solution. It's a high-availability solution.
 
What they're talking about is a 12TB RAID Array. Where all the drives in your system, SANs, or NAS box equals 12TB. Then they slice it up for different purposes.

It's a little bit more complicated as it requires planning of how much space this particular section will need. If you use roaming profiles will you need 1TB or 2?

[citation][nom]Darkk[/nom]Makes me wonder why in hell you want a single 12TB drive volume? Ever heard of the expression, "Don't put all your eggs in one basket?"Smaller drives are better.[/citation]
 
RAID 5 is a nonsense solution: it is slower for writes than RAID 0 and slower for recoveries than RAID. you can lose an entery business day to recover a RAID 5, imagine the time consumed when 2TB+ disk are on the market, and as stated in the article, RAID 5 is not a failproof solution.

The whole idea behind RAId is that it it an inexpensive/independent array of disks that is safer or faster than JBOD, so with 3 drives in a RAID 5 i have safety and speed, but the safety implies hours of recovery time, during which not a single bit on the entire array can go wrong, and the speed is just for reads, writes on RAID 5 are longer. with just one more disk i can make a RAID 10 solution, for the best of both worlds, or i can go cheap and use the very same 3 disks for a RAID 0 + external dayly backup-an elegant way to have lighting fast performance and fast system recovery.
On business solution the costs of delaying to much to recover the array far exceed the price gap between RAID 5 and RAID 10/RAID 6, and with the increased likehood that the array rebuild will be unsucessful this cost gap will be reversed.

Lets all say no to RAID 5!!
 
RAID5 saved my butt before. Even with the long recovery time.
I then quickly got the external drive to be double sure the data is safe.
This works well for me for home-use.
2 cents worth..
 
Seriously how many servers out there are running 2 TB drives (or even 1TB) in Raid 5??

As for no on Raid 5??? Pull your head out of your @ss and get with the industry. Yes Raid 0 is faster but try explaining to a user why they can't get to their files until you do a system restore from backups because you wanted to have faster read\write times... idiot....
jmtc
 
If they can't find a solution some time soon then it will be just another excuse to go to SSD. Not that SSD's wont fail but they obviously are boasting a very long shelf life for them.

Server wise sure 12 TB isn't that much. Consumer wise it's a butt load and will take a few years. I'm not saying it won't happen any time soon it's just most people don't need that much space yet. Once HDTV picks up and 25GB Blu ray rips becomes the norm then I can see 12TB filling up a lot faster.

Anyone who has 12TB of movies or music on their PC right now might just have the RIAA or Obama/Biden internet police knocking at your door making sure your legit in the near future.
 
Sure, you could use high capacity 3.5" platter SATA drives for your enterprise RAID 5, but that may not be the most intelligent decision you ever made. SCSI / SAS and the WD Raptor, have been using more reliable (lower capacity) 2.5" platters for over 5 years (I love the reviews talking about 2.5" drives being new, pop open an old 3.5" SCSI). That is why SCSI/SAS drives tend to be smaller in size. This greatly reduces platter vibration increasing reliability greatly.

..and a 7 disk RAID 5?! This article should be entitled "stupid people get upset when they do stupid things". There is nothing worse than being frustrated with a bad decision that you made for yourself, but there is one catch: YOU MADE THE DECISION!!!! Don't get angry with WD or Seagate when you 12 TB SATA RAID 5 double-faults. If you read the specs and ran the numbers, these companies practically say in bold print: DON'T DO THIS!

Whew! Felt good to get that off my chest. I get tired of storage articles that just don't get it. The only reasonable source I ever found was storagereview, but they don't update like they used to. If you love your data, back it up.
 
Rarely such bullshit can be read...
The error rate referred in the article is that of a undetectable one (by internal drive ECC algorithms), not a uncorrectable one, which will just slip by undetected. The practical result is corrupted read data (1 sector corrupted in 10^14 sector reads). The figure for enterprise level drives is much lower (1 in 10^15 - 10^16).
This is not a catastrophic failure, as it is presented. A reread will reproduce with utmost probability the correct data.

In normal read RAID operation, the parity bit is usually not verified. Higher class controllers will do a array consistency check at predetermined intervals. Such a undetected HDD read error will generate a consistency failure, and the corresponding warning in the logs - it will not mark the array as failed - and the controller will try to correct it, by writing a correct (in its vision) parity bit. The result is corrupted data, corresponding to that particular bit. In a reconstruction operation, such a error will slip undetected, the result being also corrupted data.
If the corrupted bit is used in some critical file system area, the damage could be (much) higher than if it was just part of some file.
Statistically the probability of causing irreversible damage is very low - the alarming conclusion of doom is far fetched.
 
Status
Not open for further replies.