Number of disks in a software RAID-5 array

G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

What are the trade-offs regarding the number of disks in a software
RAID-5 array ? My understanding is that, the more disks there are,

1. the more storage for the euro,

2. the worse the performance (assuming the bus is the
bottleneck, which is not unlikely in the case of software
RAID),

Does the reliability of the array increase with the number of
disks ? I'm aware that more disks means failures occur more
often but is it not offset by the fact that each disk contains a
smaller portion of the data ? I'm not sure about that because it
seems to contradict #1.

--
André Majorel <URL:http://www.teaser.fr/~amajorel/>
Todos, todos me miran mal
Salvo los ciegos, es natural
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

Previously Andre Majorel <amajorel@teezer.fr> wrote:
> What are the trade-offs regarding the number of disks in a software
> RAID-5 array ? My understanding is that, the more disks there are,

> 1. the more storage for the euro,

Loss: 1/n with n partitions or drives in the RAID-5.

> 2. the worse the performance (assuming the bus is the
> bottleneck, which is not unlikely in the case of software
> RAID),

For reads only if you have some magic other method to circumvect that
bottleneck. Writes get slower, since the also involve reads on RAID.

Personal experience with Linux 2.6.x: Reads get fater up to the
hardware limit, like a n-1 RAID-1 set, so no performence loss here.
Writes Are about the same speed on a 3 disk RAID5 as on a
8 disk RAID5. Since my application is dominated by reads, I never
tried much tuning. However I have noticed that linear writes can
get faster with larger block sizes, e.g. 32k or 128k, depending on
the hardware.

One thing that kills both read and write performance is putting
two disks on one IDE channel in a promise 133TX2 controller.
The effect seems also to be present with HighPoint HTP374-based
controllers.

> Does the reliability of the array increase with the number of
> disks ?

Overall "loss-risk vs. time" of course increases,
since the more disks, the higher the risk of a double-loss.
However normally you have some replacement procedure in place
that will keep the risk relatively low, e.g. if you are going
to replace a failed disk within 24 hours it is just the risk of
loosing 2 disks in 24 hours. For this reason it is advisable
to have a cold spare handy or maybe even a hot one. In practice
people do not put more than 8 disks or so into one RAID5 array.
Determining when such an array is not anymore more reliable than
a single disk is difficult, since it e.g. depends on the speed
of replacement and other concrete operation factors. For large
numbers of disks, the reliability of a RAID% array will be
significanly lower than that of an individual disk.

Reliability ber byte stored also decreases with the number of
disks, but it will never get worse than for one individual disk.
Here you can think of the party info beeing used for more and
more data and having less and less benefit.

> I'm aware that more disks means failures occur more
> often but is it not offset by the fact that each disk contains a
> smaller portion of the data ? I'm not sure about that because it
> seems to contradict #1.

Your reasoning is flawed: An one disk loss means no data loss.
A two disk loss means a catastrophic loss of _all_ data, no
matter how large the individual pieces were.

For more redundancy, you can use RAID6, which can tolerate
up to two disk/partition lost. However it gets slow when two
disks/partitions are missing. And in Linux-2.6.x it is still
experimental.

Arno
--
For email address: lastname AT tik DOT ee DOT ethz DOT ch
GnuPG: ID:1E25338F FP:0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
"The more corrupt the state, the more numerous the laws" - Tacitus
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

On 2004-12-31, Arno Wagner <me@privacy.net> wrote:
> Previously Andre Majorel <amajorel@teezer.fr> wrote:
>> What are the trade-offs regarding the number of disks in a software
>> RAID-5 array ? My understanding is that, the more disks there are,
>
>> 1. the more storage for the euro,
>
> Loss: 1/n with n partitions or drives in the RAID-5.
>
>> 2. the worse the performance (assuming the bus is the
>> bottleneck, which is not unlikely in the case of software
>> RAID),
>
> For reads only if you have some magic other method to circumvect that
> bottleneck. Writes get slower, since the also involve reads on RAID.
>
> Personal experience with Linux 2.6.x: Reads get fater up to the
> hardware limit, like a n-1 RAID-1 set, so no performence loss here.
>
> Writes Are about the same speed on a 3 disk RAID5 as on a
> 8 disk RAID5.

OK.

> Since my application is dominated by reads, I never
> tried much tuning. However I have noticed that linear writes can
> get faster with larger block sizes, e.g. 32k or 128k, depending on
> the hardware.

By block size, do you mean fs-level block (mke2fs -b), or
fs-level stride (mke2fs -R stride=), or stripe size, or
something else ?

> One thing that kills both read and write performance is putting
> two disks on one IDE channel in a promise 133TX2 controller.
> The effect seems also to be present with HighPoint HTP374-based
> controllers.

Yes, I've been warned about that. I understand it's true
regardless of the controller (i.e. it's a fundamental
shortcoming of IDE).

>> I'm aware that more disks means failures occur more
>> often but is it not offset by the fact that each disk contains a
>> smaller portion of the data ? I'm not sure about that because it
>> seems to contradict #1.
>
> Your reasoning is flawed: An one disk loss means no data loss.
> A two disk loss means a catastrophic loss of _all_ data, no
> matter how large the individual pieces were.

OK. As I don't understand how RAID-5 distributes the data across
the disks, I'm in the dark.

Thank you for the explanations. There doesn't seem to be any
technical reason for increasing the disk count. 4 disks seems
like a good compromise, yes ?

--
André Majorel <URL:http://www.teaser.fr/~amajorel/>
Todos, todos me miran mal
Salvo los ciegos, es natural
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

Previously Andre Majorel <amajorel@teezer.fr> wrote:
> On 2004-12-31, Arno Wagner <me@privacy.net> wrote:
>> Previously Andre Majorel <amajorel@teezer.fr> wrote:
>>> What are the trade-offs regarding the number of disks in a software
>>> RAID-5 array ? My understanding is that, the more disks there are,
>>
>>> 1. the more storage for the euro,
>>
>> Loss: 1/n with n partitions or drives in the RAID-5.
>>
>>> 2. the worse the performance (assuming the bus is the
>>> bottleneck, which is not unlikely in the case of software
>>> RAID),
>>
>> For reads only if you have some magic other method to circumvect that
>> bottleneck. Writes get slower, since the also involve reads on RAID.
>>
>> Personal experience with Linux 2.6.x: Reads get fater up to the
>> hardware limit, like a n-1 RAID-1 set, so no performence loss here.
>>
>> Writes Are about the same speed on a 3 disk RAID5 as on a
>> 8 disk RAID5.

> OK.

>> Since my application is dominated by reads, I never
>> tried much tuning. However I have noticed that linear writes can
>> get faster with larger block sizes, e.g. 32k or 128k, depending on
>> the hardware.

> By block size, do you mean fs-level block (mke2fs -b), or
> fs-level stride (mke2fs -R stride=), or stripe size, or
> something else ?

Sprry, that would be "chunk-size", i.e. stripe-size. There might
be additional gains from matching the fs-level stride to it,
but I did not try that so far.

>> One thing that kills both read and write performance is putting
>> two disks on one IDE channel in a promise 133TX2 controller.
>> The effect seems also to be present with HighPoint HTP374-based
>> controllers.

> Yes, I've been warned about that. I understand it's true
> regardless of the controller (i.e. it's a fundamental
> shortcoming of IDE).

From my experience the effect is small with the VIA onboard
ATA controllers on my mainboard. It is massive with Promise
PCI controllers and strong with the HPT on-board controllers
I have.

>>> I'm aware that more disks means failures occur more
>>> often but is it not offset by the fact that each disk contains a
>>> smaller portion of the data ? I'm not sure about that because it
>>> seems to contradict #1.
>>
>> Your reasoning is flawed: An one disk loss means no data loss.
>> A two disk loss means a catastrophic loss of _all_ data, no
>> matter how large the individual pieces were.

> OK. As I don't understand how RAID-5 distributes the data across
> the disks, I'm in the dark.

Simple: Split it in n-1 sets of pices, like it was a RAID-0.
Then a set of pices (each in the sitze of a chunk) goes on
each disk in turn. For the missing pice store a butwise
xor of all the other pices. That way one missing pice can be
reconstructed from the others and the xor-pice. The xor-pice is
rotated around, so loss of any one disk had the same performance
impact.

If you losse 2 pices, regardless of which one, you miss one
chunk-sized pice in any n-1 chunks, e.g. 4kB out of each
28kB slice in an 8 disk array.

> Thank you for the explanations. There doesn't seem to be any
> technical reason for increasing the disk count. 4 disks seems
> like a good compromise, yes ?

The one reason to use more disks is that you get a larger
array. My experience is that 4-8 disks make sense. 3 is wasteful.
more than 8 gets difficult to manage.

Arno
--
For email address: lastname AT tik DOT ee DOT ethz DOT ch
GnuPG: ID:1E25338F FP:0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
"The more corrupt the state, the more numerous the laws" - Tacitus