Raid 5:Add another hard drive?

Status
Not open for further replies.

elcold

Distinguished
Mar 20, 2007
61
0
18,630
I currently have 3 hard drives in a raid 5 config, but I have heard you can get a nice speed boost by adding another. Can someone tell me how much I can gain from adding a new HD?
 
Theoretically you can use 2 out of 3 drives for write performance, adding one means you can use 3 out of 4, so this means 50% higher write performance.

Also theoretically, you can use 3 out of 3 drives for read operations, adding another one would increase it by 33%.

Without knowing which RAID implementation you are talking about and your I/O access pattern its meaningless to speculate beyond theory.
 
Unless you are using a good seperate controller hard you will not see any speed performance gains out of adding another drive. However, if you do have a controller card you should see a small proformance increase, however, the main reason for adding more HDD's is for more space. Do you need the extra space? and keep in mind that the slowest drive in the array is the speed that the rest will run at.
 

Could you elaborate your statement that you need a hardware solution in order for performance to scale with the number of disks? Thats ridiculous. It just does not make any sense.

Also you statement that the slowest drive in the pack will determine the rest is untrue. If non-sequential I/O occurs, that disk might not even be used. So you are at the very least speaking of a specific I/O access pattern where one drive could potentialy slow down the rest.
 
In a RAID5 array each drive is written to sequentially according to the block size of the array, with a parity block included for redundancy. Each read of a file depending on size, is read from every disk in the array. ie. the first part of the file brought off disk 1, the second part of the file off disk 2 etc. until the file is read in its entirety.

In a 3 disk array you can have 3 simultaneous reads one from each disk piecing together the file. A 4 disk array you can have 4 simultaneous reads. So increasing the number of disks should theoretically increase read speed as with each disk you add another simultaneous read is available, although adding more and more disks creates more overhead. Now we can see that if we have a dedicated controller card(with its own processor and ram) rather than an onboard controller (using other machine resources) that this overhead may overcome the capabilities of the onboard controller faster than a dedicated controller. Obviously it still directly relates to how fast the controllers are, but dedicated hardware will always decrease overhead on a system.

If in a 3 disk array we have 3 drives whose interface support 300Mb/s (SATAII theoretical) each drive should read and write data at the same speed. Therefore, each block from each drive will be read in about the same time and be available. However, if we take a 3 drive array with 2 drives at SATAII and one at SATAI (150Mb/s) we come into an issue where drives one and two should theoretically read 2 blocks per 1 block on the third drive. This is where the drive with the slowest speed in the array slows the array down. If we need 6 blocks to piece the file, 2 blocks on each of the 3 drives then drives one and two are going to complete reading of there 2 blocks while the third is still reading the final block stored on it before the file can be recalled. Therefore any sequential reads or writes on the array would be slowed down by the read/write speed of the slowest disk.

If im incorrect in anyway, i will gladly take into account anybody's further explaination of a RAID5 Array.
 
Clockman is more or less right, but a few tweeks..

In a Raid 5, lets look at a 3 disk array..

When you read, you read off two disks for a certain block of data, as the third disk holds tha parity data for that block.

Your access time till the start of the read is thus the slowest of the two drives, and your maximum throughput is the sum of the two drives.

In a 4 disk RAID5 you have three disks with a block of data, and the 4th holding the parity data. As a result since you are waiting for the slowest of the three drives holding data for the read to start, you will have slighly worse latency or access time versus a three disk array, but will, on average have better sustained throughput as you read off three drives versus 2 for any block of data. (You actually read off all the drives in an array, but the is for each individual block, the parity data is allocated to different disks with each block)

The same logic applies when writing. In general, the more disks in an array the worse the latency, but the better the sustained throughput.

In theory, as clockman suggests, you could have one drive that was slower (as in rotational speed, interface, design, platter density, whatever) than the others but almost always you use identical (or at very least very very similar) drives in an array. - Most SCSI controllers for example will spin down all the drives to the slowest drive speed if you use mismatched drives.

Dedicated controllers historically were almost always faster than software RAIDs, but with all these Quad cores at $270 a pop coming out, it would be interesting to see a ICH9 versus a dedicated array controller...

 

Sure but its entirely possible to have 40% CPU utilization and have much faster storage than 2% CPU utilization with hardware RAID but have a slower storage backend. Personally i prefer the first option, since storage is more than often a bottleneck to your system. Idle CPU cycles are no good if your system is waiting for data on disks.

If in a 3 disk array we have 3 drives whose interface support 300Mb/s (SATAII theoretical) each drive should read and write data at the same speed. Therefore, each block from each drive will be read in about the same time and be available. However, if we take a 3 drive array with 2 drives at SATAII and one at SATAI (150Mb/s) we come into an issue where drives one and two should theoretically read 2 blocks per 1 block on the third drive. This is where the drive with the slowest speed in the array slows the array down.
The added latency because of propagation delay by the interface might give a 1% difference or so in performance, sure. As opposed to 33% higher throughput, i'd say thats a good trade-off. :)

The real question is whether your RAID implementation scales well. This is highly dependent on the specific implementation and really does not matter if you use software RAID or hardware RAID. Software RAID is able to beat the crap out of hardware RAID when dealing with sequential throughput. Even with RAID5 sequential write, geom_raid5 managed to get speeds higher than Areca RAID5 with 8 disks, pretty awesome i'd say. Ofcourse this does put more load on the system and i've yet to see software RAID good at request reordering - another feature controllers like Areca are adept at. But any optimization possible in hardware RAID is possible to implement in software, and with software you have a lot faster hardware at your disposal.

@the_vorlon:
ICH9R offers no intelligent implementation of RAID5, you should look at geom_raid5 TNG, which can do 400MB/s sequential write - not bad for software RAID5 on an AMD dualcore system.
 
Great i now know a little more on RAID5. But one question the "spare disk" in RAID5 is that always the biggest disk in the array? I now have 3x1.5TB and one 2TB disk, where the 2TB has gone as a spare one. Is this always the case? So if im to add a 3TB disk the array will automatically use the 3TB disk as a spare and then "open" the 2TB disk again? or?

Or if i delete the array now and only make RAID5 from the 1.5TB disks and get one "spare" from a 1.5TB and then add the 2.0TB it will get bigger?

Im a bit lost in how the RAID5 choses disks to do different jobs..


Thank you all, and sorry for bringing up a thread thats MANY years old...
 
Status
Not open for further replies.