Previously Andre Majorel said:
What are the trade-offs regarding the number of disks in a software
RAID-5 array ? My understanding is that, the more disks there are,
1. the more storage for the euro,
Loss: 1/n with n partitions or drives in the RAID-5.
2. the worse the performance (assuming the bus is the
bottleneck, which is not unlikely in the case of software
RAID),
For reads only if you have some magic other method to circumvect that
bottleneck. Writes get slower, since the also involve reads on RAID.
Personal experience with Linux 2.6.x: Reads get fater up to the
hardware limit, like a n-1 RAID-1 set, so no performence loss here.
Writes Are about the same speed on a 3 disk RAID5 as on a
8 disk RAID5. Since my application is dominated by reads, I never
tried much tuning. However I have noticed that linear writes can
get faster with larger block sizes, e.g. 32k or 128k, depending on
the hardware.
One thing that kills both read and write performance is putting
two disks on one IDE channel in a promise 133TX2 controller.
The effect seems also to be present with HighPoint HTP374-based
controllers.
Does the reliability of the array increase with the number of
disks ?
Overall "loss-risk vs. time" of course increases,
since the more disks, the higher the risk of a double-loss.
However normally you have some replacement procedure in place
that will keep the risk relatively low, e.g. if you are going
to replace a failed disk within 24 hours it is just the risk of
loosing 2 disks in 24 hours. For this reason it is advisable
to have a cold spare handy or maybe even a hot one. In practice
people do not put more than 8 disks or so into one RAID5 array.
Determining when such an array is not anymore more reliable than
a single disk is difficult, since it e.g. depends on the speed
of replacement and other concrete operation factors. For large
numbers of disks, the reliability of a RAID% array will be
significanly lower than that of an individual disk.
Reliability ber byte stored also decreases with the number of
disks, but it will never get worse than for one individual disk.
Here you can think of the party info beeing used for more and
more data and having less and less benefit.
I'm aware that more disks means failures occur more
often but is it not offset by the fact that each disk contains a
smaller portion of the data ? I'm not sure about that because it
seems to contradict #1.
Your reasoning is flawed: An one disk loss means no data loss.
A two disk loss means a catastrophic loss of _all_ data, no
matter how large the individual pieces were.
For more redundancy, you can use RAID6, which can tolerate
up to two disk/partition lost. However it gets slow when two
disks/partitions are missing. And in Linux-2.6.x it is still
experimental.
Arno