Andrew said:
Well..... that's where you lost me. That sentece is false. If he has
one drive and it goes bad he loses all his data. If he stripes with
two and loses one, he loses all his data. Almost a complete wash. Your
statement is almost like saying since your neighbor owns two cars he's
twice as likely to crash. A quick look at his insurance policy will
show that they don't believe that.
No RAID setups( 0, 1 or 5) are a substitute for backing up anyway.
Actually he is quite correct: the probability that a stripe set
will fail is essentially proportional to the sum of the probability
of the individual drive failures.
Suppose that the probability that a drive will fail in a certain
time interval "T" is "P", with P being a very small number.
Then in you have a single drive, the probability that the drive
fails during an interval "T" is obviously P.
If you have two striped drives, then the probability of the losing every
thing on the stripe set is:
P(1-P) (Drive 0 fails, Drive 1 survives)
+(1-P)P (Drive 0 survives, Drive 1 fails)
+ P^2 (Both drives fail)
---------------------------------------------
2P - P^2 TOTAL propability of losing the stripe set
Since P is very small, P^2 is negligible relative to 2P
and is typically ignored.
Digital Media, Mike Newcomb writes:
"Point 2: It is my understanding that A RAID 0 system does not
"double" your failure rate or risk of failure. The impact from a
statistical stand point is something less than "double" the risk. In
order to ascertain what the actual risk increase is, a number of
factors, such as MTBF (Mean Time Between Failure) rate, must be
considered.
See, link for forumla/light
reading:
http://www.itl.nist.gov/div898/handbook/apr/section1/apr182.htm
That page does *not* support your contention.
S'matter of fact, the formula on that page for Fs(t) is
essentially just another route to what I have done above:
Fs(t) = 1 - (1-P)(1-P) = 2P - P^2
As well, there is the statement along the left hand side
of that web page "Add failure rates and multiply reliabilities
in the series model." That statement is just a first-order
approximation of the formulas on that page for Fs(t) and Rs(t).
In reality, the additional risk is insignificant when virtually all
drives have a MTBF rate of more than 100,000 hours and you will
probably have your system for less than 5 years. (Hint: there is 8,760
hours in a year)".
There is (was?) a nice article about MTBF, as applicable to
hard drives and RAIDs at IBM's site. Home of many of the
world's best scientists and engineers, in case you have
forgotten.
They cautioned against RAID 0 because ...
If a single drive has an MTBF(single) of "T", then
(as a first order approximation)
MTBF(RAID 0, n drives) = MTBF(single) / n^2
In other words, if you stripe 2 drives, then the MTBF
of the stripe set is ONE-QUARTER of the MTBF of a single
drive.
As well MTBFs can be extremely misleading.
As you said, there are only a few thousand hours in a year.
Nobody tests a batch of drives for 20 years in order to
get real empirical evidence of how long the drive can be
expected to last. Instead they test a lot of drives for
one or two years and extrapolate from that.
Statistical modelling like that can be extremely accurate
if all of the underlying assumptions are valid - but they
can be wildly inaccurate if just one assumption turns out
to be invalid. IBM found that out the hard way with their
deathstar hard drives, WD found that out with their
DiamondMax9 drives, and Fujitsu found that out ...
And MTBFs can be extremely misleading in other ways.
If half of a batch of drives will fail in 10,000 hours
and the other half will fail in 190,000 hours, then the
MTBF is 100,000 hours but I sure as heck wouldn't want
one of those drives.
In other words, those huge MTBF numbers means squat - I
would much rather know the probability that a drive will
fail during some more useful time interval.