RAID-1 reliability

marcodeo · Aug 26, 2003

I'm going to set up a small departmental server for a customer of us.
I consider setting up a RAID-1 system.
This isn't my first RAID, but I don't have a wide experience on these
solutions, and I would like to hear the opinion of others on a couple of
points.

I am rather suspicious about RAID-1 real reliability.
It's often said that it's best if the disks are the same model & same size.
However, using two drives with the same manufacturing standards, the same
tolerances, the same MTBF, etc. etc. increases the chance that both fail
simultaneously.

Did someone of you experience such failures?
To which extent may I deviate from the above rule (same model-same size
disks)?

_I apologize for my English_
Thanks
Marco.

Rod Speed · Aug 26, 2003

I'm going to set up a small departmental server for a
customer of us. I consider setting up a RAID-1 system.

This isn't my first RAID, but I don't have a wide
experience on these solutions, and I would like to
hear the opinion of others on a couple of points.

RAID-5 is there for a reason.

I am rather suspicious about RAID-1 real reliability.

It's often said that it's best if the disks are the same model & same size.

Not for that reason. Thats for performance reasons.

However, using two drives with the same manufacturing
standards, the same tolerances, the same MTBF, etc.
etc. increases the chance that both fail simultaneously.

Nope, it does not.

Did someone of you experience such failures?

To which extent may I deviate from the
above rule (same model-same size disks)?

You may not get as good performance.

Arno Wagner · Aug 26, 2003

Previously marcodeo said:
I'm going to set up a small departmental server for a customer of us.
I consider setting up a RAID-1 system.
This isn't my first RAID, but I don't have a wide experience on these
solutions, and I would like to hear the opinion of others on a couple of
points.

I am rather suspicious about RAID-1 real reliability.
It's often said that it's best if the disks are the same model & same size.
However, using two drives with the same manufacturing standards, the same
tolerances, the same MTBF, etc. etc. increases the chance that both fail
simultaneously.

Not really. What happens is that nonrandom failures can happen
in the same way on both disks, if there are design problems with
the disks. For random failures RAID-1 is just as reliable with
similar disks as with different ones. But even design errors
will in many cases not lead to simultaneous failures. Personally
I think it is important to be able to replace a failed disk fast.

The only real risk IMO is something that really leads to simultaneous
failure, such as high sensibility to overtemperature, overvoltage or
physical shock.

Did someone of you experience such failures?
To which extent may I deviate from the above rule (same model-same size
disks)?

Usually you can combine arbitrary disks. You will only get
the capacity of the smaller one though.

Arno

_I apologize for my English_

Why? Seems fine to me.

Nick · Aug 26, 2003

You're wrong.

You'd get the same result if you received
shipments of mixed drives instead.

What I mean is that we usually receive hard drives from the same
manufacturer, same model from customers all over Europe to be repaired
the same day. When it's once you can call it coincidence, when it's
once a week it's a bit too often, especially when it's exactly the
same failure for the drives.

Nick

J.Clarke · Aug 26, 2003

What I mean is that we usually receive hard drives from the same
manufacturer, same model from customers all over Europe to be repaired
the same day. When it's once you can call it coincidence, when it's
once a week it's a bit too often, especially when it's exactly the
same failure for the drives.

Were the manufacturing dates on the drives the same?

Rod Speed · Aug 26, 2003

What I mean is that we usually receive hard
drives from the same manufacturer, same model from
customers all over Europe to be repaired the same day.

Dont believe it.

When it's once you can call it coincidence, when
it's once a week it's a bit too often, especially
when it's exactly the same failure for the drives.

Dont believe you get that effect with every
drive model from every hard drive manufacturer.

You may get that result with just one drive model
like say the Fujitsu MPGs, but thats just a design
problem and even you must have noticed that all
MPG drives in europe dont all die on a single day.

Timothy Daniels · Aug 27, 2003

"Rod Speed" baited:

Dont believe it.

Dont believe you get that effect with every
drive model from every hard drive manufacturer.

You may get that result with just one drive model
like say the Fujitsu MPGs, but thats just a design
problem and even you must have noticed that all
MPG drives in europe dont all die on a single day.

Don't even bother to answer that bait, Nick.
Let one of his fake "minions" answer it.

*TimDaniels*

Rod Speed · Aug 27, 2003

Some pathetic little ****wit claiming to be
message just the puerile silly shit thats all it can ever manage.

J.Clarke · Aug 27, 2003

Sorry, I only have an MSc in Pattern Analysis, and I have a very
strong Bayesian background.

You should have a look at Renewal Theory, 1962 from Cox, you would
learn one or two thing about time of failure and so on.
If a model as a flaw in its conception, it's very very likely that
most of the drives will fails in a very short period. The distribution
of time of failure is an Erlangian distribution, with a peak of
failure where most of the drives fail. The pdf of the distribution is
(rho^alpha * t^alpha * exp (-rho * t) )/ gamma(rho)

where alpha and rho are parameters, and t the time. One can see that
there is a much more likely time t0 of failure than any others

Add this with Murphy's law, and you'll be likely to have the first
disk crashing when nobodys here, and the second one when the guy in
charge arrives to change the first one.

What you're describing sounds like infant mortality, not the failures
that occur in normal long-term operation.

Nick · Aug 27, 2003

Dont believe you get that effect with every
drive model from every hard drive manufacturer.

You may get that result with just one drive model
like say the Fujitsu MPGs, but thats just a design
problem and even you must have noticed that all
MPG drives in europe dont all die on a single day.

Obviously, but when you discover too late that there's been a flaw in
the conception of the drive, you're in trouble.
The best exemple I have is with Quantum Fireball AS Plus. We never
receive only one of them, but rather pairs or triple, same model, same
date, and same failure (chip burned, or one platter not recognized any
more)

Nick

Rod Speed · Aug 28, 2003

Sorry, I only have an MSc in Pattern Analysis,

Just goes to show how pathetically inadequate that sort of
purported 'qualification' is when you clearly havent managed to
grasp even the most basic statistical concepts when you make
that completely stupid claim about RAID1 with identical drives.

and I have a very strong Bayesian background.

You clearly havent managed to grasp
even the most basic statistical concepts.

You should have a look at Renewal Theory, 1962 from Cox,

Doesnt have a damned thing to say about your terminally stupid
pig ignorant claim about RAID1 and identical drive models.

you would learn one or two thing about time of failure and so on.

Wrong again.

If a model as a flaw in its conception, it's very very likely
that most of the drives will fails in a very short period.

Mindlessly silly as far as both drives in a RAID1
array failing simultaneously is concerned.

Even someone as stupid as you should have noticed that
even with the worst of the particular drive models dying like
flys currently, the Fujitsu MPGs, the Quantum Fireballs and
the IBM GPXs, almost no one ever gets a pair used in a RAID1
array failing simultaneously without that being due to the
power supply powering them killing them both while dying etc.

Basically because the failure rate is nothing
like high enough to produce that effect.

The distribution of time of failure is an Erlangian distribution,
with a peak of failure where most of the drives fail. The pdf of the
distribution is (rho^alpha * t^alpha * exp (-rho * t) )/ gamma(rho)
where alpha and rho are parameters, and t the time.

That sort of waffle might get you a formal qualification
from some pathetically inadequate academic institution
boy, but it cuts no mustard with anyone who has a clue
about the most basic statistical concepts.

One can see that there is a much more
likely time t0 of failure than any others

You clearly havent managed to grasp
even the most basic statistical concepts.

Add this with Murphy's law, and you'll be likely to have
the first disk crashing when nobodys here, and the second
one when the guy in charge arrives to change the first one.

You clearly havent managed to grasp
even the most basic statistical concepts.

Hint: Murphy's law is only ever cited by those who havent
managed to grasp even the most basic statistical concepts.

marcodeo · Aug 28, 2003

[snip]

Not really. What happens is that nonrandom failures can happen
in the same way on both disks, if there are design problems with
the disks...

However, chance is that You will know that your drives have a design problem
after you have already installed them.

The only real risk IMO is something that really leads to simultaneous
failure, such as high sensibility to overtemperature, overvoltage or
physical shock.

Usually you can combine arbitrary disks. You will only get
the capacity of the smaller one though.

Thus I may use two different disks, but having similar specifications (size,
rotational speed, etc.).
This is what I was going to do.

Grazie mille :-)

MArco-

Arno Wagner · Aug 28, 2003

Previously marcodeo said:
[snip] [...]
Thus I may use two different disks, but having similar specifications (size,
rotational speed, etc.).

Same size is almost a requirement, but matching the rest is a good
idea, too.

This is what I was going to do.
Grazie mille

You are welcome.

Arno

Rod Speed · Aug 28, 2003

However, chance is that You will know that your drives have
a design problem after you have already installed them.

Sure, but the whole point of RAID1 is to have redundancy
so a single drive failure will still see the system usable.

His original pig ignorant stupid claim that using identical drives
increases the chance of simultaneous failure is terminally
pig ignorant and flaunts the fact that he clearly doesnt have
a clue about even the most basic statistical concepts.

Yes, if the drives do have a high failure rate, that obviously does
increase the risk of them both failing, but its completely pig ignorant
silliness to claim that there is an increased risk of SIMULTANEOUS
failure just because the two drives have 'the same manufacturing
standards, the same tolerances, the same MTBF, etc. etc.

Even with a fundamental design flaw, the chance of
both drives failing SIMULTANEOUSLY is microscopic.

And RAID1 just increases the chance of being able to
carry on regardless when one drive fails, it was never
intended to replace adequate backups because those
are still needed for virus infection, something external
to the drives killing both drives at once like power supply
failure, the room the drives are in going up in flames etc.

Thus I may use two different disks, but having similar
specifications (size, rotational speed, etc.).
This is what I was going to do.

Pointless because you will get a
worse performance result usually.

Just do what you should be doing even with a single
drive, dont buy drives which have a known high failure
rate, have full backups of what is on the drive(s) and
for maximum uptime, replace a failed drive as soon as
the system reports that one of the pair has failed.

Nick · Aug 28, 2003

You clearly havent managed to grasp
even the most basic statistical concepts.

What I like with you, it's that you are so much clever than any
others. That must be really hard living in a world full of morrons and
people saying uttelry clueless stuffs

Mindlessly silly as far as both drives in a RAID1
array failing simultaneously is concerned.

Even someone as stupid as you should have noticed that
even with the worst of the particular drive models dying like
flys currently, the Fujitsu MPGs, the Quantum Fireballs and
the IBM GPXs, almost no one ever gets a pair used in a RAID1
array failing simultaneously without that being due to the
power supply powering them killing them both while dying etc.

Yes, but the probability of failure of both drive is nevertheless
increased when the two drives are identical. It's likely to be only
marginaly, but one should do anything to lower the probability of
failure of both drives.

That sort of waffle might get you a formal qualification
from some pathetically inadequate academic institution
boy, but it cuts no mustard with anyone who has a clue
about the most basic statistical concepts.

Great. I love when some dirty little boy in his garage starts saying
the theory of Cox are bullshit. Because it was specifically made for
the failure of electronical device.

Hint: Murphy's law is only ever cited by those who havent
managed to grasp even the most basic statistical concepts.

Go to every statistical departement and you'll find paper about it.
It's half funny, half serious.

Someone who havent managed to grasp even the most basic statistical
concepts.

Rod Speed · Aug 28, 2003

What I like with you, it's that you are so much clever than
any others. That must be really hard living in a world full
of morrons and people saying uttelry clueless stuffs

Even you should be able to bullshit your way out of your
predicament better than that pathetic effort, child.

EVERYONE has rubbed your nose in the FACT that you
clearly dont have a clue about even the most basic statistical
concepts with that original terminally pig ignorant claim about
the use of identical drives in a RAID1 pair.

Yes, but the probability of failure of both drive is nevertheless
increased when the two drives are identical.

Complete and utter drivel. You clearly havent managed
to grasp even the most basic statistical concepts.

The ONLY thing that matters as far as the risk of
simultaneous failure in a RAID1 array is concerned
IS THE FAILURE RATE OF THAT MODEL.

The fact that they are IDENTICAL has no effect what
so ever on the risk of SIMULTANEOUS FAILURE.

It's likely to be only marginaly,

Its PRECISELY the same risk as would be
seen if two non identical drive models were
used which have the same failure rate.

but one should do anything to lower
the probability of failure of both drives.

Terminally pig ignorant all over again.

You should always have full backups even for a RAID1 array
because the spectacular death of the power supply can kill
both drives at once, the room that houses the pair of drives can
burn down, fill up with water, collapse, be robbed, etc etc etc.

The only time that simultaneous failure of both drives
matters is when 100% uptime is absolutely essential, and
RAID1 aint used in that situation by anyone with a clue.

The only thing that makes any sense at all if you
must have 100% uptime is to have both drives
physically located in different buildings etc so that
even if the entire building burns down or someone
does a Sept 11 on one of the buildings, the system
will carry on regardless using the spare etc.

Great. I love when some dirty little boy in his garage

I'm likely old enough to be your grandfather, child.

And I dont bother with garages. Dont even own one.

arts saying the theory of Cox are bullshit.

More of your childish lying. I never ever said anything
even remotely resembling anything like saying that the
'theory of Cox' is bullshit, JUST that you havent got a
clue about even the most basic statistical concepts as
far as what matters on the question being discussed is
concerned, the use of identical drives in a RAID1 array.

Because it was specifically made
for the failure of electronical device.

But doesnt say anything like what you so stupidly
pig ignorantly claimed ABOUT THE USE OF
IDENTICAL DRIVES IN A RAID1 ARRAY.

Go to every statistical departement and you'll find paper about it.

More childish lying.

It's half funny, half serious.

0 serious in fact.

J.Clarke · Aug 29, 2003

What I like with you, it's that you are so much clever than any
others. That must be really hard living in a world full of morrons and
people saying uttelry clueless stuffs

Yes, but the probability of failure of both drive is nevertheless
increased when the two drives are identical. It's likely to be only
marginaly, but one should do anything to lower the probability of
failure of both drives.

If the application is so critical that that slight increase in
probability of failure creates a significant problem then you shouldn't
be using RAID 1 alone to begin with--RAID 5+1 with clustered servers
would be more appropriate.

Now, how do you handle a hot-spare using your model? Do you use a third
brand of drive?

Great. I love when some dirty little boy in his garage starts saying
the theory of Cox are bullshit. Because it was specifically made for
the failure of electronical device.

Go to every statistical departement and you'll find paper about it.
It's half funny, half serious.

Someone who havent managed to grasp even the most basic statistical
concepts.

Statistics, schmatistics, anybody who has any significant amount of
experience as a technician knows that Murphy was an optimist.

J.Clarke · Aug 30, 2003

RAID (except RAID-0) is about availability, not backup.
Vis., maintaining availability of data past a drive failure.

RAID does not protect against:
o Controller failure

Depends on implementation. RAID1 can be implemented in software using
two separate host adapters--Novell used to call this mode "duplexing" as
opposed to "mirroring", which is RAID1 with both drives on the same host
adapter.

o Operator error in recovering a failure in a RAID system

The latter is an important point:
o IDE RAID-1 by s/w or h/w suffers a failure
---- recovery is not just reboot & automatic rebuild

Depends on the host adapter. The 3wares and the four-channel Adaptecs
recover by reboot and automatic rebuild, if they go down at all. The
cheap Promise boards are a different story.

---- often it can be time-consuming (downtime)
---- often it can expose data to risk (human error)
o SCSI RAID-5 by h/w suffers a failure
---- recovery IS just reboot & automatic rebuild

So one needs to consider not just how RAID maintains
availability, but the time for that system to recover it

RAID also varies in performance:
o IDE RAID 0/1
---- surprisingly both s/w & h/w solutions perform similar
---- h/w should be faster, but the controllers are low-spec
o Adaptec RAID-5
---- RAID-5 it is, but performance is not great
---- solutions from Mylex & others can be far faster

For business use, RAID-1 is just the minimal requirement if
applied to IDE disks which have shorter warranties than SCSI.
IDE RAID-1 is good for co-location static web-page servers,
the machine stays up with a drive failure rather than downtime.

For serious business use, I'd really want a SCSI RAID-5.
Mylex offer good solutions from very cheap 170LP, LVD SCSI
and RAID-5 capability in a low-profile format if rack mounting.

RAID-5 does not offer improved reliability when compared to RAID 1.
There is no reason to use SCSI RAID-5 instead of SCSI RAID-1 unless one
is trying to minimize drive costs.

dorothy.bradbury · Aug 30, 2003

RAID (except RAID-0) is about availability, not backup.
Vis., maintaining availability of data past a drive failure.

RAID does not protect against:
o Controller failure
o Operator error in recovering a failure in a RAID system

The latter is an important point:
o IDE RAID-1 by s/w or h/w suffers a failure
---- recovery is not just reboot & automatic rebuild
---- often it can be time-consuming (downtime)
---- often it can expose data to risk (human error)
o SCSI RAID-5 by h/w suffers a failure
---- recovery IS just reboot & automatic rebuild

So one needs to consider not just how RAID maintains
availability, but the time for that system to recover it :-)

RAID also varies in performance:
o IDE RAID 0/1
---- surprisingly both s/w & h/w solutions perform similar
---- h/w should be faster, but the controllers are low-spec
o Adaptec RAID-5
---- RAID-5 it is, but performance is not great
---- solutions from Mylex & others can be far faster

For business use, RAID-1 is just the minimal requirement if
applied to IDE disks which have shorter warranties than SCSI.
IDE RAID-1 is good for co-location static web-page servers,
the machine stays up with a drive failure rather than downtime.

For serious business use, I'd really want a SCSI RAID-5.
Mylex offer good solutions from very cheap 170LP, LVD SCSI
and RAID-5 capability in a low-profile format if rack mounting.

RAID is not a backup medium.
RAID can be more important depending on the source medium
reliability - bum IDE designs have existed & warranties are poor.

However backups remain the backup, RAID for availability.

SCSI is often kicked for cost - and yes, 136GB 10/15k rpm SCSI
disks do cost something plenty. However, if your data-set is a
lot smaller - so that 18GB disks will do, so price falls drastically.

Re the other (sub-thread :-)

about disks failing:
o Some designed-in degradation failures can group failure-POH
o Whether such is representative of the risk is mute
---- were such designed in faults expected re risk assessment?
---- are such designed in faults likely to re-occur?

The tail-end of the event distribution is small & far out.
However, from LTCM to Mar00, it isn't as small as we'd like
and certainly not as small as we'd like to assume re costs.

IT environment is one of cost, minimal margins, lousy sales-$:R&D-$.
Thus we may see more peculiar failures in the future, so assuming
past reliability is a predictor of future may not be a sound idea.

Rod Speed · Aug 30, 2003

RAID (except RAID-0) is about availability, not backup.
Vis., maintaining availability of data past a drive failure.

What I said.

RAID-1 reliability

marcodeo

Rod Speed

Arno Wagner

Nick

J.Clarke

Rod Speed

Timothy Daniels

Rod Speed

J.Clarke

Nick

Rod Speed

marcodeo

Arno Wagner

Rod Speed

Nick

Rod Speed

J.Clarke

J.Clarke

dorothy.bradbury

Rod Speed