RAID 1 vs RAID 5 and to the bottom of it !

  • Thread starter Thread starter John
  • Start date Start date
J

John

Dear All,

I've got a question regarding hardware RAID 1 (R1) and RAID 5 (R5)

Don't consider following issues, because they are not playing in this
question:
a) Minimum disk setup and setting up hot spares
b) Write intensive situations or sequential reads situations
c) Dynamic resizing a RAID
d) Same manufacturer and same lot drive issues (this count for both setups)
e) Performance issue when 1 HD goes down.

Look at following diagram for better understanding :

RAID 1 :
--------
HD1 HD2 HD3 HD4
------------------------------------------------
Strip1 Strip2 Copy1 Copy2
Strip3 Strip4 Copy3 Copy4
Strip5 Strip6 Copy5 Copy6
Strip7 Strip8 Copy7 Copy8

HD1 & HD2 are duplicated on HD3 & HD4. If HD1 crashes then the copy is
simply used instead. Recovery is simply; replace HD1 and copy the entire HD3
to it.

RAID 5 :
--------
HD1 HD2 HD3 HD4
------------------------------------------------
Strip1 Strip2 Strip3 Parity1-3
Strip4 Strip5 Parity4-6 Strip6
Strip7 Parity7-9 Strip8 Strip9
Parity10-12 Strip10 Strip11 Strip12

Instead of real duplicating disks like R1, R5 is creating a parity (XOR) and
distributes this round robin wise on all disks.


Question :
----------
I feel more secure when data is written to a R5 then it is on a R1. Why?

scenario RAID 1 : Suppose that strip 1 is written on HD1 and duplicated on
HD3, but there was a bad sector on HD3, so a real sync copy would never
work. When HD1 fails after 1 year, and it's replaced and I restore a copy of
HD3 on it, then my guess is that "Original HD1" and "restored HD1" are never
identical or you have to mark bad sectors that came from HD3 to the new
restored HD1 and still then there is a difference with "Original HD1" and
"restored HD1".

scenario RAID 5 : All this will never happen because there is no identical c
opy of data. If a sector is going bad on HD1 then this will be marked and
data will be written on an other sector on HD1. When HD1 is failing then
removing and inserting a new one will generate automatic new data recovered
from HD2, HD3 & HD4. This is very strait forward.


Why this question :
-------------------
I feel that RAID 1 was intended for fast realtime backup, and when HD1 is
giving huge problems, you can boot from the backup HD. I don't feel that
this system was made for "keeping data online without a second of
interruption"

If feel that RAID 5 was made for "keeping data online without a second of
interruption" and must be seen in that way. So RAID 5 could be seen as a
successor of RAID 1. (Please do not use points a,b,c,d & e as the BUT story)

Also, a hard-disk can go bad in a heartbeat, but can also slowly give some
hints that there is something wrong (sectors going bad on a certain place).
And most of the time a slowly death is what he will do. Am I right that RAID
1 will not give a solution for slowly death but RAID 5 will.

Am I right? Please do not take a,b,c,d & e points into consideration because
they are not the basics for this questions. I want to go to the basics of
RAID 1 and RAID 5 in online system interruption?

I can go 1 step further and say that RAID 0 was a great solution for gaining
bandwidth and with no much effort a backup system could be made and they
named it RAID 1. But no much thinking was done for the backup solution if
you take online data in account that cannot be interrupted.

If I read some articles on RAID 1 then I read a lot of "Then you can restart
from the backup disk". And this is what start me thinking and did a lot of
research on it.

Please, do give me your opinion.

Kind regards,
John.
 
Previously John said:
Dear All,
I've got a question regarding hardware RAID 1 (R1) and RAID 5 (R5)
Don't consider following issues, because they are not playing in this
question:
a) Minimum disk setup and setting up hot spares
b) Write intensive situations or sequential reads situations
c) Dynamic resizing a RAID
d) Same manufacturer and same lot drive issues (this count for both setups)
e) Performance issue when 1 HD goes down.
Look at following diagram for better understanding :
RAID 1 :
--------
HD1 HD2 HD3 HD4
------------------------------------------------
Strip1 Strip2 Copy1 Copy2
Strip3 Strip4 Copy3 Copy4
Strip5 Strip6 Copy5 Copy6
Strip7 Strip8 Copy7 Copy8
HD1 & HD2 are duplicated on HD3 & HD4. If HD1 crashes then the copy is
simply used instead. Recovery is simply; replace HD1 and copy the entire HD3
to it.
RAID 5 :
--------
HD1 HD2 HD3 HD4
------------------------------------------------
Strip1 Strip2 Strip3 Parity1-3
Strip4 Strip5 Parity4-6 Strip6
Strip7 Parity7-9 Strip8 Strip9
Parity10-12 Strip10 Strip11 Strip12
Instead of real duplicating disks like R1, R5 is creating a parity (XOR) and
distributes this round robin wise on all disks.

Question :
scenario RAID 1 : Suppose that strip 1 is written on HD1 and duplicated on
HD3, but there was a bad sector on HD3, so a real sync copy would never
work. When HD1 fails after 1 year, and it's replaced and I restore a copy of
HD3 on it, then my guess is that "Original HD1" and "restored HD1" are never
identical or you have to mark bad sectors that came from HD3 to the new
restored HD1 and still then there is a difference with "Original HD1" and
"restored HD1".
scenario RAID 5 : All this will never happen because there is no identical c
opy of data. If a sector is going bad on HD1 then this will be marked and
data will be written on an other sector on HD1. When HD1 is failing then
removing and inserting a new one will generate automatic new data recovered
from HD2, HD3 & HD4. This is very strait forward.

Forget about defect management. It is a very very rarely used machanism,
most sectors are fine.

Why this question :
-------------------
I feel that RAID 1 was intended for fast realtime backup, and when HD1 is
giving huge problems, you can boot from the backup HD. I don't feel that
this system was made for "keeping data online without a second of
interruption"
If feel that RAID 5 was made for "keeping data online without a second of
interruption" and must be seen in that way. So RAID 5 could be seen as a
successor of RAID 1. (Please do not use points a,b,c,d & e as the BUT story)
Also, a hard-disk can go bad in a heartbeat, but can also slowly give some
hints that there is something wrong (sectors going bad on a certain place).
And most of the time a slowly death is what he will do. Am I right that RAID
1 will not give a solution for slowly death but RAID 5 will.
Am I right? Please do not take a,b,c,d & e points into consideration because
they are not the basics for this questions. I want to go to the basics of
RAID 1 and RAID 5 in online system interruption?
I can go 1 step further and say that RAID 0 was a great solution for gaining
bandwidth and with no much effort a backup system could be made and they
named it RAID 1. But no much thinking was done for the backup solution if
you take online data in account that cannot be interrupted.
If I read some articles on RAID 1 then I read a lot of "Then you can restart
from the backup disk". And this is what start me thinking and did a lot of
research on it.
Please, do give me your opinion.

Actually RAID1 and RAID5 are the same if the RAID1 uses just 2 disks.
If you do RAID1 with 3 disks and RAID5 with 3 disks, you get different
redundancy. An n disk RAID1 can tolerate n-1 disk faulres. An n disk
RAID5 can tolerate a 1 disk failure.

Arno
 
Actually RAID1 and RAID5 are the same if the RAID1 uses just 2 disks.
Dear Arno,

I do not agree when you said that an n disk RAID1 can tolerate n-1 disk
failures.

Look to the chart :
RAID 1 :
--------
HD1 HD2 HD3 HD4
------------------------------------------------
Strip1 Strip2 Copy1 Copy2
Strip3 Strip4 Copy3 Copy4
Strip5 Strip6 Copy5 Copy6
Strip7 Strip8 Copy7 Copy8

If HD1 and HD3 (which is a strip copy of HD1) goes down then your RAID 1 is
broken! This is not the case when HD1 and HD2 goes down in a RAID 1 setup.

Further, RAID1 is based on strip duplication where RAID5 is based on strip's
parity calculation.

And at last, I can predict in most cases a HD-Crash if you look into the
system log (Win2K) and see the read/write disk errors showing up in none
RAID system. At that moment I replace disks. And in most of the cases this
is what is going to happen.

If the OS want to write data to an hardware RAID it will never see bad
sectors because if the OS wants to write a sector then in the background
there will be more sectors written and verified by the RAID controler, so
error handling sits on the RAID controler and must be transparent to the OS
(In a RAID 5 setup there will be also reading of other sectors to
recalculate the parity and writing the new value, it is this calculation
that makes RAID 5 slower in writing data. This is also the why question that
RAID 1 and RAID 5 is totaly different and are NOT the same!).

So, in my opinion the original post and questions are unchanged.

Please do give me your opinions.

Kind regards,
John.
 
Dear Arno,
I do not agree when you said that an n disk RAID1 can tolerate n-1 disk
failures.
Look to the chart :
If HD1 and HD3 (which is a strip copy of HD1) goes down then your RAID 1 is
broken! This is not the case when HD1 and HD2 goes down in a RAID 1 setup.
Further, RAID1 is based on strip duplication where RAID5 is based on strip's
parity calculation.

Oh, but that is not a 4 disk RAID1. It is a 2+2 disk RAID10 or
RAID0+1. A 4 disk RAID1 looks like this:

HD1 HD2 HD3 HD4
Data1 Copy1 Copy2 Copy3

And at last, I can predict in most cases a HD-Crash if you look into the
system log (Win2K) and see the read/write disk errors showing up in none
RAID system. At that moment I replace disks. And in most of the cases this
is what is going to happen.
If the OS want to write data to an hardware RAID it will never see bad
sectors because if the OS wants to write a sector then in the background
there will be more sectors written and verified by the RAID controler, so
error handling sits on the RAID controler and must be transparent to the OS
(In a RAID 5 setup there will be also reading of other sectors to
recalculate the parity and writing the new value, it is this calculation
that makes RAID 5 slower in writing data. This is also the why question that
RAID 1 and RAID 5 is totaly different and are NOT the same!).
So, in my opinion the original post and questions are unchanged.
Please do give me your opinions.

RAID10 only makes sense if you need more than the speed of a
RAID1. But it is very dependent on tha actual system
configuratio whether you get a real speed improvement,
because you have to transport twice that data and swith between
more disks than RAID1.

Arno
 
John said:
RAID 1 :
--------
HD1 HD2 HD3 HD4
------------------------------------------------
Strip1 Strip2 Copy1 Copy2
Strip3 Strip4 Copy3 Copy4
Strip5 Strip6 Copy5 Copy6
Strip7 Strip8 Copy7 Copy8

HD1 & HD2 are duplicated on HD3 & HD4. If HD1 crashes then the copy is
simply used instead. Recovery is simply; replace HD1 and copy the entire HD3
to it.

This is not RAID 1. I'd call it RAID 10
 
Dear All,

I've got a question regarding hardware RAID 1 (R1) and RAID 5 (R5)

Don't consider following issues, because they are not playing in this
question:
a) Minimum disk setup and setting up hot spares
b) Write intensive situations or sequential reads situations
c) Dynamic resizing a RAID
d) Same manufacturer and same lot drive issues (this count for both setups)
e) Performance issue when 1 HD goes down.

Look at following diagram for better understanding :

RAID 1 :
--------
HD1 HD2 HD3 HD4
------------------------------------------------
Strip1 Strip2 Copy1 Copy2
Strip3 Strip4 Copy3 Copy4
Strip5 Strip6 Copy5 Copy6
Strip7 Strip8 Copy7 Copy8

HD1 & HD2 are duplicated on HD3 & HD4. If HD1 crashes then the copy is
simply used instead. Recovery is simply; replace HD1 and copy the entire HD3
to it.

As others pointed out that isn't RAID1. RAID1 involves exactly 2
mirrored disks. This is RAID 0+1, a mirror or stripes.

RAID 5 :
--------
HD1 HD2 HD3 HD4
------------------------------------------------
Strip1 Strip2 Strip3 Parity1-3
Strip4 Strip5 Parity4-6 Strip6
Strip7 Parity7-9 Strip8 Strip9
Parity10-12 Strip10 Strip11 Strip12

Instead of real duplicating disks like R1, R5 is creating a parity (XOR) and
distributes this round robin wise on all disks.


Question :
----------
I feel more secure when data is written to a R5 then it is on a R1. Why?

scenario RAID 1 : Suppose that strip 1 is written on HD1 and duplicated on
HD3, but there was a bad sector on HD3, so a real sync copy would never
work. When HD1 fails after 1 year, and it's replaced and I restore a copy of
HD3 on it, then my guess is that "Original HD1" and "restored HD1" are never
identical or you have to mark bad sectors that came from HD3 to the new
restored HD1 and still then there is a difference with "Original HD1" and
"restored HD1".

That's why better implementations employ protection mechanisms which
should be used. The simplest is to regularly read the disk and
replace data in bad sectors with a good mirrored copy in a different
part of the disk. There are various trade names for this strategy.
IBM, for example, recommends that this be scheduled by their
management software to be done once a week. Some controllers can do
this continuously in the background.
scenario RAID 5 : All this will never happen because there is no identical c
opy of data. If a sector is going bad on HD1 then this will be marked and
data will be written on an other sector on HD1. When HD1 is failing then
removing and inserting a new one will generate automatic new data recovered
from HD2, HD3 & HD4. This is very strait forward.

That is a skewed comparison.

With RAID 0+1 if one of the copies is bad it is simply ignored and
regenerated from the good mirror. With RAID 5 bad data can also be
regenerated by reading the parity block when a CRC error occurs.

But when either 0+1 or 5 are in degraded state they loose the ability
to recover from such errors. In the case of 0+1 it is because it lost
its mirror. In the case of RAID 5 it is because although the location
of the parity blocks are distributed, each one only protects a
specific stripe. It is not "distributed" protection in the sense of
Usenet par files because the system doesn't and can't wait for the
entire volume to be written before it starts generating parity.

Parity is not inherently safer than mirroring. If there is a system
or log failure, unsynchronized or stale parity is possible.
Furthermore sometimes an uncorrectable I/O error during a write to the
disk just prior to failure can result in lost/inaccessible data.
Finally if there is a write error, the presence of parity alone will
not necessarily tell you later on whether the data or ecc data is
suspect.
Why this question :
-------------------
I feel that RAID 1 was intended for fast realtime backup, and when HD1 is
giving huge problems, you can boot from the backup HD. I don't feel that
this system was made for "keeping data online without a second of
interruption"

That may be your feeling but it isn't the reality. RAID 1 isn't a
backup. For one thing as soon as the original file is deleted,
corrupted, or infected the mirror is altered simultaneously. "Backup"
provides a means of recovery from these problems, RAID cannot.

In addition mirrored RAID levels are true availability/fault tolerant
solutions. In fact many ppl see mirrored RAID levels as better for
uptime as they can sustain failure of a maximum of 50% of the disks in
the array (if they are the right disks). It is also simpler, which
has its plusses. RAID 5 can only afford to loose one disk at a time.
It simply cannot "keep data online without a second of interruption"
if a second disk fails before a rebuild completes. RAID 0+1 sometimes
can.
If feel that RAID 5 was made for "keeping data online without a second of
interruption" and must be seen in that way. So RAID 5 could be seen as a
successor of RAID 1. (Please do not use points a,b,c,d & e as the BUT story)

No. ALL mirrored & parity RAID levels are true availability/fault
tolerant solutions

Also remember there is a difference between theoretical RAID 5 and
the reality of the engineering difficulties of a very complex level.
Furthermore, even though you think it isn't relevant, generating
parity can often be slower than mirroring. Longer rebuild = longer
degraded state. When only 1 disk can be down at a time, this makes
RAID 5 worse that RAID 10 or 0+1 for availability. Although it is not
as straightforward as that because you need more of the same sized
disks with RAID 0+1 to create the same useable volume size as RAID 5.

Basically you need a very advanced RAID5 implementation to perform
competitively with a more modest RAID 0+1. But you often need more
disks with 0+1. So the reality is that the decision of which one to
go with has to look at factors a-e, disk size, etc.
Also, a hard-disk can go bad in a heartbeat, but can also slowly give some
hints that there is something wrong (sectors going bad on a certain place).
And most of the time a slowly death is what he will do.

Correct. Often the problem is flakiness i.e. uncorrected IO errors.
That is much harder to deal with. Specific RAID levels are not in
themselves the solution.
Am I right that RAID
1 will not give a solution for slowly death but RAID 5 will.

Not really. Parity levels are not enough for better DAS controllers &
especially higher-end SAN or NAS. Other advanced protection
mechanisms are employed.
Am I right? Please do not take a,b,c,d & e points into consideration because
they are not the basics for this questions. I want to go to the basics of
RAID 1 and RAID 5 in online system interruption?

Not really. You are instead identifying the need for protection
mechanisms that assist different RAID levels in better products.
I can go 1 step further and say that RAID 0 was a great solution for gaining
bandwidth and with no much effort a backup system could be made and they
named it RAID 1.

No. That's named RAID 0+1
But no much thinking was done for the backup solution if
you take online data in account that cannot be interrupted.

No. The whole point of RAID 1, 10, & 0+1 is that the storage volume
is still seamlessly available during disk failure(s). The difference
between them and RAID 3, 4, 5, 6 is that one group mirrors data, the
other utilizes parity. Both have their plusses and minuses. Neither
are foolproof. The reliability and performance of any of these levels
in practice depends much more on the manufacturer, its engineers, and
the sector a product is being designed for than which specific level
is chosen.
If I read some articles on RAID 1 then I read a lot of "Then you can restart
from the backup disk". And this is what start me thinking and did a lot of
research on it.

Sounds like poor wording to me. A RAID 1, 0+1 or 10 system is not
supposed to need to be brought down or restarted in order to recover
from a disk failure/degraded state. Of course they do generally
automatically rebuild on startup if there is a dropped disk.

That being said some ppl do encorporate RAID 1 disks into their backup
strategy. i.e. perform a backup on a RAID 1 array, remove the mirror
disk & take it offsite. Rotate with other disks for subsequent
backups. In event of disaster simply insert mirror disk to restore
the system/rebuild the array. But, as I said before, the system need
not be taken down to move the mirror disk.
Please, do give me your opinion.

Try a more succinct post next time ;)
Kind regards,
John.

Oh how polite :)
 
As others pointed out that isn't RAID1. RAID1 involves exactly 2
mirrored disks. This is RAID 0+1, a mirror or stripes.

Oops. I mean a mirror _of_ stripes
 
George, You did it! This goes really far, and I know now that I made some
huge mistakes. Give me a day to review some of your writings and I will
certainly come back to this.

Thanks a lot!

John.
 
George, You did it! This goes really far, and I know now that I made some
huge mistakes. Give me a day to review some of your writings and I will
certainly come back to this.

A day! It's been 20 minutes since you posed. That's enough. Come on
ricky tick!

Just kidding. It's better if you take your time to look up things,
digest he info, and formulate succinct questions.
Thanks a lot!

No prob.
 
But when either 0+1 or 5 are in degraded state they loose the ability
to recover from such errors. In the case of 0+1 it is because it lost
its mirror. In the case of RAID 5 it is because although the location
of the parity blocks are distributed, each one only protects a
specific stripe. It is not "distributed" protection in the sense of
Usenet par files because the system doesn't and can't wait for the
entire volume to be written before it starts generating parity.

That is a little overstated. There are situations where for either
level a degraded array can recover bad blocks. It just is no longer
100%.

For raid 10 or 0+1 whether it can recover a bad block depends on
whether or not the bad sector occurs on a disk whose mirror has
failed. Only RAID 1 or multidisk (perfect 50%) failure in raid 10 or
0+1 yields a total loss of redundancy.

For raid 5 it depends on whether or not the parity block wasn't on
either the failed disk or sector. If there is a loss on 2 separate
disks, only a parity block can save the day

Take another look a raid diagrams and this will all become clearer.
 
As others pointed out that isn't RAID1. RAID1 involves exactly 2
mirrored disks. This is RAID 0+1, a mirror or stripes.

Untrue. RAID1 involves any number of two or more mirrored disks.

That many implementations can only support two disks, is another
matter. Using more than two disks makes sense if you want to
have disks from several different manufacturers to avoid common
weaknesses. In allmost all cases this is not needed, hence
the common restriction to two disks.

Arno
 
Untrue. RAID1 involves any number of two or more mirrored disks.

That many implementations can only support two disks, is another
matter. Using more than two disks makes sense if you want to
have disks from several different manufacturers to avoid common
weaknesses. In allmost all cases this is not needed, hence
the common restriction to two disks.

Arno

Sort of. The traditional, standard, most widely accepted definition
of RAID1 involves exactly 2 mirrored disks. However there are a
number of vendors that have mirrored RAD I levels which involve more
than 2 disks yet they still call them RAID1. Calling these levels
RAID1 is essentially marketing terminology which attempts to
characterize the sole criteria for RAID1 as mirroring without standard
striping.

That's why, for example, the IBM RAID 1 mutant 1E (1 enhanced) is
called 1E and not, for example 0+1E or something else. The data is
offset (what makes it "enhanced") rather than striped. IIRC There's a
Sun version of this also.

If one accepts the traditional definition of RAID 1, The four disk
RAID 1 you were describing is not truly a 4-disk raid 1. It is a
RAID1 with 2 online spares or failover drives. If you want to call it
simply RAID 1 without mention of online spares because that's how its
described in the manual, that's generally acceptable also.
 
Back
Top