*eguy said:
I'm still not clear with your explanation above. The system is able to write
data to the remaining two disks. Is the parity chunk included? It used to
write A1 to disk 1, A2 to disk 2, and Ap to disk 3. So there is no disk 3
any more. What will the system do with this situation. I think I still think
in a wrong way.
Your description is RAID 4. Both RAID 4 and RAID 5 work similarly (splitting
data into chunks and calculating a parity chunk). RAID 5 intersperses parity
information while RAID 4 has a dedicated parity drive. RAID 5 operates in
"round robin" fashion. RAID 4 does not. Using your nomenclature, A1 would go
to disk 1 some of the time, to disk 2 some of the time, and to disk 3 some
of the time. In a 3-drive RAID 5 array, it would look something like:
A1 A2 Ap
B2 Bp B1
Cp C1 C2
D1 D2 Dp
Comparied to RAID 4 which would look something like:
A1 A2 Ap
B1 B2 Bp
C1 C2 Cp
D1 D2 Dp
The parity information still gets calculated, and RAID 5 still intersperses
the chunks, but drops the chunk that would have been placed on the failed
drive (sometimes 1, sometimes 2, and sometimes p).
The parity calculation uses XOR (exclusive OR). In binary math, comparing OR
to XOR:
1 XOR 1 = 0
1 XOR 0 = 1
0 XOR 0 = 0
1 OR 1 = 1
1 OR 0 = 1
0 OR 0 = 0
Given the data split into 2 chunks:
100 010
The parity chunk would be:
110
If you lose the 2nd chunk (010 in this example), using the XOR function on
the surviving data recalculates the missing information.
100 XOR 110 = 010
This is why RAID 5 can survive a single disk failure. If RAID 5 were to
suffer a multidisk failure, not enough information would survive to
recalculate the missing information. The RAID recovery procedure ends up
being relatively simple math. Users and SysAdmins aren't even aware that a
drive failure took place because the system is still running and the
information is still accessible. Using hot swap or hot spare drives, the
SysAdmin can even perform the recovery while the system is in use (with a
noticable and understandable performance degradation).