I have a RAID 5 setup with four SCSI disks. Two have failed. Am I right
Me again. Just so it's clear, operator failure among myself and others
in not monitoring the logs was to blame for the initial drive failure
not being spotted immediately. So yes, I'm a muppet, worthy of scorn.
I removed all four drives from the computer in question, attached them
to another linux PC with a known working SCSI card, and ran SeaTools on
them (they are seagate drives). One drive is toast - it's not detected
at boot by the SCSI bios. The other tree pass all the SeaTools
advanced tests (which takes ~ 2 hours to run) without a single failure.
Now when I replace the four drives in the original PC, three of the
drives (sda sdb sdc below) are detected. Can't RAID5 on four drives
operate in degraded mode with three working drives?
There's something going wrong to do with superblocks, but I'm not sure
what.
Finally one more tidbit. This system is weird in that there is no
command line interface - everything is done through a proprietary web
interface, and thus some instructions I'm reading about on google (i.e.
mdadm) are not an option. Is it possible to remove the three working
drives and attach them to another linux PC with a hardware RAID card
while keeping any data in the (degraded) RAID 5 intact?
Startup messages follow. It is md0 I am concerned with.
I'm sorry for the length, but I want to include everything in case
something is important
<4> Oct 30 16:52:52 kernel: scsi : 1 host.
<3> Oct 30 16:52:52 kernel: Vendor: SEAGATE Model: ST1181677LCV
Rev: 0002
<4> Oct 30 16:52:52 kernel: Type: Direct-Access
ANSI SCSI revision: 03
<3> Oct 30 16:52:52 kernel: Detected scsi disk sda at scsi0, channel 0,
id 0, lun 0
<3> Oct 30 16:52:52 kernel: Vendor: SEAGATE Model: ST1181677LCV
Rev: 0002
<4> Oct 30 16:52:52 kernel: Type: Direct-Access
ANSI SCSI revision: 03
<3> Oct 30 16:52:52 kernel: Detected scsi disk sdb at scsi0, channel 0,
id 1, lun 0
<3> Oct 30 16:52:52 kernel: Vendor: SEAGATE Model: ST1181677LCV
Rev: 0002
<4> Oct 30 16:52:52 kernel: Type: Direct-Access
ANSI SCSI revision: 03
<3> Oct 30 16:52:52 kernel: Detected scsi disk sdc at scsi0, channel 0,
id 2, lun 0
<3> Oct 30 16:52:52 kernel: Vendor: SEAGATE Model: ST1181677LCV
Rev: 0002
<4> Oct 30 16:52:52 kernel: Type: Direct-Access
ANSI SCSI revision: 03
<3> Oct 30 16:52:52 kernel: Detected scsi disk sdd at scsi0, channel 0,
id 4, lun 0
<3> Oct 30 16:52:52 kernel: Vendor: SEAGATE Model: ST1181677LCV
Rev: 0002
<4> Oct 30 16:52:52 kernel: Type: Direct-Access
ANSI SCSI revision: 03
<3> Oct 30 16:52:52 kernel: Detected scsi disk sde at scsi0, channel 0,
id 5, lun 0
<3> Oct 30 16:52:52 kernel: Vendor: SEAGATE Model: ST1181677LCV
Rev: 0002
<4> Oct 30 16:52:52 kernel: Type: Direct-Access
ANSI SCSI revision: 03
<3> Oct 30 16:52:52 kernel: Detected scsi disk sdf at scsi0, channel 0,
id 6, lun 0
<3> Oct 30 16:52:52 kernel: Vendor: SEAGATE Model: ST1181677LCV
Rev: 0002
<4> Oct 30 16:52:52 kernel: Type: Direct-Access
ANSI SCSI revision: 03
<3> Oct 30 16:52:52 kernel: Detected scsi disk sdg at scsi0, channel 0,
id 15, lun 0
<4> Oct 30 16:52:52 kernel: scsi : detected 7 SCSI disks total.
<6> Oct 30 16:52:52 kernel: sym53c875E-0-<0,*>: FAST-20 WIDE SCSI 40.0
MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 kernel: SCSI device sda: hdwr sector= 512 bytes.
Sectors= 354600001 [173144 MB] [173.1 GB]
<4> Oct 30 16:52:52 kernel: sdb: Spinning up
disk...<6>sym53c875E-0-<1,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns,
offset 16)
<4> Oct 30 16:52:52 kernel: ..<6>sym53c875E-0-<1,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 kernel: .<6>sym53c875E-0-<1,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 kernel: ready
<4> Oct 30 16:52:52 kernel: SCSI device sdb: hdwr sector= 512 bytes.
Sectors= 354600001 [173144 MB] [173.1 GB]
<4> Oct 30 16:52:52 kernel: sdc: Spinning up
disk...<6>sym53c875E-0-<2,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns,
offset 16)
<4> Oct 30 16:52:52 kernel: ..<6>sym53c875E-0-<2,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 kernel: .<6>sym53c875E-0-<2,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 last message repeated 15 times
<4> Oct 30 16:52:52 kernel: ready
<4> Oct 30 16:52:52 kernel: SCSI device sdc: hdwr sector= 512 bytes.
Sectors= 354600001 [173144 MB] [173.1 GB]
<4> Oct 30 16:52:52 kernel: sdd: Spinning up
disk...<6>sym53c875E-0-<4,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns,
offset 16)
<4> Oct 30 16:52:52 kernel: ..<6>sym53c875E-0-<4,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 kernel: .<6>sym53c875E-0-<4,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 last message repeated 16 times
<4> Oct 30 16:52:52 kernel: ready
<4> Oct 30 16:52:52 kernel: SCSI device sdd: hdwr sector= 512 bytes.
Sectors= 354600001 [173144 MB] [173.1 GB]
<4> Oct 30 16:52:52 kernel: sde: Spinning up
disk...<6>sym53c875E-0-<5,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns,
offset 16)
<4> Oct 30 16:52:52 kernel: ..<6>sym53c875E-0-<5,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 kernel: .<6>sym53c875E-0-<5,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 last message repeated 15 times
<4> Oct 30 16:52:52 kernel: ready
<4> Oct 30 16:52:52 kernel: SCSI device sde: hdwr sector= 512 bytes.
Sectors= 354600001 [173144 MB] [173.1 GB]
<4> Oct 30 16:52:52 kernel: sdf: Spinning up
disk...<6>sym53c875E-0-<6,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns,
offset 16)
<4> Oct 30 16:52:52 kernel: ..<6>sym53c875E-0-<6,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 kernel: .<6>sym53c875E-0-<6,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 last message repeated 15 times
<4> Oct 30 16:52:52 kernel: ready
<4> Oct 30 16:52:52 kernel: SCSI device sdf: hdwr sector= 512 bytes.
Sectors= 354600001 [173144 MB] [173.1 GB]
<4> Oct 30 16:52:52 kernel: sdg: Spinning up
disk...<6>sym53c875E-0-<15,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns,
offset 16)
<4> Oct 30 16:52:52 kernel: ..<6>sym53c875E-0-<15,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 kernel: .<6>sym53c875E-0-<15,*>: FAST-20 WIDE SCSI
40.0 MB/s (50 ns, offset 16)
<4> Oct 30 16:52:52 last message repeated 16 times
<4> Oct 30 16:52:52 kernel: ready
<4> Oct 30 16:52:52 kernel: SCSI device sdg: hdwr sector= 512 bytes.
Sectors= 354600001 [173144 MB] [173.1 GB]
<6> Oct 30 16:52:52 kernel: Intel(R) PRO/1000 Network Driver - version
4.3.15
<6> Oct 30 16:52:52 kernel: Copyright (c) 1999-2002 Intel Corporation.
<4> Oct 30 16:52:52 kernel: NIC: Adding device 12098086
<4> Oct 30 16:52:52 kernel: PCI latency timer (CFLT) is unreasonably
low at 0. Setting to 32 clocks.
<4> Oct 30 16:52:52 kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
<4> Oct 30 16:52:52 kernel: eepro100.c: $Revision: 1.26 $ 2000/05/31
Modified by Andrey V. Savochkin and others (cb,rk)
<6> Oct 30 16:52:52 kernel: eth0: Intel PCI EtherExpress Pro100
82559ER, 00:80:A1:42:F1:E3, IRQ 10.
<6> Oct 30 16:52:52 kernel: Board assembly 000000-000, Physical
connectors present: RJ45
<6> Oct 30 16:52:52 kernel: Primary interface chip i82555 PHY #1.
<6> Oct 30 16:52:52 kernel: General self-test: passed.
<6> Oct 30 16:52:52 kernel: Serial sub-system self-test: passed.
<6> Oct 30 16:52:52 kernel: Internal registers self-test: passed.
<6> Oct 30 16:52:52 kernel: ROM checksum self-test: passed
(0xdbd8681d).
<6> Oct 30 16:52:52 kernel: Receiver lock-up workaround activated.
<4> Oct 30 16:52:52 kernel: eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
<4> Oct 30 16:52:52 kernel: eepro100.c: $Revision: 1.26 $ 2000/05/31
Modified by Andrey V. Savochkin and others (cb,rk)
<6> Oct 30 16:52:52 kernel: Partition check:
<6> Oct 30 16:52:52 kernel: sda:msdos_partition magic chk 55 aa
<4> Oct 30 16:52:52 kernel: sda1 sda2
<6> Oct 30 16:52:52 kernel: sdb:SCSI disk error : host 0 channel 0 id
1 lun 0 return code = 28000002
<4> Oct 30 16:52:52 kernel: Info fld=0x0, Current sd08:10: sns = f0 4
<4> Oct 30 16:52:52 kernel: ASC=15 ASCQ= 1
<4> Oct 30 16:52:52 kernel: Raw sense data:0xf0 0x00 0x04 0x00 0x00
0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x15 0x01 0x01 0x00
<4> Oct 30 16:52:52 kernel: scsidisk I/O error: dev 08:10, sector 0
<4> Oct 30 16:52:52 kernel: unable to read partition table
<6> Oct 30 16:52:52 kernel: sdc:msdos_partition magic chk 55 aa
<4> Oct 30 16:52:52 kernel: sdc1 sdc2
<6> Oct 30 16:52:52 kernel: sdd:msdos_partition magic chk 55 aa
<4> Oct 30 16:52:52 kernel: sdd1 sdd2
<6> Oct 30 16:52:52 kernel: sde:msdos_partition magic chk 55 aa
<4> Oct 30 16:52:52 kernel: sde1 sde2
<6> Oct 30 16:52:52 kernel: sdf:msdos_partition magic chk 55 aa
<4> Oct 30 16:52:52 kernel: sdf1 sdf2
<6> Oct 30 16:52:52 kernel: sdg:msdos_partition magic chk 55 aa
<4> Oct 30 16:52:52 kernel: sdg1 sdg2
<4> Oct 30 16:52:52 kernel: md.c: sizeof(mdp_super_t) = 4096
<5> Oct 30 16:52:52 kernel: RAMDISK: Compressed image found at block 0
<6> Oct 30 16:52:52 kernel: autodetecting RAID arrays
<4> Oct 30 16:52:52 kernel: (read) sda2's sb offset: 177220928 [events:
00000019]
<4> Oct 30 16:52:52 kernel: (read) sdc2's sb offset: 177220928 [events:
0000001a]
<4> Oct 30 16:52:52 kernel: (read) sdd2's sb offset: 177220928 [events:
0000001f]
<4> Oct 30 16:52:52 kernel: (read) sde2's sb offset: 177220928 [events:
0000001f]
<4> Oct 30 16:52:52 kernel: (read) sdf2's sb offset: 177220928 [events:
0000001f]
<4> Oct 30 16:52:52 kernel: (read) sdg2's sb offset: 177220928 [events:
0000001f]
<4> Oct 30 16:52:52 kernel: autorun ...
<4> Oct 30 16:52:52 kernel: considering sdg2 ...
<4> Oct 30 16:52:52 kernel: adding sdg2 ...
<4> Oct 30 16:52:52 kernel: adding sdf2 ...
<4> Oct 30 16:52:52 kernel: adding sde2 ...
<4> Oct 30 16:52:52 kernel: adding sdd2 ...
<4> Oct 30 16:52:52 kernel: created md1
<4> Oct 30 16:52:52 kernel: bind
<4> Oct 30 16:52:52 kernel: bind
<4> Oct 30 16:52:52 kernel: bind
<4> Oct 30 16:52:52 kernel: bind
<4> Oct 30 16:52:52 kernel: running:
<4> Oct 30 16:52:52 kernel: now!
<4> Oct 30 16:52:52 kernel: sdg2's event counter: 0000001f
<4> Oct 30 16:52:52 kernel: sdf2's event counter: 0000001f
<4> Oct 30 16:52:52 kernel: sde2's event counter: 0000001f
<4> Oct 30 16:52:52 kernel: sdd2's event counter: 0000001f
<4> Oct 30 16:52:52 kernel: md: device name has changed from sdf2 to
sdg2 since last import!
<4> Oct 30 16:52:52 kernel: md: device name has changed from sde2 to
sdf2 since last import!
<4> Oct 30 16:52:52 kernel: md: device name has changed from sdd2 to
sde2 since last import!
<4> Oct 30 16:52:52 kernel: md: device name has changed from sdc2 to
sdd2 since last import!
<6> Oct 30 16:52:52 kernel: md1: max total readahead window set to 384k
<6> Oct 30 16:52:52 kernel: md1: 3 data-disks, max readahead per
data-disk: 128k
<6> Oct 30 16:52:52 kernel: raid5: device sdg2 operational as raid disk
3
<6> Oct 30 16:52:59 /bin/cron[65]: (CRON) STARTUP (fork ok)
<6> Oct 30 16:52:52 kernel: raid5: device sdf2 operational as raid disk
2
<6> Oct 30 16:52:52 kernel: raid5: device sde2 operational as raid disk
1
<6> Oct 30 16:52:52 kernel: raid5: device sdd2 operational as raid disk
0
<6> Oct 30 16:52:52 kernel: raid5: allocated 4293kB for md1
<4> Oct 30 16:52:52 kernel: raid5: raid level 5 set md1 active with 4
out of 4 devices, algorithm 0
<4> Oct 30 16:52:52 kernel: RAID5 conf printout:
<4> Oct 30 16:52:52 kernel: --- rd:4 wd:4 fd:0
<4> Oct 30 16:52:52 kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdd2
<4> Oct 30 16:52:52 kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sde2
<4> Oct 30 16:52:52 kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdf2
<4> Oct 30 16:52:52 kernel: disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sdg2
<4> Oct 30 16:52:52 kernel: RAID5 conf printout:
<4> Oct 30 16:52:52 kernel: --- rd:4 wd:4 fd:0
<4> Oct 30 16:52:52 kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdd2
<4> Oct 30 16:52:52 kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sde2
<4> Oct 30 16:52:52 kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdf2
<4> Oct 30 16:52:52 kernel: disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sdg2
<6> Oct 30 16:52:52 kernel: md: updating md1 RAID superblock on device
<4> Oct 30 16:52:52 kernel: sdg2 [events: 00000020](write) sdg2's sb
offset: 177220928
<4> Oct 30 16:52:52 kernel: sdf2 [events: 00000020](write) sdf2's sb
offset: 177220928
<4> Oct 30 16:52:52 kernel: sde2 [events: 00000020](write) sde2's sb
offset: 177220928
<4> Oct 30 16:52:52 kernel: sdd2 [events: 00000020](write) sdd2's sb
offset: 177220928
<4> Oct 30 16:52:52 kernel: .
<4> Oct 30 16:52:52 kernel: considering sdc2 ...
<4> Oct 30 16:52:52 kernel: adding sdc2 ...
<4> Oct 30 16:52:52 kernel: adding sda2 ...
<4> Oct 30 16:52:52 kernel: created md0
<4> Oct 30 16:52:52 kernel: bind
<4> Oct 30 16:52:52 kernel: bind
<4> Oct 30 16:52:52 kernel: running:
<4> Oct 30 16:52:52 kernel: now!
<4> Oct 30 16:52:52 kernel: sdc2's event counter: 0000001a
<4> Oct 30 16:52:52 kernel: sda2's event counter: 00000019
<3> Oct 30 16:52:52 kernel: md: superblock update time inconsistency --
using the most recent one
<4> Oct 30 16:52:52 kernel: freshest: sdc2
<4> Oct 30 16:52:52 kernel: md0: kicking faulty sda2!
<4> Oct 30 16:52:52 kernel: unbind
<4> Oct 30 16:52:52 kernel: export_rdev(sda2)
<4> Oct 30 16:52:52 kernel: md0: former device sdb2 is unavailable,
removing from array!
<3> Oct 30 16:52:52 kernel: md: md0: raid array is not clean --
starting background reconstruction
<6> Oct 30 16:52:52 kernel: md0: max total readahead window set to 384k
<6> Oct 30 16:52:52 kernel: md0: 3 data-disks, max readahead per
data-disk: 128k
<6> Oct 30 16:52:52 kernel: raid5: device sdc2 operational as raid disk
2
<3> Oct 30 16:52:52 kernel: raid5: not enough operational devices for
md0 (3/4 failed)
<4> Oct 30 16:52:52 kernel: RAID5 conf printout:
<4> Oct 30 16:52:52 kernel: --- rd:4 wd:1 fd:3
<4> Oct 30 16:52:52 kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc2
<1> Oct 30 16:52:52 kernel: raid5: failed to run raid set md0
<4> Oct 30 16:52:52 kernel: pers->run() failed ...
<4> Oct 30 16:52:52 kernel: do_md_run() returned -22
<4> Oct 30 16:52:52 kernel: unbind
<4> Oct 30 16:52:52 kernel: export_rdev(sdc2)
<6> Oct 30 16:52:52 kernel: md0 stopped.
<4> Oct 30 16:52:52 kernel: ... autorun DONE.