Recovering from disconnected raid0 volume on P5GDC-V deluxe

  • Thread starter Thread starter Don
  • Start date Start date
D

Don

Hi,

I have a Windows 2003 Server setup on this mobo using 2 Maxtor 300GB
drives. The drives are partitioned ( using Intel Matrix Raid ) into 2
volumes, volume 1 is raid 1 and volume 2 is raid 0. OS is on Raid 1
volume.

I accidentially disconnected one of the sata cable of a matrix raid
volume when I was swapping case last night. I didn't even know until
couple boots into W2K3 Server ( I knew something's wrong, just didn't
realized cable was disconnected ). Reconnect the cable and ctl-I into
intel matrix raid bios setup, it found the disconnected drive and and
I answer "Y" to re-add the drive into the array. The status of raid 0
is Normal and raid1 "needs to be rebuild in OS". I then boot into W2K3
( it took like 30min instead of 2 or 3 btw ), and the Raid 0 ( which
was assigned as drive G ) disappeared !! When looking at Disk
Management the Raid 0 volume is there, but "not initialized" and the
disk initialization wizard automatically fires up. At this point I am
very reluctant to go ahead with the initialization as I am afraid of
data damage.

Anybody knows what does this "disk initialization wizard" do to the
volume? Would it wipe out the data on the drive? It almost sounds
like the couple of boots into the OS when the raid volume is
disconnected has fool the OS to think that the Volume is gone. Is
there a way to copy the Raid 0 stuff to somewhere?

Thanks
 
Hi,

I have a Windows 2003 Server setup on this mobo using 2 Maxtor 300GB
drives. The drives are partitioned ( using Intel Matrix Raid ) into 2
volumes, volume 1 is raid 1 and volume 2 is raid 0. OS is on Raid 1
volume.

I accidentially disconnected one of the sata cable of a matrix raid
volume when I was swapping case last night. I didn't even know until
couple boots into W2K3 Server ( I knew something's wrong, just didn't
realized cable was disconnected ). Reconnect the cable and ctl-I into
intel matrix raid bios setup, it found the disconnected drive and and
I answer "Y" to re-add the drive into the array. The status of raid 0
is Normal and raid1 "needs to be rebuild in OS". I then boot into W2K3
( it took like 30min instead of 2 or 3 btw ), and the Raid 0 ( which
was assigned as drive G ) disappeared !! When looking at Disk
Management the Raid 0 volume is there, but "not initialized" and the
disk initialization wizard automatically fires up. At this point I am
very reluctant to go ahead with the initialization as I am afraid of
data damage.

Anybody knows what does this "disk initialization wizard" do to the
volume? Would it wipe out the data on the drive? It almost sounds
like the couple of boots into the OS when the raid volume is
disconnected has fool the OS to think that the Volume is gone. Is
there a way to copy the Raid 0 stuff to somewhere?

Thanks

Hmmm. Smells like the answer to this problem is going to cost
you money.

RAID disks use a "reserved sector". On that sector is written info
about which array(s) the disk is part of, and whether the array
is operational or damaged.

If you boot a system, and the RAID firmware detects the array is
broken, the firmware can write the array status into any "reserved
sectors" it still has access to. It is all downhill from there.

RAID 0 is the "striped" one. When you "make" a RAID 0 array, there
is no work for the OS to do, as the information on the disks is not
redundant. The "stripe" size, determine how the data alternates
between drives, and knowing the stripe would be necessary to
extract that data in a recovery operation.

If you alternately "delete" and "make" a RAID 0 array, there is
no reason for your data to be damaged. As long as the same stripe
size is used, everything should be in the same relative position
it was in before the "delete" and "make". If the partitions are
no longer readable/visible, it implies something has happened
to make the info at the beginning of the disk unreadable. Getting
the stripe size wrong will do that.

RAID 1 is a bit different, as there is a need to maintain a
redundancy relationship between the disks. If you start a RAID 1
array, with one disk disconnected, the RAID firmware/software is
going to immediately write the reserved sector, to record the
fact that the two disks are no longer identical copies. When
the RAID 1 array is reestablished, one disk will be used as the
master copy, and it will be copied to the other disk. Obviously,
getting that detail right would be pretty important. And since
the reserved sector records which disk went missing, the disk
that was always available to the system is a known quantity.

So, the question is, is there a software program that knows
what an MBR is, and how partition info is stored, that can
figure it out even if a RAID stripe is involved, then fix it ?

To start with, do a bit-by-bit copy of each drive in the
array, to a backup drive. Now, the problem is, there will be
a ton of tools mentioned in any web search, but you'll have
no way of knowing how good they are. Try to find people who
don't have an interest in selling you something, to find the
best tool for the job. The purpose of making an image of
each of the two drives, is in case any other tools or
methods, fail to work properly. The tool should copy the
"reserved sector" as well as the rest of the data (I don't
know where the reserved sector is located!).

http://einstein.cs.uri.edu/courses/fall2003/hpr108Bs1/diskimaging.html
http://www.cftt.nist.gov/disk_imaging.htm

I'll assume you didn't mess with the stripe size in either the
original setup or when you glued the array back together. I
really don't see a reason for the data to be unreadable on a
RAID 0, unless some detail like that has changed.

Good luck,
Paul
 
Well, I would have just plugged the cable back in.
Sounds like you have to do a restore.

With 2 HDD, personally I think the config you have was ummmm suicide in the
first place.

Logic:

No one posts about Matrix (1 that I recall) ==> its either perfect, or no
one is using it. Some how I think it is more probable that people are going
for simple configs rather than using matrix capabilities (IE 1 x raid, plain
SATA).

The combination of a RAID 0 volume *and* a RAID 1 volume on only 2 HDD's
increases complexity and so increases probability of s/w or firmware
failures.

RAID 0 is not recommended as loss of either volume = loss of all that is on
the RAID volume. There are limited uses for RAID 0 where it actually
benefits the user.

It is very important when using RAID to know in advance what to do in
failure scenarios. If you have to rebuild and decide to still use RAID 0,
then you can expect only 1 thing with RAID 0 ==> Loss of everything on the
volume if you don't handle the failure correctly (EG plugging the driver
straight back in without doing anything else). So try the failures - unplug
one or other drive (this is fine for RAID 1) and learn how the rebuild
process actually works.

I hpe you have a good backup system...

As is said by many RAID is no substitute for backups.

HTH
 
First of all, thank you both Paul and Mercury for taking time to
respond to my post.

I got everything back last night. Learn a whole lot more of this
Matrix raid stuff.

I can now say in "normal" situration, if you have fire up the OS after
one drive disconnected i.e. making your raid1 out of sync, when fire
up the OS after you re-connected the cable, the system will appeared
as hang and hdd light is on constantly. As it is basically a software
raid, the utility is hard at work trying to re-sync the raid1 disk.
The confusing part of this process is the screen will stay at the
Window XP logo for up to an hour without any message, depends on the
size of your raid1. In my case one of the sata cable is loose even
after I "re-connected". So at some point during the recovery process
the OS thinks that one of hd, although present, is bad. Making it show
up on the Disk Management but not "initialized". So what I did is
change both sata cables, and let it sit for an hour and finally logon
screen appears again and everything is back to "healthy" again.

Was trying to save few bucks and use Raid 0 as interim solution for
couple of months, never again!!!
 
I'd keep any eye out for bios (and driver) updates for the mobo as the
symptoms you describe indicate there is work for the Intel people to do in
getting their firmware to a desireable state - it should give sensible
feedback and let you know it is rebuilding with an "I'm still alive and
working" indicator.

This may take a while, but Intel is usually good on this front. The RAID on
the ICH5R for example is quite mature now and the biggest issue I suspect is
the SATA cables. My RAID 1 has failed twice now and twice the failing HDD
has checked 100% - same one. Second time the supplier refused to replace the
drive stating it had no flaws in any way. I had checked the drive with
seatools (seagate drive diagnostics) and it was OK, but the SMART indicated
some "error" records which I assumed would mean something...

Does anyone have a handy hint for better securing SATA connectors? I am
loathe to use sticky tape, but anything that won't affect the motherboard,
is detachable and will improve the connection is surely worth looking at.
Mind you, this is only a suspicion - I wish I had evidence that the
connectors are intermittent.

- Tim (Mercury on my laptop).
 
First of all, thank you both Paul and Mercury for taking time to
respond to my post.

I got everything back last night. Learn a whole lot more of this
Matrix raid stuff.

I can now say in "normal" situration, if you have fire up the OS after
one drive disconnected i.e. making your raid1 out of sync, when fire
up the OS after you re-connected the cable, the system will appeared
as hang and hdd light is on constantly. As it is basically a software
raid, the utility is hard at work trying to re-sync the raid1 disk.
The confusing part of this process is the screen will stay at the
Window XP logo for up to an hour without any message, depends on the
size of your raid1. In my case one of the sata cable is loose even
after I "re-connected". So at some point during the recovery process
the OS thinks that one of hd, although present, is bad. Making it show
up on the Disk Management but not "initialized". So what I did is
change both sata cables, and let it sit for an hour and finally logon
screen appears again and everything is back to "healthy" again.

Was trying to save few bucks and use Raid 0 as interim solution for
couple of months, never again!!!

So, with respect to the RAID 0, when it found the other half of
the array, nothing was required to get it to run again ? Maybe
that is what section 23.1.1 of the first manual is referring to,
when it says "repairable failure detected" dialog ?

The ICH6R RAID manual is here. Troubleshooting starts on Pg.86
ftp://download.intel.com/support/chipsets/imst/sb/manual45_oem.pdf

(ICH7R RAID manual is here, for those with the later boards.)
ftp://download.intel.com/support/chipsets/imsm/sb/manual50_oem.pdf

Paul
 
Actually, after I re-connect a "good" cable to each drive and boot up
the pc 1st time, when I ctrl-i into the intel bios, it pop up with a
msgbox saying something like "the missing drive is found, is it the
original drive (y/n/i). I answer y and my raid1 is now show rebuild(
or degrade, can't remember exactly, but the footnote on it says I need
to boot into OS to rebuilt/resync it ), but the raid0 part shows
perfectly normal and sure enough, everything is there when the
recovery is done. I guess it make sense because raid0 volume was never
attempted change ( by OS ) as operating system treats it as missing
drive. The raid1 volume, on the other hand, keeps on going because
that's the whole reason to get raid1. In fact, before I re-connect the
bad cable, the OS is booting up without any problem, it only shows the
raid0 volume is missing.

All this time I don't recall seeing "repairable failure detected"
dialog, from both BIOS or OS. The re-sync of raid1 happens without any
dialog box or anything ( and that's why I thought the system was hang,
as it just stays at the boot up windows logo for an hr or so !! ).

Btw, I am not using the intel raid driver from the cd, instead, I
downloaded a fairly recent verion of it from intel site couple months
ago when I start setting up the server. Maybe there is change of
bahaviour between those two drivers??

Also, I restarted the pc today just to be sure everything is ok. It
boot up within a couple of minutes, as it should.
 
Hi Tim:

I use the red sata cables that comes with asus mobos ( I have few asus
mobo so I also have few sata cables ). It appears in my case that it
was always at the hd end that's coming off. Anyway, I know there is a
more secure version of cable out there like the one below but I've
never tried them yet since I already got bunch of freebie from the
mobo package. Maybe I should get one and see how it works.

http://www.zipzoomfly.com/jsp/ProductDetail.jsp?ProductCode=316604
 
Hi Tim:

I use the red sata cables that comes with asus mobos ( I have few asus
mobo so I also have few sata cables ). It appears in my case that it
was always at the hd end that's coming off. Anyway, I know there is a
more secure version of cable out there like the one below but I've
never tried them yet since I already got bunch of freebie from the
mobo package. Maybe I should get one and see how it works.

http://www.zipzoomfly.com/jsp/ProductDetail.jsp?ProductCode=316604

The SecureConnect documentation is here. It looks like it fits in
two hole that are only on their disk drives.

http://www.wdc.com/en/library/sata/2579-001075.pdf

I think I saw some other brand of disk drive (Hitachi) advertising
a "latching" connector at the disk drive end.

For an excellent collection of photos of SATA connectors, try
this page. There is a latching cable shown for the motherboard
end, but I don't know if there is a requirement for the motherboard
connector to have a full plastic "box" around it, for such a
connector to work or not.

http://store.yahoo.com/cooldrives/saiandsaiiin.html

OK. Go to http://www.molex.com , enter "SATA" in the left hand
search box, click "GO", then click "Serial ATA Overview" at the
bottom of the returned page.

"Locking Latches

Locking Latch options available.

Locking latches for signal cables have recently been approved
by the Serial ATA Working Group. Positive locking latches ensure
cables will stay connected... 

Locking PCB signal plugs are also available in both vertical
and right angle versions and are backwards compatible. The extra
plastic shroud used by the cable locking mechanism ensures the
signal plugs are more robust than the original SATA design."

If a motherboard is modern enough, the newer SATA connectors with
the plastic "box" around the connector tab, might be what is needed
to make a "latching" cable to work well. Not really sure whether
the latch would work with the old style motherboard connector or
not.

Paul
 
Thanks for the links = no current solution for me...
sticky tape, blue tack, chocolate fudge, superglue, drill a hole thru the
mobo & tie it on maybe :)

I use the red cables too. What a blunder...

I'll keep an eye for the latching sockets.

- tim
 
Back
Top