RAID 5 - Is Dell BSing me?

  • Thread starter Thread starter Grandma Wilkerson
  • Start date Start date
G

Grandma Wilkerson

Hi,

I'm running Windows2000 on a Dell 2450. I have 4 identical harddrives
configured for RAID 5. It was shipped this way from the factory. Today the
amber warning light on drive #2 starts blinking indicating a problem with
that drive. The server, however, is still running just fine. The data is in
tact. This is what I would have expected with RAID 5 -- one disk goes down
and the system stays up. That's the whole point of RAID, or so I thought.

I called Dell, hoping that they'd send out a replacement drive [I have a
4hr service contract with them]. The tech wanted me to jump through a series
of hoops [take the drive out and "re-seat" it, run a rebuild via a
SCSISelect utility, etc.]. The SCSISelect utility showed that there is a
problem with drive #2 which I expected. The tech had me do a rebuild anyway.
The rebuild appears to work at first, with drive #2 happily writing data --
but then at 20%, the rebuild fails. This came as no surprise to me because I
figured the drive was bad.... but here's what the tech said that confuses
me:

He says that because the rebuild failed at 20%, it would be pointless to
send me a new drive. He says that I need to delete the container and
re-create it which means I'd lose ALL of the data. I should then reinstall
Windows, restore my data from tape, etc. This seems crazy. I thought the
whole point of RAID-5 is that my data is safe if only one drive fails --
and in fact my data IS safe! The server is running just FINE on only 3
drives. All I want is to do is replace drive#2 so that I have my redundancy
again... but he's telling me that replacing drive #2 won't help and that I
need to backup everything, recreate the container, rebuild, and then restore
everything... and I didn't understand his explanation as to WHY I had to do
this. Something to do with "parity" errors? He says he's seen this problem
"100 times." I don't understand. This seems crazy. The data is still OK. It
seems like I should be able to just stick a new drive in there, and the RAID
5 array should automatically rebuild itself. Help!
 
Grandma Wilkerson said:
He says that because the rebuild failed at 20%, it would be
pointless to send me a new drive. He says that I need to delete the
container and re-create it which means I'd lose ALL of the data. I
should then reinstall Windows, restore my data from tape, etc. This
seems crazy. I thought the whole point of RAID-5 is that my data is
safe if only one drive fails -- and in fact my data IS safe! The
server is running just FINE on only 3 drives. All I want is to do is
replace drive#2 so that I have my redundancy again... but he's
telling me that replacing drive #2 won't help and that I need to
backup everything, recreate the container, rebuild, and then restore
everything... and I didn't understand his explanation as to WHY I had
to do this. Something to do with "parity" errors? He says he's seen
this problem "100 times." I don't understand. This seems crazy. The
data is still OK. It seems like I should be able to just stick a new
drive in there, and the RAID 5 array should automatically rebuild
itself. Help!

Ask to speak to a level 2 technician. That tech is fos.
 
I have spoken to a colleague of mine who is an MCP, MCSE
and an MCT and he tells m ethat the Dell Tech is talking
out of his backside.

If you replace the drive and give it the rebuild command,
it uses the parity checks to regenertate the data !

So this carp about the parity errors is totally BS !

Its the parity checks that is keeping your data alive !!!
 
Granny:

Wow that is/was a bad experience !
I would have asked to speak with a supervisor and escalated the issue right
away.

I know what it is like. Trying to PROVE a drive is bad. I have, over the
years, come up with a methodology to deal with this kind of problem.

Number one, you should have spare hard disks on hand for servers. You can't
listen to Dell or any other vendor on what to do when the most important concept
is the data and the users of said server.

When Under Vendor Warranty -- At any notification of a problem with the hard
disk, I install the spare, ASAP ! Then I can deal with the suspect drive in an
off-line fashion. I will then install the suspect drive on a test jig. I then
use the hard drive manufacturer's diagnostic. Here are a few...
IBM/Hitachi - Drive Fitness Test (DFT)
Western Digital - DLGDIAG
Seagate - SeaTools
Once I have an error code from the manufacturer's diagnostic I call the vendor
( i.e., Dell) and I provide the technician with the error code. This is the
irrefutable proof of a hard disk problem and I *always* get the replacement
disk.

When No-Longer Under Vendor Warranty -- At any notification of a problem with
the hard disk, I install the spare. I have a vendor/contractor that I have that
I then call and they replace the drive, no questions asked. Of course this is a
"paid service" but the customer (the server's client accounts) deserve this
quality level of service.

I also keep a MS Access database of all repair/service issues. I can then spot
trends more easily and keep a record of what was done, shipping & tracking
numbers, etc.

When it comes to notebooks and desktops under warranty, I follow the same rule
only I test the drive insitu. In the case of Dell, I execute Dell diagnostics
and manufacturer diagnostics. Again, irrefutable proof of a failure.

Remember that blonde haired young man in the commercials ? The one expressing
"Dude were getting a Dell". Well he was BUSTED for marijuana possession in NY
and that ended his Dell commercial career. After speaking with many a
technician at Dell I have come to the conclusion that many of their techs are
smoking rope !

{ Pet Peeve & whine }
I hate it when I tell the Dell tech. I want an email confirmation and they tell
me yes they will send me one and an email never arrives.


I hope this helps, maybe not now, but for the future....

Dave
 
Hi David,

Thanks so much for the helpful tips! I'll keep that information in my
permanent archive for the next tech support incident. I finally talked Dell
into sending a new drive. I received it within an hour, installed it,
rebuilt the array, and everything is working fine now. I'm so glad I didn't
follow the tech's recommendation!

Thanks again,

Granny
 
You guys were so right! Dell finally agreed to send me a new drive and I
rebuilt the array successfully. Everything is working fine now. The amber
warning light is no longer blinking and the array status is now "OK" instead
of "CRITICAL". Thank you so much. Rebuilding this machine would have taken
me a day and I can't afford the downtime!

You know it really bums me out that Dell gave me such bad advice. I've
been getting bad information from tech support people recently. Here's
another example:

I have 1.5x1.5 mbps "business-class" internet connection through
Roadrunner, my ISP. I had a connectivity problem and gave the tech my static
IP address block in case he wanted to run some tests. He absolutely INSISTED
that Roadrunner did not support static IP addresses -- this, despite the
fact that my cable bill has a special line item billing me for a static IP
block. I couldn't convince the guy that I had a static address. Finally he
gets mad and says to me "look, I work here, I know for a fact that all
customers have dynamic IP addresses!"... I think big companies should really
spend more time training their tech people.

Anyway, I'm just happy my RAID array is working again. Thanks again!

David
 
Grandma Wilkerson said:
Hi,

I'm running Windows2000 on a Dell 2450. I have 4 identical harddrives
configured for RAID 5. It was shipped this way from the factory. Today the
amber warning light on drive #2 starts blinking indicating a problem with
that drive. The server, however, is still running just fine. The data is in
tact. This is what I would have expected with RAID 5 -- one disk goes down
and the system stays up. That's the whole point of RAID, or so I thought.

I called Dell, hoping that they'd send out a replacement drive [I have a
4hr service contract with them]. The tech wanted me to jump through a series
of hoops [take the drive out and "re-seat" it, run a rebuild via a
SCSISelect utility, etc.]. The SCSISelect utility showed that there is a
problem with drive #2 which I expected. The tech had me do a rebuild anyway.
The rebuild appears to work at first, with drive #2 happily writing data --
but then at 20%, the rebuild fails. This came as no surprise to me because I
figured the drive was bad.... but here's what the tech said that confuses
me:

He says that because the rebuild failed at 20%, it would be pointless to
send me a new drive. He says that I need to delete the container and
re-create it which means I'd lose ALL of the data. I should then reinstall
Windows, restore my data from tape, etc. This seems crazy. I thought the
whole point of RAID-5 is that my data is safe if only one drive fails --
and in fact my data IS safe! The server is running just FINE on only 3
drives. All I want is to do is replace drive#2 so that I have my redundancy
again... but he's telling me that replacing drive #2 won't help and that I
need to backup everything, recreate the container, rebuild, and then restore
everything... and I didn't understand his explanation as to WHY I had to do
this. Something to do with "parity" errors? He says he's seen this problem
"100 times." I don't understand. This seems crazy. The data is still OK. It
seems like I should be able to just stick a new drive in there, and the RAID
5 array should automatically rebuild itself. Help!


The tech is useless as are most techs at Dell. Could be worse, could be
Gateway. At least your call wasn't transferred to India - the stories I
could tell. The drive has to be replaced *regardless* of whether you do a
rebuild or not, right? So just call back and demand a replacement drive and
what you do with it is your own business. Opt for the advance ship so you
don't have to send the bad drive back first. FWIW, you are correct. The
array will rebuild from the beginning with a new drive. A word to the wise,
get a spare drive and have it on hand for just such emergencies.

Paul
 
Grandma Wilkerson said:
You guys were so right! Dell finally agreed to send me a new drive and I
rebuilt the array successfully. Everything is working fine now. The amber
warning light is no longer blinking and the array status is now "OK" instead
of "CRITICAL". Thank you so much. Rebuilding this machine would have taken
me a day and I can't afford the downtime!

You know it really bums me out that Dell gave me such bad advice. I've
been getting bad information from tech support people recently. Here's
another example:

I have 1.5x1.5 mbps "business-class" internet connection through
Roadrunner, my ISP. I had a connectivity problem and gave the tech my static
IP address block in case he wanted to run some tests. He absolutely INSISTED
that Roadrunner did not support static IP addresses -- this, despite the
fact that my cable bill has a special line item billing me for a static IP
block. I couldn't convince the guy that I had a static address. Finally he
gets mad and says to me "look, I work here, I know for a fact that all
customers have dynamic IP addresses!"... I think big companies should really
spend more time training their tech people.

Anyway, I'm just happy my RAID array is working again. Thanks again!

David

Perhaps you got their "home-class" support line by mistake ;-)
 
Paul M. Cook©® said:
The tech is useless as are most techs at Dell. Could be worse, could
be Gateway.

Actually, the BUSINESS support techs at both Gateway and Dell have been
been very, very good in my experiances. And I've called both on at
least 2 dozen occations over the past few years.

Now if you want a tech support that is bad, try Sony.
I have a professor who has a *new* Sony Viao. He wanted a 2nd
port-replicator so that he didn't have to disconnect everything in his
office (just unplug the port replicator, travel, plug other replicator
when he arrived at new destination). Sounds like a reasonable request,
right.
No deal. You can't buy a port-replicator, at any price. There's not
even a part number for it!

I'll never buy another Sony computer product. Ever.

And don't even get me started about Poloriod for a slide maker we had in
a previous deptartment.
 
Anything but a Sony, took us almost a year to get our money back after
returning it 3 times! They kept telling us we needed a new $250 battery
which would be unchargeable after about 1 month of use. But they did wipe
the hard drive clean free of charge while they were fixing the battery.
Louis
 
Bob I said:
Maybe asking for a docking station would have produced better
results?

Nope. They knew exactly what it was I wanted - so the name of the item
was not the issue. It was the fact that they had a policy to not sell
them! The only helpful person I talked to there was getting PO'd with
it too (at his own company) - he suggested buying a 2nd Viao and
returning it, keeping the port replicator. Seriously; I'm not joking.
 
Back
Top