Maxtor and RMA hassles?

  • Thread starter Thread starter Ryan Underwood
  • Start date Start date
R

Ryan Underwood

Is it just me, or does Maxtor suddenly have a huge stick up their ass with
respect to warranty RMA's? Specifically, having requested replacements
for ATA drives as recently as last year, and SCSI drives as recently as
a month ago, I can never remember them insisting on a diagnostic code to
process a warranty. This is a problem because they have no Linux version
of their diagnostic tool, so I would have to take the entire server down
to run the diagnostic on a single failed drive out of an array.

I tried to do the RMA on their website, but got hung up with the
diagnostic code requirement. Remembering that WDC used to require a
diagnostic code for web RMA's but didn't bother you if you did a phone
RMA, I called up. Well, the lackey I got insisted on the diagnostic code
giving no particular reason why, even when I explained the situation, so I
asked to speak to his manager. After 10 minutes on hold, the
conversation went mostly like this:

--------------------------------------

"Sir, we need a diagnostic code to process the RMA."

"Well, I just returned a drive not more than a month ago, and they didn't
hassle me about any diagnostic code."

"Well, that means someone didn't do their job."

.... "The drive went down out of an array for bad blocks. It needs to be
replaced."

"Well sir, if you run the diagnostic utility, it will fix those bad
blocks, and then you won't need to return the drive."

.... (having heard this line before) "Please explain to me how a software
utility will correct a media defect."

"Sir, be advised that media problems are not covered under warranty and
you will need to contact your media software vendor."

.... (incredulously) "No, that's not what I mean. I mean there is a
problem with the physical medium that the drive is storing information on.
I want you to explain to me how a software utility would "fix" a defect
rather than just mask an underlying problem.

"Sir, if you just run that utility, it will fix the problem."

"Even if it were able to do so, I would have to take the entire server
down to do that, which defeats the point of a hot spare, and causes me to
lose money."

"Well then you're going to have to do that. Your account is now flagged
for a bad blocks complaint, and any time you call back for any drive in
the future, we will have to insist on a diagnostic code before proceeding."

.... (thinking fast) "Well, the drive just died. I guess I won't be able
to run the diagnostic now."

"Ok sir, now if you would just verify your address and telephone number,
I'll have someone prepare the RMA for you and give you a call back."

----------------------------------------------------

Is this ridiculous or what? Not only the quick change of tune, but the
insistence on the diagnostic utility's ability to fix a problem. I have
never seen a drive be magically repaired by a manufacturer's utility.
Usually the problem just ends up coming back later, so you get to lose
whatever was stored on that block twice. I knew people who used IBM
75GXP drives in their servers, and when they eventually crapped out, they
would run DFT, it would "fix" the drive, and then they would go on their
merry way, only to have the drive crap out again a few weeks later.
Thank $DEITY for RAID-1! Of course, that doesn't help you if all the
drives in the set are full of holes.

My theory is that the "repair" in any manufacturer's utility simply writes
data to that block that exercises the magnetic field in that area, in an
attempt to shake out problems related to magnetic drift. But what's the
likelihood of normal magnetic drift being a problem on a drive less than 2
years old, compared to an actual media defect which causes the block
mapped at that location not to be able to hold onto its data?

Usually modern drives are doing block remapping behind the scenes too;
when a block has a correctable read error, the drive transparently moves
the data to a spare block and marks the original one bad. You can tell
when this has happened too, because reads that would previously have been
linear end up having intermittent seeks in the middle of the file, where
the drive seeks to the spare blocks(s) that now hold parts of that
file, and then back to where the rest of the file resides. So what would
the software utility magically accomplish that the drive's defect
management firmware would not?

The manager as well as the person that called back and actually wrote up
the RMA insisted that this policy has been in place for several years now.
I know that's either misinformed or a flat-out lie, because I had an
ATA drive replaced last year without any diagnostic, and the most recent
SCSI replacement was about a month ago, and they didn't want any
diagnostic, and the drive was even out of warranty when they replaced it.
I also fairly clearly remember submitting their web RMA form in the past
without needing a diagnostic code, so the "Diagnostic Code Required" step
in there is a new one on me.

This really wouldn't be a problem if they offered versions of their
diagnostic software that worked natively under Linux or whatever server OS
you are using. Asking to take your busy server down to run a diagnostic
(and then possibly again to install the new drive, without a hotplug
array) is just plain unreasonable in my opinion.

Thoughts? Am I just plain wrong and they have always required this, or
are they really tightening their belts at the expense of their customers'
patience and businesses?

Seagate is looking awful good these days.
 
Ryan Underwood said:
a month ago, I can never remember them insisting on a diagnostic code to
process a warranty. This is a problem because they have no Linux version
of their diagnostic tool, so I would have to take the entire server down
to run the diagnostic on a single failed drive out of an array.
Did you run smartctl? It can self-test modern IDE/SCSI drives.
 
Previously Ryan Underwood said:
Is it just me, or does Maxtor suddenly have a huge stick up their ass with
respect to warranty RMA's? Specifically, having requested replacements
for ATA drives as recently as last year, and SCSI drives as recently as
a month ago, I can never remember them insisting on a diagnostic code to
process a warranty. This is a problem because they have no Linux version
of their diagnostic tool, so I would have to take the entire server down
to run the diagnostic on a single failed drive out of an array.

They requested it earlier too, but you can get around this by claiming
"loud noise" or one of the other reasons. However there is a risk of
them sending your defect drive back.

Presently I have a drive with failed SMART and >2500 reallocated
sectors, which was a pain to blamnk for privacy. Still the Maxtor
utility (it uses Windows only to create a bootable DOS floppy, email
me if you want a current sector image that you can put on a floppy
with Linux) needs to do a full scan to give me a code. The best way to
do this IMO is to remove the disk, put it as only disk in another PC
and let the tests run there. If it offers to fix bad blocks, let it
and run the test again. At least I have had success with this and got
the code on the second run. If this fails, you can let it run a
burn-in test until it finally decides the disk is dead.

However I have to add that sometimes (have seen this now 1 time in
maybe 50 disks total), the defects go away and the disk is still
perfectly fine months later.
I tried to do the RMA on their website, but got hung up with the
diagnostic code requirement. Remembering that WDC used to require a
diagnostic code for web RMA's but didn't bother you if you did a phone
RMA, I called up. Well, the lackey I got insisted on the diagnostic code
giving no particular reason why, even when I explained the situation, so I
asked to speak to his manager. After 10 minutes on hold, the
conversation went mostly like this:

[conversation snipped]
Is this ridiculous or what? Not only the quick change of tune, but the
insistence on the diagnostic utility's ability to fix a problem. I have
never seen a drive be magically repaired by a manufacturer's utility.
Usually the problem just ends up coming back later, so you get to lose
whatever was stored on that block twice. I knew people who used IBM
75GXP drives in their servers, and when they eventually crapped out, they
would run DFT, it would "fix" the drive, and then they would go on their
merry way, only to have the drive crap out again a few weeks later.
Thank $DEITY for RAID-1! Of course, that doesn't help you if all the
drives in the set are full of holes.
My theory is that the "repair" in any manufacturer's utility simply writes
data to that block that exercises the magnetic field in that area, in an
attempt to shake out problems related to magnetic drift.

No. It just writes data, the drive remembers the sector as bad and
reallocates it. You can get the same effect by just blanking the drive,
e.g. with dd_rescue /dev/zero /dev/hd<x>. You can also overwrite the
files the defects are in or overwrite the secors directly with
dd or dd_Rescue, if you are brave. The defects will be gone
afterwards. Still, more may turn up if the drive has a problem.
But what's the
likelihood of normal magnetic drift being a problem on a drive less than 2
years old, compared to an actual media defect which causes the block
mapped at that location not to be able to hold onto its data?
Usually modern drives are doing block remapping behind the scenes too;
when a block has a correctable read error, the drive transparently moves
the data to a spare block and marks the original one bad. You can tell
when this has happened too, because reads that would previously have been
linear end up having intermittent seeks in the middle of the file, where
the drive seeks to the spare blocks(s) that now hold parts of that
file, and then back to where the rest of the file resides. So what would
the software utility magically accomplish that the drive's defect
management firmware would not?

It is just a front-end that lets you write to the sectors. Until
you do, the sectors are defect on reads.
The manager as well as the person that called back and actually wrote up
the RMA insisted that this policy has been in place for several years now.

It has been, AFAIK. At least for ATA. I never had a Maxtor SCSI drive.
I know that's either misinformed or a flat-out lie, because I had an
ATA drive replaced last year without any diagnostic, and the most recent
SCSI replacement was about a month ago, and they didn't want any
diagnostic, and the drive was even out of warranty when they replaced it.
I also fairly clearly remember submitting their web RMA form in the past
without needing a diagnostic code, so the "Diagnostic Code Required" step
in there is a new one on me.
This really wouldn't be a problem if they offered versions of their
diagnostic software that worked natively under Linux or whatever server OS
you are using. Asking to take your busy server down to run a diagnostic
(and then possibly again to install the new drive, without a hotplug
array) is just plain unreasonable in my opinion.

As I said, the tool is native DOS. You can get good disgnostics by
doing a surface scan (dd_rescue /dev/hdx /dev/null), by querying
the SMART staus (ATA only on Linux, I think) and by looking at the SCSI
defects map and count. On ATA you can also do short and long
SMART self-tests from linux and look at reallocated sectors
and other stats. I use "smartctl" for this. Works well.
Thoughts? Am I just plain wrong and they have always required this, or
are they really tightening their belts at the expense of their customers'
patience and businesses?

The code thing is just a show-stoper for people that complain but
their drives are fine. As far as I know there are a lot of them. And,
as I said, I had one disk with 100 defects that refused to get more
defects despite a week of hourly surface scans. It still works well
and it did not get noticably slower. My theory is that the problem was
caused by vibration or a power fluctuation.

If the drive is dying, you will get that code. Just remove it, put in
your cold spare and run the PowerMax from its bootable floppy on some
unused computer to get that code. If the drive survives several days
of "burn-in test", it is probably fine. Otherwise you get the
code.

Side note: Maxtor advance RMA is a good idea, since you get nice
packaging and it is pretty fast these days (at least in Europe).
Seagate is looking awful good these days.

Not really better IMO. Not until they have the longer warranties in
effect. At the moment it is still one year for all those ATA disks, I
believe. However when they put the announced five years (I think that
was it) in effect, I need to think about it again.

Regards,
Arno
 
Is it just me, or does Maxtor suddenly have a huge stick up their ass with
respect to warranty RMA's? Specifically, having requested replacements
for ATA drives as recently as last year, and SCSI drives as recently as
a month ago, I can never remember them insisting on a diagnostic code to
process a warranty. This is a problem because they have no Linux version
of their diagnostic tool, so I would have to take the entire server down
to run the diagnostic on a single failed drive out of an array.

I tried to do the RMA on their website, but got hung up with the
diagnostic code requirement. Remembering that WDC used to require a
diagnostic code for web RMA's but didn't bother you if you did a phone
RMA, I called up. Well, the lackey I got insisted on the diagnostic code
giving no particular reason why, even when I explained the situation, so I
asked to speak to his manager. After 10 minutes on hold, the
conversation went mostly like this:

It's easier for them to repair the drive if you have previously run
the diagnostic tool. They know what happened and what is to be changed

Nick
 
Did you run smartctl? It can self-test modern IDE/SCSI drives.

Yes, and the SMART self-test showed a failure. This was what alerted me
to the problem via email (then confirmed with uncorrectable read errors
from dmesg when the RAID was syncing). Unfortunately, the first tech I
talked to had no idea what I was talking about when I mentioned this, so I
didn't bother pursuing it. Pretty sad for a storage company, don't you
think?
 
They requested it earlier too, but you can get around this by claiming
"loud noise" or one of the other reasons. However there is a risk of
them sending your defect drive back.

Has this actually happened to anyone?
Presently I have a drive with failed SMART and >2500 reallocated
sectors, which was a pain to blamnk for privacy.

Yes, this is another problem with transparent block remapping.
No. It just writes data, the drive remembers the sector as bad and
reallocates it.

But how would it do the remapping when writing? Aren't blocks remapped
when errors occur while they are _read_, not when written?
You can get the same effect by just blanking the drive,
e.g. with dd_rescue /dev/zero /dev/hd<x>. You can also overwrite the
files the defects are in or overwrite the secors directly with
dd or dd_Rescue, if you are brave. The defects will be gone
afterwards. Still, more may turn up if the drive has a problem.

Wow. So those utilities really don't do anything special at all. That is
excellent information.
It has been, AFAIK. At least for ATA. I never had a Maxtor SCSI drive.

It definitely wasn't for the Quantum SCSI drive I returned. I returned an
ATA drive last year which was randomly doing something odd which would
cause it to whine while the transfer rate went very low, and then return
to normal. When this became common, I had a RMA issued, and I don't
remember having to get a diagnostic code. It's a good thing, because the
drive never actually failed, so I would not have gotten any code from the
utility probably.
As I said, the tool is native DOS. You can get good disgnostics by doing
a surface scan (dd_rescue /dev/hdx /dev/null), by querying the SMART
staus (ATA only on Linux, I think)

no, some SCSI drives support SMART too (probably newer ones):
# smartctl -a /dev/sda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: IBM DDYS-T18350N Version: S96H
Serial number: 4EGD8217
Device type: disk
Transport protocol: Fibre channel (FCP-2)
Local Time is: Fri Sep 3 01:51:21 2004 CDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature: 42 C
Drive Trip Temperature: 85 C
Manufactured in week 06 of year 2001
Current start stop count: 180 times
Recommended maximum start stop count: 10000 times
[...]
On ATA you can also do short and long SMART self-tests
from linux and look at reallocated sectors and other stats. I use
"smartctl" for this. Works well.

Yes, I forgot to mention:
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
[...]
7 Seek_Error_Rate 0x000b 001 001 023 Pre-fail Always
FAILING_NOW 23
[...]
04 59 26 f0 10 01 e0 Error: ABRT 38 sectors at LBA = 0x000110f0 = 69872
40 59 26 f0 10 01 e0 Error: UNC 38 sectors at LBA = 0x000110f0 = 69872
....etc .

I didn't mention this because the first person I talked to hadn't had any
clue what SMART or Linux was or why a SMART failure meant anything, only
insisting on their own diagnostic utility. I probably should have pursued
this point further, but oh well. I was notified (by smartd email) that
the disk was failing, which was quite useful.
If the drive is dying, you will get that code. Just remove it, put in
your cold spare

Ha! I'm supposed to have one of those? :)

Not having a spare on hand was part of the problem and why it was
necessary that they address the problem soon.
Side note: Maxtor advance RMA is a good idea, since you get nice
packaging and it is pretty fast these days (at least in Europe).

Yes, it is convenient because you can ship the failed drive back in the
same packaging.
Not really better IMO. Not until they have the longer warranties in
effect. At the moment it is still one year for all those ATA disks, I
believe. However when they put the announced five years (I think that
was it) in effect, I need to think about it again.

That is what my comment was intended to imply. If they have a no hassle
warranty and high performance disks, they will be within my consideration.

Thanks for the reply!
 
Yes, and the SMART self-test showed a failure. This was what alerted me
to the problem via email (then confirmed with uncorrectable read errors
from dmesg when the RAID was syncing). Unfortunately, the first tech I
talked to had no idea what I was talking about when I mentioned this, so I
didn't bother pursuing it. Pretty sad for a storage company, don't you
think?

Rather pretty sad for the average customer. I think they match
1st level suppoerters to the average customer. If you call
Cisco router support, e.g., you have an engineer on the
phone within minutes. But they have mostly customers that
have some level of technical competence.

The sad thing is that as a customer with technical competence,
you have trouble getting through the "idiot-filter".

Arno
 
Has this actually happened to anyone?

I did this sucessfylly twice. But that was more than a year ago.
Yes, this is another problem with transparent block remapping.
Indeed.
But how would it do the remapping when writing? Aren't blocks remapped
when errors occur while they are _read_, not when written?

There are two cases when the drive can detect errors on writing:
1. It already knows the sector to be bad from past reads. (It
does only reallocate on ECC recovered bad reads, not ones that
failed. Good strategy, maybe different conditions or more retries
cans till recover that secor.)
2. It has trouble finding the sector or reading the sector header. (Even
"no-id" drives have a small secor header for syncronisation.)
Wow. So those utilities really don't do anything special at all. That is
excellent information.

Yes, I was surprised as well. And the "full read scan" just does a
"long" SMART self test and produces this additional code Maxtor
support seems to be so fond of.
It definitely wasn't for the Quantum SCSI drive I returned. I returned an
ATA drive last year which was randomly doing something odd which would
cause it to whine while the transfer rate went very low, and then return
to normal. When this became common, I had a RMA issued, and I don't
remember having to get a diagnostic code. It's a good thing, because the
drive never actually failed, so I would not have gotten any code from the
utility probably.

Possibly different procedure for Quantum. I never had Quantum after I
saw several of those fail in Suns during my diploma thesis work.
no, some SCSI drives support SMART too (probably newer ones):
# smartctl -a /dev/sda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: IBM DDYS-T18350N Version: S96H
Serial number: 4EGD8217
Device type: disk
Transport protocol: Fibre channel (FCP-2)
Local Time is: Fri Sep 3 01:51:21 2004 CDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature: 42 C
Drive Trip Temperature: 85 C
Manufactured in week 06 of year 2001
Current start stop count: 180 times
Recommended maximum start stop count: 10000 times
[...]


Nice. I will remember that.

Yes, I forgot to mention:
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
[...]
7 Seek_Error_Rate 0x000b 001 001 023 Pre-fail Always
FAILING_NOW 23
[...]
04 59 26 f0 10 01 e0 Error: ABRT 38 sectors at LBA = 0x000110f0 = 69872
40 59 26 f0 10 01 e0 Error: UNC 38 sectors at LBA = 0x000110f0 = 69872
...etc .
I didn't mention this because the first person I talked to hadn't had any
clue what SMART or Linux was or why a SMART failure meant anything, only
insisting on their own diagnostic utility. I probably should have pursued
this point further, but oh well. I was notified (by smartd email) that
the disk was failing, which was quite useful.

I also have smartd running on most of my disks. _Very_ useful indead.
I also have monitoring on "reallocated sectors" which is one of
the early warning signs.
Ha! I'm supposed to have one of those? :)

What, you mean you dont???? I feel for you, I truely do.
Not having a spare on hand was part of the problem and why it was
necessary that they address the problem soon.
Yes, it is convenient because you can ship the failed drive back in the
same packaging.
That is what my comment was intended to imply. If they have a no hassle
warranty and high performance disks, they will be within my consideration.
Thanks for the reply!

You are welcome.

Arno
 
Ryan Underwood said:
Has this actually happened to anyone?


Yes, this is another problem with transparent block remapping.

Oh? What problem would that be?

[snip]
 
Back
Top