Failure of brand new drive... possibly due to staggered spinup?

  • Thread starter Thread starter Dan Lenski
  • Start date Start date
D

Dan Lenski

Hi all,
I've just experienced a mystifying failure of a hard disk that was
literally only one day old. It is a Hitachi Travelstar 5K160 (80gb
5400RPM SATA) that came with my new Dell laptop. I had installed
Ubuntu Feisty Linux, and everything seemed to be working fine, and I
even checked the S.M.A.R.T. data for the drive and it looked great.

I was playing around with drive power settings using hdparm under
Linux, and I enabled "power-on in standby mode", which is supposed to
enable staggered spin-up. No particular reason, I was just trying it
out. I assumed the effect would be harmless in a single-drive
system. Everything continued to work fine, until I powered off the
computer an hour or so later...

I tried to turn it back on, and BIOS reported failure of the first
disk drive. I tried a variety of rescue CDs and boot disks, to no
avail... I could not get the drive to respond. I then removed the
drive from the laptop and put it in my desktop tower. Again, the
computer was unable to communicate with it, ruling out the possibility
of a drive controller issue. I tried holding the drive in my hand as
it powered up, and I could not feel the characteristic hum of the
motor!

So I'm quite mystified. The coincidence is uncanny, and I've never
had a brand-spanking-new drive fail like this. Is it possible that
enabling "power-on in standby mode" destroyed this drive?? In my
experience, drives in standby mode are still capable of communicating
with the host, so I don't understand what the problem with this drive
could be. Anyone have any advice/anecdotes/explanation?

Dan Lenski
 
Your BIOS can fail to send the START STOP UNIT command to the boot drive.

Or the drive firmware can have a bug in this path, in which case yes, the
drive is killed. But this is unlikely.

Solution:
- attach the disk to the second Linux machine as _non-primary_ disk.
- boot Linux
- play with "hdparm"

Or:
- find an USB/1394 box for the disk
- install in inside
- attach the box to a Windows machine. What will Windows say?
 
Dan Lenski said:
Hi all, [SNIP]

So I'm quite mystified. The coincidence is uncanny, and I've never
had a brand-spanking-new drive fail like this. Is it possible that
enabling "power-on in standby mode" destroyed this drive?? In my
experience, drives in standby mode are still capable of communicating
with the host, so I don't understand what the problem with this drive
could be. Anyone have any advice/anecdotes/explanation?

Dan Lenski

You've just experienced an early failure. Nothing special, just a fact of
life. The frequency of failures usually follow the 'bathtub' curve.
Relatively many drives fail early in their life due to one weak component
that barely made it through the manufacturing tests. The failure rate drops
over time and remains low for a number of years. Then it goes up again when
the drive components start to wear.

Just call Dell and have the drive replaced under warranty.

Rob
 
You've just experienced an early failure. Nothing special, just a fact of
life. The frequency of failures usually follow the 'bathtub' curve.
Relatively many drives fail early in their life due to one weak component
that barely made it through the manufacturing tests. The failure rate drops
over time and remains low for a number of years. Then it goes up again when
the drive components start to wear.

I guess so. It's just... spooky! Is it typical for such early
failures to occur when the drive is power-cycled? I'm going to live
in fear of the "hdparm -s1" option in the future :-)
Just call Dell and have the drive replaced under warranty.

I've done that, after assuaging my conscience that this wasn't my
fault. Well, actually I used their Internet chat tech support...
which was a pleasant surprise since it turns out to be less annoying
than speaking on the phone.

Dan
 
Didn't Google, or some other large company, recently discredit the
bathtub curve failure rate theory?
 
Your BIOS can fail to send the START STOP UNIT command to the boot drive.

Or the drive firmware can have a bug in this path, in which case yes, the
drive is killed. But this is unlikely.

Solution:
- attach the disk to the second Linux machine as _non-primary_ disk.
- boot Linux
- play with "hdparm"

Tried this. (I actually booted off a USB-drive containing some Linux
utilities since I only had one SATA cable.) When Linux boots, it
complains of an inability to communicate with SATA disk 1. So no /dev/
sd* node ever gets allocated for the disk.

I can see the possibility that BIOS fails to send the appropriate
initialization commands to the drive, knowing how buggy BIOS can be.
But it seems unlikely that *both* BIOS and the Linux kernel would fail
to do so! And from other mailing list posts, I've read that SATA
drives should not have any problem identifying themselves to the host
in standby mode, before spin-up.
Or:
- find an USB/1394 box for the disk
- install in inside
- attach the box to a Windows machine. What will Windows say?

An interesting idea. Though I don't have a SATA enclosure handy, only
an IDE enclosure.

I guess the drive really is just plain dead. I really wish I could
confirm or refute the notion that standby mode did it, though!

Dan
 
Dan said:
I guess the drive really is just plain dead. I really wish I could
confirm or refute the notion that standby mode did it, though!


So get another one and try it again, repeat until you have a
statistically valid sample :-)
 
Dan Lenski said:
Hi all,
I've just experienced a mystifying failure of a hard disk that was
literally only one day old. It is a Hitachi Travelstar 5K160 (80gb
5400RPM SATA) that came with my new Dell laptop. I had installed
Ubuntu Feisty Linux, and everything seemed to be working fine, and I
even checked the S.M.A.R.T. data for the drive and it looked great.

I was playing around with drive power settings using hdparm under
Linux, and I enabled "power-on in standby mode", which is supposed to
enable staggered spin-up. No particular reason, I was just trying it
out. I assumed the effect would be harmless in a single-drive system.
Everything continued to work fine, until I powered off the
computer an hour or so later...

I tried to turn it back on, and BIOS reported failure of the first
disk drive. I tried a variety of rescue CDs and boot disks, to no
avail... I could not get the drive to respond. I then removed the
drive from the laptop and put it in my desktop tower. Again, the
computer was unable to communicate with it, ruling out the possibility
of a drive controller issue. I tried holding the drive in my hand as
it powered up, and I could not feel the characteristic hum of the motor!

And why should you? You set it up to "power-on in standby mode".
So it does.
So I'm quite mystified. The coincidence is uncanny, and I've never
had a brand-spanking-new drive fail like this. Is it possible that
enabling "power-on in standby mode" destroyed this drive??

Nope, it is just doing what you told it to do, its in standby until
you tell it to come out of it.
In my experience, drives in standby mode are still capable of communicating
with the host,

And it probably does.
Problem is likely that host doesn't understand why it is in standby mode,
so it fails it.
so I don't understand what the problem with this drive could be.

Most likely none.
Anyone have any advice/anecdotes/explanation?

Most likely your host isn't compatible with power-on in standby mode.
Set the drive back to normal. That may be easier said then done, apparently.
 
So get another one and try it again, repeat until you have a
statistically valid sample :-)

I hadn't planned to get into the hard disk testing business anytime
soon :-)

I'm just worried that there could be some issue with standby mode on
this brand of drive. Having my drive die after 2 days is bad
enough... having it die after 2 months when I have all my work on
there would be a lot worse.

Dan
 
Hi all,
I've just experienced a mystifying failure of a hard disk that was
literally only one day old. It is a Hitachi Travelstar 5K160 (80gb
5400RPM SATA) that came with my new Dell laptop. I had installed
Ubuntu Feisty Linux, and everything seemed to be working fine, and I
even checked the S.M.A.R.T. data for the drive and it looked great.

I was playing around with drive power settings using hdparm under
Linux, and I enabled "power-on in standby mode", which is supposed to
enable staggered spin-up. No particular reason, I was just trying it
out. I assumed the effect would be harmless in a single-drive
system. Everything continued to work fine, until I powered off the
computer an hour or so later...

I tried to turn it back on, and BIOS reported failure of the first
disk drive. I tried a variety of rescue CDs and boot disks, to no
avail... I could not get the drive to respond. I then removed the
drive from the laptop and put it in my desktop tower. Again, the
computer was unable to communicate with it, ruling out the possibility
of a drive controller issue. I tried holding the drive in my hand as
it powered up, and I could not feel the characteristic hum of the
motor!

So I'm quite mystified. The coincidence is uncanny, and I've never
had a brand-spanking-new drive fail like this. Is it possible that
enabling "power-on in standby mode" destroyed this drive?? In my
experience, drives in standby mode are still capable of communicating
with the host, so I don't understand what the problem with this drive
could be. Anyone have any advice/anecdotes/explanation?

Dan Lenski

This is as expected. You need to send a

Power-Up In Standby feature set device spin-up.

command to spinup the disk, or a

Disable Power-Up In Standby feature set.

to disable the feature.
 
And why should you? You set it up to "power-on in standby mode".
So it does.

Indeed. However, I would expect it to come out of standby mode when
addressed by the host :-) For example, under Linux I can put a drive
temporarily into standby with "hdparm -y /dev/sda". However, the
Linux IDE/SATA drivers will bring it out of standby as soon as I try
to access it.
And it probably does.
Problem is likely that host doesn't understand why it is in standby mode,
so it fails it.

Okay. I would believe this if it was only the laptop BIOS that didn't
know what to do. But not only the laptop BIOS can't initialize it,
also the BIOS on my desktop can't initialize it, and the Linux kernel
can't initialize it when booting from an external disk.

I certainly think a recent Linux 2.6.20 kernel must know how to deal
with this situation... I've never met another hard drive feature that
the Linux kernel couldn't handle with ease.

Of course, now that I dig around a little more, I find this patch on
the linux-ide mailing list: http://www.mail-archive.com/[email protected]/msg04323.html
Maybe with this patch my kernel will figure out what to do? I'll try
it tonight...
Most likely your host isn't compatible with power-on in standby mode.
Set the drive back to normal. That may be easier said then done, apparently.

Indeed. Is there any utility to do this??

Dan Lenski
 
Indeed. However, I would expect it to come out of standby mode when
addressed by the host :-)

Nope.
It wants/needs to be specifically told. Else any access would wake it up.
For example, under Linux I can put a drive
temporarily into standby with "hdparm -y /dev/sda". However, the
Linux IDE/SATA drivers will bring it out of standby as soon as I try
to access it.

Power-on in standby mode is an altogether different feature.
It's similar to the start unit command of SCSI that is required
if a SCSI drive has been jumpered for autospin disabled.
The difference here is that the jumper has been executed in software
so you have a jumper command and a spinup command.
Svend has mentioned them both already.
Okay. I would believe this if it was only the laptop BIOS that didn't
know what to do. But not only the laptop BIOS can't initialize it,
also the BIOS on my desktop can't initialize it, and the Linux kernel
can't initialize it when booting from an external disk.

That's not so surprising at all. Even IBM/Hitachi who are normally well
equiped (either their Drive Fitness Test or Feature Tool) don't have it in
their toolkits.
I certainly think a recent Linux 2.6.20 kernel must know how to deal
with this situation... I've never met another hard drive feature that
the Linux kernel couldn't handle with ease.

Of course, now that I dig around a little more, I find this patch on
the linux-ide mailing list: http://www.mail-archive.com/[email protected]/msg04323.html
Maybe with this patch my kernel will figure out what to do? I'll try
it tonight...
Indeed. Is there any utility to do this??

Now that you mention it, Svend was experimenting with it.
http://www.partitionsupport.com/advancednotes.htm
 
This is as expected. You need to send a

Power-Up In Standby feature set device spin-up.

command to spinup the disk, or a

Disable Power-Up In Standby feature set.

to disable the feature.

Wow. Just wow. I can hardly believe it, but that worked. Thanks
Svend and Folkert for helping me figure out that the drive wasn't
actually dead.

Issuing those commands to the drive wasn't so easy: I had to apply
Mark Lord's patch (http://www.mail-archive.com/linux-
(e-mail address removed)/msg04323.html) to the 2.6.20 kernel. But lo and
behold, when I booted with that patch, the SETFEATURE_SPINUP command
was sent to the drive, and it began to operate again.

The whole thing is kind of amazing: toggling the "power up in standby"
feature caused the BIOS of *three* desktop computers to pronounce the
drive dead, and to freeze when booting. In order to get past the
BIOS, I had to hotplug the drive at the GRUB boot menu. And the
default 2.6.20 Linux kernel of Ubuntu failed to spin the drive up as
well. Probably the Linux kernel doesn't support this since it expects
the BIOS to have spun the drive up already.

So I still have some questions...
* does anyone know of a BIOS that actually *does* know how to spin up
drives that boot in standby?
* why isn't this feature marked as DANGEROUS in the hdparm
manual :-) ?
* is there a way to issue raw commands to a drive from Linux (maybe
via /sys) without recompiling the kernel?

I'd like to make a standalone boot disk to help out other folks who've
bricked their drive in a similar fashion. It'd be great to figure out
a way to do it without a custom kernel.

Wow. This is definitely the strangest hardware/firmware quirk I've
ever encountered... and one of the most time-consuming.

Dan
 
The whole thing is kind of amazing: toggling the "power up in standby"
feature caused the BIOS of *three* desktop computers to pronounce the
drive dead, and to freeze when booting.

A clear sign of bad industry support of this (S)ATA feature, especially for
laptop drives.

For SCSI drives, their SCSI BIOSes can send START STOP UNIT (the similar SCSI
command) at boot for very long times, and the drive can be mechanically
jumpered to "no spin at powerup".

This is because spinning up a SCSI drive imposes significant load to the PSU,
so, it is a good idea to delay its spinup until after the BIOS self-tests,
while the (S)ATA drives will be spinned up and power up. This reduces the PSU
power load.

But this is relevant for "heavy" SCSI drives only, not relevant for a laptop
drive. That's why - IMHO - the industry support for a feature is bad on (S)ATA.
* why isn't this feature marked as DANGEROUS in the hdparm
manual :-) ?

Hey, it's open source, mark yourself and tell the maintainer :-)
* is there a way to issue raw commands to a drive from Linux (maybe
via /sys) without recompiling the kernel?

Try FreeBSD and "camcontrol".
 
Maxim said:
A clear sign of bad industry support of this (S)ATA feature, especially for
laptop drives.

For SCSI drives, their SCSI BIOSes can send START STOP UNIT (the similar SCSI
command) at boot for very long times, and the drive can be mechanically
jumpered to "no spin at powerup".

This is because spinning up a SCSI drive imposes significant load to the PSU,
so, it is a good idea to delay its spinup until after the BIOS self-tests,
while the (S)ATA drives will be spinned up and power up. This reduces the PSU
power load.

But this is relevant for "heavy" SCSI drives only, not relevant for a laptop
drive. That's why - IMHO - the industry support for a feature is bad on (S)ATA.

I don't agree on that. Don't forget that SATA drives are also used in
big (and very expensive) storage arrays for low performance high
capacity disk storage.
 
A clear sign of bad industry support of this (S)ATA feature, especially for
laptop drives.

Right. It's about a 3-line addition to the BIOS code, as can be seen
from Mark Lord's libata patch which I linked to. In my opinion, it
*is* a feature which would benefit desktop computers and embedded
systems, since you could save significant load on the PSU by not
spinning up the HD at boot time. For example, my friend has built an
automotive PC and he had problems with it crashing at boot, due to
excessive drain on the car's 12V supply.

Also, I don't think the distinction between 2.5" and 3.5" drives is
relevant here, since they all use the same (S)ATA command set.
Hey, it's open source, mark yourself and tell the maintainer :-)

Oh, believe me, I plan to :-) In my opinion, it is MUCH more
dangerous than the other features marked dangerous. Most of them can
simply crash the OS or lock up the drive until the next reboot.

This one can make the drive appear dead *and* freeze the BIOS.
Try FreeBSD and "camcontrol".

Cool. That's a neat utility. I feel like it outta be possible to
send some commands via /sys/bus/scsi/devices or something like that...
but it's just a hunch. I'm going to email Mark Lord about his patch
and maybe he'll have a suggestion for that!

I'd also like to poke the freakin' BIOS vendors with a clue stick and
tell them to support this feature... but that's probably a lost cause,
right?

Dan
 
I don't agree on that. Don't forget that SATA drives are also used in
big (and very expensive) storage arrays for low performance high
capacity disk storage.

Right. I assume that's why this drive has the feature. I have heard
that some data centers use arrays of 2.5" disks since they consume
significantly less power, and I'm assuming that's why this feature is
implemented for SATA disks.

Dan
 
Try FreeBSD and "camcontrol".
Cool. That's a neat utility. I feel like it outta be possible to
send some commands via /sys/bus/scsi/devices or something like that...
but it's just a hunch.

"camcontrol" IIRC can do this.

But, to send SCSI commands to (S)ATA drive in FreeBSD, you need a properly
built kernel - no direct ATA disk driver, but the SCSI-to-ATA bridge driver.
 
Right. I assume that's why this drive has the feature. I have heard
that some data centers use arrays of 2.5" disks since they consume
significantly less power, and I'm assuming that's why this feature is
implemented for SATA disks.

Actually 2.5" SATA drives are used as local disks in blade servers,
were space and power are at a premium. There are also high performance
2.5" disks that are unsuitable for laptops, but AFAIK they are
not available to ordinary customers, just to OEMs.

And yes, I believe you are correct that this is the reason
the feature is present. An other one is that 2.5" disks are
far better at starting fast than 3.5" disks, since on laptops this
is a typical way to save power.

Still, basically the BIOS manufacurers or customizers messed
up badly here.

Arno
 
In comp.sys.ibm.pc.hardware.storage Dan Lenski said:
I'd also like to poke the freakin' BIOS vendors with a clue stick and
tell them to support this feature... but that's probably a lost cause,
right?

Very likely. These people believe they know what they are doing, which
is the worst kind of incompetence.

Arno
 
Back
Top