F
Frantisek.Rysanek
Dear all,
this is just a short note of something that resembles a UFO sighting.
In our daily practice of an industrial/embedded HW assembly shop,
we're using Linux live CD (or NFS boot) to test outgoing hardware.
It's based on Fedora 5 user-space (selected bits and pieces)
combined with some more recent kernel, such as 2.6.22.6 with some
light patches. The distro contains some simple in-house utils and
scripts
to generate load. It's been a fairly solid test suite for the past few
years.
In the last month or so, I've met three notebook 2.5" disk drives that
exhibit IDNF at LBA address of slightly over 268 000 000 (just over
130 GB). The first two drives were a Hitachi 500GB model, bought from
different distributors, with quite different serial numbers, likely
from different manufacturing batches, both throwing IDNF at about 268
000 200. The last one that failed just tonight, is a 160GB Seagate
drive - gave an IDNF at about 268 335 000. All the drives are SATA.
Tested in different motherboards with Intel chipsets (ICH on-chip SATA
HBA). The Hitachi drives would at least report the error in their
SMART log (visible via smartctl). The Seagate drive doesn't show the
error in the SMART log, it was only returned via the host interface at
runtime.
The particular test where this failed is a read-only continuous
sequential test across the whole surface of the drive, followed by a
10 minute random seeks test, the two tests looped ad infinitum. This
in-house util has been tested up to 20 TB RAID volumes on various i386
machines, so there's no reason for it to fail on a 160GB disk drive!
Note that the IDNF error is clearly reported by the disk drive, with
the sector's LBA number reproduced in the error response along with
the error code (IDNF) - so this doesn't seem like a parity error on
the SATA cable, garbled LBA address coming from the driver or
something like that.
Makes me wonder if I've just discovered a pattern. Yeah, too few
observations to draw statistical conclusions, I know. Various crackpot
conspiracy theories spring to my mind... I'm writing this just in case
someone had inside knowledge of some common problem in this area, and
wasn't gagged by an NDA
Frank Rysanek
this is just a short note of something that resembles a UFO sighting.
In our daily practice of an industrial/embedded HW assembly shop,
we're using Linux live CD (or NFS boot) to test outgoing hardware.
It's based on Fedora 5 user-space (selected bits and pieces)
combined with some more recent kernel, such as 2.6.22.6 with some
light patches. The distro contains some simple in-house utils and
scripts
to generate load. It's been a fairly solid test suite for the past few
years.
In the last month or so, I've met three notebook 2.5" disk drives that
exhibit IDNF at LBA address of slightly over 268 000 000 (just over
130 GB). The first two drives were a Hitachi 500GB model, bought from
different distributors, with quite different serial numbers, likely
from different manufacturing batches, both throwing IDNF at about 268
000 200. The last one that failed just tonight, is a 160GB Seagate
drive - gave an IDNF at about 268 335 000. All the drives are SATA.
Tested in different motherboards with Intel chipsets (ICH on-chip SATA
HBA). The Hitachi drives would at least report the error in their
SMART log (visible via smartctl). The Seagate drive doesn't show the
error in the SMART log, it was only returned via the host interface at
runtime.
The particular test where this failed is a read-only continuous
sequential test across the whole surface of the drive, followed by a
10 minute random seeks test, the two tests looped ad infinitum. This
in-house util has been tested up to 20 TB RAID volumes on various i386
machines, so there's no reason for it to fail on a 160GB disk drive!
Note that the IDNF error is clearly reported by the disk drive, with
the sector's LBA number reproduced in the error response along with
the error code (IDNF) - so this doesn't seem like a parity error on
the SATA cable, garbled LBA address coming from the driver or
something like that.
Makes me wonder if I've just discovered a pattern. Yeah, too few
observations to draw statistical conclusions, I know. Various crackpot
conspiracy theories spring to my mind... I'm writing this just in case
someone had inside knowledge of some common problem in this area, and
wasn't gagged by an NDA
Frank Rysanek