IDNF at just over 130 GB, multiple drives (spooky stuff)

  • Thread starter Thread starter Frantisek.Rysanek
  • Start date Start date
F

Frantisek.Rysanek

Dear all,

this is just a short note of something that resembles a UFO sighting.

In our daily practice of an industrial/embedded HW assembly shop,
we're using Linux live CD (or NFS boot) to test outgoing hardware.
It's based on Fedora 5 user-space (selected bits and pieces)
combined with some more recent kernel, such as 2.6.22.6 with some
light patches. The distro contains some simple in-house utils and
scripts
to generate load. It's been a fairly solid test suite for the past few
years.

In the last month or so, I've met three notebook 2.5" disk drives that
exhibit IDNF at LBA address of slightly over 268 000 000 (just over
130 GB). The first two drives were a Hitachi 500GB model, bought from
different distributors, with quite different serial numbers, likely
from different manufacturing batches, both throwing IDNF at about 268
000 200. The last one that failed just tonight, is a 160GB Seagate
drive - gave an IDNF at about 268 335 000. All the drives are SATA.
Tested in different motherboards with Intel chipsets (ICH on-chip SATA
HBA). The Hitachi drives would at least report the error in their
SMART log (visible via smartctl). The Seagate drive doesn't show the
error in the SMART log, it was only returned via the host interface at
runtime.

The particular test where this failed is a read-only continuous
sequential test across the whole surface of the drive, followed by a
10 minute random seeks test, the two tests looped ad infinitum. This
in-house util has been tested up to 20 TB RAID volumes on various i386
machines, so there's no reason for it to fail on a 160GB disk drive!

Note that the IDNF error is clearly reported by the disk drive, with
the sector's LBA number reproduced in the error response along with
the error code (IDNF) - so this doesn't seem like a parity error on
the SATA cable, garbled LBA address coming from the driver or
something like that.

Makes me wonder if I've just discovered a pattern. Yeah, too few
observations to draw statistical conclusions, I know. Various crackpot
conspiracy theories spring to my mind... I'm writing this just in case
someone had inside knowledge of some common problem in this area, and
wasn't gagged by an NDA :-)

Frank Rysanek
 
Previously said:
Dear all,
this is just a short note of something that resembles a UFO sighting.
In our daily practice of an industrial/embedded HW assembly shop,
we're using Linux live CD (or NFS boot) to test outgoing hardware.
It's based on Fedora 5 user-space (selected bits and pieces)
combined with some more recent kernel, such as 2.6.22.6 with some
light patches. The distro contains some simple in-house utils and
scripts
to generate load. It's been a fairly solid test suite for the past few
years.
In the last month or so, I've met three notebook 2.5" disk drives that
exhibit IDNF at LBA address of slightly over 268 000 000 (just over
130 GB). The first two drives were a Hitachi 500GB model, bought from
different distributors, with quite different serial numbers, likely
from different manufacturing batches, both throwing IDNF at about 268
000 200. The last one that failed just tonight, is a 160GB Seagate
drive - gave an IDNF at about 268 335 000. All the drives are SATA.
Tested in different motherboards with Intel chipsets (ICH on-chip SATA
HBA). The Hitachi drives would at least report the error in their
SMART log (visible via smartctl). The Seagate drive doesn't show the
error in the SMART log, it was only returned via the host interface at
runtime.
The particular test where this failed is a read-only continuous
sequential test across the whole surface of the drive, followed by a
10 minute random seeks test, the two tests looped ad infinitum. This
in-house util has been tested up to 20 TB RAID volumes on various i386
machines, so there's no reason for it to fail on a 160GB disk drive!
Note that the IDNF error is clearly reported by the disk drive, with
the sector's LBA number reproduced in the error response along with
the error code (IDNF) - so this doesn't seem like a parity error on
the SATA cable, garbled LBA address coming from the driver or
something like that.
Makes me wonder if I've just discovered a pattern. Yeah, too few
observations to draw statistical conclusions, I know. Various crackpot
conspiracy theories spring to my mind... I'm writing this just in case
someone had inside knowledge of some common problem in this area, and
wasn't gagged by an NDA :-)
Frank Rysanek

Here is a thought: the place is just when you need to go from
20 bit sector numbers to 21 bit. Possibly there is a limitation
in the size of block transfers you can do or it is some sort
of "synergy" between multi-block transfers and the offset.
I seem to remember there were some drives on the market that
had a firmware bug on very large reads that resulted in
ID not found.

Things you can try:
- Disable DMA (hdparm -d 0)
- Disable readahead
- Change other basic parameters, as dar as possible
- Change parameters for the SATA controller
- Use a different kernel, this may also be a libsata issue.

Arno
 
Here is a thought: the place is just when you need to go from
20 bit sector numbers to 21 bit. Possibly there is a limitation
in the size of block transfers you can do or it is some sort
of "synergy" between multi-block transfers and the offset.
I seem to remember there were some drives on the market that
had a firmware bug on very large reads that resulted in
ID not found.

Things you can try:
- Disable DMA (hdparm -d 0)
- Disable readahead
- Change other basic parameters, as dar as possible
- Change parameters for the SATA controller
- Use a different kernel, this may also be a libsata issue.

Arno

If the OP would like to pursue the theory that the behaviour may be
related to a 20-bit to 21-bit transition, then he could try reading
sequentially in the opposite direction, ie from the topmost LBA down
to LBA 0.

- Franc Zabkar
 
Thanks for your responses, both of you :-)

Before reading your postings, I've received a first replacement drive
from RMA today - and ran it through my test.
And guess what, same crash. Here are some data captures:

http://www.fccps.cz/download/adv/frr/drive1.dmesg.txt
http://www.fccps.cz/download/adv/frr/drive1.smartctl.txt

The hex interpretation of the LBA address seems interesting:
IDNF at LBA = 0x0fffff00 = 268435200

To me that's a 28bit boundary, rather than 20bit - but either way
the "overflow" explanation would seem plausible, though I seem to
recall
a slightly different address from one past occasion.

My test util defaults to a transaction size of 64 kB = 128 sectors.
That's not an extremely long transaction, and it's power of 2 aligned
to the whole physical drive's starting address (sector 0).
So that failing LBA address is the starting sector of my
transaction :-)

This is the util that I'm using:

http://www.fccps.cz/download/adv/frr/hddtest-1.0.tgz

It's not a particularly clean piece of code,
but it doesn't seem to have obvious overflow bugs.
Moreover, the offending offset is acknowledged by the disk drive,
and the hitachi drives even report it in the SMART log :-)

The three Hitachi drives do this fairly reliably during the first pass
of the test. The one odd Seagate drive only did it once,
after many passes of the test, and didn't log that in its SMART log
(which is certainly not a proof it didn't happen).

I may indeed modify the code to actually read the disk
"inside out" - when I have some time...

Frank Rysanek
 
Thanks for your responses, both of you :-)

Before reading your postings, I've received a first replacement drive
from RMA today - and ran it through my test.
And guess what, same crash. Here are some data captures:

http://www.fccps.cz/download/adv/frr/drive1.dmesg.txt
http://www.fccps.cz/download/adv/frr/drive1.smartctl.txt

This is the portion of the SMART log that caught my eye:

=================================================================
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 00 00 ff ff ef Error: IDNF at LBA = 0x0fffff00 = 268435200

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 00 00 ff ff ef 08 00:33:46.600 READ DMA
c8 00 70 90 fe ff ef 08 00:33:46.600 READ DMA
c8 00 90 00 fe ff ef 08 00:33:46.600 READ DMA
c8 00 f8 08 fd ff ef 08 00:33:46.600 READ DMA
c8 00 08 00 fd ff ef 08 00:33:46.600 READ DMA
=================================================================

It seems to me that the sequence of operations is ...

(1) seek to LBA 0x0ffffd00 and read 0x08 sectors
(2) seek to LBA 0x0ffffd08 and read 0xf8 sectors
(3) seek to LBA 0x0ffffe00 and read 0x90 sectors
(4) seek to LBA 0x0ffffe90 and read 0x70 sectors
(5) seek to LBA 0x0fffff00 and read 0x00 sectors

If I'm correct, then why would the sector count vary for each
operation, and why would the last operation have an SC of 0?

- Franc Zabkar
 
My test util defaults to a transaction size of 64 kB = 128 sectors.
That's not an extremely long transaction, and it's power of 2 aligned
to the whole physical drive's starting address (sector 0).
So that failing LBA address is the starting sector of my
transaction :-)

It seems to me that the Hitachi SMART log shows events happening on
256 sector boundaries. Perhaps you could try "aligning" or
"synchronising" your software and hardware using a block size of 0x100
sectors. Perhaps this behaviour is a result of a caching bug in the
Linux kernel or some kind of buffering issue???

- Franc Zabkar
 
Previously Franc Zabkar said:
On 4 Dec 2008 09:47:18 GMT, Arno Wagner <[email protected]> put finger to
keyboard and composed:
If the OP would like to pursue the theory that the behaviour may be
related to a 20-bit to 21-bit transition, then he could try reading
sequentially in the opposite direction, ie from the topmost LBA down
to LBA 0.

Good idea. Linux dd_rescue has an option for that.

Arno
 
Previously said:
Thanks for your responses, both of you :-)
Before reading your postings, I've received a first replacement drive
from RMA today - and ran it through my test.
And guess what, same crash. Here are some data captures:

The hex interpretation of the LBA address seems interesting:
IDNF at LBA = 0x0fffff00 = 268435200
To me that's a 28bit boundary, rather than 20bit - but either way
the "overflow" explanation would seem plausible, though I seem to
recall
a slightly different address from one past occasion.

Ah, yes, if these are sector numbers. And old LBA incidentially is
limited to 28 bit. That was exactly the issue I read about, some
LBA48 capable drives having problems when you read over the 28 bit
mark while using 28 bit addresses right before.
My test util defaults to a transaction size of 64 kB = 128 sectors.
That's not an extremely long transaction, and it's power of 2 aligned
to the whole physical drive's starting address (sector 0).
So that failing LBA address is the starting sector of my
transaction :-)

Yes, but I suspect the read-ahead as the issue, which will
make your reads longer.
This is the util that I'm using:

It's not a particularly clean piece of code,
but it doesn't seem to have obvious overflow bugs.
Moreover, the offending offset is acknowledged by the disk drive,
and the hitachi drives even report it in the SMART log :-)
The three Hitachi drives do this fairly reliably during the first pass
of the test. The one odd Seagate drive only did it once,
after many passes of the test, and didn't log that in its SMART log
(which is certainly not a proof it didn't happen).
I may indeed modify the code to actually read the disk
"inside out" - when I have some time...

Possibly you just need to go to single sector reads or the like around
the 28 bit mark.

Arno
 
It seems to me that the Hitachi SMART log shows events happening on
256 sector boundaries. Perhaps you could try "aligning" or
"synchronising" your software and hardware using a block size of 0x100
sectors. Perhaps this behaviour is a result of a caching bug in the
Linux kernel or some kind of buffering issue???

I suspect the read-ahead.

Arno
 
seek to LBA 0x0fffff00 and read 0x00 sectors

... why would the last operation have an SC of 0?

According to the ATA spec, a sector count of zero indicates that 256
sectors are to be transferred.

- Franc Zabkar
 
I suspect the read-ahead.

Arno

I don't pretend to understand much about the inner workings of the ATA
command set, but it appears that the Hitachi drive's SMART log is
indicating that it was being accessed using the READ DMA (C8h) command
which is limited to 28-bit LBAs. The corresponding command for 48-bit
LBA mode is READ DMA EXT (25h). It appears that Linux is defaulting to
28-bit LBA mode and switching to 48-bit mode as the need arises.
Perhaps reading the drive "from inside out" would force Linux to start
and finish in 48-bit mode and thus avoid any transitioning problems???

- Franc Zabkar
 
Previously Franc Zabkar said:
On 6 Dec 2008 01:20:59 GMT, Arno Wagner <[email protected]> put finger to
keyboard and composed:
I don't pretend to understand much about the inner workings of the ATA
command set, but it appears that the Hitachi drive's SMART log is
indicating that it was being accessed using the READ DMA (C8h) command
which is limited to 28-bit LBAs. The corresponding command for 48-bit
LBA mode is READ DMA EXT (25h). It appears that Linux is defaulting to
28-bit LBA mode and switching to 48-bit mode as the need arises.
Perhaps reading the drive "from inside out" would force Linux to start
and finish in 48-bit mode and thus avoid any transitioning problems???

I have searched abit, and this may be indeed a libata problem.
I think disks _should_ be able to read over the 28 bit boundary
even when the start of the read was specified in 28 bit and
the length of the read (number of sectors) takes them over,
but some are not. This may be standards conform behaviour,
but it certainly is non-ribust.

It seems that in some 2.6.26 kernels, people observed a similar
problem:

http://www.gossamer-threads.com/lists/linux/kernel/985985?page=last

If this is the issue, the fix would be to go at least to 2.6.26.7
or upgrade to 2.6.27 (which runs fine on several machines here).

Arno
 
Wow, thanks for the incredibly detailed analysis, everybody :-)

Maybe if I open the device with O_DIRECT, the libata + SCSI
+ block layer + VM in Linux will be prevented from reorganizing
the reads. Thanks for pointing out those size rearrangements.

Other than that, I have one more reason to upgrade my kernel
- I'm not up to hacking it to use READ DMA EXT *always*.

BTW, it seems to me that 0x0fffff00 + 256 is exactly within the 28bit
address range, but obviously in principle I can't rule out any "off by
one"
misbehavior, or a clash with drive's internal read-ahead size,
as Mr. Wagner has pointed out.
So I should at least add an option to my tests to avoid
the "troublesome behavior" on my part...

Thanks again for your help :-)

Frank Rysanek
 
http://www.gossamer-threads.com/lists/linux/kernel/985985?page=last

If this is the issue, the fix would be to go at least to 2.6.26.7
or upgrade to 2.6.27 (which runs fine on several machines here).

Arno
ahaa, should've read that before responding :-)
That's just wonderful... and there's even a one-liner patch
that fixes all the occurrences of lba_28_ok(). Great :-)
I could come up with a local fix to drivers/ata/libata-core.c,
consisting in hard-wiring LBA48 for drives greater than 130 GB,
but this is a more systematic solution.

And it's not Linux-specific. Your link even points to the minimum
Intel
ICH driver version for Windows (8.2.0.1001). Exactly what I needed.

Thank you :-)

Frank Rysanek
 
Maybe a summary would be appropriate:

The problem is not related to drive-internal read-ahead.
It's simply that sector 0x0fffffff cannot be read using
LBA28. The last legal LBA28 offset is 0x0ffffffe.
So maybe this should be considered an "off-by-one" error in the ATA
spec :-)

The problem is not Linux-specific. In Windows, the minimum okay
Intel ICH driver version is 8.2.0.1001, its iastor.sys has a timestamp
of 7 May 2008 (so in Windows, this is also HW-specific to Intel
chipsets).
In FreeBSD, reportedly this has been fixed sometime back in 2004.

Based on the Hitachi note, this behavior is inherent to many,
if not all, recent Hitachi Deskstar/Travelstar drive models.
Hitachi refuse to do anything about this behavior in their firmware,
stating that it's standards-based.

My one odd IDNF error with a Seagate drive may be some random
glitch. Based on all my past observations, Seagate drives generally
don't exhibit this problem. Makes me wonder how to collect more
information about this :-) Write a specific selective stress test
maybe...

Frank Rysanek
 
Previously said:
ahaa, should've read that before responding :-)
That's just wonderful... and there's even a one-liner patch
that fixes all the occurrences of lba_28_ok(). Great :-)
I could come up with a local fix to drivers/ata/libata-core.c,
consisting in hard-wiring LBA48 for drives greater than 130 GB,
but this is a more systematic solution.
And it's not Linux-specific. Your link even points to the minimum
Intel
ICH driver version for Windows (8.2.0.1001). Exactly what I needed.
Thank you :-)
Frank Rysanek

You are welcome. Report back whether it worked out!

Arno
 
You are welcome. Report back whether it worked out!

Arno
It seems so. The first pass of my test is well after the 130GB mark
and cranking away happily :-) I've just patched my old 2.6.22.6 with
the lba_28_ok() patch (remove -1) and that seems to have made all the
difference :-) It seems that this whole LBA28 gotcha is complete news
to our local Hitachi distributors. Oh well... nice to have people like
you still active in the USENET News :-)

Frank Rysanek
 
Previously said:
It seems so. The first pass of my test is well after the 130GB mark
and cranking away happily :-) I've just patched my old 2.6.22.6 with
the lba_28_ok() patch (remove -1) and that seems to have made all the
difference :-)

Good news. Also good for my ego ;-)
It seems that this whole LBA28 gotcha is complete news
to our local Hitachi distributors.

Well, computer equipment is a mess today. High time
the progress slows...
Oh well... nice to have people like
you still active in the USENET News :-)

:-)

Arno
 
Wow, thanks for the incredibly detailed analysis, everybody :-)

Maybe if I open the device with O_DIRECT, the libata + SCSI
+ block layer + VM in Linux will be prevented from reorganizing
the reads. Thanks for pointing out those size rearrangements.

Other than that, I have one more reason to upgrade my kernel
- I'm not up to hacking it to use READ DMA EXT *always*.
BTW, it seems to me that 0x0fffff00 + 256 is exactly within the 28bit
address range, but obviously in principle I can't rule out any "off by
one" misbehavior,

If that's what the spec says then it's not misbehavior.
or a clash with drive's internal read-ahead size, as Mr. Wagner has pointed out.

Which was utter bullshit.
Obviously you would never be able to read the last blocks on a harddrive if that were the case.
 
Back
Top