USB disk silent read errors

  • Thread starter Thread starter Svend Olaf Mikkelsen
  • Start date Start date
S

Svend Olaf Mikkelsen

Previously "bit errors" were discussed. That was when wrong content
was read from a disk without error messages or warnings. Typically
there was a pattern. An example may have been that 1 bit location in
each 16 bit word was always returned as 0.


Currently I have an 8 GB USB disk (USB flash drive) with "silent read
errors".

The disk has 15753215 sectors, each 512 bytes.

In Linux using dd, I retrieved the first 262,144,000 bytes:


dd if=/dev/sda of=usbdisk_bs_4096.bin bs=4096 count=64000

dd if=/dev/sda of=usbdisk_bs_65536.bin bs=65536 count=4000


And afterwards in Windows:

fc /b usbdisk_bs_4096.bin usbdisk_bs_65536.bin

gives:

Comparing files usbdisk_linux_bs_4096.bin and
USBDISK_LINUX_BS_65536.BIN

0ED1C000: C1 F8
0ED1C200: 20 1B
0ED1C400: 0B 32
0ED1C600: 6C 53
0ED1C800: 7C 45
0ED1CA00: 20 1B
0ED1CC00: 2E 17
0ED1CE00: 38 3F
0ED1D000: 95 9C
0ED1D200: 00 0B
0ED1D400: 00 09
0ED1D600: E8 E7
0ED1D800: 00 09
0ED1DA00: 80 8B
0ED1DC00: 0F 06
0ED1DE00: E4 E3
0ED1E000: 35 2C
0ED1E200: 0A 11
0ED1E400: 21 38
0ED1E600: 15 0A
0ED1E800: 06 1F
0ED1EA00: 08 13
0ED1EC00: 01 18
0ED1EE00: CF C8
0ED1F000: FF F6
0ED1F200: 10 1B
0ED1F400: 00 09
0ED1F600: DE D1
0ED1F800: C0 C9
0ED1FA00: E8 E3
0ED1FC00: C0 C9
0ED1FE00: 6B 6C

It is the first byte of each sector that differs in an area at disk
offset 0x0ED1C000. That is 248627200 (248,627,200) bytes, or about 237
MiB into the disk.


Some bytes are wrong and equal in both usbdisk_bs_4096.bin and
usbdisk_bs_65536.bin, so they were not discovered by this file
compare.

The silent read errors are stable, meaning that the results are the
same in both Linux and Windows, and in two different machines, and at
different times. I however did not verify exact Linux/Windows match,
since I do not know the exact read behavior of Linux/dd.

Sector 0 of the disk was zeroed to make certain that the operating
system would not change the disk content during examination.


I made a file expected.bin with the expected disk content for the
first 262,144,000 bytes, and have:

Comparing files expected.bin and \USBDISK_LINUX_BS_4096.BIN

0ED14800: 69 6C
0ED15800: 4D 44
0ED15A00: 00 0B
0ED15C00: 8B 82
0ED15E00: 0C 0B
0ED16000: C2 DB
0ED16200: 1C 07
0ED16400: C0 D9
0ED16600: 92 8D
0ED16800: 19 00
0ED16A00: 83 98
0ED16C00: 42 5B
0ED16E00: 30 37
0ED17000: 38 31
0ED17200: 5B 50
0ED17400: 00 09
0ED17600: 10 1F
0ED17800: FF F6
0ED17A00: EA E1
0ED17C00: 10 19
0ED17E00: 08 0F
0ED18000: 85 FC
0ED18200: 4C 37
0ED18400: 57 2E
0ED18600: 14 6B
0ED18800: 00 79
0ED18A00: 00 7B
0ED18C00: 20 59
0ED18E00: E9 EE
0ED19000: 10 19
0ED19200: 55 5E
0ED19400: AC A5
0ED19600: A8 A7
0ED19800: 4C 45
0ED19A00: 0F 04
0ED19C00: 75 7C
0ED19E00: 50 57
0ED1A000: 24 3D
0ED1A200: 00 1B
0ED1A400: 48 51
0ED1A600: 00 1F
0ED1A800: FD E4
0ED1AA00: 66 7D
0ED1AC00: 00 19
0ED1AE00: 24 23
0ED1B000: E8 E1
0ED1B200: 00 0B
0ED1B400: 8B 82
0ED1B600: 02 0D
0ED1B800: 24 2D
0ED1BA00: 0B 00
0ED1BC00: 03 0A
0ED1BE00: 7C 7B

Comparing files expected.bin and \USBDISK_LINUX_BS_65536.BIN

0ED14800: 69 6C
0ED15800: 4D 44
0ED15A00: 00 0B
0ED15C00: 8B 82
0ED15E00: 0C 0B
0ED16000: C2 DB
0ED16200: 1C 07
0ED16400: C0 D9
0ED16600: 92 8D
0ED16800: 19 00
0ED16A00: 83 98
0ED16C00: 42 5B
0ED16E00: 30 37
0ED17000: 38 31
0ED17200: 5B 50
0ED17400: 00 09
0ED17600: 10 1F
0ED17800: FF F6
0ED17A00: EA E1
0ED17C00: 10 19
0ED17E00: 08 0F
0ED18000: 85 FC
0ED18200: 4C 37
0ED18400: 57 2E
0ED18600: 14 6B
0ED18800: 00 79
0ED18A00: 00 7B
0ED18C00: 20 59
0ED18E00: E9 EE
0ED19000: 10 19
0ED19200: 55 5E
0ED19400: AC A5
0ED19600: A8 A7
0ED19800: 4C 45
0ED19A00: 0F 04
0ED19C00: 75 7C
0ED19E00: 50 57
0ED1A000: 24 3D
0ED1A200: 00 1B
0ED1A400: 48 51
0ED1A600: 00 1F
0ED1A800: FD E4
0ED1AA00: 66 7D
0ED1AC00: 00 19
0ED1AE00: 24 23
0ED1B000: E8 E1
0ED1B200: 00 0B
0ED1B400: 8B 82
0ED1B600: 02 0D
0ED1B800: 24 2D
0ED1BA00: 0B 00
0ED1BC00: 03 0A
0ED1BE00: 7C 7B
0ED1C000: C1 F8
0ED1C200: 20 1B
0ED1C400: 0B 32
0ED1C600: 6C 53
0ED1C800: 7C 45
0ED1CA00: 20 1B
0ED1CC00: 2E 17
0ED1CE00: 38 3F
0ED1D000: 95 9C
0ED1D200: 00 0B
0ED1D400: 00 09
0ED1D600: E8 E7
0ED1D800: 00 09
0ED1DA00: 80 8B
0ED1DC00: 0F 06
0ED1DE00: E4 E3
0ED1E000: 35 2C
0ED1E200: 0A 11
0ED1E400: 21 38
0ED1E600: 15 0A
0ED1E800: 06 1F
0ED1EA00: 08 13
0ED1EC00: 01 18
0ED1EE00: CF C8
0ED1F000: FF F6
0ED1F200: 10 1B
0ED1F400: 00 09
0ED1F600: DE D1
0ED1F800: C0 C9
0ED1FA00: E8 E3
0ED1FC00: C0 C9
0ED1FE00: 6B 6C


Before sector 0 was zeroed: Known files were copied correctly in
Windows to a local disk using file system calls.

I tried again: Sector 0 restored, the files copied: All known files
correct.

And in Linux on another PC: All known files copied correct.

Sector 0 zeroed again.

"Findpart getsect" directly from the location of a file, which was
wrong in inside the files made with dd: The content correct.

Findpart getsect for the first 512000 sectors still matched the wrong
content which made me discover the problem, except for differences
which seems to be directory file access dates.

The file system is FAT32. The sector number of cluster 2 is 30772.

My current theory is that the errors are not seen with this disk if
the read sector address modulus 8 is 4. Since cluster 2 address 30772
mod 8 is 4, and the cluster size 8 sectors, read directly to cluster
addresses will not give errors.
 
Does it matter if the disk is connected on the USB or internal IDE/SATA
interface on that particular PC?

I was thinking if this is a version of the old problem on some USB
equipped mobo's, where very large files could be written out ok, but not
read back correctly, due to clock drifting.

/Rolf
 
Previously Svend Olaf Mikkelsen said:
Previously "bit errors" were discussed. That was when wrong content
was read from a disk without error messages or warnings. Typically
there was a pattern. An example may have been that 1 bit location in
each 16 bit word was always returned as 0.

Currently I have an 8 GB USB disk (USB flash drive) with "silent read
errors".
The disk has 15753215 sectors, each 512 bytes.
In Linux using dd, I retrieved the first 262,144,000 bytes:

dd if=/dev/sda of=usbdisk_bs_4096.bin bs=4096 count=64000
dd if=/dev/sda of=usbdisk_bs_65536.bin bs=65536 count=4000

And afterwards in Windows:
fc /b usbdisk_bs_4096.bin usbdisk_bs_65536.bin

Comparing files usbdisk_linux_bs_4096.bin and
USBDISK_LINUX_BS_65536.BIN
0ED1C000: C1 F8
0ED1C200: 20 1B
0ED1C400: 0B 32
0ED1C600: 6C 53
0ED1C800: 7C 45
0ED1CA00: 20 1B
0ED1CC00: 2E 17
0ED1CE00: 38 3F
0ED1D000: 95 9C
0ED1D200: 00 0B
0ED1D400: 00 09
0ED1D600: E8 E7
0ED1D800: 00 09
0ED1DA00: 80 8B
0ED1DC00: 0F 06
0ED1DE00: E4 E3
0ED1E000: 35 2C
0ED1E200: 0A 11
0ED1E400: 21 38
0ED1E600: 15 0A
0ED1E800: 06 1F
0ED1EA00: 08 13
0ED1EC00: 01 18
0ED1EE00: CF C8
0ED1F000: FF F6
0ED1F200: 10 1B
0ED1F400: 00 09
0ED1F600: DE D1
0ED1F800: C0 C9
0ED1FA00: E8 E3
0ED1FC00: C0 C9
0ED1FE00: 6B 6C
It is the first byte of each sector that differs in an area at disk
offset 0x0ED1C000. That is 248627200 (248,627,200) bytes, or about 237
MiB into the disk.

Some bytes are wrong and equal in both usbdisk_bs_4096.bin and
usbdisk_bs_65536.bin, so they were not discovered by this file
compare.
The silent read errors are stable, meaning that the results are the
same in both Linux and Windows, and in two different machines, and at
different times. I however did not verify exact Linux/Windows match,
since I do not know the exact read behavior of Linux/dd.
Sector 0 of the disk was zeroed to make certain that the operating
system would not change the disk content during examination.

I made a file expected.bin with the expected disk content for the
first 262,144,000 bytes, and have:
Comparing files expected.bin and \USBDISK_LINUX_BS_4096.BIN
0ED14800: 69 6C
0ED15800: 4D 44
0ED15A00: 00 0B
0ED15C00: 8B 82
0ED15E00: 0C 0B
0ED16000: C2 DB
0ED16200: 1C 07
0ED16400: C0 D9
0ED16600: 92 8D
0ED16800: 19 00
0ED16A00: 83 98
0ED16C00: 42 5B
0ED16E00: 30 37
0ED17000: 38 31
0ED17200: 5B 50
0ED17400: 00 09
0ED17600: 10 1F
0ED17800: FF F6
0ED17A00: EA E1
0ED17C00: 10 19
0ED17E00: 08 0F
0ED18000: 85 FC
0ED18200: 4C 37
0ED18400: 57 2E
0ED18600: 14 6B
0ED18800: 00 79
0ED18A00: 00 7B
0ED18C00: 20 59
0ED18E00: E9 EE
0ED19000: 10 19
0ED19200: 55 5E
0ED19400: AC A5
0ED19600: A8 A7
0ED19800: 4C 45
0ED19A00: 0F 04
0ED19C00: 75 7C
0ED19E00: 50 57
0ED1A000: 24 3D
0ED1A200: 00 1B
0ED1A400: 48 51
0ED1A600: 00 1F
0ED1A800: FD E4
0ED1AA00: 66 7D
0ED1AC00: 00 19
0ED1AE00: 24 23
0ED1B000: E8 E1
0ED1B200: 00 0B
0ED1B400: 8B 82
0ED1B600: 02 0D
0ED1B800: 24 2D
0ED1BA00: 0B 00
0ED1BC00: 03 0A
0ED1BE00: 7C 7B
Comparing files expected.bin and \USBDISK_LINUX_BS_65536.BIN
0ED14800: 69 6C
0ED15800: 4D 44
0ED15A00: 00 0B
0ED15C00: 8B 82
0ED15E00: 0C 0B
0ED16000: C2 DB
0ED16200: 1C 07
0ED16400: C0 D9
0ED16600: 92 8D
0ED16800: 19 00
0ED16A00: 83 98
0ED16C00: 42 5B
0ED16E00: 30 37
0ED17000: 38 31
0ED17200: 5B 50
0ED17400: 00 09
0ED17600: 10 1F
0ED17800: FF F6
0ED17A00: EA E1
0ED17C00: 10 19
0ED17E00: 08 0F
0ED18000: 85 FC
0ED18200: 4C 37
0ED18400: 57 2E
0ED18600: 14 6B
0ED18800: 00 79
0ED18A00: 00 7B
0ED18C00: 20 59
0ED18E00: E9 EE
0ED19000: 10 19
0ED19200: 55 5E
0ED19400: AC A5
0ED19600: A8 A7
0ED19800: 4C 45
0ED19A00: 0F 04
0ED19C00: 75 7C
0ED19E00: 50 57
0ED1A000: 24 3D
0ED1A200: 00 1B
0ED1A400: 48 51
0ED1A600: 00 1F
0ED1A800: FD E4
0ED1AA00: 66 7D
0ED1AC00: 00 19
0ED1AE00: 24 23
0ED1B000: E8 E1
0ED1B200: 00 0B
0ED1B400: 8B 82
0ED1B600: 02 0D
0ED1B800: 24 2D
0ED1BA00: 0B 00
0ED1BC00: 03 0A
0ED1BE00: 7C 7B
0ED1C000: C1 F8
0ED1C200: 20 1B
0ED1C400: 0B 32
0ED1C600: 6C 53
0ED1C800: 7C 45
0ED1CA00: 20 1B
0ED1CC00: 2E 17
0ED1CE00: 38 3F
0ED1D000: 95 9C
0ED1D200: 00 0B
0ED1D400: 00 09
0ED1D600: E8 E7
0ED1D800: 00 09
0ED1DA00: 80 8B
0ED1DC00: 0F 06
0ED1DE00: E4 E3
0ED1E000: 35 2C
0ED1E200: 0A 11
0ED1E400: 21 38
0ED1E600: 15 0A
0ED1E800: 06 1F
0ED1EA00: 08 13
0ED1EC00: 01 18
0ED1EE00: CF C8
0ED1F000: FF F6
0ED1F200: 10 1B
0ED1F400: 00 09
0ED1F600: DE D1
0ED1F800: C0 C9
0ED1FA00: E8 E3
0ED1FC00: C0 C9
0ED1FE00: 6B 6C

Before sector 0 was zeroed: Known files were copied correctly in
Windows to a local disk using file system calls.
I tried again: Sector 0 restored, the files copied: All known files
correct.
And in Linux on another PC: All known files copied correct.
Sector 0 zeroed again.
"Findpart getsect" directly from the location of a file, which was
wrong in inside the files made with dd: The content correct.
Findpart getsect for the first 512000 sectors still matched the wrong
content which made me discover the problem, except for differences
which seems to be directory file access dates.
The file system is FAT32. The sector number of cluster 2 is 30772.
My current theory is that the errors are not seen with this disk if
the read sector address modulus 8 is 4. Since cluster 2 address 30772
mod 8 is 4, and the cluster size 8 sectors, read directly to cluster
addresses will not give errors.

That is quite an interesting observation you have there.
Lets see:
- It is not a RAM Error in DRAM, since DRAM is organized as
single bits per chip (typically).
- It may be an SRAM error, SRAM is organized in Bytes. In this
case it would be the SRAM in the HDD itself.
- The errors are stable, and different when reading different
block sizes. Sounds rather unlikely for a RAM error.

This could be a software error either in the disk or in the
OS. This theory would also be supported by the repeatability.
If I understand you correctly, the errors are not present
on a different PC. That would indicate the disk itself is
fine and the HDD<->USB adapter as well.

It could also be a hardware error in the mainboard, either
in the USB host controller or in the DMA unit that transfers
the data to main memory.

Arno
 
Back
Top