J
Joe Pfeiffer
I've got a disk drive which seems to have turned up a bad block, but for
which auto reallocation is failing; so I'm looking for advice....
The disk is:
babs:525$ sudo smartctl -i /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green family
Device Model: WDC WD10EACS-00D6B0
Here's what I was seeing in my logs:
Feb 18 09:26:20 babs smartd[4207]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Feb 18 09:26:20 babs smartd[4207]: Device: /dev/sda [SAT], 1 Offline uncorrectable sectors
Took a look at the attributes:
babs:524$ sudo smartctl -A /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 199 051 Pre-fail Always - 37
3 Spin_Up_Time 0x0027 180 160 021 Pre-fail Always - 5966
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 92
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 051 Old_age Always - 0
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 18712
10 Spin_Retry_Count 0x0032 100 253 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 91
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 17
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 92
194 Temperature_Celsius 0x0022 115 100 000 Old_age Always - 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 1
It's my understanding that attribute 5 is the actual count of
reallocated sectors on the disk -- in this case, that no sectors had
been reallocated, so all the spare sectors on the disk are available.
Went to find the bad sector:
babs:530$ sudo smartctl -l selftest /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 18711 387454402
Translated the LBA to the block in my filesystem and verified I couldn't
read it:
babs:537$ sudo dd if=/dev/sda1 of=/dev/null ibs=4096 skip=48431792
dd: reading `/dev/sda1': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 18.5663 s, 0.0 kB/s
(found the file it was in and deleted the file)
Now comes the part that's got me puzzled: wrote over the block to force
a reallocation:
babs:545$ sudo dd if=/dev/zero of=/dev/sda1 obs=4096 seek=48431792 count=1
dd: writing `/dev/sda1': Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 19.1174 s, 0.0 kB/s
and here's what turned up in my log when I tried it:
Feb 18 10:55:07 babs kernel: [583184.036616] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:07 babs kernel: [583184.036622] ata1.00: BMDMA stat 0x24
Feb 18 10:55:07 babs kernel: [583184.036627] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:07 babs kernel: [583184.036636] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:07 babs kernel: [583184.036639] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:07 babs kernel: [583184.036644] ata1.00: status: { DRDY ERR }
Feb 18 10:55:07 babs kernel: [583184.036647] ata1.00: error: { UNC }
Feb 18 10:55:07 babs kernel: [583184.052596] ata1.00: configured for UDMA/133
Feb 18 10:55:07 babs kernel: [583184.052607] ata1: EH complete
Feb 18 10:55:10 babs kernel: [583186.992593] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:10 babs kernel: [583186.992598] ata1.00: BMDMA stat 0x24
Feb 18 10:55:10 babs kernel: [583186.992603] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:10 babs kernel: [583186.992612] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:10 babs kernel: [583186.992615] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:10 babs kernel: [583186.992619] ata1.00: status: { DRDY ERR }
Feb 18 10:55:10 babs kernel: [583186.992623] ata1.00: error: { UNC }
Feb 18 10:55:10 babs kernel: [583187.008573] ata1.00: configured for UDMA/133
Feb 18 10:55:10 babs kernel: [583187.008588] ata1: EH complete
Feb 18 10:55:13 babs kernel: [583189.947567] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:13 babs kernel: [583189.947573] ata1.00: BMDMA stat 0x24
Feb 18 10:55:13 babs kernel: [583189.947578] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:13 babs kernel: [583189.947586] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:13 babs kernel: [583189.947589] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:13 babs kernel: [583189.947594] ata1.00: status: { DRDY ERR }
Feb 18 10:55:13 babs kernel: [583189.947597] ata1.00: error: { UNC }
Feb 18 10:55:13 babs kernel: [583189.960532] ata1.00: configured for UDMA/133
Feb 18 10:55:13 babs kernel: [583189.960546] ata1: EH complete
Feb 18 10:55:16 babs kernel: [583192.903546] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:16 babs kernel: [583192.903552] ata1.00: BMDMA stat 0x24
Feb 18 10:55:16 babs kernel: [583192.903557] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:16 babs kernel: [583192.903565] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:16 babs kernel: [583192.903568] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:16 babs kernel: [583192.903573] ata1.00: status: { DRDY ERR }
Feb 18 10:55:16 babs kernel: [583192.903576] ata1.00: error: { UNC }
Feb 18 10:55:16 babs kernel: [583192.916509] ata1.00: configured for UDMA/133
Feb 18 10:55:16 babs kernel: [583192.916524] ata1: EH complete
Feb 18 10:55:19 babs kernel: [583196.002650] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:19 babs kernel: [583196.002656] ata1.00: BMDMA stat 0x24
Feb 18 10:55:19 babs kernel: [583196.002660] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:19 babs kernel: [583196.002669] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:19 babs kernel: [583196.002672] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:19 babs kernel: [583196.002676] ata1.00: status: { DRDY ERR }
Feb 18 10:55:19 babs kernel: [583196.002680] ata1.00: error: { UNC }
Feb 18 10:55:19 babs kernel: [583196.016642] ata1.00: configured for UDMA/133
Feb 18 10:55:19 babs kernel: [583196.016657] ata1: EH complete
Feb 18 10:55:22 babs kernel: [583198.958629] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:22 babs kernel: [583198.958634] ata1.00: BMDMA stat 0x24
Feb 18 10:55:22 babs kernel: [583198.958639] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:22 babs kernel: [583198.958648] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:22 babs kernel: [583198.958651] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:22 babs kernel: [583198.958656] ata1.00: status: { DRDY ERR }
Feb 18 10:55:22 babs kernel: [583198.958659] ata1.00: error: { UNC }
Feb 18 10:55:22 babs kernel: [583198.972618] ata1.00: configured for UDMA/133
Feb 18 10:55:22 babs kernel: [583198.972632] sd 0:0:0:0: [sda] Unhandled sense code
Feb 18 10:55:22 babs kernel: [583198.972636] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb 18 10:55:22 babs kernel: [583198.972640] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Feb 18 10:55:22 babs kernel: [583198.972646] Descriptor sense data with sense descriptors (in hex):
Feb 18 10:55:22 babs kernel: [583198.972649] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Feb 18 10:55:22 babs kernel: [583198.972659] 17 18 15 c2
Feb 18 10:55:22 babs kernel: [583198.972663] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Feb 18 10:55:22 babs kernel: [583198.972669] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 17 18 15 bf 00 00 08 00
Feb 18 10:55:22 babs kernel: [583198.972679] end_request: I/O error, dev sda, sector 387454402
Feb 18 10:55:22 babs kernel: [583198.972697] ata1: EH complete
Feb 18 10:55:32 babs kernel: [583209.040031] powernow-k8: failing targ, change pending bit set
The line saying "Add. Sense: Unrecovered read error - auto reallocate
failed" is, of course, the most interesting....
And the -A attributes are unchanged. So... any thoughts?
which auto reallocation is failing; so I'm looking for advice....
The disk is:
babs:525$ sudo smartctl -i /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green family
Device Model: WDC WD10EACS-00D6B0
Here's what I was seeing in my logs:
Feb 18 09:26:20 babs smartd[4207]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors
Feb 18 09:26:20 babs smartd[4207]: Device: /dev/sda [SAT], 1 Offline uncorrectable sectors
Took a look at the attributes:
babs:524$ sudo smartctl -A /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 199 051 Pre-fail Always - 37
3 Spin_Up_Time 0x0027 180 160 021 Pre-fail Always - 5966
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 92
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 051 Old_age Always - 0
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 18712
10 Spin_Retry_Count 0x0032 100 253 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 91
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 17
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 92
194 Temperature_Celsius 0x0022 115 100 000 Old_age Always - 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 1
It's my understanding that attribute 5 is the actual count of
reallocated sectors on the disk -- in this case, that no sectors had
been reallocated, so all the spare sectors on the disk are available.
Went to find the bad sector:
babs:530$ sudo smartctl -l selftest /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 18711 387454402
Translated the LBA to the block in my filesystem and verified I couldn't
read it:
babs:537$ sudo dd if=/dev/sda1 of=/dev/null ibs=4096 skip=48431792
dd: reading `/dev/sda1': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 18.5663 s, 0.0 kB/s
(found the file it was in and deleted the file)
Now comes the part that's got me puzzled: wrote over the block to force
a reallocation:
babs:545$ sudo dd if=/dev/zero of=/dev/sda1 obs=4096 seek=48431792 count=1
dd: writing `/dev/sda1': Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 19.1174 s, 0.0 kB/s
and here's what turned up in my log when I tried it:
Feb 18 10:55:07 babs kernel: [583184.036616] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:07 babs kernel: [583184.036622] ata1.00: BMDMA stat 0x24
Feb 18 10:55:07 babs kernel: [583184.036627] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:07 babs kernel: [583184.036636] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:07 babs kernel: [583184.036639] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:07 babs kernel: [583184.036644] ata1.00: status: { DRDY ERR }
Feb 18 10:55:07 babs kernel: [583184.036647] ata1.00: error: { UNC }
Feb 18 10:55:07 babs kernel: [583184.052596] ata1.00: configured for UDMA/133
Feb 18 10:55:07 babs kernel: [583184.052607] ata1: EH complete
Feb 18 10:55:10 babs kernel: [583186.992593] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:10 babs kernel: [583186.992598] ata1.00: BMDMA stat 0x24
Feb 18 10:55:10 babs kernel: [583186.992603] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:10 babs kernel: [583186.992612] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:10 babs kernel: [583186.992615] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:10 babs kernel: [583186.992619] ata1.00: status: { DRDY ERR }
Feb 18 10:55:10 babs kernel: [583186.992623] ata1.00: error: { UNC }
Feb 18 10:55:10 babs kernel: [583187.008573] ata1.00: configured for UDMA/133
Feb 18 10:55:10 babs kernel: [583187.008588] ata1: EH complete
Feb 18 10:55:13 babs kernel: [583189.947567] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:13 babs kernel: [583189.947573] ata1.00: BMDMA stat 0x24
Feb 18 10:55:13 babs kernel: [583189.947578] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:13 babs kernel: [583189.947586] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:13 babs kernel: [583189.947589] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:13 babs kernel: [583189.947594] ata1.00: status: { DRDY ERR }
Feb 18 10:55:13 babs kernel: [583189.947597] ata1.00: error: { UNC }
Feb 18 10:55:13 babs kernel: [583189.960532] ata1.00: configured for UDMA/133
Feb 18 10:55:13 babs kernel: [583189.960546] ata1: EH complete
Feb 18 10:55:16 babs kernel: [583192.903546] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:16 babs kernel: [583192.903552] ata1.00: BMDMA stat 0x24
Feb 18 10:55:16 babs kernel: [583192.903557] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:16 babs kernel: [583192.903565] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:16 babs kernel: [583192.903568] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:16 babs kernel: [583192.903573] ata1.00: status: { DRDY ERR }
Feb 18 10:55:16 babs kernel: [583192.903576] ata1.00: error: { UNC }
Feb 18 10:55:16 babs kernel: [583192.916509] ata1.00: configured for UDMA/133
Feb 18 10:55:16 babs kernel: [583192.916524] ata1: EH complete
Feb 18 10:55:19 babs kernel: [583196.002650] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:19 babs kernel: [583196.002656] ata1.00: BMDMA stat 0x24
Feb 18 10:55:19 babs kernel: [583196.002660] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:19 babs kernel: [583196.002669] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:19 babs kernel: [583196.002672] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:19 babs kernel: [583196.002676] ata1.00: status: { DRDY ERR }
Feb 18 10:55:19 babs kernel: [583196.002680] ata1.00: error: { UNC }
Feb 18 10:55:19 babs kernel: [583196.016642] ata1.00: configured for UDMA/133
Feb 18 10:55:19 babs kernel: [583196.016657] ata1: EH complete
Feb 18 10:55:22 babs kernel: [583198.958629] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Feb 18 10:55:22 babs kernel: [583198.958634] ata1.00: BMDMA stat 0x24
Feb 18 10:55:22 babs kernel: [583198.958639] ata1.00: failed command: READ DMA EXT
Feb 18 10:55:22 babs kernel: [583198.958648] ata1.00: cmd 25/00:08:bf:15:18/00:00:17:00:00/e0 tag 0 dma 4096 in
Feb 18 10:55:22 babs kernel: [583198.958651] res 51/40:00:c2:15:18/40:00:17:00:00/e0 Emask 0x9 (media error)
Feb 18 10:55:22 babs kernel: [583198.958656] ata1.00: status: { DRDY ERR }
Feb 18 10:55:22 babs kernel: [583198.958659] ata1.00: error: { UNC }
Feb 18 10:55:22 babs kernel: [583198.972618] ata1.00: configured for UDMA/133
Feb 18 10:55:22 babs kernel: [583198.972632] sd 0:0:0:0: [sda] Unhandled sense code
Feb 18 10:55:22 babs kernel: [583198.972636] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb 18 10:55:22 babs kernel: [583198.972640] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Feb 18 10:55:22 babs kernel: [583198.972646] Descriptor sense data with sense descriptors (in hex):
Feb 18 10:55:22 babs kernel: [583198.972649] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Feb 18 10:55:22 babs kernel: [583198.972659] 17 18 15 c2
Feb 18 10:55:22 babs kernel: [583198.972663] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Feb 18 10:55:22 babs kernel: [583198.972669] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 17 18 15 bf 00 00 08 00
Feb 18 10:55:22 babs kernel: [583198.972679] end_request: I/O error, dev sda, sector 387454402
Feb 18 10:55:22 babs kernel: [583198.972697] ata1: EH complete
Feb 18 10:55:32 babs kernel: [583209.040031] powernow-k8: failing targ, change pending bit set
The line saying "Add. Sense: Unrecovered read error - auto reallocate
failed" is, of course, the most interesting....
And the -A attributes are unchanged. So... any thoughts?