Seagate - SMART Raw Read Error Rate test

  • Thread starter Thread starter Franc Zabkar
  • Start date Start date
F

Franc Zabkar

I've been trying to make sense of the SMART Raw Read Error Rate
attribute reported by my Seagate drive, model ST3120026A.

To this end I have conducted an experiment where I've booted to a
FreeDOS diskette which creates a RAM drive containing the following
programs:

debug.exe (from MS-DOS)
smartudm.exe (a DOS SMART utility)

http://www.sysinfolab.com/files/smartudm.zip (37KB)

I have used Smartudm to record the values of the HDD's SMART
attributes before and after each operation.

To establish a baseline, I have executed the following commands:

smartudm 0 /r before.rpt
smartudm 0 /r after.rpt

The before and after reports show that the Seek Error Rate (SER)
increases by 8 counts and the Raw Read Error Rate (RRER) increases by
3. This appears to be the overhead for Smartudm.

The following commands also produce the same result:

smartudm 0 /r before.rpt
debug
-q
smartudm 0 /r after.rpt

I now use Debug to read a certain number of sectors from the C: drive
(HDD) as follows:

debug
-L 100 2 0 nnn (where nnn = number of sectors in hex)
-Q

The following table shows how the values for SER and RRER are
affected:

Sectors SER RRER
------------------------------------------
0x001 +9 (+1) +0x341 (+0x33E)
0x200 +9 (+1) +0x341 (+0x33E)
0x210 +9 (+1) +0x342 (+0x33F) +0x001
0x280 +9 (+1) +0x3B2 (+0x3AF) +0x070
0x300 +9 (+1) +0x432 (+0x42F) +0x080
0x400 +10 (+2) +0x532 (+0x52F) +0x100

The figures in brackets are adjusted to account for Smartudm's
overhead. The last column indicates the increase in the RRER count.

It appears that the HDD reads a minimum of 0x33E (=830) sectors, after
which each increment in the RRER value corresponds to one additional
sector. AFAICT, the HDD always reads 0x132 sectors more than
requested, which probably corresponds to a look-ahead buffer of around
150KB.

- Franc Zabkar
 
To establish a baseline, I have executed the following commands:

smartudm 0 /r before.rpt
smartudm 0 /r after.rpt

The before and after reports show that the Seek Error Rate (SER)
increases by 8 counts and the Raw Read Error Rate (RRER) increases by
3. This appears to be the overhead for Smartudm.

Why do 8 seeks result in only 3 reads?
I now use Debug to read a certain number of sectors from the C: drive
(HDD) ...
The following table shows how the values for SER and RRER are
affected:

Sectors SER RRER
------------------------------------------
0x001 +9 (+1) +0x341 (+0x33E)
0x200 +9 (+1) +0x341 (+0x33E)
0x210 +9 (+1) +0x342 (+0x33F) +0x001
0x280 +9 (+1) +0x3B2 (+0x3AF) +0x070
0x300 +9 (+1) +0x432 (+0x42F) +0x080
0x400 +10 (+2) +0x532 (+0x52F) +0x100

The figures in brackets are adjusted to account for Smartudm's
overhead. The last column indicates the increase in the RRER count.

It appears that the HDD reads a minimum of 0x33E (=830) sectors, after
which each increment in the RRER value corresponds to one additional
sector. AFAICT, the HDD always reads 0x132 sectors more than
requested, which probably corresponds to a look-ahead buffer of around
150KB.

It would appear that the drive doesn't cache reads. For example, I
would have thought that when reading 0x400 sectors, the first 0x300
sectors could be fetched from the drive's read cache.

Is this behaviour by design, or does the read cache need to be
explicitly enabled? Does Seagate expect the OS to handle read caching
rather than the drive?

BTW, I am aware that the drive's write caching can be enabled or
disabled.

- Franc Zabkar
 
I've been trying to make sense of the SMART Raw Read Error Rate
attribute reported by my Seagate drive, model ST3120026A.

I know it's bad practice to reply to one's own posts, but here is an
illuminating message from Seagate's forums:
http://forums.seagate.com/stx/board/message?board.id=ata_drives&message.id=8843#M8843

The OP states that the Raw Read Error Rate counts to 250,000,000 and
then switches back to 0.

I suspect that the lower 28 bits may reflect a sector count, allowing
for 268,435,456 reads. The uppermost bits may hold an error count. The
cycle is probably repeated for the next block of 256M reads, and the
normalised value is probably incremented or decremented depending on
the new error count.

- Franc Zabkar
 
I know it's bad practice to reply to one's own posts, but here is an

Nah, reply to any post you feel like ;)

Interesting thread, I've got a 'bad' 250GB drive that has a huge seek
error rate, right from day one, yet it passes the Seagate warranty
return test. Had it for a few years now, never lost data, though it
is slower on test than a similar 250GB drive. The processed numbers?
Worst 38, curently 53 (report threshold is 30).

I checked five more Seagates, from 80GB to 500GB, the 80GB had 5 in
the top 16bits raw value (currently 87), but it's been running over
20000 hours, the rest of the drives have zero on the top 16bits.

Grant.
 
Nah, reply to any post you feel like ;)

Interesting thread, I've got a 'bad' 250GB drive that has a huge seek
error rate, right from day one, yet it passes the Seagate warranty
return test. Had it for a few years now, never lost data, though it
is slower on test than a similar 250GB drive. The processed numbers?
Worst 38, curently 53 (report threshold is 30).

I checked five more Seagates, from 80GB to 500GB, the 80GB had 5 in
the top 16bits raw value (currently 87), but it's been running over
20000 hours, the rest of the drives have zero on the top 16bits.

Grant.

I repeated my Debug/Smartudm test for a 20GB Fujitsu MPF3024AT HDD.

Sectors SER RRER
----------------------------------
0x001 +1 +0x1DE
0x200 +0x3DD +0x1FF
0x300 +0x4DD +0x100
0x400 +3 +0x5DD +0x100

It appears that Fujitsu also counts reads and seeks in the lower bytes
of the raw attribute value. The look-ahead read buffer appears to be
0x1DD sectors, ie 244KB, and the read cache appears to be disabled.

Fujitsu differs from Seagate in that repeatedly retrieving SMART data
does not increment the SER or RRER counts. This may be because Fujitsu
does not count any SMART related disc activity toward the SMART
attributes, or perhaps the SMART data are stored in EEPROM rather than
on the platters.

After examining more than a year of daily Fujitsu SMART reports, it
appears that the maximum raw values for the RRER and SER attributes
are 0x3FFFF (=256K) and 0xFFF (=4K), respectively.

Therefore it looks like Seagate's raw numbers are much higher than
those of other manufacturers merely because Seagate uses a much larger
number of counts for averaging purposes.

- Franc Zabkar
 
In message <[email protected]> Grant
Interesting thread, I've got a 'bad' 250GB drive that has a huge seek
error rate, right from day one, yet it passes the Seagate warranty
return test. Had it for a few years now, never lost data, though it
is slower on test than a similar 250GB drive.

Why didn't you / don't you warranty it?
 
In message <[email protected]> Grant
You cannot read? Drive passes the Seagate RMA warranty test.

Unimportant, if you're seeing huge seek error rates, or significantly
slower performance then matching (model+firmware) drives, Seagate
absolutely will accept the RMA request.

A failure in SeaTools is one reason to send in an RMA request, but it's
not the only reason that will be accepted.
 
In message <[email protected]> Grant


Unimportant, if you're seeing huge seek error rates, or significantly
slower performance then matching (model+firmware) drives, Seagate
absolutely will accept the RMA request.

AFAICT, huge raw values for Seagate's Seek Error Rate SMART attribute
are nearly always a very good sign.

For example, these are the data for my 120GB ST3120026A HDD:
http://www.users.on.net/~fzabkar/SmartUDM/120GB.RPT

Attribute ID Threshold Value Worst Raw
----------------------------------------------------------------
Seek Error Rate 7 30 79 60 00000580A6ACh

The seek error rate appears to be 0 errors in 92 million seeks.

If by "huge seek error rate" you mean low numbers for the normalised
value, then that's a different matter ...

- Franc Zabkar
 
Previously DevilsPGD said:
Unimportant, if you're seeing huge seek error rates, or significantly
slower performance then matching (model+firmware) drives, Seagate
absolutely will accept the RMA request.
A failure in SeaTools is one reason to send in an RMA request, but it's
not the only reason that will be accepted.

Seagate sasys somnething different on their website, but it is
possible to trick them by claiming Seatools would not even run and get
an RMA number that way. As drives are very likely not tested when
they are received on an RMA, you will get a replacement anyways. I
have used this approach for Seagate and Maxtor sucessfully on
not-quite-deat-yet drives.

So, while a huge raw seek error rate may not mean anything, you
will very likely get a replacemen even for a drive that is
completely fine. Significantly slower performance, however,
is a clear warning sign. Of course you need to measure this
without filesystem, as the filesystem can also cause slowdowns
in disks that are fine.

Arnio
 
In message <[email protected]> Grant


Unimportant, if you're seeing huge seek error rates, or significantly
slower performance then matching (model+firmware) drives, Seagate
absolutely will accept the RMA request.

Okay, sorry for sarcasm :) Actually, seems it was slower than an older
model, same capacity.
A failure in SeaTools is one reason to send in an RMA request, but it's
not the only reason that will be accepted.

I can't use SeaTools 'cos it flips out when it sees a linux lilo
boot manager :( So I used 'dd' to write zeroes to entire drive
and the drive improved in speed (this was a couple years ago).
An extended self-test also reports drive is okay.

OTOH I hqave another Seagate which scrambled itself around 10000
hours (recovered with 'dd' zeroing) but the drive goofed up again
at 19000 hours and now has a large error history and an nonzero
reallocated sector count. Got an RMS number for that last week
but not yet sent it off.

'magpie' has two 250GB Seagates, here's the good one:

root@magpie:~# smartctl -s on -a /dev/hda
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.8 family
Device Model: ST3250823A
Serial Number: 4ND1BLNH
Firmware Version: 3.03
User Capacity: 250,059,350,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Feb 5 10:27:50 2009 EST
SMART support is: Available - device has SMART capability.
SMART support is: Disabled

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 84) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 047 045 006 Pre-fail Always - 175696614
3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 999
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail Always - 93857194
9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 16609
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 563
194 Temperature_Celsius 0x0022 039 053 000 Old_age Always - 39 (0 14 0 0)
195 Hardware_ECC_Recovered 0x001a 047 045 000 Old_age Always - 175696614
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10595 -
# 2 Short offline Completed without error 00% 10594 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

- - -

And the 'bad' drive with the huge seek error rate value:

root@magpie:~# smartctl -s on -a /dev/hdc
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.9 family
Device Model: ST3250624A
Serial Number: 5ND3HF83
Firmware Version: 3.AAE
User Capacity: 250,059,350,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Feb 5 10:29:39 2009 EST
SMART support is: Available - device has SMART capability.
SMART support is: Disabled

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 100) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 090 006 Pre-fail Always - 201484312
3 Spin_Up_Time 0x0003 099 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1464
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 053 038 030 Pre-fail Always - 7413479801581
9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10399
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 593
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 061 049 045 Old_age Always - 39 (Lifetime Min/Max 39/39)
194 Temperature_Celsius 0x0022 039 051 000 Old_age Always - 39 (0 15 0 0)
195 Hardware_ECC_Recovered 0x001a 082 051 000 Old_age Always - 193196196
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 4395 -
# 2 Short offline Completed without error 00% 4393 -
# 3 Extended offline Completed without error 00% 2099 -
# 4 Extended offline Completed without error 00% 1971 -
# 5 Conveyance offline Completed without error 00% 1970 -
# 6 Extended offline Interrupted (host reset) 20% 1950 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

- - -

And today, there's not much difference in speed accessing the first,
same sized 8GB partition on each drive:

root@magpie:~# time dd if=/dev/hda1 bs=4k of=/dev/null
2056312+1 records in
2056312+1 records out
8422654464 bytes (8.4 GB) copied, 120.697 s, 69.8 MB/s

real 2m0.765s
user 0m1.008s
sys 0m13.025s
root@magpie:~# time dd if=/dev/hdc1 bs=4k of=/dev/null
2056312+1 records in
2056312+1 records out
8422654464 bytes (8.4 GB) copied, 123.494 s, 68.2 MB/s

real 2m3.531s
user 0m1.024s
sys 0m13.085s

- - -
So, apart from the very high seek error rate, /dev/hdc seems okay?
That's why I never RMA'd it.

Thanks,
Grant.
 
If by "huge seek error rate" you mean low numbers for the normalised
value, then that's a different matter ...

7 Seek_Error_Rate 0x000f 053 038 030 Pre-fail Always - 7413479801581

See other post with full dump ;)

Grant.
 
....
Seagate sasys somnething different on their website, but it is
possible to trick them by claiming Seatools would not even run and get
an RMA number that way. As drives are very likely not tested when
they are received on an RMA, you will get a replacement anyways. I
have used this approach for Seagate and Maxtor sucessfully on
not-quite-deat-yet drives.

So, while a huge raw seek error rate may not mean anything, you
will very likely get a replacemen even for a drive that is
completely fine. Significantly slower performance, however,
is a clear warning sign. Of course you need to measure this
without filesystem, as the filesystem can also cause slowdowns
in disks that are fine.

Oh yes, measured by reading unmounted partition/s to /dev/null.

The slower performance fixed itself after I wrote zeroes to the
entire drive. But then, I have a Seagate 13GB drive that came
out of a box where it was not bolted down, had a huge bad sectors
area when I first checked it -- and writing zeroes to the drive
recovered it too! But that drive is sitting on a cupboard, a
uselessly small spare for some old machine I'm unlikely to rebuild.

Grant.
 
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 047 045 006 Pre-fail Always - 175696614
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail Always - 93857194
9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 16609

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 090 006 Pre-fail Always - 201484312
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 053 038 030 Pre-fail Always - 7413479801581
9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10399

AFAICT, the first drive has an SER of 0 errors in 94 million seeks,
whereas the second has recorded 4.7 errors per million (0x6be /
0x15d482ed). However, the second drive has a much better RRER (119 vs
47).

Based on that analysis, would there be any reason to be seriously
concerned about either drive?

- Franc Zabkar
 
AFAICT, the first drive has an SER of 0 errors in 94 million seeks,
whereas the second has recorded 4.7 errors per million (0x6be /
0x15d482ed). However, the second drive has a much better RRER (119 vs
47).

Based on that analysis, would there be any reason to be seriously
concerned about either drive?

Well, that's why I haven't RMAd the thing, it got better after the
write zeroes to entire surface part of the RMA check. The shear
width of the SER number (four digits wider) compared to other drives
was the concern.

Grant.
 
Well, that's why I haven't RMAd the thing, it got better after the
write zeroes to entire surface part of the RMA check. The shear
width of the SER number (four digits wider) compared to other drives
was the concern.

Grant.

Seagate and Hitachi appear to specify a seek error rate of 1 error in
10^7 seeks. Your "bad" drive has 47 times that number.

See page 11 of this document:
http://www.hitachigst.com/tech/tech...723EE2D5C186256D75004DC973/$file/Sp32ej03.PDF

Having said that, the SMART threshold appears to correspond to an SER
of around 1 error per 1000 seeks.

As for the RRER, the specs allow for 10 recoverable errors in 10^12
bits read. I assume this includes the ECC bits, ID bits, sync bits,
etc. However, if for simplicity's sake we just allow for 4096 bits per
sector, then the allowable RRER is 1 error in 24 million sectors.
Since Seagate's sector count occupies the lower 28 bits of the raw
RRER attribute (ie a max count of 256 million sectors), then it seems
reasonable to assume that one could expect around 10 errors in the
uppermost bits, assuming that is where the error count is stored.
However, I've yet to see any SMART report with anything other than 0
in those bits.

- Franc Zabkar
 
Interesting thread, I've got a 'bad' 250GB drive that has a huge seek
error rate, right from day one, yet it passes the Seagate warranty
return test. Had it for a few years now, never lost data, though it
is slower on test than a similar 250GB drive. The processed numbers?
Worst 38, curently 53 (report threshold is 30).

I checked five more Seagates, from 80GB to 500GB, the 80GB had 5 in
the top 16bits raw value (currently 87), but it's been running over
20000 hours, the rest of the drives have zero on the top 16bits.

Grant.

Here is a bad Fujitsu drive with data in the uppermost bits of the
RRER attribute:

http://www.diydatarecovery.com/forum/index.php?topic=60.0

I don't know what to make of the 0x011D value (see below), or maybe it
is 0x011D0.

The reallocated sector count appears to be 0x003F (=63), but its two
other 16-bit values (0x003B and 0x0795) are a mystery. Or maybe 0x003B
(=59) is the actual number of reallocated sectors and 0x0795 (=1941)
could be the number of spare sectors remaining, making a total of 2000
spares ???

The reallocation event count also appears to consist of 2 (or 3?)
components.

I notice also that, unlike my 6GB and 20GB Fujitsu drives, the unit
used for the Power-On Time Count appears to be hours rather than
seconds. Yet the Diskpatch software assumes the latter.

It appears that the moral of the story is not to blindly interpret raw
SMART data as absolute values, even in the case of Reallocated
Sectors, and don't rely on the author of the SMART software to
interpret the data correctly.

=====================================================================
1 : Raw Read Error Rate Threshold=46 Value=100 Worst=60
Data=011D000330FA Status=OK
2 : Throughput Performance Threshold=0 Value=100 Worst=100
Data=000000D40000 Status=OK
4 : Start/Stop Count Threshold=0 Value=99 Worst=99
Data=0000000000C9 Status=OK
5 : Reallocated Sectors Count Threshold=24 Value=98 Worst=98
Data=0795003B003F Status=OK
7 : Seek Error Rate Threshold=0 Value=100 Worst=100
Data=0000000002F9 Status=OK
9 : Power-On Time Count Threshold=0 Value=98 Worst=98
Data=0000000005A6 Status=OK
12 : Drive Power Cycle Count Threshold=0 Value=100 Worst=100
Data=0000000000C8 Status=OK
194: HDD Temperature Threshold=0 Value=100 Worst=100
Data=003400100028 Status=OK
195: Hardware ECC Recovered Threshold=0 Value=100 Worst=100
Data=000000000088 Status=OK
196: Reallocation Event Count Threshold=0 Value=98 Worst=98
Data=000219D6003F Status=OK
197: Current Pending Sector Count Threshold=0 Value=91 Worst=91
Data=00000000000A Status=OK
198: Off-Line Uncorrectable Sect... Threshold=0 Value=94 Worst=94
Data=00000000000D Status=OK
200: Write Error Rate Threshold=0 Value=98 Worst=98
Data=0000006C44E4 Status=OK
201: Soft Read Error Rate Threshold=0 Value=100 Worst=100
Data=000000000000 Status=OK
203: Run Out Cancel Threshold=0 Value=100 Worst=100
Data=0164014F0318 Status=OK
=====================================================================

- Franc Zabkar
 
Here is a bad Fujitsu drive with data in the uppermost bits of the
RRER attribute:

I don't know what to make of the 0x011D value (see below), or maybe it
is 0x011D0.
The reallocated sector count appears to be 0x003F (=63), but its two
other 16-bit values (0x003B and 0x0795) are a mystery. Or maybe 0x003B
(=59) is the actual number of reallocated sectors and 0x0795 (=1941)
could be the number of spare sectors remaining, making a total of 2000
spares ???
The reallocation event count also appears to consist of 2 (or 3?)
components.
I notice also that, unlike my 6GB and 20GB Fujitsu drives, the unit
used for the Power-On Time Count appears to be hours rather than
seconds. Yet the Diskpatch software assumes the latter.
It appears that the moral of the story is not to blindly interpret raw
SMART data as absolute values, even in the case of Reallocated
Sectors, and don't rely on the author of the SMART software to
interpret the data correctly.

Definitely. I also have seen reallocated counts that seem to count
backwards from some number. Might have been a Fujitsu 40GB notebook
drive.

Incidentially, I have several disks with misinterpreded lifetime
attributes and I know that the temperature attribute is not
standardized. The smartmontools use a plain text configuration file
fior it, and I have contributes rules for several new drive models.
But temperature interpretetion has the advantage of being easy to
validate.

The right way to look is to look at both raw and cocked values
and then to apply common sense. What also helps is to look at
the dynamics (if available). For example, I had one 200GB disk in
a server that suddenly (less than 1 hour time) had about 200
reallocated sectors. It never got any additional ones and performed
fine for 3 more years. Dying disks typically get more of them
over time or after full scans (long SMART selftest).

If nothing makes sense in the raw values, look at the cocked values
only.

Arno
 
Back
Top