smartctl when disks are in standby (Linux)

  • Thread starter Thread starter David Brown
  • Start date Start date
D

David Brown

I've been playing around a little with standby for disks on a Linux
system - mostly it works well, with disks waking up automatically as needed.

If I try a smartctl test ("smartctl -t short /dev/sda") on a standby
drive, however, it fails - it will wake up the drive, but smartctl gives
up waiting and returns an error message before the drive is ready. One
the drive is up to speed, the test runs fine.

Has anyone come across this before? Is it just that my disks (Samsung
1TB drives) are particularly slow to start up? Or is there a way to
make smartctl wait a little longer before giving up?

mvh.,

David
 
David Brown said:
I've been playing around a little with standby for disks on a Linux
system - mostly it works well, with disks waking up automatically as needed.
If I try a smartctl test ("smartctl -t short /dev/sda") on a standby
drive, however, it fails - it will wake up the drive, but smartctl gives
up waiting and returns an error message before the drive is ready. One
the drive is up to speed, the test runs fine.
Has anyone come across this before? Is it just that my disks (Samsung
1TB drives) are particularly slow to start up? Or is there a way to
make smartctl wait a little longer before giving up?

Can you post the error message from the syslog? It is possible
that not smartctl but the kernel gives up.

In any case, a simple solution could be to wrap smartctl into
something like

head -c 512 device > /dev/null; smartctl command device

If the wakeup by reading the first 512 bytes fails, you
could also add a "sleep 60" or the like.

For something more sophistocated, you can check
the power mode with "hdparm -C <device>".

Arno
 
Arno said:
Can you post the error message from the syslog? It is possible
that not smartctl but the kernel gives up.

I'll have a look when I next have the machine switched on - I didn't
think about checking the syslog. Anything else that I've tried that
needs the disk (such as "ls" if the relevant data is not in the cache)
simply blocks until the disk is up to speed, so I've assumed the issue
is specific to smartctl.
In any case, a simple solution could be to wrap smartctl into
something like

head -c 512 device > /dev/null; smartctl command device

Yes, that's my thought. It might involve slightly more work if I make
use of smartd, such as replacing the original smartctl binary with a
script doing something like your suggestion.
If the wakeup by reading the first 512 bytes fails, you
could also add a "sleep 60" or the like.

For something more sophistocated, you can check
the power mode with "hdparm -C <device>".

I know about "hdpram -C" - but in this case, I don't want to check if
the disk is awake, I want to awaken it!
 
I'll have a look when I next have the machine switched on - I didn't
think about checking the syslog. Anything else that I've tried that
needs the disk (such as "ls" if the relevant data is not in the cache)
simply blocks until the disk is up to speed, so I've assumed the issue
is specific to smartctl.

It may be specific to smartctl or to sending disk commands.
If there is nothing in the syslog, then it is smartctl, otherwise
not necessarily. Because I did not find a commandline setting
for a timeout in smartctl, I think it may well be the kernel.
If it is indeed a timeout in smartctl, that parameter would be
something to propose to the smartctl maintainer.
Yes, that's my thought. It might involve slightly more work if I make
use of smartd, such as replacing the original smartctl binary with a
script doing something like your suggestion.
I know about "hdpram -C" - but in this case, I don't want to check if
the disk is awake, I want to awaken it!

First know - then act! ;-)

This would allow you to skip a wakeup-step and waiting if the HDD
is already up.

Arno
 
Arno said:
It may be specific to smartctl or to sending disk commands.
If there is nothing in the syslog, then it is smartctl, otherwise
not necessarily. Because I did not find a commandline setting
for a timeout in smartctl, I think it may well be the kernel.
If it is indeed a timeout in smartctl, that parameter would be
something to propose to the smartctl maintainer.

Here's a transcript:


host:~# hdparm -y /dev/sda

/dev/sda:
issuing standby command
host:~# hdparm -C /dev/sda

/dev/sda:
drive state is: standby


host:~# smartctl -t short /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Command "Execute SMART Short self-test routine immediately in off-line
mode" failed


host:~# smartctl -c /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status: ( 41) The self-test routine was
interrupted
by the host with a hard or soft
reset.




syslog gives the following:

Sep 26 20:50:23 host kernel: [ 9865.466082] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x6
frozen
Sep 26 20:50:23 host kernel: [ 9865.466110] ata1.00: cmd
b0/d4:00:01:4f:c2/00:00:00:00:00/00 tag 0
Sep 26 20:50:23 host kernel: [ 9865.466111] res
40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 26 20:50:23 host kernel: [ 9865.466154] ata1.00: status: { DRDY }
Sep 26 20:50:25 host kernel: [ 9868.758069] ata1: soft resetting link
Sep 26 20:50:26 host kernel: [ 9869.402326] ata1.00: configured for UDMA/133
Sep 26 20:50:26 host kernel: [ 9869.410326] ata1.01: configured for UDMA/133
Sep 26 20:50:26 host kernel: [ 9869.410349] ata1: EH complete
Sep 26 20:50:26 host kernel: [ 9869.410531] sd 0:0:0:0: [sda] 1953525168
512-byte hardware sectors (1000205 MB)
Sep 26 20:50:26 host kernel: [ 9869.410531] sd 0:0:0:0: [sda] Write
Protect is off
Sep 26 20:50:26 host kernel: [ 9869.410531] sd 0:0:0:0: [sda] Mode
Sense: 00 3a 00 00
Sep 26 20:50:26 host kernel: [ 9869.410531] sd 0:0:0:0: [sda] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 26 20:50:26 host kernel: [ 9869.418556] sd 0:0:1:0: [sdb] 1953525168
512-byte hardware sectors (10Sep 26 20:50:26 offlinebackup kernel: [
9869.418692] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors
(1000205 MB)
Sep 26 20:50:26 host kernel: [ 9869.418726] sd 0:0:0:0: [sda] Write
Protect is off
Sep 26 20:50:26 host kernel: [ 9869.418744] sd 0:0:0:0: [sda] Mode
Sense: 00 3a 00 00
Sep 26 20:50:26 host kernel: [ 9869.418759] sd 0:0:0:0: [sda] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 26 20:50:26 host kernel: [ 9869.418803] sd 0:0:1:0: [sdb] 1953525168
512-byte hardware sectors (1000205 MB)
Sep 26 20:50:26 host kernel: [ 9869.418837] sd 0:0:1:0: [sdb] Write
Protect is off
Sep 26 20:50:26 host kernel: [ 9869.418854] sd 0:0:1:0: [sdb] Mode
Sense: 00 3a 00 00
Sep 26 20:50:26 host kernel: [ 9869.418869] sd 0:0:1:0: [sdb] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA


It seems that both disks (sda and sdb) have been woken up and reset.
Note that I don't get any syslog messages for a normal wakeup.

Excerpt from "smartctl -a /dev/sda" after this failed test:

Self-test execution status: ( 41) The self-test routine was
interrupted
by the host with a hard or soft
reset.


Of course, the "head" command will only work if the first block is not
already in the cache - otherwise it will not wake up the disk.

I haven't yet found any equivalent to "hdparm -y" to force a wakeup -
that would be useful.
First know - then act! ;-)

This would allow you to skip a wakeup-step and waiting if the HDD
is already up.

A command that forces a wakeup will not do any harm, or take much time,
if the disk is already awake, so there is no harm there.
 
David Brown said:
Here's a transcript:

host:~# hdparm -y /dev/sda
/dev/sda:
issuing standby command
host:~# hdparm -C /dev/sda
/dev/sda:
drive state is: standby

host:~# smartctl -t short /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Command "Execute SMART Short self-test routine immediately in off-line
mode" failed

host:~# smartctl -c /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status: ( 41) The self-test routine was
interrupted
by the host with a hard or soft
reset.



syslog gives the following:
Sep 26 20:50:23 host kernel: [ 9865.466082] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x6
frozen
Sep 26 20:50:23 host kernel: [ 9865.466110] ata1.00: cmd
b0/d4:00:01:4f:c2/00:00:00:00:00/00 tag 0
Sep 26 20:50:23 host kernel: [ 9865.466111] res
40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)

Ok, so it is the kernel. With Christians post, it seems the command is
given to the kernel with a timeout parameter. I would deduce
that a normal access either has a larger value for the timeout
or will wait longer in case the drive is not active.

Anyways, Christian seems to have fixed this.

Arno
 
Christian said:
This likely occurs because the of the SCSI timeout value of 6 seconds
used by smartctl for the SAT ATA PASS-THROUGH command.

This is too short to spin up a disk. For example, a 1TB Samsung drive
(HD103UJ) spins up in ~9 seconds.

This sounds like a very likely explanation.
I fixed this in smartmontools r2924, timeout is now 20 seconds. Please
try current code from SVN repository:
http://sourceforge.net/apps/trac/smartmontools/wiki/Download

Thanks for the problem report.

Christian

Many thanks! I'll give this a try this evening, if I get the chance.

mvh.,

David
 
Back
Top