One thing I don't see in the study, is inclusion of
temperature and humidity factors at the same time.
It also doesn't show excessive temperature (though we can't
be sure), and (at least based on manufacturer's claims) a
large % of drives are damaged in handling, no tracking of
the source, delivery, or installation. It also lacks any
conclusion about whether the drives measuring temp are doing
so at the same spot on the drive, if that reading is
accurate relative to the other drives. Suppose for example
all the (randomly picking on WD for no particularly reason)
WD drives that ran at 50C at a far higher failure rate,
because their temp report was significantly below some areas
on the drive, the drive itself was on average significantly
hotter than another brand reporting the same temp.
kony said:It also doesn't show excessive temperature (though we can't
be sure), and (at least based on manufacturer's claims) a
large % of drives are damaged in handling, no tracking of
the source, delivery, or installation. It also lacks any
conclusion about whether the drives measuring temp are doing
so at the same spot on the drive, if that reading is
accurate relative to the other drives. Suppose for example
all the (randomly picking on WD for no particularly reason)
WD drives that ran at 50C at a far higher failure rate,
because their temp report was significantly below some areas
on the drive, the drive itself was on average significantly
hotter than another brand reporting the same temp.
Paul said:(e-mail address removed) wrote
One thing I don't see in the study, is inclusion of
temperature and humidity factors at the same time. A
couple of the drive manufacturers include curves which
show acceptable temperature and humidity conditions.
Which may be why their temperature results show little
impact from high drive operating temperature, if the air
was bone dry.
Still, if we assume 40% R.H. in their datacenters, seeing so little effect from temperature is
surprising.
I would have liked to see brand names for the drives too
It would end a lot of arguments.
"Before being put into production, all disk drives go
through a short burn-in process, which consists of a
combination of read/write stress tests designed to catch
many of the most common assembly, configuration, or
component-level problems. The data shown here do not
include the fall-out from this phase, but instead begin
when the systems are officially commissioned for use.
Therefore our data should be consistent with what a regular
end-user should see, since most equipment manufacturers
put their systems through similar tests before shipment."
This should address the handling issue.
I get that the sense that they left out drive models
in the report, but DID track them internally. But
dammit, I wish they'd publish a lemon list at least.
"3.2 Manufacturers, Models, and Vintages
Failure rates are known to be highly correlated with drive
models, manufacturers and vintages [18]. Our results do
not contradict this fact. For example, Figure 2 changes
significantly when we normalize failure rates per each
drive model. Most age-related results are impacted by
drive vintages. However, in this paper, we do not show a
breakdown of drives per manufacturer, model, or vintage
due to the proprietary nature of these data."
"Before being put into production, all disk drives go
through a short burn-in process, which consists of a
combination of read/write stress tests designed to catch
many of the most common assembly, configuration, or
component-level problems. The data shown here do not
include the fall-out from this phase, but instead begin
when the systems are officially commissioned for use.
Therefore our data should be consistent with what a regular
end-user should see, since most equipment manufacturers
put their systems through similar tests before
shipment."
This should address the handling issue.
"Synapse Syndrome" <[email protected]> in
(e-mail address removed):
The report also looked at the impact of scan errors - problems found
on the surface of a disc - on hard drive failure.
"We find that the group of drives with scan errors are 10 times more
likely to fail than the group with no errors," said the authors.
They added: "After the first scan error, drives are 39 times more
likely to fail within 60 days than drives without scan errors."
suggests a value of a error-scan utility, with scheduled scans, to
report susceptible drives?