B
Bill Todd
Paul said:It doesn't help the latency for a completely serial workload, but it
lets you do seeks in parallel when the workload allows it.
You seem to have a better handle on this subject than most of the people
who are presuming to advise you on it.
1. Conventional PATA/SATA drives are indeed less reliable for 24/7
server-style workloads than higher-end drives (which indeed do use more
robust internals) - today, as well as in the past. Microsoft and
Seagate representatives held a joint presentation at last year's WinHEC
describing stress tests on three groups of 300 desktop drives which
found that failure rates went up significantly when the drives were used
24/7 rather than at their rated duty cycles (typically 8 hrs/day).
After 1000 hours of continuous operation the failure rate using a
typical desktop workload (lots of idle time) was 1.7% (about 10x what
one would expect after 1000 hours for a desktop drive rated at 600,000
hours MTBF in 'normal' use, and with especially considerate handling
Copan Systems found that 18,000 disks used in its 'MAID' products over a
two-year period starting in April, 2004, had a real-world failure rate
nearly 5x lower still, corresponding to nearly a 3,000,000 hour MTBF).
The failure rate after 1000 hours under constant streaming access (i.e.,
no significant seeks) was more than twice as high (4%), and the failure
rate after 1000 hours for a constant seek-intensive workload was higher
still (6.3% and still rising steeply, reaching 7.4% after 1200 hours,
while the other two workload failure curves were flattening out). It's
too bad they didn't test Raptors (well, Seagate *was* the joint
presenter...) - I've wondered how well they lived up to WD's marketing
in that area, which certainly suggests that they're full-fledged
high-end-FC/SCSI replacements in terms of reliability. SATA failure
rates for somewhat less demanding (but still 24/7 and fairly heavy)
workloads in Microsoft's TerraServer application were stated a year ago
to be 7.2% annually (about 5x the predicted failure rate for a
600,000-hour MTBF disk in its specified 'desktop' duty cycle). The
Copan paper ("Disk Failure Rates and Implications of Enhanced MAID
Storage Systems") is a good general starting point for this topic, and
contains the interesting tidbits that having more platters (because of
increased loading on the motor and head actuators?) and more power-on
hours (apparently even if the disk is spun down) increases failure
rates. 'Near-line enterprise' drives are specced for 24/7 operation but
not for the kind of intense workloads that the high-end enterprise
drives are: it would be nice to see some head-to-head stress testing
comparing them to conventional desktop SATA drives under both light and
heavy workloads, since they cost very little more (though conventional
drives may be more deeply discounted during sales). And as you already
know, keeping the disk cool is a significant win (I vaguely remember
reading somewhere that each 10 degree C. rise in operating temperature
doubled the failure rate).
2. So yes, your inclination toward using a Raptor purely for
reliability is entirely reasonable - though if you can tolerate a disk
failure during your usage then just figuring you'd replace a
conventional SATA drive if it failed (or even if two of them failed)
might be more economical.
3. The assertion that USB access significantly increases latency
appears to be false - at least from a quick look at some tests at Tom's
Hardware, where the listed external USB drive access times are
comparable to internal drive access times. This makes sense, since
three's no overhead in the purely hardware/software USB processing
that's remotely comparable to the a disk's seek and rotational latency.
While as you claimed the Raptor's higher rotational speed would
decreases average access times (by about 1 ms.), its much faster average
seek times would likely be more important to random-access performance
(all as Folkert observed: he may seldom be civil, but that doesn't mean
he's always wrong).
4. You were correct (and Folkert was wrong) in your suggestion that
using multiple slower drives can be a cost-effective way to improve
throughput (even random-access throughput) for parallel workloads. The
secret is not to use traditional small stripe-segment sizes. Storing
only 64 KB (or even less) per disk before moving on to the next *never*
made much sense: with today's disks, 1 - 4 MB segments allow you to
maximize streaming throughput (it takes a good-sized client buffer to
inhale multiple multi-MB segments in parallel, but RAM's cheap these
days) while still placing a reasonable upper limit on smaller-access
latency (so the worst case may rise from 20 ms. to 80 ms. - BFD, since
for the kind of parallel workload you described this will be balanced by
reduced queuing delays; in fact, these stripe-segment sizes ensure 70% -
90% disk utilization for *any* workload, unlike smaller stripe segments
which can reduce utilization by a factor of N = the number of disks that
s modest access spans).
- bill