On Wed, 06 Apr 2005 10:31:43 -0400, Carlos Moreno
Now, let's call my idea RAID-M (M for Moreno
), so that we
avoid the mix-up in the explanations...
Good Idea
Let's compare the performance of RAID-0 and RAID-M (assuming
that we disregard any issues with the complexity of an actual
implementation) when we have two disks of different speeds
(and again, we assume a rather constant value for the STR as
a measure of "speed").
Which it isn't. So, for example, on the outer rings the raid requires
1 stripe:disk ratio while that would have to change incrementally
somehow untill you reach the inner rings which would likely require a
very different stripe ratio.
This would be even more evident with different sized drives (or rather
drives based on different platter size/density). Actually even with
drives of the same size but different speed this would also be a
problem because one is getting assigned more chunks than the other so
the logical disk will end in different places on the physical disks
which have no relation to each other performance wise.
Think of it as a "load balancing" process -- by striping the
data into lengths proportional to the relative speed, you
avoid the condition of having one disk idle because it has to
wait for the other one to finish. The "optimal" point, where
load balancing occurs, is the point in which both hard disks
take the exact same amount of time in writing their stripes
of data (which is why in RAID-M you would stripe the data
into blocks of lengths proportional to the relative speeds
of the disks)
Right. I got that point from the beginning but it would take a
different kind of storage technology to be able to do this even in
theory.
Of course, the load balancing happens automatically in RAID-0
provided that the two disks have the same speed! The alleged
advantage of RAID-M would be the increased flexibility that
it allows you to use disks with different speeds, and it makes
the most of it (where "the most" in this case means that you're
not "held back" by a slow disk).
In theory. But also in theory I think there would need to be many
restrictions on what could be used this way. "Flexability" &
"optimization" are marginal at best in this context.
Notice the interesting detail:
In RAID-0, the effective speed of the array is always twice
the speed of the slower disk -- it doesn't help if you have
the faster disk being 20 times faster; the speed is always
determined/limited by the slower disk.
Which is what I was saying all along.
In RAID-M, the effective speed of the array is necessarily
greater than or equal to the speed of the faster disk (the
"equal" part is only achieved if the slower disk has speed
0, in which case its stripe has length zero, and thus the
speed is simply the speed of the faster drive).
Now I understand what you meant.
<snip>
Thanks for the clarification. It took me a few posts to catch on & it
really took this last one to fully appreciate your argument. I'm used
to the more typical raid questions, which this sounded like at first,
but ended up being a whole lot more interesting. Even though I
understand your argument better, I don't think it really changes my
assessment of RAID-M. I'm not just concerned about the practicality
or cost of RAID-M but the theoretical basis also.
The main problem with even the theoretical basis of raid-M is that
disk performance isn't a single variable i.e. it doesn't get distilled
down to a single measurement very well. Serial or "sustained"
transfer rate decays from outside to inside the platter at very
different rates among different model drives. Also different drives
handle random I/O very differently as well.
Your numbers work so nicely ONLY because you are restricting the
formula to look at an idealized/bastardized version of "sustained
transfer rate" exclusively and hand picking nice performance
relationships between drives. How would raid-M work with, say 3
different model drives? RAID-M with 2 drives of the same size but
different speed would also mean you would be wasting space on one of
the drives (not very desirable). If you're going to plan the matching
of sets of drives or restrict RAID-M to two drives then there is
little incentive to overcome the design hurdles when normal raid-0
performs better and is ultimately more cost effective & mature. More
importantly it would significantly inhibit the flexibility you are
looking & which is the whole point of RAID-M.
The other problem is that even if drive selection is optimal/balanced
you would have to measure & match the drives performance attributes as
well as space utilization and stripe distribution with a tremendous
level of precision. Precise performance measurement & matching is not
possible because of that first point I mentioned. Also because the
point of RAID-M is to be able to optimize the use of spare parts you
have by random occurrence, you wouldn't likely be using products with
nice performance relationships.
I also would think setup would be the opposite of what you propose;
the user would select the stripe ratio based on a benchmark and the
firmware would determine the useable disk space. It would be easier
on the end-user & designer then to have the end user calculate
relative space (remember you're not likely to have nice numbers like
2:1 or 1.25:1 and there is often confusion about space measurements)
and have the controller check that the user selection is viable and
then orchestrate the stripes accordingly. There would be more room
for error. Finally, the idea that you could do RAID-M with a single
stripe ratio is not viable with disk technology even in theory. The
idea might be better suited to other technologies.
RAID-M is an interesting concept. While raid0 is simple relative to
other levels, it is not that simple from a performance or performance
optimization perspective. This is because we are trying to make the
disks work in concert and to conduct truly simultaneous I/O. Latency
and spindle sync are issues because they eat away at the ability for
I/O to be simultaneous and therefore also expected gains. Spindle
syncronization is obsolete at this point partly due to faster spindle
speeds and also due to masking techniques in the drive & controllers
firmware like caching & coalescing etc. But I can recall specifically
an array of Quantum Atlas V's (7200 RPM) I resurrected from basically
a dog to really quite peppy with spindle sync. I know there were some
companies that were also doing this a few years back, taking older,
cheaper, slower drives and using spindle sync to make the array appear
better than it was to increase profits.
Even two similar performing but not identical drives reveal out-of
sync problems in part because specs like 7200 or 10K or 15K rpm are
not really exact figures (just like disk space) and also due to
different firmware approaches to handling I/O. So two very different
drives have a more noticeable problem working together and that would
even further eat away at performance.
It would be nice if disk performance could be reduced to a single,
easily measurable number, and that there could be such flexibility as
raid-M requires. But there is another component your model does not
take into account; firmware compatibility. While firmware can
optimize performance in both single and normal raid configurations,
there are limitations to the ability of some products to perform in
raid that have historically required matching moth model & firmware in
arrays. In fact disk firmware may even require tuning to work
properly for each raid-m proportional scenario (just as they do for
normal raid as apposed to vanilla disks) or might simply perform
atrociously as they are in real world RAID-M.
For example the first WD 80 gig PATA drives, which were originally not
intended for raid, performed horribly in raid including simple
striping. This required firmware correction both by WD (upgrades of
course not availabvle to users) and avoidance of many controllers.
Back in the days of Mylex, you had to be even more careful with
firmware versions. Conflicting versions or some combinations of
controllers & disk firmware could actually cause damage to the drives.
I recall an array of Quantum/Maxtor 10K3's that were getting PFA
errors (i.e. SMART failure) even though the drives were basically
good- because of this. Between these problems and the high infant
morality rates and a batch of bad motors, the array ended up not being
viable.
I'm glad you've told me more of where you're coming from. I hope I
have done the same and not simply been repetitive. It's truly an
interesting concept. Unfortunately for performance the simplest
solution usually wins out IMHO. Maybe you could try designing
something like this for flash or ram?
Cheers.