Another question on RAID (software)

Carlos Moreno · Apr 3, 2005

This one is more out of curiosity than anything else...

On Windows 2000, we are restricted to equal sizes for all
the RAIDed partitions. However, if we have a drive X that
is, say, twice as fast as drive Y, then you wouldn't get
much benefit from it (in the extreme, if one of the drives
is more than twice as fast as the other one, then one gets
higher performance NOT using RAID).

This would not be true if we were able to specify different
sizes -- the optimal point would be when the sizes are
proportional to the relative speeds (i.e., if drive X is
f times faster than drive Y, then the partition on drive
X should be f times larger, and all the data should be
split in blocks that are f times larger for drive X).

Are there systems that do this? (software or hardware?)

Or they don't even bother, given that if you're using RAID
it is assumed that you are not playing with cheap or old
stuff?

Thanks,

Carlos
--

Curious George · Apr 5, 2005

All raid (HW & SW) limits size by the smallest drive in the array.
Unused space can be reclaimed by making a separate logical drive or
JBOD volume. All raid (HW & SW) limit performance by the slowest
drive in the array.

If I understand what you are asking, these are not possible. It would
be ridiculously complicated & expensive to design a controller that
would handle mismatched sizes and it wouldn't even work for all the
typical raid levels. Actually the resultant mutant wouldn't
technically be any of the typical raid levels. It would be cheaper to
simply buy a set of matched disks than to use mismatched disks with
such a controller. Furthermore it is not possible to not have a
slower disk limit the performance of an array. At least I don't know
how you could totally engineer that out of the equation.

Playing with cheap, old stuff is fine for tinkering with, but you have
to understand and accept its limitations. The limitations you are
asking about have nothing to do with this though. It only relates
insofar as raid designers expect/prefer you to use matched disks and
you imply that you sporadically acquire single parts.

Carlos Moreno · Apr 5, 2005

Curious said:
All raid (HW & SW) limits size by the smallest drive in the array.
Unused space can be reclaimed by making a separate logical drive or
JBOD volume. All raid (HW & SW) limit performance by the slowest
drive in the array.

If I understand what you are asking, these are not possible. It would
be ridiculously complicated & expensive to design a controller that
would handle mismatched sizes and it wouldn't even work for all the
typical raid levels.

Fair enough.

I just thought it might be neat (in the way of flexibility),
but wasn't really too sure that it would be available. The
reasons you give against the availability of such thing are
solid enough (it has been pointed out to me that the issue
of RAIDing gets fairly complicated when you have multiple
concurrent accesses, blocks of various sizes, etc. -- the
idea I was entertaining would make things exponentially more
complex, I guess).

One detail, though:

Furthermore it is not possible to not have a
slower disk limit the performance of an array.

No, it is possible.

The equation we would have to come to terms with is the fact
that if you have two drives, one with speed S and one with
speed f*S (with 0 < f <= 1), the theoretical maximum speed
you can get (with RAID 0, that is) is S*(1+f). If f is as
low as 0.25, then you only get an effective rate that is 1.25
the speed of the faster disk.

In a sense, the slower disk is not "holding back" the total
speed -- it's only holding back the potential for increase.
The way RAID is, a sufficiently slow disk can make the total
speed lower than the speed of one drive (i.e., it can make
it slower to use RAID0 than not to use it)

Thanks for your comments!

Carlos
--

Curious George · Apr 5, 2005

- the
idea I was entertaining would make things exponentially more
complex, I guess).

It also requires designing new/nonstandard raid levels. You can't
have a 2 disk mirror, & mirrors and stripes would have to be offset in
a very creative way. Even more creative than the IBM experiments like
1E, etc.

One way I could imagine this working is by using fixed size
relationships like adding only disks that are 1x & 1.5x the size of
the smaller disk (i.e. there would be wasted space anyway), and it
might require more disks to start with than the normal raid levels to
make fault tolerant levels. It might even require specific
combinations of disk models like 2x 36gig drives and 2x 72 gig drives
but not 1x 36 gig drive & 3 x 72 gig drives.

Even still the result would be very difficult to explain & document
for customers. The support costs would be astronomical. Some of the
resultant mutant levels could potentially also have a poor
relationship between being able to sustain drive failures and array
size compared to some of the normal levels. In fact it might simply
be the case that some arrays can sustain a failure unless it is drive
X. This would be counterproductive.

Finally, performance & recovery behavior would be so complex and
bottleneck ridden that it would be even harder than it presently is to
know what to expect of an array. I may even be wrong in thinking this
could even be done mathematically. The more I think about this the
less sense it makes mathematically & financially & manufacturer PR
wise. I know I wouldn't want such a product. Your basically asking
for a very complex & expensive toy; an intellectual/engineering
curuosity.

No, it is possible.

The equation we would have to come to terms with is the fact
that if you have two drives, one with speed S and one with
speed f*S (with 0 < f <= 1), the theoretical maximum speed
you can get (with RAID 0, that is) is S*(1+f).

What exactly do you mean by speed? STR? Don't know where you got
this formula or what you are calculating exactly.

If f is as low as 0.25,
then you only get an effective rate that is 1.25
the speed of the faster disk.

That sounds like the slower disk speeds up the faster disk. I think
you mean 1.25x the speed of the slower drive, which is still basically
wrong.

It's an interesting idea you brought up. I wasn't thinking of RAID0
because it's as much RAID as JBOD is (it is not _R_edundant). Your
argument is a RAID0 argument specifically; it does not remotely apply
to other levels. .

Unfortunately raid doesn't work as you're describing because there are
several disk performance attributes as well as factors of how raid
levels work on a low level & chunk size that dictate read or write
completion so you can't factor in the relative speed difference so
simply as you did & certainly not universally across raid levels.

Lets examine a mismatched 2 drive raid0. If half the stripe is on the
slower disk and half on the faster one than half of the read of a
stripe is performed at the slower rate and the other half at the
faster rate & there is a pause correlating to the seek time plus the
differential in seek times (because the spindles are out of sync).

If both data reads happen simultaneously than the array still has to
wait for the slower drive to finish the read/write to issue a
read/write complete. That effectively factors out the performance
increase of the faster drive. So speed raid0 is 2x the speed of the
slower drive minus additional latency of the spindles being out of
sync as well as the reality that the read commands are probably not
executed exactly simultaneously.

In the same 2 drive raid0 array if a small read/write occurs on a
single disk it will occur at the rate of that disk (either the faster
or slower one). So looking at both small & larger data size, the
array performance is limited by the slower disk in most cases.

IMHO JBOD makes a lot more more sense because it is more recoverable,
mismatched raid0 performance is not so hot, & because you have the
flexability to combine drives that are of different sizes. In fact I
think what you've wanted all along is JBOD or a spanned volume, NOT
raid >= 0.

Lets also look at the redundant levels & how they also don't fit your
model:

RAID1 will take whichever read comes first (speed would be dictated by
the fastest disk) while a write is not completed until both disks
finish a write (so write speed is dictated by the slower disk).
Multiplying the relative difference doesn't apply in any way here.

RAID 2 is completely obsolete & useless for modern drives

RAID 3 is bit level & you traditionally want to sync the spindles so
it is not worth discussing. This is really putting a round peg in a
square hole. raid 3 is also basicaly obsolete.

RAID 4 (what some manufacturers mistakenly call RAID3) on the other
hand is a little different. If the slower drive is the dedicated
parity drive that might not impact performance theoretically because
the parity reads & writes are much smaller than the data reads &
writes. That is of course if the disk latency is the same but the
proportional difference in raw throughput is no more than the relative
proportion of parity to data size minus the time it takes to prepare
and make the parity calculation. You can quickly see how this is
purely theoretical. In practice the dedicated parity drive is already
a performance limitation in most kinds of access even with fully
matched drives.

RAID 5 is a total cluster-****; an engineering nightmare performance
and data security wise. Only the top systems are truly transactional
and incorporate data stream checksumming. The rest are slow and
inherently unreliable. I don't really want to go there. Lets just
say a slower disk causes all kinds of hold-ups when it's its turn in a
stripe read or write (like I was describing for raid0) & the existing
write penalty is bad enough. Adding a slower disk temporarily to keep
an array up is fine if your desperate, but I'm not sure it makes any
sense to consider this from the beginning or to even think about the
performance decay too much.

The nested raid levels need not be discussed because they basically
compound the issues already outlined & I've delved into too much
detail already.

Finally, re your original mismatched RAID0 scenario, PPL implement
RAID0 to get throughput that far exceeds the fastest disks of the day.
They implement it expecting 2x or more the raw throughput of a single
disk & only in situations where the risk of failure is acceptable
(i.e. short term manipulation of a copy of a large file, etc.).
Otherwise the cost (including risk)/benefit ratio is unacceptable. No
matter what you use you mismatched array of old spare drives for, the
cost/benefit ratio still would suck. It's _both_ false economy and
bad computing practice.

In a sense, the slower disk is not "holding back" the total
speed -- it's only holding back the potential for increase.

K. Following your argument.

The way RAID is, a sufficiently slow disk can make the total
speed lower than the speed of one drive (i.e., it can make
it slower to use RAID0 than not to use it)

Not following. Which drive? the slower one? Sounds like you're
contradicting yourself either way.

Also if the differential between disks' performance attributes is
large it is simply magnifying the performance problems of other
mismatched drives of closer capability. I don't know if there is
indeed a point where suddenly it becomes a different animal.

Carlos Moreno · Apr 5, 2005

Curious said:
It also requires designing new/nonstandard raid levels. You can't
have a 2 disk mirror

Oh no, of course. The thing came to mind as an option for
RAID 0 exclusively.

What exactly do you mean by speed? STR?

Yes, sustained transfer rate would be what makes most sense
for the purpose of my analysis.

Don't know where you got
this formula or what you are calculating exactly.

That sounds like the slower disk speeds up the faster disk. I think
you mean 1.25x the speed of the slower drive

No, it is 1.25x the speed of the faster.

If you have disk 1 with speed S and disk 2 with speed S/2
(as an example), what the striping system would do with a
block of length L is as follows: send a block (a stripe)
of length 2*L/3 to disk 1, and a stripe of length L/3 to
disk 2 -- since disk 1 is twice as fast, they will both
take the exact time to complete the write (theoretically,
that is). But then, you took 2/3 of the time as disk 1
would have taken to store the complete block, right? So,
you're going 3/2 (1.5x) the speed of the faster.

If disk 2 is 10 times slower, then you send 10/11th of
the data to disk 1 and 1/11th to disk 2 -- again, you're
in perfect balance, since both disks will take the same
time to complete the operation; the operation is taking
10/11 the time it would take disk 1 alone, hence we're
going 11/10 (1.1x) faster than the faster drive.

It does work, with the nice detail that the slower disk
does not limit the effective (total) speed, as happens
with the actual RAID0 technique (if we were to use disks
with different speeds)

Not following. Which drive? the slower one? Sounds like you're
contradicting yourself either way.

No, not contradicting: notice that here I was referring
now to the way RAID 0 *is* (as opposed to what I was
thinking it could be).

If you have a disk that is 10 times slower, and you still
stripe the data into equal sizes, then disk 1 will take
a 10th of the time to complete the operation, but you'll
still have to wait 10 times longer until the second disk
completes it. IOW, if you had not put the second disk,
you would go faster. This does not happen if you stripe
the data into lengths proportional to the relative speed
of the corresponding disk.

Once again, your argument still stands: the complexity
does not justify th gain, mainly when we take into account
that the cost of "getting serious" and using good and
balanced drives is not as high :-)

(plus, yes, this
idea applies to RAID 0 exclusively)

Carlos
--

Curious George · Apr 6, 2005

Oh no, of course. The thing came to mind as an option for
RAID 0 exclusively.

Ah ha. But you originally did not specify a raid level. Win2k can do
raid 0,1, & 5. Using a combination of firmware & win2k raid you can
also do nested levels.

I see what you're saying now in the OP. Your recent responses make me
interpret the OP very differently.

Yes, sustained transfer rate would be what makes most sense
for the purpose of my analysis.

No, it is 1.25x the speed of the faster.

Maybe I'm just too dense or tired at the moment, but I just don't get
it.

If you have disk 1 with speed S and disk 2 with speed S/2
(as an example), what the striping system would do with a
block of length L is as follows: send a block (a stripe)
of length 2*L/3 to disk 1, and a stripe of length L/3 to
disk 2 --

not necessarily. Ideally RAID0 is supposed to split I/O operations
into equal-sized blocks and spread them evenly across disks.
Controllers aren't smart enough to determine relative STR and adjust
I/O accordingly. They also do not allow you to control the striping
on a low level like you are proposing. If they did you would end up
not being able to use all the disk space and would just be asking for
performance problems & support calls. You are proposing a hopelessly
complex & limiting hypothetical product.

since disk 1 is twice as fast, they will both
take the exact time to complete the write (theoretically,
that is). But then, you took 2/3 of the time as disk 1
would have taken to store the complete block, right? So,
you're going 3/2 (1.5x) the speed of the faster.

K. But this is a 3 disk example used to counter my disagreement about
a 2 disk raid0. I'd be more convinced if you contradicted my points
directly or brought in some corroborating evidence. Finally, although
the read or write may be in parts over more than one disk it is still
a single read or write. AFAIK a write complete, for example, is
issued after everything is done. Its not like some other kinds of
packetized streaming transports.

If disk 2 is 10 times slower, then you send 10/11th of
the data to disk 1 and 1/11th to disk 2 -- again, you're
in perfect balance, since both disks will take the same
time to complete the operation; the operation is taking
10/11 the time it would take disk 1 alone, hence we're
going 11/10 (1.1x) faster than the faster drive.

Again you're talking about a freak accident data pattern or freak
mutant configuration not haw raid is normally conceptualized &
implemented.

Saying the maximal boost is X times N disk according to these factors
is misleading since you are talking about a rare, haphazard occurrence
when things are in balance or vaporware rather than something you
could count on & measure like max STR, burst rate, etc.

It does work, with the nice detail that the slower disk
does not limit the effective (total) speed, as happens
with the actual RAID0 technique (if we were to use disks
with different speeds)

That requires a level of intelligence or flexibility raid0
implementations just don't have AFAIK.

No, not contradicting: notice that here I was referring
now to the way RAID 0 *is* (as opposed to what I was
thinking it could be).

I though we were talking about what it is. Isn't that the point? If
you want to design your own RAID0 variant lets call this "Moreno
Striping" or something like that - not RAID0. If this is just your
imagination/hypothesis then don't correct me like your pipe-dream is
how things actually work.

If you have a disk that is 10 times slower, and you still
stripe the data into equal sizes, then disk 1 will take
a 10th of the time to complete the operation, but you'll
still have to wait 10 times longer until the second disk
completes it. IOW, if you had not put the second disk,
you would go faster.

This does not happen if you stripe
the data into lengths proportional to the relative speed
of the corresponding disk.

??? A very curious proposition. What products do this? Who ever
proposed this an a respected paper?

Once again, your argument still stands: the complexity
does not justify th gain, mainly when we take into account
that the cost of "getting serious" and using good and
balanced drives is not as high

(plus, yes, this
idea applies to RAID 0 exclusively)

in extremely rare circumstances or a theoretical mutant configuration.
purely theoretical maximal gain, not anything your could realistically
count on or expect.

If you want to continue explaining your hypothesis please provide some
corroborating evidence. RAID0 is not normally designed or discussed
that way. I've never hear of anyone designing the ability to do what
you are proposing. Maybe it's because I'm too tired, but AFAIK the
I/O is supposed to be done simultaneously & there is a single
completion point before the data can be sent out of the disk subsystem
so the slower spindle or most delayed seek will always tie up a read
or write complete. This factors relative STR out not in.

If I'm wrong, please provide some corroborating evidence; a link, a
book, a whitepaper - anything. Always eager to learn.

Carlos Moreno · Apr 6, 2005

Well, let's continue... I'm a bit frustrated at this point because,
being a teacher myself, I feel bad that I've done a very lousy job
at explaining what I've been trying to say :-)

There seems to be a confusion about these following things:

1) What I'm suggesting *could* be a feature of RAID-0 to optimize
things under imbalanced drives conditions (i.e., something
that occured to me, but then I thought that maybe it had also
occured to someone else long ago)

2) What RAID-0 really is

3) What I think RAID-0 is.

In my original post, I was asking about 1), thinking that perhaps
it was not my idea at all, and that perhaps it was indeed part of
the way RAID-0 works. From your original answer, and the other
posts, I conclude that this is not the case (that is; such feature
is not part of what RAID-0 is)

Now, let's call my idea RAID-M (M for Moreno :-)

), so that we
avoid the mix-up in the explanations...

Let's compare the performance of RAID-0 and RAID-M (assuming
that we disregard any issues with the complexity of an actual
implementation) when we have two disks of different speeds
(and again, we assume a rather constant value for the STR as
a measure of "speed").

Let's say that we have disk 1 with STR of 10MB/sec, and drive 2
with 5MB/sec, and let's see how much time does it take to write
a block of 20MB to the array.

With RAID-0: Stripe the 20MB into two blocks of 10MB, and send
one block to disk 1 and one block to disk 2.

Disk 1 will be done in exactly 1 second. But disk 2 won't be
done for another second. We don't benefit from the fact that
disk 1 was done in just one second; we have to wait for another
second for disk 2 to be ready, and so, the time it took to write
20MB to the array was 2 seconds.

Effective speed of the RAID-0 array: 10MB/sec

Now let's see what happens with RAID-M:

With RAID-M, we stripe the data into blocks with length
proportional to the relative speeds of the disks; so, since
disk 1 is twice as fast, we always stripe the data into blocks
twice as large for disk 1.

In this case, we send 13.33 MB to disk 1, and 6.66 MB to disk 2.
At 10MB/sec, disk 1 will take 1.33 seconds to complete the
write. And disk 2, at 5MB/sec, will take also 1.33 seconds to
complete the write.

So, with RAID-M, the time it took to write 20MB to the array is
1.33 seconds.

Effective speed of the RAID-M array: 15MB/sec

(with the exact same two disks)

Think of it as a "load balancing" process -- by striping the
data into lengths proportional to the relative speed, you
avoid the condition of having one disk idle because it has to
wait for the other one to finish. The "optimal" point, where
load balancing occurs, is the point in which both hard disks
take the exact same amount of time in writing their stripes
of data (which is why in RAID-M you would stripe the data
into blocks of lengths proportional to the relative speeds
of the disks)

Of course, the load balancing happens automatically in RAID-0
provided that the two disks have the same speed! The alleged
advantage of RAID-M would be the increased flexibility that
it allows you to use disks with different speeds, and it makes
the most of it (where "the most" in this case means that you're
not "held back" by a slow disk).

Notice the interesting detail:

In RAID-0, the effective speed of the array is always twice
the speed of the slower disk -- it doesn't help if you have
the faster disk being 20 times faster; the speed is always
determined/limited by the slower disk.

In RAID-M, the effective speed of the array is necessarily
greater than or equal to the speed of the faster disk (the
"equal" part is only achieved if the slower disk has speed
0, in which case its stripe has length zero, and thus the
speed is simply the speed of the faster drive).

Again, let's keep in mind that RAID-M is just a hypothetical
thing -- I'm describing it and using it here as if it was a
real thing only for the purpose of clarification (to clarify
what I was trying to say in my OP, and avoid the confusion
between what RAID-0 really is, and what I was suggesting could
be a feature of RAID-0).

One detail that you mention is that the RAID system (software
or hardware) would have great difficulty in determining the
relative speeds.

Notice that that would not be a problem if we assume that in
RAID-M, the user/administrator is the one that configures
things. This particular feature of RAID-M would work with
partitions, more than with entire disks. If I know that
disk 1 has a STR of 40MB/sec and disk 2 has a STR of 30MB/sec,
then I would create a partition of, say, 40GB on disk 1, and
a partition of 30GB on disk 2. Now the RAID-M software does
not have to determine relative speeds -- it just looks at the
sizes of the partitions and stripes the data in fractions
that maintain the proportions of size (4 to 3 in this example).
This way, both drives would reach maximum storage capacity
at the same time (since the stripes of data keep the same
proportions).

At this point, I clarify again that I'm not stubbornly
insisting that things should be like I say and that the idea
of RAID-M is a good idea; I just feel bad that I haven't been
able to explain my idea clearly enough, so I gave it one more
shot.

You did convince me (from your first reply) that the complexity
of RAID-M would be much too high to justify the benefit. I do
think that the benefit is more in the theoretical/conceptual
front; that is, I still think it is a neat principle; one
neat principle that unfortunately is not applicable in practice.

Cheers,

Carlos
--

Curious George · Apr 8, 2005

On Wed, 06 Apr 2005 10:31:43 -0400, Carlos Moreno

Now, let's call my idea RAID-M (M for Moreno ), so that we
avoid the mix-up in the explanations...

Good Idea

Let's compare the performance of RAID-0 and RAID-M (assuming
that we disregard any issues with the complexity of an actual
implementation) when we have two disks of different speeds

(and again, we assume a rather constant value for the STR as
a measure of "speed").

Which it isn't. So, for example, on the outer rings the raid requires
1 stripe:disk ratio while that would have to change incrementally
somehow untill you reach the inner rings which would likely require a
very different stripe ratio.

This would be even more evident with different sized drives (or rather
drives based on different platter size/density). Actually even with
drives of the same size but different speed this would also be a
problem because one is getting assigned more chunks than the other so
the logical disk will end in different places on the physical disks
which have no relation to each other performance wise.

Think of it as a "load balancing" process -- by striping the
data into lengths proportional to the relative speed, you
avoid the condition of having one disk idle because it has to
wait for the other one to finish. The "optimal" point, where
load balancing occurs, is the point in which both hard disks
take the exact same amount of time in writing their stripes
of data (which is why in RAID-M you would stripe the data
into blocks of lengths proportional to the relative speeds
of the disks)

Right. I got that point from the beginning but it would take a
different kind of storage technology to be able to do this even in
theory.

Of course, the load balancing happens automatically in RAID-0
provided that the two disks have the same speed! The alleged
advantage of RAID-M would be the increased flexibility that
it allows you to use disks with different speeds, and it makes
the most of it (where "the most" in this case means that you're
not "held back" by a slow disk).

In theory. But also in theory I think there would need to be many
restrictions on what could be used this way. "Flexability" &
"optimization" are marginal at best in this context.

Notice the interesting detail:

In RAID-0, the effective speed of the array is always twice
the speed of the slower disk -- it doesn't help if you have
the faster disk being 20 times faster; the speed is always
determined/limited by the slower disk.

Which is what I was saying all along.

In RAID-M, the effective speed of the array is necessarily
greater than or equal to the speed of the faster disk (the
"equal" part is only achieved if the slower disk has speed
0, in which case its stripe has length zero, and thus the
speed is simply the speed of the faster drive).

Now I understand what you meant.

<snip>

Thanks for the clarification. It took me a few posts to catch on & it
really took this last one to fully appreciate your argument. I'm used
to the more typical raid questions, which this sounded like at first,
but ended up being a whole lot more interesting. Even though I
understand your argument better, I don't think it really changes my
assessment of RAID-M. I'm not just concerned about the practicality
or cost of RAID-M but the theoretical basis also.

The main problem with even the theoretical basis of raid-M is that
disk performance isn't a single variable i.e. it doesn't get distilled
down to a single measurement very well. Serial or "sustained"
transfer rate decays from outside to inside the platter at very
different rates among different model drives. Also different drives
handle random I/O very differently as well.

Your numbers work so nicely ONLY because you are restricting the
formula to look at an idealized/bastardized version of "sustained
transfer rate" exclusively and hand picking nice performance
relationships between drives. How would raid-M work with, say 3
different model drives? RAID-M with 2 drives of the same size but
different speed would also mean you would be wasting space on one of
the drives (not very desirable). If you're going to plan the matching
of sets of drives or restrict RAID-M to two drives then there is
little incentive to overcome the design hurdles when normal raid-0
performs better and is ultimately more cost effective & mature. More
importantly it would significantly inhibit the flexibility you are
looking & which is the whole point of RAID-M.

The other problem is that even if drive selection is optimal/balanced
you would have to measure & match the drives performance attributes as
well as space utilization and stripe distribution with a tremendous
level of precision. Precise performance measurement & matching is not
possible because of that first point I mentioned. Also because the
point of RAID-M is to be able to optimize the use of spare parts you
have by random occurrence, you wouldn't likely be using products with
nice performance relationships.

I also would think setup would be the opposite of what you propose;
the user would select the stripe ratio based on a benchmark and the
firmware would determine the useable disk space. It would be easier
on the end-user & designer then to have the end user calculate
relative space (remember you're not likely to have nice numbers like
2:1 or 1.25:1 and there is often confusion about space measurements)
and have the controller check that the user selection is viable and
then orchestrate the stripes accordingly. There would be more room
for error. Finally, the idea that you could do RAID-M with a single
stripe ratio is not viable with disk technology even in theory. The
idea might be better suited to other technologies.

RAID-M is an interesting concept. While raid0 is simple relative to
other levels, it is not that simple from a performance or performance
optimization perspective. This is because we are trying to make the
disks work in concert and to conduct truly simultaneous I/O. Latency
and spindle sync are issues because they eat away at the ability for
I/O to be simultaneous and therefore also expected gains. Spindle
syncronization is obsolete at this point partly due to faster spindle
speeds and also due to masking techniques in the drive & controllers
firmware like caching & coalescing etc. But I can recall specifically
an array of Quantum Atlas V's (7200 RPM) I resurrected from basically
a dog to really quite peppy with spindle sync. I know there were some
companies that were also doing this a few years back, taking older,
cheaper, slower drives and using spindle sync to make the array appear
better than it was to increase profits.

Even two similar performing but not identical drives reveal out-of
sync problems in part because specs like 7200 or 10K or 15K rpm are
not really exact figures (just like disk space) and also due to
different firmware approaches to handling I/O. So two very different
drives have a more noticeable problem working together and that would
even further eat away at performance.

It would be nice if disk performance could be reduced to a single,
easily measurable number, and that there could be such flexibility as
raid-M requires. But there is another component your model does not
take into account; firmware compatibility. While firmware can
optimize performance in both single and normal raid configurations,
there are limitations to the ability of some products to perform in
raid that have historically required matching moth model & firmware in
arrays. In fact disk firmware may even require tuning to work
properly for each raid-m proportional scenario (just as they do for
normal raid as apposed to vanilla disks) or might simply perform
atrociously as they are in real world RAID-M.

For example the first WD 80 gig PATA drives, which were originally not
intended for raid, performed horribly in raid including simple
striping. This required firmware correction both by WD (upgrades of
course not availabvle to users) and avoidance of many controllers.
Back in the days of Mylex, you had to be even more careful with
firmware versions. Conflicting versions or some combinations of
controllers & disk firmware could actually cause damage to the drives.
I recall an array of Quantum/Maxtor 10K3's that were getting PFA
errors (i.e. SMART failure) even though the drives were basically
good- because of this. Between these problems and the high infant
morality rates and a batch of bad motors, the array ended up not being
viable.

I'm glad you've told me more of where you're coming from. I hope I
have done the same and not simply been repetitive. It's truly an
interesting concept. Unfortunately for performance the simplest
solution usually wins out IMHO. Maybe you could try designing
something like this for flash or ram?

Cheers.

Curious George · Apr 8, 2005

Maybe you could try designing
something like this for flash or ram?

Come to think of it, Carlos, tape is a good example of a storage
technology where this is possible. PPL do RAIT all the time and
speeds between drives tend to double or at least have nice, consistent
speed relationships.

You'd need at least one autoloader or library (the slower drive(s) at
least) and there would be wasted tape space unless the performance
relationship also correlated to tape capacity. The software could be
doable (at least in theory) but you would have to be careful about
using compression (I would think).

I doubt anyone would try to make or market software with this
functionality but the theory of RAIT-M seems plausible to me (as you
described it).

Cheers

Carlos Moreno · Apr 9, 2005

Curious said:
Thanks for the clarification. It took me a few posts to catch on & it
really took this last one to fully appreciate your argument. I'm used
to the more typical raid questions, which this sounded like at first,
but ended up being a whole lot more interesting. Even though I
understand your argument better, I don't think it really changes my
assessment of RAID-M. I'm not just concerned about the practicality
or cost of RAID-M but the theoretical basis also.
[...]

I'm equally grateful for the expanded analysis; you're right
that I had overlooked the variations and the difficulty to
summarize disk performace into a single, nice parameter.

Not that I thought that you were opposed to the idea because
you didn't understand it -- it was clear to me from your very
first reply that your argument was solid; I'm glad that I
could clarify the details about my OP, since that allowed for
a more specific, more interesting analysis of why the idea is
not as sound as it may seem following a light/naive analysis.

Thanks,

Carlos
--

Curious George · Apr 9, 2005

I'm equally grateful for the expanded analysis; you're right
that I had overlooked the variations and the difficulty to
summarize disk performace into a single, nice parameter.

Not that I thought that you were opposed to the idea because
you didn't understand it -- it was clear to me from your very
first reply that your argument was solid; I'm glad that I
could clarify the details about my OP, since that allowed for
a more specific, more interesting analysis of why the idea is
not as sound as it may seem following a light/naive analysis.

Thanks,

Carlos

Your idea & math were sound it just isn't a great fit for disk
technology. Like I said before tape is one example of a better fit
for your idea (for the math to work at least).

I appreciate your thoroughness. It was nice discussing this with you.

Thanks also.

Another question on RAID (software)

Carlos Moreno

Curious George

Carlos Moreno

Curious George

Carlos Moreno

Curious George

Carlos Moreno

Curious George

Curious George

Carlos Moreno

Curious George