RAID 5 Question

Marc de Vries · Jul 16, 2004

Marc de Vries said:
Marc de Vries said:

[snip]

PATA and SATA controllers don't have this issue since you have a
dedicated ide channel for every disk.

Not with PATA.
It is still *your* choice to not use a second drive on same channel.

Click to expand...

I have never seen a PATA Raid5 controller where each drive didn't have
a dedicated ide channel.

Click to expand...

Which obviously is entirely firmware limited unless the IDE chip used is
selfdesigned and doesn't support M/S.

Sure, but that doesn't matter. When the firmware in the Raid5 cards
don't give you that option, you have little choice but to accept that.

Adding a second drive will simple not work. So it's NOT your choice to
use a second drive on the same channel with Raid5 cards.

What has that got to do with hotplugging?

Because You can't hotplug with two devices connected on IDE.

Actually hotplugging is not supported at all on IDE, but as long as
there is only one device present some companies can work around that
limitation.

Yes, and where did I say different?

You are responding to my statement. Now I have to assume that you
wanted to make a point with that reply. Aparantly I was wrong in
assuming that.

Your remark suggests that there was something wrong to what I said.
(especially when someone who doesn't know how raid works reads it). So
I had to make it clear for everyone that it does apply to Raid5.

That is why I said "That is for striped arrays" and not "Raid0".
Get it now?

I get it that you are making pointless remarks that have absolutely
nothing to do with the thread.

If you had said that the transfer rate increase doesn't apply to Raid1
you would have added something sensible. Although it still doesn't
apply to what we are talking about here: Raid5.

Supports what? What is "it"?

Support the necessary functions to make this work. For instance in a
Raid1 array the function that makes it possible to read a file from
disk2 while another file is being read from disk1. But there is no
need to dive into details here.
I've explained it to you in the past, but you didn't listen then, so I
will not explain it to you again.

A more detailed explanation than I have already given in the thread
here isn't interesting for the discussion. Although I might give it if
the op asks for it.

Right, "perceived".

Which obviously is false when it is "perceived" like that.
Perceived, as the word means, is a false assumption.

Pffff. You obviously have no idea what you are talking about.

The seektime for reading a single file from the array stays the same
as on a single disk. But total time spend on seeks when you read many
files on the array is must lower then when you read many files on a
single disk.

I've explained in simple terms how that works. In my opinion perceived
seektime is a good way to talk about this lower total seektime. But if
you know a better word I welcome you to tell me what word I should
use.

The facts are that the total seektime is reduced so I have a
performance gain. So there are no false assumptions.

If transfer rate exists of STR divided by (seektime and actual
transfer time) per time unit and you then increase the STR you could
also say that that is perceived as if the seektime was decreased com-
pared to the old STR just because the total transfer time decreased.
It's not so.

If you don't know how raid arrays work, you could make that
conclusion, but you would be completely wrong and you would be saying
something completely different from what I am saying.

Thanks for confirming that this is
"Just another way of how transfer rate increases".

Wrong again. You just don't get it.

When I open small files the total time to read the file is determined
for 90% by the seektime. Increasing the transferrates can therefore
never give a large performance gain.

But because you can read multiple files at the same time you are
effectively reducing the seektime and are creating a huge performance
gain.

Again a simple example. I can read two small files at once from the
array.
90% of the time T to read that small file is spend on the seektime.
Now I read two of those files from a single disk.
The total time spend is: 2T
2 x (0,9 x T + 0,1 x T)

Now I quadrouble the transferrate of that disk by putting them in a
raid array. Now for 10% of time T I need only a quarter of the time.

Now the total time needed is 2 x (0.9 x T + 0.025 x T) = 1.85 T

So although I made a huge transferrate increase I got only a very
small performance gain.

Now I don't do anything about the transferrate of the disk when I put
it in a raid array. I just read both files at the same time, as I can
do with larger arrays. (or with a raid1 array)

Now the total time to read both files is:
1 x (0,9 x T + 0,1 x T) = 1T

Now I have made a 100% performance gain and it is not because of the
transferrate. We all know (or should know) that seektime is the
limiting factor when you are reading small files.

So we have a huge performance gain. It is the sort of performance gain
we would get when we halve the seektime.

It's not an actual seektime decrease, but real-life performance
figures of the array make it look like the seektime has been halved.
So the array behaves as if the seektime has halved, and thus I state
that the "perceived" seektime has decreased.

And btw, just as well you can say that when seektime (total time spent
in seeks) decreases that that is "perceived" as increased transfer rate.

No you can't.

Simply because a perceived seektime decrease corresponds well with the
actual performance figures and an increased transferrate does not.

Even when I increase the trasnferrate a million times the minimum time
to read those files will still be 1.8T
And for large files we also know that the transferrate isn't that big.

A perceived seektime does work will with the real-life data:
For large files it has little impact and for little files it has a big
impact.
Exactly what we see on our array:
For large files the performance increase is determined by the
transferrate increase and not the perceived seektime And for small
files it's just the other way around.

You have no say in that whatsoever.

Yeah, I forgot that I was talking to someone who doesn't WANT to
understand.

So no (hint: stripesize on mirrors?).

So yes.
Hint: We are talking about Raid5 arrays and not mirrors.

My statement was about Raid5 arrays. Of course such a statement cannot
automatically be applied to all other type of arrays. I've never
stated that it could be applied to a two disk mirrorer array.

Marc

Marc de Vries · Jul 16, 2004

Which of course is one and the same if you look at speed only

Absolutely not!!

Just look at the difference between SCSI disks and IDE disks.

Modern 7200 rpm IDE disks have nearly the same transferrates as 10.000
and 15.000 SCSI disks. But SCSI disks reach much higher IO/s. For this
reason SCSI is preferred for busy servers and desktops.

The reason for this is very simple. SCSI disks have lower seektimes
because of the higher rpm. The lower seektimes give higher IO/s which
most applications benefit from.

(IO/s is MB/s, more IO/s is more MB/s).

This is nonsense!

What IO/s doesn't say is how big the IOs are and how much time is
spent in seeks relative to larger IO and how that affects transfer
rate. So more IO isn't necessarily faster IO when the same amount
of IO is broken into more and thus smaller pieces.

I have the impression that your are trying to confuse the discussion
on purpose.

When you only have very larges files, you can have low IO/s and high
MB/s. You can even saturate the bus with high MB/s and still have very
low IO/s. And in that situation transferrate is important.

But in REAL LIFE that is almost never the case.

Just what I already said before. None of my applications needs high
transferrate. Most benefit from high IO/s (thus low seektimes)

IOs, not IO/s, is important to servers to get the most out of the

Wrong again. IO/s is what is important. That is why IO/s is always
measured when people benchmark a disk or array controller for server
usage.

Hint: IBM, Compaq/HP, Dell etc all look at IO/s for their servers and
treat MB/s seperately for applications.

Now why would they do that if MB/s and IO/s is the same, and why would
they measure IO/s when you think that IOs is more important?

Hint: They are not wrong. What does that tell you about your opinion?

Marc

Peter Hucker · Jul 16, 2004

Absolutely not!!

Just look at the difference between SCSI disks and IDE disks.

Modern 7200 rpm IDE disks have nearly the same transferrates as 10.000
and 15.000 SCSI disks. But SCSI disks reach much higher IO/s. For this
reason SCSI is preferred for busy servers and desktops.

The reason for this is very simple. SCSI disks have lower seektimes
because of the higher rpm. The lower seektimes give higher IO/s which
most applications benefit from.

Isn't there more intelligence in SCSI? Command queueing or something?

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

Do you Realise that Bill Gates is an anagram of "Last Bilge"?

Folkert Rienstra · Jul 16, 2004

Peter Hucker said:
Well I've generally found the bigger the faster.

Probably because when you get a bigger drive it's usually newer technology.

Even in the same model range a bigger drive (x2) is faster in roughly the middle
half of its capacity when it nears the end of the smaller drives capacity as the
smaller drive will be at the slowest zone but the bigger drive is only halfway
at faster zones. The difference maybe as much as 50%.

E.G., on a 80GB and a 160GB drive of same model range that both can do ~60MB/s
at start and 30MB/s at end, a file that is at around the 80GB mark on both drives
transfers at 30MB/s on the smaller drive but transfers at 45MB/s on the bigger one.

Peter Hucker · Jul 16, 2004

Even in the same model range a bigger drive (x2) is faster in roughly the middle
half of its capacity when it nears the end of the smaller drives capacity as the
smaller drive will be at the slowest zone but the bigger drive is only halfway
at faster zones. The difference maybe as much as 50%.

E.G., on a 80GB and a 160GB drive of same model range that both can do ~60MB/s
at start and 30MB/s at end, a file that is at around the 80GB mark on both drives
transfers at 30MB/s on the smaller drive but transfers at 45MB/s on the bigger one.

Plus if you put 80GB on an 80GB drive, seeks are up to full distance. 80GB on a 160GB drive (if defragged), seeks are up to half distance.

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

"These stretch pants come with a warranty of one year or 500,000 calories... whichever comes first."

Marc de Vries · Jul 16, 2004

I never touch 5400s!

Both the 7K250 and the 180GXP are 7200 rpm disks!

Marc de Vries · Jul 16, 2004

Isn't there more intelligence in SCSI? Command queueing or something?

That also comes into play for really long queues. But the main factor
is the lower seektime.

You can also substitute a 10.000 rpm SATA Raptor for the 10.000 rpm
SCSI disks. That raptor doesn't have extra command queueing
capabilities compared to a 7200 rpm SATA disk, but you will still see
much higher IO/s.

Command queueing can actually hurt performance if the queues are not
long. (which is usually the case on desktop) Storagereview had an
interesting article on that.

Marc

Peter Hucker · Jul 16, 2004

Both the 7K250 and the 180GXP are 7200 rpm disks!

Bugger.

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

You have reached the CPX-2000 Voice Blackmail System. Your voice patterns are now being digitally encoded and stored for later use. Once this is done, our computers will be able to use the sound of YOUR voice for literally thousands of illegal and immoral purposes. There is no charge for this initial consultation. However our staff of professional extortionists will contact you in the near future to further explain the benefits of our service, and to arrange for your schedule of payment. Remember to speak clearly at the sound of the tone. Thank you.

Peter Hucker · Jul 16, 2004

That also comes into play for really long queues. But the main factor
is the lower seektime.

You can also substitute a 10.000 rpm SATA Raptor for the 10.000 rpm
SCSI disks. That raptor doesn't have extra command queueing
capabilities compared to a 7200 rpm SATA disk, but you will still see
much higher IO/s.

Command queueing can actually hurt performance if the queues are not
long. (which is usually the case on desktop) Storagereview had an
interesting article on that.

Hmmmmmm, didn't know that. Anyway SCSI is a STUPID price - there is no way they cost that much more to make.

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

More than 10,000 people in England and Wales required professional treatment for injuries caused by home telephones in 2002.

chrisv · Jul 16, 2004

Peter Hucker said:
Bugger.

Good word, that.

Marc de Vries · Jul 16, 2004

Hmmmmmm, didn't know that. Anyway SCSI is a STUPID price - there is no way they cost that much more to make.

IMO the reason for that pricedifference is very simple.
SCSI disks are used in servers. People buying servers using SCSI buy
them using the companies money instead of their own.

People are always more willing to spend the companies money than they
are willing to spend their own money.

The same with Xeon processors. Especially the VERY expensive PIII
Xeons that had 2MB cache. In a best case scenario it give you only 10%
performance gain, while the CPU was twice the price of the 1MB
variant. But the 2MB version was the default in our organization.

Also look at all 19" rack components. A simple tapedrive enclosure
consisting of nothing more than a power supply, a single scsi
connector and cable internally and a sturdy steel casing costs a
fortune.

Still a SCSI disk can be cost effictive in some situations. Especially
with high end database servers you really need the extra performance
from scsi disks, and then it can make such a big difference that you
want to spend the extra money.

But for fileservers scsi is really a waist of money. Sure, a SCSI disk
has a better MTBF, but having a few hotspares for your raid arrays
will solve that too, and it will still be cheaper.

But if you buy your servers from the big brands (HP/Compaq, IBM, Dell)
you really don't have a choice. They don't offer hotpluggable PATA or
SATA in their servers.

Marc

Peter Hucker · Jul 16, 2004

Good word, that.

Depending on your orientation and your location it may not be.

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

Why are hemorrhoids called "hemorrhoids" instead of "asteroids"?

J. Clarke · Jul 16, 2004

Marc said:
Absolutely not!!

Just look at the difference between SCSI disks and IDE disks.

Modern 7200 rpm IDE disks have nearly the same transferrates as 10.000
and 15.000 SCSI disks. But SCSI disks reach much higher IO/s. For this
reason SCSI is preferred for busy servers and desktops.

The reason for this is very simple. SCSI disks have lower seektimes
because of the higher rpm. The lower seektimes give higher IO/s which
most applications benefit from.

I have a minor nit with this statement--what's shorter on the high-RPM
drives is the _access_ time. Seek time is only a part of that and what the
high RPM decreases directly is "latency", which is the time required for
the location on which the data you are attempting to access to rotate from
wherever it is when the head has stabilized to the position of the head.

High-RPM drives also often have slightly reduced seek times, but this is not
directly due to the high RPM and is instead a secondary effect--many 10,000
and 15,000 RPM drives have reduced-diameter platters so the seek distance
from innermost to outermost track is shorter than for a 7200 RPM drive,
which results in a slightly faster seek time.

<snip>

dg · Jul 16, 2004

You can also substitute a 10.000 rpm SATA Raptor for the 10.000 rpm
SCSI disks. That raptor doesn't have extra command queueing
capabilities compared to a 7200 rpm SATA disk,

Are you sure about that?

Thanks,
--Dan

Folkert Rienstra · Jul 17, 2004

Marc de Vries said:
Marc de Vries said:

[snip]
I use multiple spindles to get higher IO/s, not to get higher transfer
rates. So I have never checked the performance gain on transferrates
on my servers.

But I have a promise Sata Raid5 card in my desktop at home, and I
definitely get a much higher transferrate from that array of 4 disks
then from a single drive.

There are also lots of reviews and people on ng where two disks are
striped with onboard el-cheapo raid cards where the transferrate is
much higher than with a single disk.

So I'm suprised that you see about the same transfer rate. with two
striped disks.

But very little applications benefit from higher transferrates.
When applications benefit from raid arrays it is usually because of
higher IO/s.

Click to expand...

Which of course is one and the same if you look at speed only

Click to expand...

Absolutely not!!

Try to read and apprehend.

Just look at the difference between SCSI disks and IDE disks.

Modern 7200 rpm IDE disks have nearly the same transferrates as 10.000
and 15.000 SCSI disks.

No, they dont. That is STR, Sustained Transfer Rate.
I said transfer rates. Or to put it simply: MB/s.

But SCSI disks reach much higher IO/s.

That is 2-fold: access time and bus use efficiency.

For this reason SCSI is preferred for busy servers

Exactly, multiuser environments with small record parallel IO.

and desktops.

Nope. Unless they run with server oriented OSes, OSes that
do Parallel IO and make a mess of basically serialized IO.

The reason for this is very simple. SCSI disks have lower seektimes
because of the higher rpm.

Because of smaller diameter. The rpm reduces latency.

The lower seektimes give higher IO/s

On random IO, not sequencial IO.

which most applications benefit from.

Applications (or rather OSes) that do random parallel small record IO.

This is nonsense!

Never went to prep shool? It is very simple arithmetic, you see.
What's the point of IO/s is better if it is not higher MB/s.

I have the impression that your are trying to confuse the discussion
on purpose.

Now wouldn't that suit you to get out of your predicament?
You already strayed from the subject, which is RAID, not SCSI.

When you only have very larges files, you can have low IO/s and high
MB/s.

Now you've just explained what I said.
You've also just explained that low IO/s isn't necessarily bad.
Good or bad depends on the size of the IO AND the number of it.

You can even saturate the bus with high MB/s and still have very
low IO/s. And in that situation transferrate is important.

Again you confuse transfer rate with STR.
Transfer rate is transfer rate, MB/s, be it sequential (STR) or random (average).
STR is trictly sequential. In sequential access STR is important.
In random data access, access time is (also) important.
Transfer rate is a function of both STR and access time.

But in REAL LIFE that is almost never the case.

Just what I already said before. None of my applications needs high
transferrate. Most benefit from high IO/s (thus low seektimes)

Which translates into relatively high random access tranfer rate.

Users aren't the least bit interested in access time or IO/s.
They are interested in how long it takes for their data
to transfer and that is decided by transfer rate, i.e. MB/s.
Access time and IO/s are incorporated in the transfer rate.

Wrong again. IO/s is what is important.

Only for comparison. For a multiuser environment IO size defines
the largest possible IO command for a given transfer that can be
round-robin distributed so that any user gets a fair chance of his
IO being handled within a limited time. IOs are just units of measu-
rement used for comparison as far as performance is concerned.

That is why IO/s is always measured when people benchmark a disk
or array controller for server usage.

Thank you for confirming what I just said: for comparison.
And then only if the IO size is the same in the comparison.

Hint: IBM, Compaq/HP, Dell etc all look at IO/s for their servers and
treat MB/s seperately for applications.

Now why would they do that if MB/s and IO/s is the same,

I never said it's the same, all I meant to say is that IO/s is also MB/s.
MB/s is absolute, IO/s is dependent on the size of the IO and how that
affects the distibution of IO and how that affects the number of seeks.

and why would they measure IO/s ?

I have no idea.
The number is bloody useless if you don't know what the IO size is
and how much seeks are involved depending on the size of the IO.
Perhaps because benchmarking in a multitasking/multiuser environment
is already a somewhat risky affair, so you focus on other things and
other units of measurement? You tell me.

Random IO benchmarks do a random seek for every IO, so IO size
changes the number of seeks for a given number of MB read,
and IO/s is therefor very much dependant on the IO size.
There is no linear dependence.
However, if you change IO size on a system that does mainly sequen-
tial IO, changing IO size increases the number IOs but doesn't in-
crease the number of seeks and dependence is therefor mostly linear.

It's a very crude way of comparing IO performance with but for all I
know it it is the least useless one and may actually give you some room to
fiddle with instead of some absolute numbers that only apply to a single
way of measurement.

when you think that IOs is more important.

Again that is not at all what I said. You took that totally out of context.

IOs, not IO/s, is important to servers to get the most out of the
bus (Queueing, reordering) and to distribute IO fairly among users
(cutting up potentially larger IO into smaller IO) for response
continuity, at the same time.

Hint: They are not wrong. What does that tell you about your opinion?

Now I suddenly have an opinion? Bother to tell what my opinion is?

Folkert Rienstra · Jul 17, 2004

Geez, took me a whole day to respond. This is very tricky stuff.

Marc de Vries said:
Marc de Vries said:

[snip]

PATA and SATA controllers don't have this issue since you have a
dedicated ide channel for every disk.

Not with PATA.
It is still *your* choice to not use a second drive on same channel.

I have never seen a PATA Raid5 controller where each drive didn't have
a dedicated ide channel.

Click to expand...

Which obviously is entirely firmware limited unless the IDE chip used is
selfdesigned and doesn't support M/S.

Click to expand...

Sure, but that doesn't matter. When the firmware in the Raid5 cards
don't give you that option, you have little choice but to accept that.

Adding a second drive will simple not work. So it's NOT your choice to
use a second drive on the same channel with Raid5 cards.

What has that got to do with hotplugging?

Click to expand...

Because You can't hotplug with two devices connected on IDE.

Yes, you just repeated what you said without explanation.

Actually hotplugging is not supported at all on IDE, but as long as
there is only one device present some companies can work around that
limitation.

So what is that 'limitation'.

You are responding to my statement.

You obviously got that wrong as it was *you* who responded to *my* statement.

Now I have to assume that you wanted to make a point with that reply.
Aparantly I was wrong in assuming that.

Yes, as pointless as it is now. You seem to enjoy repeating yourself.

Your remark suggests that there was something wrong to what I said.

Nope, *your* remark suggested that there was something wrong to what *I* said.

(especially when someone who doesn't know how raid works reads it).

I second that.

So I had to make it clear for everyone that it does apply to Raid5.

Which is a striped array, like I said.

I get it that you are making pointless remarks that have absolutely
nothing to do with the thread.

Ah yeah, and now it is my fault how this thread suddenly grew from
a 4kB post to a 10kB post to an 18kB post. Who are you trying to kidd.

If you had said that the transfer rate increase doesn't apply to Raid1
you would have added something sensible.

Well, apparently it would have been clearer for you.

Most people that read the replies would just seeand conclude that the one excludes the other.

Although it still doesn't apply to what we are talking about here: Raid5.

And now you contradicted yourself when you earlier said that it does.

Support the necessary functions to make this work. For instance in a
Raid1 array the function that makes it possible to read a file from
disk2 while another file is being read from disk1. But there is no
need to dive into details here.

Not for mirrors, no. That one is crystal clear.

I've explained it to you in the past, but you didn't listen then,

It is not uncommon that people run away from a conversation when so-
meone maintains that the earth is flat and the posts grow exponentially.

so I will not explain it to you again.

Well you did it anyway, at the end.

A more detailed explanation than I have already given in the thread
here isn't interesting for the discussion.

Yes it is when the assumptions are wrong.

Although I might give it if the op asks for it.

Pffff. You obviously have no idea what you are talking about.

That is the standard troll response when they have been caught.

Actually, I do. You use official words that have a specific meaning and
then you (mis)use them for something else, changing that official meaning.

The seektime for reading a single file from the array stays the same
as on a single disk.

There you go.
Seektime is the time spent to seek from a specific or random position to
a specific or random block, not time spent seeking in the total retrieval
time of a file.
Minimum seektime, maximum seektime and average seektime are specific
to a (=one) block, not a file. It's a measurement range for a single seek.
And apart from that you mean access time, not seektime.

You misuse the terms. If you want to convey something, use the proper
terms or at least refrain from using improper terms.

But total time *spend on seeks*

Now you got it.

when you read many files on the array is much lower than when you
read many files on a single disk.

Though you got that wrong. When you read from a striped array you
read in parallel (and at higher combined transfer rate).
When you read from a single disk you read in a serial manner.
The fact that you can only read the second file after the first one
does not constitute a seek.

Now if you read in parallel fashion from a single disk by issuing parallel
IOs for several files at once, that then get serialized and scattered, you
may have a point.
Reading several sequential files at once may result in random(ish) retrieval
of said files if they are bigger than the maximum IO size, and the smaller
the IO size or the bigger the files, the more accesses (seeks+latencies) are
possibly involved.

On the other hand, if the disk supports tagged queueing (or the OS does
some reordering in the system queue) the queue will get reordered and
the files read one after the other without that overdose of extra seeks.

I've explained in simple terms how that works.

Far too simple when you leave out some essentials.
Then it is not really an explanation.

In my opinion perceived seektime is a good way to talk about this
lower total seektime.

It's confusing and on top of that, it's wrong.

But if you know a better word I welcome you to tell me what
word I should use.

Increased transfer rate, because that is what lower seektimes or the
absence of unnecessary accesses is leading to.
And the performance gain is twofold as parallel transfer also increases
performance. What contributes the most depends on the IO size.

The facts are

Are they? See later. *

that the total seektime

I would call that 'aggregated seektime' if you must use the word "seektime".

is reduced

so I have a performance gain.

That's not disputed, now is it.
But performance constitutes the number of data transferred per time
unit and that's MB/s. Both STR and access time play a role in that.

So there are no false assumptions.

Then it is not perceived.

If you don't know how raid arrays work, you could make that conclusion,
but you would be completely wrong and you would be saying
something completely different from what I am saying.

Wrong again. You just don't get it.

Users get it.
They just notice how their files transfer faster and that is what increa-
sed performance means to them, more MB/s aka better transfer rate.
They don't give a damn about milliseconds access time or IO/s.

When I open small files the total time to read the file is determined
for 90% by the seektime.
Increasing the transferrates can therefore never give a large
performance gain.

You can't increase transfer rate, transfer rate is what you get. You can
influence it by better access time and better sustained transfer rate.

But because you can read multiple files at the same time you are effec-
tively reducing the seektime

*
No, you are reintroducing parallel reading of files that by themselfs
wouldn't have parallel read because they are to small to fill a stripe.

and are creating a huge performance gain.

Actually you are pulling a long nose to the operating system that intro-
duced all those seeks in the first place by cutting files up in small IO
and then parallelized them.
Actually you are taking your performance back again that was trashed
by the OS in the first place.

Again a simple example.

Oh, where is the first one?

I can read two small files at once from the array.

That depends.

90% of the time T to read that small file is spend on the seektime.

So we are talking 52MB/s, 12 ms access, 13 ms total transfer time
amounts to 4kB files. So we are talking *VERY* small file.

Now I read two of those files from a single disk.
The total time spend is: 2 x (0,9 x T + 0,1 x T) = 2T

The transfer rate is 1/13 * 52MB/s = 4MB/s

Now I quadruple the transferrate of that disk by putting them in a
raid array.

So let's assume 4 drive Raid0 then.

Now for 10% of time T I need only a quarter of the time.

If your stripe size is 4K or less.

Now the total time needed is 2 x (0.9 x T + 0.025 x T) = 1.85 T

I believe that that is not entirely correct as you have now two
different values of T in the same equation, but that aside.

So although I made a huge transferrate increase

No, you didn't, you made a huge STR increase.

I got only a very small performance gain.

Yes, you got only a small *transferrate* gain.

Now I don't do anything about the transferrate of the disk when I put
it in a raid array.

Of course, you do. But let's go with it for the moment.
And let's assume 2 drive Raid0.

I just read both files at the same time, as I can do with larger arrays.

Or with any (striped) array. Let's assume you meant larger number of
drives in the array instead of larger size array. It's not the size of
the array but the size of the stripe that makes it possible or not.
And it depends on where in a stripe these files sit.
With that 4KB they may well sit on a single drive but there's a 50-50
chance with a 2 drive array that they sit on the same one.

(or with a raid1 array)

Now that makes sense.

Now the total time to read both files is:
1 x (0,9 x T + 0,1 x T) = 1T

If you are lucky.
The transfer rate per file is still 4kB/s though.

Now I have made a 100% performance gain

Oh? Where is it when the transfer rate per file is still 4kB/s?

and it is not because of the transferrate.

Yes, because of the transferrate. Just not because of the STR.
If both files were destined for you then you have doubled the
transfer rate for the total of them (8kB/s). *

We all know (or should know) that seektime is the
limiting factor when you are reading small files.

So we have a huge performance gain. It is the sort of performance gain
we would get when we halve the seektime.

(Or halve the transfer time, i.e. double the transfer rate).

Yes, but only 50% of the time and for files smaller to equal than half
the stripe size or more generally stripe divided by number of drives.

It's not an actual seektime decrease, but real-life performance
figures of the array make it look like the seektime has been halved.

No, small files aren't read in parallel and transfer at single drive speeds.
Combining small files restores parallel reading and aggregating back
transfer rate.

So the array behaves as if the seektime has halved,

No, it behaves like the files are bigger and behave like striped instead
of some form of JBOD. and transfer time has halved.

and thus I state that the "perceived" seektime has decreased.

No, combined transfer rate has increased because transfer time was decreased

No you can't.

Yes, you can. When you allow your "perceived", then you have to allow
my "perceived" as well when both are wrong.

Simply because a perceived seektime decrease corresponds well with the
actual performance figures and an increased transferrate does not.

And that is not what I said.
Transfer rate (rate of transfer, MB/s) IS performance.

Even when I increase the trasnferrate a million times the minimum time
to read those files will still be 1.8T
And for large files we also know that the transferrate isn't that big.

Again, I know what you mean to say but thats not what you say.

A perceived seektime does work well with the real-life data:
For large files it has little impact and for little files it has a big impact.
Exactly what we see on our array:
For large files the performance increase is determined by the
transferrate increase and not the perceived seektime and for small
files it's just the other way around.

Nope.
What you see is what you see but it not what you contribute it to.
Combining files brings back striping (parallel reading) to small files.

Yeah, I forgot that I was talking to someone who doesn't WANT to
understand.

Well, then we must have something in common because I am under the
exact same impression that you don't want to understand what I am
saying. Part of the confusion I believe is just in the language and
definitions of words.

So yes.
Hint: We are talking about Raid5 arrays and not mirrors.

You were and I changed it to mirrors.
Generally mirrors have better access time because one drive is always
closer to the data than the other drive so average latency on a mirror
is less (less than 1/2 rev) than on a single drive.

That's why I said "on mirrors".

My statement was about Raid5 arrays. Of course such a statement cannot
automatically be applied to all other type of arrays.

I've never stated that it could be applied to a two disk mirrorer array.

Now you've lost me again. Assuming that the implementation does allow it,
that is where it works all the time irrespective of number of drives or stripe
sizes or what ever.

Peter Hucker · Jul 17, 2004

Are you sure about that?

HOLY SHIT! Are those raptors really 3 times the proce of 7200s or am I looking in the wrong places?

I can't believe they are 3 times faster.

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

Q: What do you call the space between Pamela Anderson's breasts?
A: Silicon Valley.

"Parity Error Detected" message when running Intel Storage Console.	1	Nov 5, 2009
[Adaptec 2610SA] Add HDD to existing RAID 5 array	17	Sep 16, 2009
SCT Error Recovery Control onj Seagate?	0	Aug 24, 2014
External Storage for Laptop	7	Feb 21, 2010
What should i choose RAID 5 or RAID 10	3	Dec 14, 2006
IDE RAID performance	14	Jul 10, 2005
Need advice on configuring raid array - or should I	8	Mar 26, 2006
Getting a RAID-0 with SATA and a RAID-1 with IDE / ATA to work again	1	Oct 12, 2007

RAID 5 Question

Marc de Vries

Marc de Vries

Peter Hucker

Folkert Rienstra

Peter Hucker

Marc de Vries

Marc de Vries

Peter Hucker

Peter Hucker

chrisv

Marc de Vries

Peter Hucker

J. Clarke

dg

Folkert Rienstra

Folkert Rienstra

Peter Hucker

Ask a Question

Similar Threads