Marc de Vries said:
Hi,
I'm trying to decide between these two options, for a server
running Linux (RedHat 9, most likely), which will be used as a
database server, running PostgreSQL (version 7.4, most likely).
[snip]
I've been always reluctant to use SCSI; but I keep reading
that for server use, it's almost like it's not even a question,
it must be SCSI, period, by definition.
IDE was not suited for servers in the past. This is the main reason
SCSI is the defacto standard in servers. The server market is very
conservative and people more easily spend the money of the company
then their own money, which is another reason SCSI has stayed the
standard.
Still there are good reasons to choose SCSI is you are willing to pay
premium prices.
- SCSI has much longer cables then IDE.
Which works for JBOD but not RAID.
SCSI is at it's limit at 4 drives per channel, bandwidth wise.
Actually it is between 7 and 4 as it depends on the individual drives
STR whether the combined STR will exceed the channel bandwidth.
Four is the limit for the fastest drives in its range.
Combined STR isn't all that interesting as people think it is.
It is with RAID.
No.
But lots of peopel think that it is.
Ever considered that they might be smarter than you are?
Yeah. just like billions of flies can't be wrong. Eat shit!
Lots of people talk about this without understanding what is going on.
So many people in fact, that Storagereview puts specific warnings on
their website that STR is NOT interesting at all for harddisk
performance.
Unfortunately you belong to those people
Have you ever considered that someone who has hands on experience with
high end array controllers in demanding servers might know more about
this subject than you do?
There is lots of good info on the net that confirms what I told you.
Nope. Resultant access time it the slowest time of the set.
I am of course talking about effective access time when reading
multiple files. (which should have been clear from my earlier posts)
It is EXACTLY the same reason why RAID1 is faster.
When reading a single file, RAID1 is not faster either. In that case
the access time is also the slowest time of the set.
It's when you read multiple files that RAID1 becomes faster. Exactlty
the same as with Raid5 and Raid0.
Nope, access is to all drives at once for the same IO.
Otherwise, what's the point of RAID0.
The point of Raid0 is that it does BOTH.
When a file gets written to Raid0 it is spread over multiple disks.
But it is not necessarily spread over ALL the disks. That is dependent
on the stripe size you specify.
Take a large stripe size. This will mean that a very small file is not
spread over multiple disks at all, but written to only 1 disk!
A file that is a bit larger might be spread to just 2 disks.
That means that the other disks are idle, and a good array controller
can read/write to those disks as the same time.
You can see it graphically here:
http://www.pcguide.com/ref/hdd/perf/raid/concepts/perfStripe-c.html
Now, if your only aim is to improve transfer rates then you want small
stripe sizes.
In general, large stripe size gives you better performance. Even with
affordable IDE array controllers.
In theory there is an optimal stripe size and going higher will
decrease performance. But the maximum stripe size the array controller
offers you is usually well below the point where performance starts to
decrease again.
Unfortunately the stripe size is often not tested when sites test
array controllers. But as you can see from a thorough test by Anantech
the stripe size has a significant impact on performance.
http://www.anandtech.com/storage/showdoc.html?i=1491&p=18
I'm sorry that you didn't feel the need to think abou my advice before
you ignored it.
I hope you will now understand from my post (or from the numerous
articles about this that you can find if you google with the terms
raid and stripe size) that you were wrong.
Too ridiculous to even comment on.
I understand why it might seem ridiculous to someone who doesn't have
indepth knowledge of raid.
But you can see clear examples of it here:
http://www6.tomshardware.com/storage/20011023/raid-05.html
Would you care to comment on those results, with three different raid
controllers (cheap ones even), that you didn't think were possible?
But it nevertheless is a limiting factor.
No. It is something which CAN be a limiting factor depending on the
configuration you have.
It's something you should be aware of, but it's not as big a limiting
factor as you make it.
Oh good.
Nope, what is true on the macro level is also true on the microlevel.
When you do enough random access transfers you will eventually
find it back in the numbers there too.
Often enough.
STR is STR, it is one number, it cannot be "low"(er) or "high"(er).
(There are different STRs per different zones though).
STR is nothing else then sustained transfer rate. You refer to the
theoretical maximum STR.
But that's just nitpicking on names and doesn't add to the discussion.
What is important is the fact that the bandwith needed by the disks is
far below the maximum STR it can achieve.
If you had read it back, yes. But you are not that kind of guy, are you.
If I had read your replies more thoroughly, I should indeed have
realised that you don't have indepth knowledge of how raid works, and
that you wouldn't understand my first post aimed at someone with more
experience with raid.
It's not, nor did I ever say that it was, so, speaking of false assumptions,
I really don't know where you got that from.
You are contradicting yourself.
If STR is not all important for disk performance, it can't be all
important for arrays either.
Assuming U160, presumably.
Try to think before you type the next time.
We are talking here about the rate at which a harddisk can read bytes
from a platter. This has NOTHING to do with the interface.
Right, and when you now have 5 of those drives in RAID0 it could
theoretically be able to transfer that same amount of data in 1 ms
in stead of 5, except that the SCSI U160 channel can't transfer
that data (200 MB/s) in 1 ms.
Nice try to confuse the situation, but I won't fall into that trap.
As was perfectly clear I was talking about 1 drive and not an array.
If you use 5 drives that each transfer 40kB and the array does nothing
else at all, than just transferring that single file, then it will
indeed saturate the scsi bus for a short amount of time.
But now comes the part were you go wrong:
We are talking about arrays that do more then just transferring a
single file. We are talking about arrays in servers that are busy.
with all kinds of different tasks.
Again it is the same as the Raid1 scenario.
When you read a single file a Raid1 array is just as fast as a single
harddisk. But when you read lots of files, a Raid1 array can be nearly
twice as fast as a single disk.
This happens with Raid0 and Raid5 as well.
You have to look at more then just the STR. I point again to the
stripe size benchmarks.
As you can clearly see the larger stripe sizes are faster, even though
your example of the single file would predict that it would be slower.
Again this why I constantly tell you that you have to look at the
configuration in which you want to use raid.
Your examples work fine for videoediting were people are transferring
single large files. But they don't work for (database) servers where
you have lots of random read/write actions simultaneously.
So although the 200kb could be transferred theoretically in 9 ms
from the start of the command, giving an average transfer rate of
22 MB/s, which is way below U160's bandwidth as you put it, it is
actually transferred in around 9.5 ms, averaging at 21 MB/s instead.
So, your reasoning about used and available bandwidth is flawed.
Like I said before, you don't get it.
Again you don't get it, that there is a large difference between using
raid for videoediting, or for large servers.
I have never denied that large transferrates aren't important for
videoediting scenarios.
You are referring to situations that apply to the videoediting
scenario and have nothing in common with servers.
What you have done is proof that you don't want 8 disks on a single
scsi channel when you do videoediting. You are right there. But I have
NEVER suggested otherwise.
But since this sort of usage doesn't happen on servers, you have not
given proof that you don't want that on servers.
STR is measured by pure sequential access.
When access is not sequential, you obviously can't speak of STR.
Whatever you want to call it STR or maximum STR does not matter here.
You just use that as a tactic to avoid the real issue, that you didn't
answer.
And with lots of other applications on non-fragmented volumes.
You are forgetting that we are not talking about desktops here.
This discussion is about databases! It does not apply to databases!
Really? What exactly did you think that
"That is just other words for difficulty to read higher densities."
line was all about?
Oh god. You are really utterly stupid.
WHAT DID YOU THINK MY POST BEFORE THAT WAS ABOUT?
I started about difficulty to read high density as high rpm values.
Then you reply to that with: "that is just ...." which doesn't add
anything at all!
And then you start acting silly.
Is this really the level of discussion that you like to have? Then
you'd better find someone else, because I'm not planning on lowering
myself to your level.
Then another remark before I let you live on your fantasy world again:
You don't give any proof of your claims. I have given you several
references that confirm my claims and reasoning. Also I have hands on
experience with the kind of arrays we are discussing. You can tell me
a thousand times that my array can't perform well, but I can see every
day that it performs great.
If you want to be taken seriously by people I strongly suggest you
learn how to substantiate your claims.
BTW don't bother replying to this because I won't read it.
I've gotten tired of discussing with people that just make empty
claims. The only reason I have replied so far is so that the OP gets
the correct information, and has resources on internet where he can
verify it. I don't care anymore what you think about it.
Marc