Hi,
I'm trying to decide between these two options, for a server
running Linux (RedHat 9, most likely), which will be used as
a database server, running PostgreSQL (version 7.4, most
likely).
I don't have experience with Linux and PostgreSQL, but I do have some
experience with MS SQL and lots with Exchange which are also databases
and should have similar hardware demands.
We are talking about peak load in the order of 20 or 30 inserts
per second, with clients connecting from three different
places.
The machine is a dual Athlon 2GHz with 1GB of memory.
Now, the big question is: what hard disk configuration should
I choose for this server?
The first question is if the harddiks will be a bottleneck at all. Do
you have experience with other machines running a similar
configuration?
Often a lot of memory will make make load on the harddisk a lot lower.
But this depends lot on the size of the database and the way it is
used. If the database easily fits in memory the harddisk is not used
all that much.
I've been always reluctant to use SCSI; but I keep reading
that for server use, it's almost like it's not even a question,
it must be SCSI, period, by definition.
IDE was not suited for servers in the past. This is the main reason
SCSI is the defacto standard in servers. The server market is very
conservative and people more easily spend the money of the company
then their own money, which is another reason SCSI has stayed the
standard.
Still there are good reasons to choose SCSI is you are willing to pay
premium prices.
- SCSI has much longer cables then IDE. Which can be a pratical issue
in larger servers. Especially when you have 19" racks with external
storage enclosures. (SATA2 will change all that)
- Until recently you only had Raid controllers for SCSI. Which is
considered necessary for servers also. There are good RAID controllers
for IDE for small servers nowadays. In ranges for 4, 8 or even 12
disks. But if you want to have more disks on your array controller you
still need SCSI.
- High end SCSI array controllers (which are VERY expensive) have more
reliability features then IDE controllers. For example external power
supplies in case the mobo fails, and onboard batteries for the ram on
the array controller. All designed so that you can have write back
cache on the controller, and not loose data when power fails in your
server.
- Because SCSI is de defacto standard for servers, manufacturers have
made the more reliable harddisks mainly for SCSI. There is no
technical reason why they couldn't have made IDE harddisks with the
same MTBF, but there simply wasn't a market for it.
This has changed recently with the WD Raptor 10.000 rpm IDE harddisk
and the Maxtor Maxline II 7200 rpm harddisk. The WD Raptor is pretty
expensive though. (but SCSI is still more expensive)
Then again, reliability of harddisk is not all that important anymore
when you use Raid arrays. Even when a IDE disks fails twice as fast,
it is still a lot cheaper.
But I have the impression that these kinds of features of SCSI are not
really what you are interested in.
Still the most important reason to choose SCSI is because of the very
low access times. This again is not caused by the SCSI protocol
itself, but by the high rpm values. A 15000 rpm SCSI has a much better
access time then 10.000 rpm SCSI or 7200 rpm IDE disks.
This access time is important for Random read/write operations.
Which doesn't happen a lot on your desktop, or even on fileservers and
webservers.
SCSI drivers have simple cache management which is optimized for
randrom read/write operations. IDE drivers cache management is
optimized for typical desktop use.
This is why 7200 rpm IDE drives will regularly beat 10.000 rpm SCSI
(and sometimes even 15.000 rpm SCSI) drives in desktop situations.
But a database often has lots of random read/write operations.
This is what makes a database the best case scenario for a SCSI disk.
(I'm talking about comparing single disks here without looking at the
huge price difference)
Considering the option of 3 IDE drives with RAID 5, how do
things compare? 3 IDE's with RAID 5 are a bit more expensive
than a single SCSI (we're talking about renting a dedicated
server, so we're talking monthly fees).
For this kind of use, should I be expecting better performance
from the IDE-based RAID 5 configuration?
It depends.
RAID5 is mainly designed to have affordable redundancy.
It is fast when you read data, but the XOR calculations for RAID5
makes it slow for writing data.
This is a feature of RAID5 itself, and thus applies to both IDE and
SCSI Raid5 controllers.
If you want ultimate performance, and also redundancy then you should
use RAID10. (or RAID0+1)
You would need 4 disks of which two are used for data, instead of 3 of
which two are used with Raid5, but is is a lot faster when writing and
usually also a bit faster when reading. The controller itself can be a
lot cheaper.
4 IDE disks with a Raid10 controller are probably cheaper then 3 IDE
disks with a Raid5 controller.
4 IDE's in Raid10 will be faster than a single SCSI in most situations
and have the added benefit of redundancy. There have been some review
sites that have tested this, but I can't remember them just now.
It has probably been anandtech and/or xbitlabs.
What about two or three SCSI drives without RAID? Will this
be better than the IDE RAID 5 option? (2 SCSI drives of
36G each cost almost the same (a little bit more) than the
three IDE's with RAID5 configuration).
Regardless of SCSI or IDE you should also consider if you want RAID or
want multiple seperate disk volumes.
In lots of databases you have a seperate disk for data and a seperate
disk for logfiles. When a lot of data is written to the logs files, it
gives a huge performance boast to place those logfiles on a seperate
disk, so that the database disk can do other tasks at the same time.
Being able to do that may well be the most important reason to choose
multiple IDE disks over a single SCSI disk.
Of course you might also opt for a single IDE disk for the logfiles
and a IDE RAID0+1 volume for the database files.
For smaller servers there will be lots of situations where you can
configure a server with IDE that will perform just as well as SCSI for
a lower price. Or configure a server that will perform better then
SCSI for the same price.
But when the database gets more demanding there will come a point
where a single SCSI disk, or a few IDE disks in a Raid volume can't
deliver the performance your database wants.
At that time you have to consider buying SCSI raid controllers.
So, as you see, SCSI still has it's place in servers, but it's not
that easy anymore to determine when it is the best solution.
I hope this will give you some more ideas on how to configure your
server.
Also you might want to check some reviews:
http://www.tech-report.com/reviews/2003q2/10k-comparo/index.x?pg=1
http://www.tech-report.com/reviews/2003q4/cheetah-15k/index.x?pg=1
http://www6.tomshardware.com/storage/20031114/index.html
And there are lots more that are interesting.
Marc