building on your own a large data storage ...

  • Thread starter Thread starter lbrtchx
  • Start date Start date
L

lbrtchx

Hi,
~
I need to store a really large number of texts and I (could) have a
number of ATA100 S.M.A.R.T.-compliant hard drives, which I would like
to use to somehow build a large and safe (RAID-5?) data store
~
Now I am definitely more of a software person (at least
occupationally) and this is what I have in mind:
~
* I will have to use standard (and commercially available (meaning
cheap ;-))) x86-based hardware and open source software
~
* AFAIK you could maximally use 4 hard drives in such boxes
~
* heat dissipation could become a problem with so many hard drives
~
* I need a reliable and stable power supply
~
Should I got for ATA or SATA drives and why?
~
You could use firewire and/or USB cards to plug in that many
harddrives. Wouldn't it be faster/better using extra ATA PCI cards?
What else would it entail? How many such cards could Linux take?
~
People in the know use software based RAID. Could you give me links
to these kinds of discussions?
~
What would be my weak/hotspot points in my kind of design?
~
Any suggestions of the type of boxes/racks I should use?
~
Is this more or less feasible? What am I missing here? Any other
suggestions? or intelligent posts in which people have discussed these
issues before? I found two in which some people have said a few right
and some other questionable things:
~
comp.sys.ibm.pc.hardware.storage: "2 TB storage solution"
comp.arch.storage: "Homebuilt server (NAS/SAN) vs the prefab ones?
Peformance"
~
Do you know of any such "do-it-yourself" projects out there?
~
thanks
lbrtchx
 
Hi,
~
I need to store a really large number of texts and I (could) have a
number of ATA100 S.M.A.R.T.-compliant hard drives, which I would like
to use to somehow build a large and safe (RAID-5?) data store
~
Now I am definitely more of a software person (at least
occupationally) and this is what I have in mind:
~
* I will have to use standard (and commercially available (meaning
cheap ;-))) x86-based hardware and open source software
~
* AFAIK you could maximally use 4 hard drives in such boxes
~
* heat dissipation could become a problem with so many hard drives
~
* I need a reliable and stable power supply
~
Should I got for ATA or SATA drives and why?
~
You could use firewire and/or USB cards to plug in that many
harddrives. Wouldn't it be faster/better using extra ATA PCI cards?
What else would it entail? How many such cards could Linux take?
~
People in the know use software based RAID. Could you give me links
to these kinds of discussions?
~
What would be my weak/hotspot points in my kind of design?
~
Any suggestions of the type of boxes/racks I should use?
~
Is this more or less feasible? What am I missing here? Any other
suggestions? or intelligent posts in which people have discussed these
issues before? I found two in which some people have said a few right
and some other questionable things:
~
comp.sys.ibm.pc.hardware.storage: "2 TB storage solution"
comp.arch.storage: "Homebuilt server (NAS/SAN) vs the prefab ones?
Peformance"
~
Do you know of any such "do-it-yourself" projects out there?
~

Have a look at OpenFiler. It's a linux based iSCSI, NFS and SMB appliance.
Takes about 10 minutes to install and configure. I'm using it as iSCSI
shared storage (pseudo-SAN) between a couple of ESX Servers and it works
fine.
 
["Followup-To:" header set to comp.sys.ibm.pc.hardware.storage.]
I need to store a really large number of texts and I (could) have a
number of ATA100 S.M.A.R.T.-compliant hard drives, which I would like
to use to somehow build a large and safe (RAID-5?) data store
~
* I will have to use standard (and commercially available (meaning
cheap ;-))) x86-based hardware and open source software
~
* AFAIK you could maximally use 4 hard drives in such boxes

On a motherboard with 2 IDE ports, you cannot make a 4-disk
RAID-5 array because doing I/O on two devices on the same IDE
port gives poor performance.

You could make two RAID-1 arrays each having one disk on the
primary IDE port and one on the secondary IDE port. Performance
will still suck when you do I/O on both arrays at the same time
but when one array is idle, the other will work OK.

This is of course not as good as RAID-5 from a disk space/euro
POV.
Should I got for ATA or SATA drives and why?

SATA is better because 1) it doesn't have the master/slave
issues of IDE, i.e. if you have 4 SATA ports on your
motherboard, you *can* do a 4-disk RAID-5 array and 2)
motherboards with 8 SATA ports are easy to find.
* heat dissipation could become a problem with so many hard drives

I would not want to do it without adequate ventilation.
* I need a reliable and stable power supply

Fortron FSP-400-60GLN works for me. We have had issues with
Antec.
People in the know use software based RAID. Could you give me links
to these kinds of discussions?

The archives of the linux-raid mailing list (the administration
tool is called mdadm).
What would be my weak/hotspot points in my kind of design?

For me, the time was spent on
- understanding mdadm,
- understanding the trade-offs (partioning an array of disks vs.
making an array of partitions, using LVM or not, optimum
granularity) and
- hardware (how to fit 8 or more disks in a PC case with decent
ventilation).
Any suggestions of the type of boxes/racks I should use?

3ware make 3-disks-in-2-5.25"-spaces trays. They are expensive
and the fans they use die after about a year. When a fan goes
bad, the tray helpfully warns you about it by beeping loudly and
constantly. The fans are not the easiest to find (60 mm or some
such). Be prepared to hear a lot of beeping.

I made my own trays. It was a lot of work and they look ugly but
it was cheap and they do the job. The ventilation is superior to
commercial trays (120 mm fan than moves a lot of air quietly and
reliably).

One very important thing about RAID that too many people
overlook : don't make an N-disk array from N disks of the same
make and model bought the same day. Our sysadmin at work did and
both drives on a RAID-1 array failed within days of each
other...
 
In comp.sys.ibm.pc.hardware.storage Andre Majorel said:
["Followup-To:" header set to comp.sys.ibm.pc.hardware.storage.]
I need to store a really large number of texts and I (could) have a
number of ATA100 S.M.A.R.T.-compliant hard drives, which I would like
to use to somehow build a large and safe (RAID-5?) data store
~
* I will have to use standard (and commercially available (meaning
cheap ;-))) x86-based hardware and open source software
~
* AFAIK you could maximally use 4 hard drives in such boxes
On a motherboard with 2 IDE ports, you cannot make a 4-disk
RAID-5 array because doing I/O on two devices on the same IDE
port gives poor performance.

Well, you can, but expect no more than, say, 10MB/s.
You could make two RAID-1 arrays each having one disk on the
primary IDE port and one on the secondary IDE port. Performance
will still suck when you do I/O on both arrays at the same time
but when one array is idle, the other will work OK.
This is of course not as good as RAID-5 from a disk space/euro
POV.

I added a Promise IDE PCI (Ultra 100, I believe) controller and used
one disk per IDE channel. That works fine.
SATA is better because 1) it doesn't have the master/slave
issues of IDE, i.e. if you have 4 SATA ports on your
motherboard, you *can* do a 4-disk RAID-5 array and 2)
motherboards with 8 SATA ports are easy to find.

And the cables are better and you can get 4 port and 8 port
SATA controller cards.
I would not want to do it without adequate ventilation.

Airflow from the outside to each disk is needed, unless
you do a very careful cooling design.
Fortron FSP-400-60GLN works for me. We have had issues with
Antec.

Fortron, Antec, both not the best. I had a 500W Fortron die on me with
60% of the maximum load after a power outage, which 24 El-Cheapo PCs
on the same power-rail survived. I recommend Enermax. Very well
engineered and with good reserves.
The archives of the linux-raid mailing list (the administration
tool is called mdadm).

There is also a HOWTO, I believe. Anyways Linux software RAID is very
reliable and relatively easy to administrate (if you know what you are
doing). The HOWTO is a bit outdated, to you may also need the
mdadm man-page, but it will basically tell you most things you need:

http://tldp.org/HOWTO/Software-RAID-HOWTO.html
For me, the time was spent on
- understanding mdadm,
- understanding the trade-offs (partioning an array of disks vs.
making an array of partitions, using LVM or not, optimum
granularity) and

I strongly suggest using partitions of type 0xfd, because then the
kernel will auto-assemble the array on system start. For complete
disks you need some start-script or other, which also means you cannot
have the root-partition on the array. In additions these start-scripts
sometimes are unreliable. I found that LVM just adds uneccessary
complexity.

Also, you may want to have different RAID-sets on your disks.
One thing I used for a long time was the following:

Disk 1: 10GB 1/2 RAID 1 for system, rest 1/4 RAID 5
Disk 2: 10GB 2/2 RAID 1 for system, rest 2/4 RAID 5
Disk 3: 9.9GB 1/2 RAID 1 for home, rest 3/4 RAID 5
Disk 4: 9.9GB 2/2 RAID 1 for home, rest 4/4 RAID 5

The RAID5 being a shared data partition. I also put a
swap partition on disks 3 and 4 (100MB each). You
cannot do this whan using full disks.
- hardware (how to fit 8 or more disks in a PC case with decent
ventilation).
3ware make 3-disks-in-2-5.25"-spaces trays. They are expensive
and the fans they use die after about a year. When a fan goes
bad, the tray helpfully warns you about it by beeping loudly and
constantly. The fans are not the easiest to find (60 mm or some
such). Be prepared to hear a lot of beeping.
I made my own trays. It was a lot of work and they look ugly but
it was cheap and they do the job. The ventilation is superior to
commercial trays (120 mm fan than moves a lot of air quietly and
reliably).

There is a 4-in-3 mounting with 120mm fan made by Coolermaster.
Quite cheap, about 22 Swiss Franks, which is something like
15 USD/EUR. Ugly, but cools well and perfect for making disk-packs
that fit in standard PC cases. I have one of these in a server
running something like 3 years now and the fan is still fine.
It does fit in 3 standard 5 1/4" bays if you remove the plastic
side-rails. They are just screwed on. Mounting holes also match
any standard case. The fan should direcly blow in outside air.

Here is a link (the coolermaster website is broken both
in Opera and Firefox....). Sorry, it is German:

http://www.pcp.ch/Cooler-Master-4-in-3-Device-Modul-1a12170083.htm

One very important thing about RAID that too many people
overlook : don't make an N-disk array from N disks of the same
make and model bought the same day. Our sysadmin at work did and
both drives on a RAID-1 array failed within days of each
other...

Hehe. For high-reliability applications that is very good advice.
You may even want a 3- or 4- way RAID 1 (which Linux can do)
for these with 3 or 4 different disks. For ordinary use, I
would say it is enough to have a cold spare ready and rebuild
the array immediately. You can also take tha array down until
you have the spare. But don't continue to use it in degraded
state.

I also recommend running a full SMART selftest on the disks every 14
days or so and to have email or text-message alerting both on RAID and
SMART monitor events.

And RAID is no substitute for backup, of course. It just makes
the event when you need your backup less likely.

Arno
 
SATA is better because 1) it doesn't have the master/slave
issues of IDE, i.e. if you have 4 SATA ports on your
motherboard, you *can* do a 4-disk RAID-5 array and 2)
motherboards with 8 SATA ports are easy to find.

Mostly irrelevant, ether what he has can do the rail level
required, or he can buy what's needed. Also, it could be a
bad idea to rely on a RAID5 dependant on a particular
motherboard chipset, instead of an add-on card (which is far
less likely to fail)>

It has very little to do with SATA vs PATA. Certainly
master/slave assignments are no issue at all, being able to
jumper a drive is fairly trivial in the grand scheme of
things.
 
Mostly irrelevant, ether what he has can do the rail level
required, or he can buy what's needed.

But at what price ? I haven't found any cheap 4-IDE-port cards.
Also, it could be a bad idea to rely on a RAID5 dependant on a
particular motherboard chipset, instead of an add-on card
(which is far less likely to fail)>

Or not as easy to replace, anyway. But your warning applies to
hardware RAID-5 and we are discussing _software_ RAID-5. With
software RAID, whatever RAID-5 features the chipset may have
don't matter because we simply don't use them.
It has very little to do with SATA vs PATA. Certainly
master/slave assignments are no issue at all, being able to
jumper a drive is fairly trivial in the grand scheme of
things.

I was referring to another master/slave issue : that IDE devices
cannot share a port without fighting.

If you put 2 devices on the same IDE port and try to do I/O with
both, the total transfer rate is less than the transfer rate you
would get when using only one device at a time. Not a little
less ; reportedly so much less that it's unacceptable.
 
In comp.sys.ibm.pc.hardware.storage Andre Majorel
["Followup-To:" header set to comp.sys.ibm.pc.hardware.storage.]
Any suggestions of the type of boxes/racks I should use?
3ware make 3-disks-in-2-5.25"-spaces trays. They are expensive
and the fans they use die after about a year. When a fan goes
bad, the tray helpfully warns you about it by beeping loudly and
constantly. The fans are not the easiest to find (60 mm or some
such). Be prepared to hear a lot of beeping.
I made my own trays. It was a lot of work and they look ugly but
it was cheap and they do the job. The ventilation is superior to
commercial trays (120 mm fan than moves a lot of air quietly and
reliably).

There is a 4-in-3 mounting with 120mm fan made by Coolermaster.
Quite cheap, about 22 Swiss Franks, which is something like
15 USD/EUR. Ugly, but cools well and perfect for making disk-packs
that fit in standard PC cases. I have one of these in a server
running something like 3 years now and the fan is still fine.
It does fit in 3 standard 5 1/4" bays if you remove the plastic
side-rails. They are just screwed on. Mounting holes also match
any standard case. The fan should direcly blow in outside air.

Here is a link (the coolermaster website is broken both
in Opera and Firefox....). Sorry, it is German:

http://www.pcp.ch/Cooler-Master-4-in-3-Device-Modul-1a12170083.htm

It's the Cooler Master STB-3T4-E1 :

http://www1.coolermaster.com/index.php?url_place=product&p_serial=STB-3T4-E1

There's also the STB-3T4-E3-GP :

http://www1.coolermaster.com/index.php?url_place=product&p_serial=STB-3T4-E3-GP

Excellent. I paid that much for the fan alone. I wish I had
known about it one year ago. Well, maybe not.

My DIY ones hold 5 disks instead of 4. You can pop out the fan
and remove/add disks from the front. The disks are in vertical
position which will provide a little bit of natural convection
if the fan jams.

On the plus side, the STB-3T4-E3-GP is shock mounted. And it
looks like you can install it without flattening the tabs in the
PC case.

How is laid out the STB-3T4-E1 ? Do you install the disks from
the back as with the STB-3T4-E3-GP ? Are the disks secured, or
can they fall out the back if you tilt the PC backwards ?
 
[snip]
But at what price ? I haven't found any cheap 4-IDE-port cards.

But there are cheap 2 port cards, so a motherboard with 3 pci slots could
have 8 drives, all as master, or 16 drives if you use master/slave.

A dedicated storage box wouldn't need the slots for any other cards.
 
Previously Andre Majorel said:
In comp.sys.ibm.pc.hardware.storage Andre Majorel
["Followup-To:" header set to comp.sys.ibm.pc.hardware.storage.]

Any suggestions of the type of boxes/racks I should use?
3ware make 3-disks-in-2-5.25"-spaces trays. They are expensive
and the fans they use die after about a year. When a fan goes
bad, the tray helpfully warns you about it by beeping loudly and
constantly. The fans are not the easiest to find (60 mm or some
such). Be prepared to hear a lot of beeping.
I made my own trays. It was a lot of work and they look ugly but
it was cheap and they do the job. The ventilation is superior to
commercial trays (120 mm fan than moves a lot of air quietly and
reliably).

There is a 4-in-3 mounting with 120mm fan made by Coolermaster.
Quite cheap, about 22 Swiss Franks, which is something like
15 USD/EUR. Ugly, but cools well and perfect for making disk-packs
that fit in standard PC cases. I have one of these in a server
running something like 3 years now and the fan is still fine.
It does fit in 3 standard 5 1/4" bays if you remove the plastic
side-rails. They are just screwed on. Mounting holes also match
any standard case. The fan should direcly blow in outside air.

Here is a link (the coolermaster website is broken both
in Opera and Firefox....). Sorry, it is German:

http://www.pcp.ch/Cooler-Master-4-in-3-Device-Modul-1a12170083.htm
It's the Cooler Master STB-3T4-E1 :

There's also the STB-3T4-E3-GP :

Excellent. I paid that much for the fan alone. I wish I had
known about it one year ago. Well, maybe not.
My DIY ones hold 5 disks instead of 4. You can pop out the fan
and remove/add disks from the front. The disks are in vertical
position which will provide a little bit of natural convection
if the fan jams.

And building something yourself successfully is allways
a source of satisfaction!
On the plus side, the STB-3T4-E3-GP is shock mounted. And it
looks like you can install it without flattening the tabs in the
PC case.

Only if you have three consecutive slots without them.
I did not have to flatten any, since it seems server cases
do not have the tabs anyways.
How is laid out the STB-3T4-E1 ? Do you install the disks from
the back as with the STB-3T4-E3-GP ? Are the disks secured, or
can they fall out the back if you tilt the PC backwards ?

Actually both seem to be the same, i.e. disks are put in from the
backside and secured with 4 screws each. For this you need to remove
the whole units, remove the sides and then you have a 3.5" wide drive
cage in your hands with the fan at the front. The STB-3T4-E3-GP merely
seems to have a front-cover added.

Arno
 
Previously bealoid said:
But at what price ? I haven't found any cheap 4-IDE-port cards.
But there are cheap 2 port cards, so a motherboard with 3 pci slots could
have 8 drives, all as master, or 16 drives if you use master/slave.
A dedicated storage box wouldn't need the slots for any other cards.

You still can get a 4-port SATA controller for the price of a
2-port ATA controller and (at least here) SATA drives are not
more expensive. Plus the cableing is far superiour, which becomes
an issue with several drives. Also note that SATA cables can be
up to 1m long, which may actually be needed.

Arno
 
Andre Majorel said:
But at what price ? I haven't found any cheap 4-IDE-port cards.


Or not as easy to replace, anyway. But your warning applies to
hardware RAID-5 and we are discussing _software_ RAID-5.
With software RAID,
whatever RAID-5 features the chipset may have don't matter
because we simply don't use them.

Then it better not have them if they can't be shut off.
I was referring to another master/slave issue : that IDE devices
cannot share a port without fighting.

They are not fighting. You simply can't send a command to both
the drives and then let them fight for who's first with the data.
That would allow the late drive to transfer from cache, without the
extra latency, doubling singledrive speed.
Instead you send a command to one drive, wait for it to finish and
then send a command to the other one, and wait for that to finish.
You only get single drive speed and have double the average latency
as opposed to a single drive. On the other hand you get twice the
transfer length to offset the double latency.
If you put 2 devices on the same IDE port and try to do I/O with both,
the total transfer rate is less than the transfer rate you would get

That is not supported by theory. *Total* transferrate should be
the same: twice as much data at twice as much time.
Single (one) drive transfer rate though will obviously be cut in half,
but that's not what you said.
when using only one device at a time.

As if there is any other way.
Not a little less ; reportedly so much less that it's unacceptable.

Yes, if your transfer length is too small latency has a devestating
effect on the average transfer rate. The maximum transfer for a
single command (256kB) only takes ~3.5ms to transfer at 70MB/s.
Set against the total time of say 15ms that will slice your average
random read speed to 16MB/s, even for a single drive.

If you fooloshly alternate drive requests on a one command basis
then you will convert an initially sequential read in a random read.
Maybe that's the problem, though even then read ahead caching on
the drive should fix that.
 
Arno Wagner said:
In comp.sys.ibm.pc.hardware.storage Andre Majorel said:
["Followup-To:" header set to comp.sys.ibm.pc.hardware.storage.]
I need to store a really large number of texts and I (could) have a
number of ATA100 S.M.A.R.T.-compliant hard drives, which I would like
to use to somehow build a large and safe (RAID-5?) data store
~
* I will have to use standard (and commercially available (meaning
cheap ;-))) x86-based hardware and open source software
~
* AFAIK you could maximally use 4 hard drives in such boxes
On a motherboard with 2 IDE ports, you cannot make a 4-disk
RAID-5 array because doing I/O on two devices on the same IDE
port gives poor performance.

Well, you can, but expect no more than,
say, 10MB/s.

Utter nonsense, as always from the babblebot.
For writes there is no difference, for reads the commands are obviously se-
rialized which results in just single drive speed as opposed to drivespeedx2.
Combined with the 2 drives on the other channel you should still be able to
get a theoretical 1.5xsingledrive drivespeed. Which is a lot more than 10MB/s.
I added a Promise IDE PCI (Ultra 100, I believe) controller and used
one disk per IDE channel. That works fine.

[snip]
 
In comp.sys.ibm.pc.hardware.storage Andre Majorel


I strongly suggest using partitions of type 0xfd, because then the
kernel will auto-assemble the array on system start. For complete
disks you need some start-script or other, which also means you cannot
have the root-partition on the array. In additions these start-scripts
sometimes are unreliable.

That's what I did too. Partition all the disks identically then
create array 1 from /dev/sd{a,b,c,d}1, array 2 from
/dev/sd{a,b,c,d}2, etc. The root and swap are on the first two
arrays, which are RAID-1 so as to eliminate the need for initrd.
The data is on RAID-5 arrays.

Another reason for preferring arrays of partitions to partitions
of arrays is that, when mdadm detects an error anywhere in any
component device, the whole array is marked as degraded. It
takes a long time to resync a 750 GB array.

Also, when you add a disk to an array of disks, you are
dependent on mdadm's --grow feature. Whereas with arrays of
partitions, you have the option of copying the data from array 5
to array 6, stopping and re-creating array 5 and copying the
data from array 6 back to array 5. And so on until all you have
grown all the arrays. Primitive and slow but robust.
Hehe. For high-reliability applications that is very good advice.
You may even want a 3- or 4- way RAID 1 (which Linux can do)
for these with 3 or 4 different disks.

Definitely. People tend to assume they need identical disks.
They don't. I have a 4-disk RAID-5 array made of disks from 4
different manufacturers. There might be a cost in terms of
performance but when you get 125 MB/s on reads, you can't
complain too much. <g>

# time dd bs=10M count=1000 if=/dev/md105 of=/dev/null
1000+0 records in
1000+0 records out

real 1m18.976s
user 0m0.000s
sys 0m20.897s

Disk sizes are standardised nowadays. For Hitachi, Samsung,
Seagate and Western Digital, 200 GB = 24321 x 255 x 63 , 250 GB =
30401 x 255 x 63, 500 GB = 60801 x 255 x 63... Only Maxtor are
a little bit bigger. Partition them all to the figures above and
be done with it.
For ordinary use, I would say it is enough to have a cold
spare ready and rebuild the array immediately. You can also
take tha array down until you have the spare. But don't
continue to use it in degraded state.

The hapless sysadmin said he didn't get mdadm's alert email. No
idea whether that is a bug in mdadm or an error on his part.
 
Back
Top