A few questions before assembling Linux 7.5TB RAID 5 array

  • Thread starter Thread starter Yeechang Lee
  • Start date Start date
Y

Yeechang Lee

I'm shortly going to be setting up a Linux software RAID 5 array using
16 500GB SATA drives with one HighPoint RocketRAID 2240 PCI-X
controller (i.e., the controller will be used for its 16 SATA ports,
not its "hardware" fakeraid). The array will be used to store and
serve locally and via gigabit Ethernet large, mostly high-definition
video recordings (up to six or eight files being written to and/or
read from simultaneously, as I envision it). The smallest files will
be 175MB-700MB, the largest will be 25GB+, and most files will be from
4GB to 12GB with a median of about 7.5GB. I plan on using JFS as the
filesystem, without LVM.

A few performance-related questions:

* What chunk size should I use? In previous RAID 5 arrays I've built
for similar purposes I've used 512K. For the setup I'm describing,
should I go bigger? Smaller?
* Should I stick with the default of 0.4% of the array as given over
to the JFS journal? If I can safely go smaller without a
rebuilding-performance penalty, I'd like to. Conversely, if a larger
journal is recommended, I can do that.
* I'm wondering whether I should have ordered two RocketRAID 2220
(each with eight SATA ports) instead of the 2240. Would two cards,
each in a PCI-X slot, perform better? I'll be using the Supermicro
X7DVL-E
(<URL:http://www.supermicro.com/products/motherboard/Xeon1333/5000V/X7DVL-E.cfm>)
as the motherboard.
 
I'm shortly going to be setting up a Linux software RAID 5 array using
16 500GB SATA drives with one HighPoint RocketRAID 2240 PCI-X
controller (i.e., the controller will be used for its 16 SATA ports,
not its "hardware" fakeraid). The array will be used to store and
serve locally and via gigabit Ethernet large, mostly high-definition
video recordings (up to six or eight files being written to and/or
read from simultaneously, as I envision it). The smallest files will
be 175MB-700MB, the largest will be 25GB+, and most files will be from
4GB to 12GB with a median of about 7.5GB. I plan on using JFS as the
filesystem, without LVM.

A few performance-related questions:

* What chunk size should I use? In previous RAID 5 arrays I've built
for similar purposes I've used 512K. For the setup I'm describing,
should I go bigger? Smaller?
* Should I stick with the default of 0.4% of the array as given over
to the JFS journal? If I can safely go smaller without a
rebuilding-performance penalty, I'd like to. Conversely, if a larger
journal is recommended, I can do that.
* I'm wondering whether I should have ordered two RocketRAID 2220
(each with eight SATA ports) instead of the 2240. Would two cards,
each in a PCI-X slot, perform better? I'll be using the Supermicro
X7DVL-E
(<URL:http://www.supermicro.com/products/motherboard/Xeon1333/5000V/X7DVL-E.cfm>)
as the motherboard.

For a system that large wouldn't you be better off with a 3Ware controller
which is a real RAID controller rather that the Highpoints which aren't?
 
* I'm wondering whether I should have ordered two RocketRAID 2220
(each with eight SATA ports) instead of the 2240. Would two cards,
each in a PCI-X slot, perform better? I'll be using the Supermicro
X7DVL-E

I wouldn't think so. Unless the PCI-X slots you intend to use are on
separate busses (not likely), the two cards will contend for the same amount
of bandwidth. Whether data for the drives gets funnelled through one slot
or two shouldn't make a difference.

With PCI Express, each slot gets its own dedicated chunk of bandwidth to
the northbridge. The motherboard you're considering has a couple of PCI-E
slots (one with 8 lanes and another with 4 lanes). Since you were already
looking at HighPoint controllers, a couple of RocketRAID 2320s might've been
the better way to go (as long as you weren't planning on using those slots
for something else).

_/_
/ v \ Scott Alfter (remove the obvious to send mail)
(IIGS( http://alfter.us/ Top-posting!
\_^_/ rm -rf /bin/laden >What's the most annoying thing on Usenet?
 
Yeechang said:
A few performance-related questions:

* What chunk size should I use? In previous RAID 5 arrays I've built
for similar purposes I've used 512K. For the setup I'm describing,
should I go bigger? Smaller?

It is best to try a number of different configurations and benchmark
each one to see how it works with your needs. For my needs I've mainly
used 64 KB stripes because it gave better performance than 128 or
higher. Make sure you match the file system chunk size to the RAID
stripe size too.
* Should I stick with the default of 0.4% of the array as given over
to the JFS journal? If I can safely go smaller without a
rebuilding-performance penalty, I'd like to. Conversely, if a larger
journal is recommended, I can do that.

I'd probably just keep it at the defaults.
* I'm wondering whether I should have ordered two RocketRAID 2220
(each with eight SATA ports) instead of the 2240. Would two cards,
each in a PCI-X slot, perform better? I'll be using the Supermicro
X7DVL-E
(<URL:http://www.supermicro.com/products/motherboard/Xeon1333/5000V/X7DVL-E.cfm>)
as the motherboard.

As Scott mentioned, since it looks like this MB has both slots sharing
the PCI-X bus it probably wouldn't help to separate them out unless the
architecture of the card has limitations. Even if they were separate
buses though I don't think it would help you to have two cards since the
bandwidth of the bus exceeds the needs of the drives.

For this many SATA drives I would hope that you are going with RAID6 and
a hot-spare.

Steve
 
Steve said:
It is best to try a number of different configurations and benchmark
each one to see how it works with your needs. For my needs I've mainly
used 64 KB stripes because it gave better performance than 128 or
higher.

I figured as much, but was hoping that someone else would say "Hey, in
my experience ___KB chunks are best for your situation, and I'd raise
the chunk size ___KB for every terabyte bigger." I guess there's just
no way around manually building and rebuilding the array a few times,
unless the performance with each chunk-size step relative to with
others is the same as while the array is still dirty and being built
up for the first time and once the array is finished.
Make sure you match the file system chunk size to the RAID stripe
size too.

I don't think this is an issue with JFS; that is, mkfs.jfs doesn't
offer any such options in the first place.
For this many SATA drives I would hope that you are going with RAID6
and a hot-spare.

Undecided. While the recordings would be inconvenient to lose, it
would not be life-or-death. I suspect I'll end up doing RAID 6 but no
hot spare.

In my previous such array (see below) I went to the trouble of buying
an extra drive for cold swap which, naturally, hasn't ever been
needed. Given the enterprise-class Western Digital drives I'm using
this time I shouldn't have any trouble hunting down an exact spare or
two in three or five years' time; worst comes to worst I'd just buy a
750GB for whatever ridiculously-low price they sell for then and just
not use the extra space in the array.
 
I'm shortly going to be setting up a Linux software RAID 5 array using
16 500GB SATA drives with one HighPoint RocketRAID 2240 PCI-X
controller (i.e., the controller will be used for its 16 SATA ports,
not its "hardware" fakeraid).
[snip]

What kind of enclosure/cabinet do you use for this setup? Does it have
hot-swap drive bays?
 
I figured as much, but was hoping that someone else would say "Hey, in
my experience ___KB chunks are best for your situation, and I'd raise
the chunk size ___KB for every terabyte bigger." I guess there's just
no way around manually building and rebuilding the array a few times,
unless the performance with each chunk-size step relative to with
others is the same as while the array is still dirty and being built
up for the first time and once the array is finished.
Sadly, no as usage and typical filesize play a large part in it and no
two arrays are going to be used the same.
 
Yeechang said:
I'm shortly going to be setting up a Linux software RAID 5 array using
16 500GB SATA drives with one HighPoint RocketRAID 2240 PCI-X
controller (i.e., the controller will be used for its 16 SATA ports,
not its "hardware" fakeraid).

How long are you expecting a rebuild to take in the event of a
disk failure? You may well be better off creating a bunch of smaller
5 disk RAID5 arrays rather than one big one.

An aside - we've just taken delivery of an EMC CX300 storage system.
We've configured a RAID 5 array with 15 146GB Fibre channel disks and
a hot spare. We've just pulled one of the disks from the array and
are watching the rebuild take place. I'll let you know how long it
takes!

Guy
-- --------------------------------------------------------------------
Guy Dawson I.T. Manager Crossflight Ltd
(e-mail address removed)
 
Yeechang said:
Steve Cousins wrote:



I don't think this is an issue with JFS; that is, mkfs.jfs doesn't
offer any such options in the first place.

OK. I've never used JFS. XFS has worked really well for us. One nice
thing when testing different configurations is that the file system
creates very quickly. mkfs.xfs also can figure out the chunk size
automatically if you use Linux software RAID. If you do go with RAID6
and a hot spare though make sure you use a very new version of the xfs
tools because I found a bug with it not using the correct chunk size.
The hot spare was throwing it off. They fixed it for me and I believe
it is in the latest version.

Another thing that I ran into is that if you ever want to do a xfs_check
on a volume this big it takes a lot of memory and/or swap space. On a 5
TB RAID array it was always crashing. I have 3 GB of RAM on that
machine and it wasn't enough. I ended up adding a 20 GB swap file to
the 3 GB swap partition and that allowed xfs_check to work. I don't
know if JFS has the same memory needs but it is worth checking out
before you need to run it for real.

Good luck and Happy Holidays,

Steve
 
Steve said:
OK. I've never used JFS. XFS has worked really well for us. One nice
thing when testing different configurations is that the file system
creates very quickly.

mkfs.jfs also works very quickly, as well. What takes a long time--and
is of course filesystem-independent--is the RAID-creation
process. Benchmarking multiple chunk sizes is going to be extremely
time-consuming, alas.
Another thing that I ran into is that if you ever want to do a
xfs_check on a volume this big it takes a lot of memory and/or swap
space.

I appreciate the suggestion. The box will ony have 2GB of RAM; it
doesn't need any more for my purposes, but I'll be sure to give it
lots of swap.
 
Guy said:
How long are you expecting a rebuild to take in the event of a
disk failure? You may well be better off creating a bunch of smaller
5 disk RAID5 arrays rather than one big one.

An aside - we've just taken delivery of an EMC CX300 storage system.
We've configured a RAID 5 array with 15 146GB Fibre channel disks and
a hot spare. We've just pulled one of the disks from the array and
are watching the rebuild take place. I'll let you know how long it
takes!

Well, the data was in long ago but then I went on holiday.

After pulling one 146GB disk from the 14 disk RAID 5 array it took the
CX300 35 mins to bring the hot spare in to the array.

When I replaced the pulled drive the CX300 took 10 mins to rebuild the
array so that the hot spare was spare again.

Guy
-- --------------------------------------------------------------------
Guy Dawson I.T. Manager Crossflight Ltd
(e-mail address removed)
 
Guy said:
Well, the data was in long ago but then I went on holiday.

After pulling one 146GB disk from the 14 disk RAID 5 array it took the
CX300 35 mins to bring the hot spare in to the array.

When I replaced the pulled drive the CX300 took 10 mins to rebuild the
array so that the hot spare was spare again.

35 minutes sounds way too short to me. We have Clariions with 5-disk
RAID groups of 146GB drives and they take longer than that. Clariion
arrays do RAID rebuilds based on LUNs, so for example if you only had a
100GB LUN bound in that RAID group that's all you rebuilt. The array
knows not to bother to rebuild dead space where no LUNs are bound. If
you had bound LUNs to fill that whole 14-disk RAID 5 array (~2TB) I
suspect your rebuild would take considerably longer.
 
Jon said:
35 minutes sounds way too short to me. We have Clariions with 5-disk
RAID groups of 146GB drives and they take longer than that. Clariion
arrays do RAID rebuilds based on LUNs, so for example if you only had a
100GB LUN bound in that RAID group that's all you rebuilt. The array
knows not to bother to rebuild dead space where no LUNs are bound. If
you had bound LUNs to fill that whole 14-disk RAID 5 array (~2TB) I
suspect your rebuild would take considerably longer.

Ah. That does indeed change things. We had a 10GB LUN and a 250GB LUN
on the 14 disk array at the time of the test.

Guy
-- --------------------------------------------------------------------
Guy Dawson I.T. Manager Crossflight Ltd
(e-mail address removed)
 
Back
Top