RAID 1 can read as fast as RAID 0

  • Thread starter Thread starter Peter Olcott
  • Start date Start date
P

Peter Olcott

http://en.wikipedia.org/wiki/RAID_1#RAID_1
It is possible to make a hard-drive subsystem read as
quickly as a RAID 0 and yet have the much higher reliability
of RAID 1.
The above link mentions this. All that is needed is to
divide up the data access across all of the drives, reading
a portion of the file from each drive. All of the data is
redundantly stored on every drive.

The only case that I am looking at (because of specific need
of my application) is maximizing the sustained throughput
for reading files larger than 500 MB. I saw an Adaptec RAID
controller tonight that could handle 12 drives
simultaneously at their full speed. If there were a total of
24 drives on the system, then only two cards would be needed
to get my required 2400 MB per second sustained transfer
rate. These cards were PCI Express 8x.

Assuming that these cards work like the above Wikipedia
article indicates, it would seem that the last step is to
find the best way to combine the output from these two
controller cards. I would guess that the typical
recommendation would be to get another controller card used
to create a RAID 0 from these two RAID 1 subsystems.
 
Umm, what??

AFAIK, you can't just toss a 3rd card in and have that do a
raid 0 of two existing raid1 arrays.  The closest to your
goal would probably be RAID 1+0 then 0+1.  The cards you're
looking at can probably do one if not both, though I've no
idea if it can build one array with either of these across
the drives on two separate cards (I'm doubting many if any
can), though in theory you might be able to do this at a
software level in the OS, do the hardware RAID0 per each
controller's drives, then mirror the two arrays with the OS.

I'm doubting that you really need the incredibly high
throughput you're suggesting.  Sure, that'd be nice, who
wouldn't like more performance, but somehow the rest of the
world survives with less.- Hide quoted text -

- Show quoted text -

What I had in mind was some combination of RAID 1 and RAID 0
overlaying the base RAID 1 configuration. A key here is whether or not
the RAID 1 controller can allow for much faster read speed, (like the
link above indicates). If this is the case then a software RAID system
based on simply reading half of the file from each of the two RAID 1
arrays may be the best approach, since the controller cards themselves
cost $700 each.

I need to have very fast read only performance to create a virtual
memory system for my deterministic finite automaton based technology.
What I really need is 1.0 TB of RAM, the virtual memory system is the
next best thing. It should be able to read at close to RAM speeds:
2400 MB per second.
 
PeteOlcott said:
What I had in mind was some combination of RAID 1 and RAID 0
overlaying the base RAID 1 configuration. A key here is whether or not
the RAID 1 controller can allow for much faster read speed, (like the
link above indicates). If this is the case then a software RAID system
based on simply reading half of the file from each of the two RAID 1
arrays may be the best approach, since the controller cards themselves
cost $700 each.

I need to have very fast read only performance to create a virtual
memory system for my deterministic finite automaton based technology.
What I really need is 1.0 TB of RAM, the virtual memory system is the
next best thing. It should be able to read at close to RAM speeds:
2400 MB per second.

You might as well read at RAM speeds, since figuring out the actual
limitation on RAID controllers is going to be difficult. This
server product can hold 32 FBDIMMs, so is capable of 256GB total
at the moment.

http://download.intel.com/support/motherboards/server/s7000fc4ur/sb/s7000fc4ur_tps1_0.pdf

http://www.intel.com/Assets/Image/prodlarge/s7000fc4ur_lg.jpg

The memory fits on "riser cards", so four cards plug into the
two slots on either side of the motherboard. Each riser card
holds 8 FBDIMMs, in a confined space (presumably for forced
cooling). That may limit the usage of oversized FBDIMMs as
an expansion opportunity. So you'd need lower profile
modules to do the build with.

Some nicer pictures here.

http://www.intel.com/Assets/PDF/general/4-Processor-S7000FC4UR-Config_Guide.pdf

16 of these kits, at $3550 each, would give a total of 256GB memory.
With accessories, you can do it for less than $100K, easily :-)

http://www.valueram.com/datasheets/KVR667D2D4F5K2_16G.pdf

It is possible a motherboard with the slots right on the motherboard,
could hold more RAM, if taller FBDIMMs were used. But the price
of such modules might also be a consideration. There are some
16GB modules on the horizon (Elpida and one other company).
I'm not sure about the allowed height of the module for
the Intel riser cards.

http://www.elpida.com/en/news/2008/08-05.html

To make this question more fun, it might be nice to know what
your total budget is for a solution. It could be, that virtually
any solution is out of your budget range.

Paul
 
Somewhere on teh intarweb "Paul" typed:
You might as well read at RAM speeds, since figuring out the actual
limitation on RAID controllers is going to be difficult. This
server product can hold 32 FBDIMMs, so is capable of 256GB total
at the moment.

http://download.intel.com/support/motherboards/server/s7000fc4ur/sb/s7000fc4ur_tps1_0.pdf

http://www.intel.com/Assets/Image/prodlarge/s7000fc4ur_lg.jpg

The memory fits on "riser cards", so four cards plug into the
two slots on either side of the motherboard. Each riser card
holds 8 FBDIMMs, in a confined space (presumably for forced
cooling). That may limit the usage of oversized FBDIMMs as
an expansion opportunity. So you'd need lower profile
modules to do the build with.

Some nicer pictures here.

http://www.intel.com/Assets/PDF/general/4-Processor-S7000FC4UR-Config_Guide.pdf

Geez, thanks Paul! Until I followed those links I was quite happy with my
computer!
--
Shaun.

DISCLAIMER: If you find a posting or message from me
offensive, inappropriate, or disruptive, please ignore it.
If you don't know how to ignore a posting, complain to
me and I will be only too happy to demonstrate... ;-)
 
~misfit~ said:
Geez, thanks Paul! Until I followed those links I was quite happy with my
computer!

One of the side benefits, is with that box in your room,
you'll never be cold in the winter.

Paul
 
Paul said:
You might as well read at RAM speeds, since figuring out
the actual
limitation on RAID controllers is going to be difficult.
This
server product can hold 32 FBDIMMs, so is capable of 256GB
total
at the moment.

http://download.intel.com/support/motherboards/server/s7000fc4ur/sb/s7000fc4ur_tps1_0.pdf

http://www.intel.com/Assets/Image/prodlarge/s7000fc4ur_lg.jpg

The memory fits on "riser cards", so four cards plug into
the
two slots on either side of the motherboard. Each riser
card
holds 8 FBDIMMs, in a confined space (presumably for
forced
cooling). That may limit the usage of oversized FBDIMMs as
an expansion opportunity. So you'd need lower profile
modules to do the build with.

Some nicer pictures here.

http://www.intel.com/Assets/PDF/general/4-Processor-S7000FC4UR-Config_Guide.pdf

16 of these kits, at $3550 each, would give a total of
256GB memory.
With accessories, you can do it for less than $100K,
easily :-)

http://www.valueram.com/datasheets/KVR667D2D4F5K2_16G.pdf

It is possible a motherboard with the slots right on the
motherboard,
could hold more RAM, if taller FBDIMMs were used. But the
price
of such modules might also be a consideration. There are
some
16GB modules on the horizon (Elpida and one other
company).
I'm not sure about the allowed height of the module for
the Intel riser cards.

http://www.elpida.com/en/news/2008/08-05.html

To make this question more fun, it might be nice to know
what
your total budget is for a solution. It could be, that
virtually
any solution is out of your budget range.

Paul

It is not my budget. It is for a device that will do
automated regression testing. It looks like such a device
can be constructed with 1.0 TB of space, 2400 MB per second
read speeds for less than $10,000. I am looking for a
minimum cost solution with 1.0 TB of space, 2400 MB per
second sustained throughput (read speed). The speed of
writing can be as slow as that of a single disk drive.

There is no sense paying $100K to achieve what can be
accomplished for $10K.
http://www.nextlevelhardware.com/storage/barracuda/
This drive will sustain 100 MB per second for about $140. A
$700 Adaptec SAS controller will sustain the maximum
throughput of twelve drives. If their RAID 1 does split
seeks to speed up read speed, then I only need 24 drives and
two cards. 3360 + 1400 = $4760.
 
Paul said:
PeteOlcott wrote:
.... snip ...


You might as well read at RAM speeds, since figuring out the
actual limitation on RAID controllers is going to be difficult.
This server product can hold 32 FBDIMMs, so is capable of 256GB
total at the moment.

All you need is a disk buffer of the same size as the disk
capacity. This also makes the caching software decisions extremely
simple.

Don't take this suggestion too seriously.
 
CBFalconer said:
All you need is a disk buffer of the same size as the disk
capacity. This also makes the caching software decisions
extremely
simple.

Don't take this suggestion too seriously.

http://storageadvisors.adaptec.com/2007/03/20/bottlenecks-from-disk-to-backbone/

I already have a solution, all that I need to do is to find
a way to physically implement it.
http://www.nextlevelhardware.com/storage/barracuda/

All that I have to do is to find the best way to hook 24 of
these drives up to a single workstation. All of the data
will be mirrored across all of these drives, and every (>
100MB) read will be split up accoss all of the drives. I
can do the split reads and mirrored writes myself if I have
to, from drive letters C: through Z:
 
Peter said:
It is not my budget. It is for a device that will do
automated regression testing. It looks like such a device
can be constructed with 1.0 TB of space, 2400 MB per second
read speeds for less than $10,000. I am looking for a
minimum cost solution with 1.0 TB of space, 2400 MB per
second sustained throughput (read speed). The speed of
writing can be as slow as that of a single disk drive.

There is no sense paying $100K to achieve what can be
accomplished for $10K.
http://www.nextlevelhardware.com/storage/barracuda/
This drive will sustain 100 MB per second for about $140. A
$700 Adaptec SAS controller will sustain the maximum
throughput of twelve drives. If their RAID 1 does split
seeks to speed up read speed, then I only need 24 drives and
two cards. 3360 + 1400 = $4760.

In this plot, the 100MB/sec sustained, is available to about
half the capacity of the disk. It is getting close to 50MB/sec
near the end of the disk. Whatever scheme you use, you would not
want to be accessing the second half of the disk.

http://www.nextlevelhardware.com/storage/barracuda/hdtachoverlay.jpg

I think my main problems with your plan are -

1) Are you certain that the whole storage subsystem can operate
as you describe ? The RAID1 reading different sectors (in a
RAID0-like way). And then being able to somehow combine the
bandwidths of the two arrays. Usually, RAID1 uses just two
entities (arrays or individual disks) for redundancy. I would
not expect to see 24 disks arranged in RAID1 (from a commercial
offering). There is nothing stopping you from writing your
own software.

2) Do you have a benchmark that shows the system operating as in
(1) and achieving the results ? If not, there is an element
of risk to the project.

I went to the Adaptec site and had a look at their 5 series
controllers, and they have a single statement that 1.2GB/sec
transfer rate can be achieved. Their white paper concentrates
on IOP rate (important for servers), but they don't give any
info for RAID0 operation. Perhaps you could contact
Adaptec and ask them for a RAID0 benchmark, with as many
disks as they can connect to the thing. Just to see if
it is capped at 1.2GB/sec, or it can deliver more.
RAID0 is probably not something they get a lot of
questions about.

There is a quotation here, of an older Areca card (PCI-X interface,
not PCI Express), getting 1200MB/sec from 16 drives, but without
any other details about it. The problem with PCI-X, is finding
a server motherboard with multiple, good, PCI-X busses on it
(so you can combine the bandwidth of more than one controller).

http://www.avsforum.com/avs-vb/showthread.php?t=1045086&page=5

I guess my concern, is the degree of risk of not making the
target of 2400MB/sec, due to the need to use more than one
controller, and somehow combine them. Also, as the bandwidth
goes up, the issue of "zero copy" comes into play. If any
of the software is layered, and the software copies the data
from one part of memory to another, then that will diminish
the chances of getting a high effective transfer rate.

What I was trying to do, is find a controller that doesn't
have an IOP on it, making each disk appear individually
to the OS, and then rely on an OS level RAID0 scheme to
achieve the bandwidth target. My thinking was, by removing
the IOP, all the actual data transfer is done via DMA, and
then the only issue is how much overhead would be needed
to manage it. That should be pretty small, if say, a
command to each disk, could read a couple megabytes per
command execution. (I think there is a limit to the
size of command you can give to a drive at one time.)

But whether we're using your scheme or mine, in both
cases there is no proof they'll hit 2400MB/sec. We
can see some of the essential elements are there, but
not know whether the whole thing holds together.

Paul
 
Paul said:
In this plot, the 100MB/sec sustained, is available to
about
half the capacity of the disk. It is getting close to
50MB/sec
near the end of the disk. Whatever scheme you use, you
would not
want to be accessing the second half of the disk.

http://www.nextlevelhardware.com/storage/barracuda/hdtachoverlay.jpg

I think my main problems with your plan are -

1) Are you certain that the whole storage subsystem can
operate
as you describe ? The RAID1 reading different sectors
(in a
RAID0-like way). And then being able to somehow combine
the
bandwidths of the two arrays. Usually, RAID1 uses just
two
entities (arrays or individual disks) for redundancy. I
would
not expect to see 24 disks arranged in RAID1 (from a
commercial
offering). There is nothing stopping you from writing
your
own software.

2) Do you have a benchmark that shows the system operating
as in
(1) and achieving the results ? If not, there is an
element
of risk to the project.

I went to the Adaptec site and had a look at their 5
series
controllers, and they have a single statement that
1.2GB/sec
transfer rate can be achieved. Their white paper
concentrates
on IOP rate (important for servers), but they don't give
any
info for RAID0 operation. Perhaps you could contact
Adaptec and ask them for a RAID0 benchmark, with as many
disks as they can connect to the thing. Just to see if
it is capped at 1.2GB/sec, or it can deliver more.
RAID0 is probably not something they get a lot of
questions about.

There is a quotation here, of an older Areca card (PCI-X
interface,
not PCI Express), getting 1200MB/sec from 16 drives, but
without
any other details about it. The problem with PCI-X, is
finding
a server motherboard with multiple, good, PCI-X busses on
it
(so you can combine the bandwidth of more than one
controller).

http://www.avsforum.com/avs-vb/showthread.php?t=1045086&page=5

I guess my concern, is the degree of risk of not making
the
target of 2400MB/sec, due to the need to use more than one
controller, and somehow combine them. Also, as the
bandwidth
goes up, the issue of "zero copy" comes into play. If any
of the software is layered, and the software copies the
data
from one part of memory to another, then that will
diminish
the chances of getting a high effective transfer rate.

What I was trying to do, is find a controller that doesn't
have an IOP on it, making each disk appear individually
to the OS, and then rely on an OS level RAID0 scheme to
achieve the bandwidth target. My thinking was, by removing
the IOP, all the actual data transfer is done via DMA, and
then the only issue is how much overhead would be needed
to manage it. That should be pretty small, if say, a
command to each disk, could read a couple megabytes per
command execution. (I think there is a limit to the
size of command you can give to a drive at one time.)

But whether we're using your scheme or mine, in both
cases there is no proof they'll hit 2400MB/sec. We
can see some of the essential elements are there, but
not know whether the whole thing holds together.

Paul

The ideal case might be to connect 24 controllers to 24
drives to 24 slots, except there never is more than six
slots, and at least two of these always seem to be too slow.
If this case was possible making it into the a 24-fold
mirrored RAID system with 24 fold the read speed of a single
drive would be easy. How do we even solve this single aspect
of the problem?
 
Somewhere on teh intarweb "Paul" typed:
One of the side benefits, is with that box in your room,
you'll never be cold in the winter.

LOL, for sure. You'd probably go deaf too.
--
Shaun.

DISCLAIMER: If you find a posting or message from me
offensive, inappropriate, or disruptive, please ignore it.
If you don't know how to ignore a posting, complain to
me and I will be only too happy to demonstrate... ;-)
 
Back
Top