Constructing a disk system with RAM read speed and RAID 1 reliability

  • Thread starter Thread starter Peter Olcott
  • Start date Start date
P

Peter Olcott

I have an idea to construct a disk array that can provide
2400 MB / second read performance and provide the
reliability of RAID 1 mirroring. All that I need to know is
how to directly hook up 24 drives to a single workstation.

I have a need for a system that can read 1.6 GB files
directly into memory in one second or less. What I need is a
sort of virtual memory system with a variable page size of
at least 500MB. I only need to be able to read these files
quickly, writing them can be at single disk write speed.

If I can connect 24 drives up to a single workstation, I can
read 1/24th of the file sized increments into a single
memory buffer at 1/24th of the file sized offsets. This
would only require a single seek per drive. If these drives
each provide 100 MB per second sustained read performance,
then the drives can read at about the maximum speed that RAM
can be written to.

The missing piece of this plan is knowing the best hardware
combination to use, and whether or not any existing hardware
combination will meet these requirements.
 
I have an idea to construct a disk array that can provide 2400 MB /
second read performance and provide the reliability of RAID 1 mirroring.
All that I need to know is how to directly hook up 24 drives to a single
workstation.

I have a need for a system that can read 1.6 GB files directly into
memory in one second or less. What I need is a sort of virtual memory
system with a variable page size of at least 500MB. I only need to be
able to read these files quickly, writing them can be at single disk
write speed.

If I can connect 24 drives up to a single workstation, I can read 1/24th
of the file sized increments into a single memory buffer at 1/24th of
the file sized offsets. This would only require a single seek per drive.
If these drives each provide 100 MB per second sustained read
performance, then the drives can read at about the maximum speed that
RAM can be written to.

The missing piece of this plan is knowing the best hardware combination
to use, and whether or not any existing hardware combination will meet
these requirements.

Have you considered solid-state disks?
 
Peter said:
I am guessing that my solution (at least for read only) will
beat their performance at a tiny fraction of their cost.
http://www.violin-memory.com/assets/techbrief_gen1.pdf

The best commodity device (in terms of pricing), is the Gigabyte
iRAM with SATA interface.

http://www.anandtech.com/storage/showdoc.aspx?i=2480

It was available in several versions, but as far as I know,
the one that gets power (but not digital signals) from a PCI
slot, is the one most often "seen in the wild".

http://en.wikipedia.org/wiki/I-RAM

http://www.dailytech.com/article.aspx?newsid=7563

An experimenter on 2cpu.com tested the PCI powered version on an Areca
card, and found that proper RAID cards (like an Areca), didn't like the
iRAM because it doesn't emulate enough of a SATA disk.
The iRAM does work with things like the RAID interface on a
Southbridge chip. That limits the number of iRAMs that could
be used, to six or so. The iRAM interface is SATA 150MB/sec,
and the actual transfer rate is lower than that.

(A sample thread. Areca 1160 has a status of POS or "piece of shit",
when used with the iRAM.)

http://forums.2cpu.com/archive/index.php/t-77526.html

Otherwise, you'd be getting RAM based storage, for the price
of DDR RAM. The last 1GB DDR I bought (good stuff), cost
$35 and you can get them for less than that.

This is another example of a product based on RAM. I think
the prices are without RAM installed, but I could be wrong.

http://www.hyperossystems.co.uk/07042003/hardware.htm#hyperosHDIIproduct

Dan answers the question, from Apr 2008.

http://www.dansdata.com/askdan00025.htm

Paul
 
Paul said:
The best commodity device (in terms of pricing), is the
Gigabyte
iRAM with SATA interface.

http://www.anandtech.com/storage/showdoc.aspx?i=2480

It was available in several versions, but as far as I
know,
the one that gets power (but not digital signals) from a
PCI
slot, is the one most often "seen in the wild".

http://en.wikipedia.org/wiki/I-RAM

http://www.dailytech.com/article.aspx?newsid=7563

An experimenter on 2cpu.com tested the PCI powered version
on an Areca
card, and found that proper RAID cards (like an Areca),
didn't like the
iRAM because it doesn't emulate enough of a SATA disk.
The iRAM does work with things like the RAID interface on
a
Southbridge chip. That limits the number of iRAMs that
could
be used, to six or so. The iRAM interface is SATA
150MB/sec,
and the actual transfer rate is lower than that.

(A sample thread. Areca 1160 has a status of POS or "piece
of shit",
when used with the iRAM.)

http://forums.2cpu.com/archive/index.php/t-77526.html

Otherwise, you'd be getting RAM based storage, for the
price
of DDR RAM. The last 1GB DDR I bought (good stuff), cost
$35 and you can get them for less than that.

This is another example of a product based on RAM. I think
the prices are without RAM installed, but I could be
wrong.

http://www.hyperossystems.co.uk/07042003/hardware.htm#hyperosHDIIproduct

Dan answers the question, from Apr 2008.

http://www.dansdata.com/askdan00025.htm

Paul

I asked Intel if they have any boards that will meet my
requirements.
 
Peter said:
I asked Intel if they have any boards that will meet my
requirements.

Do you mean a motherboard like the Skulltrail ?

http://www.intel.com/products/desktop/motherboards/D5400XS/D5400XS-overview.htm

http://downloadcenter.intel.com/Product_Filter.aspx?ProductID=2864

You can also browse through here.

http://www.intel.com/products/motherboard/index.htm?iid=subhdr+prod_boards

*******
There is another possibility here, only this uses an Nvidia chipset.

MSI P7N Diamond. Four large PCI Express slots. Hard to find a
block diagram. Claims 16,16,16,8 for lane wiring. The first two
16 are done with Nforce200 switch. Leaving the 16,8 to be done
by the Southbridge, as I don't see any other chips for the
job. It is hard for me to believe there are 24 lanes on one chip,
which is why I'd prefer to find a block diagram.

http://www.newegg.com/Product/Product.aspx?Item=N82E16813130158

The basic premise behind the 780i is here. Even though the diagram
is labeled 780i for both chips, it is actually 780i and 570i.

http://www.anandtech.com/showdoc.aspx?i=3180&p=2

The 570 is listed as 16,8 here. So it does look like the MSI board
would give you at least x8 performance on each of four slots, but
in an ATX form factor 12"x9.6" motherboard.

http://www.nvidia.com/page/nforce5_specs_amd.html

More details on the board here. Check the CPU support chart, before
buying a processor.

http://global.msi.com.tw/index.php?func=proddesc&prod_no=1372&maincat_no=1

Paul
 
The key question is whether or not any of these alternatives
is feasible. It would seem that the key aspect of this key
question would be whether or not any of these boards can
provide enough simultaneous bandwidth from their expansion
slots. I really need at least 400 MB per second
simultaneously from each of six slots. That should
(hopefully) provide my required 1600 MB per second, even
from the slow part of the drive.
 
Peter said:
The key question is whether or not any of these alternatives
is feasible. It would seem that the key aspect of this key
question would be whether or not any of these boards can
provide enough simultaneous bandwidth from their expansion
slots. I really need at least 400 MB per second
simultaneously from each of six slots. That should
(hopefully) provide my required 1600 MB per second, even
from the slow part of the drive.

P7N Diamond

+------------------------------- x1
| +---------------------------- x1
| |
+----------+ ??? +-----------+
12.8GB/sec ___/ DDR2-800 ---| 780i |-------| Nforce200 |---- x16 3.6Gb/sec
Two sticks \ DDR2-800 ---| | | Switch |---- x16 3.6GB/sec
+----------+ +-----------+
|
| Hypertransport
| Likely 4GB/sec, a *guess*
| Enough for x16 to spread around
|
+-----------+
| 570i |----------------------- x16 2GB/sec
| |----------------------- x8 2GB/sec
| |------- (PCI)
| |------- (SATA)
| |------- four x1, for onboard usage.
+-----------+

First of all, the output of Nforce200 is (2) x16 PCI Express revision 2.0,
which would be a total of 16GB/sec. The input to the Nforce200 cannot
sustain that. And in any case, your storage cards are going to be
running with revision 1.0 speeds, so when using storage cards, the
max bandwidth you could pull from the two slots, would be 8GB/sec total.

It could be that the input bus to Nforce200, is 16 lanes at 4.5GT/sec.
A "normal" PCI Express lane runs at 2.5GT/sec. So there is 4GB/sec * (4.5/2.5)
or 7.2GB/sec feeding into the Nforce200. That means you could have 3.6GB/sec
on each of the "x16" outputs. This is still plenty, with respect to your
requirement of 400MB/sec from each.

The Hypertransport leading to the 570i, could be a 4GB/sec one. This
is a guess based on the fact that the chipset is advertised as a
"3x16" platform. So the x8 and x16 likely share 4GB/sec of bandwidth,
making two solid x8 slots in practice. That would be 2GB/sec per slot.
Activities from other 570i interfaces, would cut into the bandwidth
slightly, such as a burst from the SATA ports. Maybe if you set up a
SATA four drive RAID0, a burst from that would provide the most
competition with the other slots. That still leaves enough bandwidth
to have more than 400MB/sec on the PCI Express slots.

So while there is some detail missing in the diagram, I'm not overly
concerned about the available bandwidth.

The memory supports up to DDR2-1200. You might be using DDR2-800
in there. That would be 6.4GB/sec per memory DIMM. Two DIMMs
operating in dual channel gives 12.8GB/sec, which is just enough
to match the 3x16 bandwidth. And memory does not actually
sustain that kind of bandwidth forever. But again, compared to your
total 1.6GB/sec requirement, there is likely more than enough
capacity in the memory subsystem. Your cards might use 25% of the
practical bandwidth.

If you can find a better diagram than this one, then that
would help fill in the details.

http://images.anandtech.com/reviews/chipsets/nvidia/nforce-780i/780i-block_lrg.png

Paul
 
                              P7N Diamond

                                  +------------------------------- x1
                                  |  +---------------------------- x1
                                  |  |
                              +----------+  ???  +-----------+
12.8GB/sec  ___/ DDR2-800 ---|   780i   |-------| Nforce200 |---- x16  3.6Gb/sec
Two sticks     \ DDR2-800 ---|          |       |  Switch   |---- x16  3.6GB/sec
                              +----------+       +-----------+
                                   |
                                   | Hypertransport
                                   | Likely 4GB/sec, a *guess*
                                   | Enough for x16 to spread around
                                   |
                              +-----------+
                              |   570i    |----------------------- x16  2GB/sec
                              |          |----------------------- x8   2GB/sec
                              |          |------- (PCI)
                              |          |------- (SATA)
                              |          |------- four x1, for onboard usage.
                              +-----------+

First of all, the output of Nforce200 is (2) x16 PCI Express revision 2.0,
which would be a total of 16GB/sec. The input to the Nforce200 cannot
sustain that. And in any case, your storage cards are going to be
running with revision 1.0 speeds, so when using storage cards, the
max bandwidth you could pull from the two slots, would be 8GB/sec total.

It could be that the input bus to Nforce200, is 16 lanes at 4.5GT/sec.
A "normal" PCI Express lane runs at 2.5GT/sec. So there is 4GB/sec * (4.5/2.5)
or 7.2GB/sec feeding into the Nforce200. That means you could have 3.6GB/sec
on each of the "x16" outputs. This is still plenty, with respect to your
requirement of 400MB/sec from each.

The Hypertransport leading to the 570i, could be a 4GB/sec one. This
is a guess based on the fact that the chipset is advertised as a
"3x16" platform. So the x8 and x16 likely share 4GB/sec of bandwidth,
making two solid x8 slots in practice. That would be 2GB/sec per slot.
Activities from other 570i interfaces, would cut into the bandwidth
slightly, such as a burst from the SATA ports. Maybe if you set up a
SATA four drive RAID0, a burst from that would provide the most
competition with the other slots. That still leaves enough bandwidth
to have more than 400MB/sec on the PCI Express slots.

So while there is some detail missing in the diagram, I'm not overly
concerned about the available bandwidth.

The memory supports up to DDR2-1200. You might be using DDR2-800
in there. That would be 6.4GB/sec per memory DIMM. Two DIMMs
operating in dual channel gives 12.8GB/sec, which is just enough
to match the 3x16 bandwidth. And memory does not actually
sustain that kind of bandwidth forever. But again, compared to your
total 1.6GB/sec requirement, there is likely more than enough
capacity in the memory subsystem. Your cards might use 25% of the
practical bandwidth.

If you can find a better diagram than this one, then that
would help fill in the details.

http://images.anandtech.com/reviews/chipsets/nvidia/nforce-780i/780i-...

    Paul

So six slots could simultaneously provide at least 400 MB per second?
If the answer is yes, then the next question would be:
Can hard drive controller cards provide at least 400 MB per second:
100 MB per second each from simultaneously reading four different
drives?
 
PeteOlcott said:
So six slots could simultaneously provide at least 400 MB per second?
If the answer is yes, then the next question would be:
Can hard drive controller cards provide at least 400 MB per second:
100 MB per second each from simultaneously reading four different
drives?

The board has four worthwhile slots. The two PCI Express x1 slots aren't
going to be as capable as a x8 or x16 slot. (About 200MB/sec each.)

The answer about controllers, depends on what software is
added, between the controllers and the OS. One of my assumptions
was, that *perhaps* you could use the Tomshardware Windows RAID hack
to combine the bandwidth of 16 disks at 100MB/sec each. I selected
a non-RAID card in that case, so 16 separate disks are presented
to the OS, assuming that the Tomshardware RAID hack would allow
their bandwidth to be combined.

(User sees combined bandwidth 1600MB/sec)
|
Tomshardware_RAID_Hack_For_WinXP
| | | |
------- ------- ------- ------- Four separate cards
| | | | | | | | | | | | | | | | Sixteen disks

If you use Areca cards, we know already from seeing manufacturer data,
that they can produce 800MB/sec limited by their IOP. But with the
Areca cards, you'd need another layer of software to combine
the bandwidth of two cards.

(User sees combined bandwidth 1600MB/sec)
|
(Need a way to RAID0 these two arrays ???) <------ what product does this ?
| |
| | (800MB/sec each)
--------------- --------------- Two separate cards
| | | | | | | | | | | | | | | | Sixteen disks (Velociraptor)

The Areca solution means you need fewer slots, or potentially
you could get more bandwidth etc., but it also means
identifying the software that allows the output of
the Areca disks to be combined. If you're writing
your own software, then you can do that part
yourself (assuming non-blocking I/O in Windows,
so two program threads could read to memory
buffers simultaneously). Alternately, the
software might be something commercial that
allows RAID0 combination of separate arrays.

In the second figure, maybe if the arrays appear
as volumes to the OS, you could use the Tomshardware
hack to combine them ? Perhaps you could test this
concept, using only four disks to start and a couple
of onboard controllers on an ordinary motherboard, to
see if the "striped" option here, would allow two arrays
to be combined.

http://www.tomshardware.com/reviews/windowsxp-make-raid-5-happen,925-3.html

(User sees combined bandwidth)
|
Tomshardware_RAID_Hack_For_WinXP ("striped")
| |
| |
--- --- Two onboard controllers, RAID0 each
| | | | Four disks

What you'd do for the experiment, is set up each array of
two disks individually. Use HDTach or HDTune to benchmark
each array. Then add in the Tomshardware hack, combining
the two arrays by using Windows to "stripe" the volumes.
Then run HDTach or HDTune on the resulting virtual array.
So you should be able to do a partial proof of concept
with simple ingredients. What cannot be known in
advance, is the degree to which it scales and everything
works to deliver more than 1000MB/sec when the
real hardware config is set up.

So whatever you do, this is still going to be an
expensive experiment -- unless you can find an article
where someone has tested a similar concept, you won't
know for sure about the scaling, or whether the thing
runs out of steam at such high bandwidths (CPU limit).
For example, any time that software has to do memory to
memory copies of data, that just kills performance, so
if any part of the software is doing that, it will
crush the performance. The small experiment with the
four disks above, may not be able to show you that
limitation.

That is why using a single controller, capable of doing
more than 800MB/sec, is more attractive. With that
working for you, you're more likely to see a benchmark,
before buying equipment.

Paul
 
Peter said:
The key question is whether or not any of these alternatives
is feasible. It would seem that the key aspect of this key
question would be whether or not any of these boards can
provide enough simultaneous bandwidth from their expansion
slots. I really need at least 400 MB per second
simultaneously from each of six slots. That should
(hopefully) provide my required 1600 MB per second, even
from the slow part of the drive.

Please do not top-post. Your answer belongs after (or intermixed
with) the quoted material to which you reply, after snipping all
irrelevant material. See the following links:

<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/> (taming google)
<http://members.fortunecity.com/nnqweb/> (newusers)

F'ups set.
 
Back
Top