Fastest sustained throughput possible from a drive array???

Peter Olcott · Aug 27, 2008

http://www.wwpi.com/cables-a-connec...edented-flexibility-in-io-bandwidth-selection
It looks like this link indicates 1600 MB per second, yet it
does not specify that this is sustained throughput. Can
someone please confirm this?

GT · Aug 27, 2008

Peter Olcott said:
http://www.wwpi.com/cables-a-connec...edented-flexibility-in-io-bandwidth-selection
It looks like this link indicates 1600 MB per second, yet it does not
specify that this is sustained throughput. Can someone please confirm
this?

This is not the speed of the drives in the setup, its the speed of the bus.
There are no moving parts or seek times to consider, so through can be
sustained on the bus as long as the hardware at each end can cope.

Not sure where you will find hard disks (in any sort of array) that can
sustain those kinds of speeds though.

Peter Olcott · Aug 29, 2008

GT said:
This is not the speed of the drives in the setup, its the
speed of the bus. There are no moving parts or seek times
to consider, so through can be sustained on the bus as
long as the hardware at each end can cope.

Not sure where you will find hard disks (in any sort of
array) that can sustain those kinds of speeds though.

http://www.dulcesystems.com:80/html/pro_rx.html
This system will do 800 MB / second, that may be the limit
of current technology.

Peter Olcott · Aug 29, 2008

GT said:
This is not the speed of the drives in the setup, its the
speed of the bus. There are no moving parts or seek times
to consider, so through can be sustained on the bus as
long as the hardware at each end can cope.

Not sure where you will find hard disks (in any sort of
array) that can sustain those kinds of speeds though.

http://www.violin-memory.com/assets/techbrief_gen1.pdf
Here is another option in case anyone cares, 1400 mb per
second.

Paul · Aug 29, 2008

Peter said:
http://www.violin-memory.com/assets/techbrief_gen1.pdf
Here is another option in case anyone cares, 1400 mb per
second.

The big win on an appliance like that, is the seek time.

Seek is one of the irritating limits of something like
a 16 disk RAID array. Plenty of bandwidth, but relatively
long delays before the array is returning data. That SSD/Ramdisk
is one way to fix it, even if it "only" supports 504GB of
RAM.

Paul

Peter Olcott · Aug 29, 2008

Paul said:
The big win on an appliance like that, is the seek time.

Seek is one of the irritating limits of something like
a 16 disk RAID array. Plenty of bandwidth, but relatively
long delays before the array is returning data. That
SSD/Ramdisk
is one way to fix it, even if it "only" supports 504GB of
RAM.

Paul

I don't seek why seek time can not be amortized across
multiple disks.

Paul · Aug 29, 2008

Peter said:
I don't seek why seek time can not be amortized across
multiple disks.

If the seek time is 3 microseconds on that DRAM based SSD, and
11 milliseconds on a hard drive, it is pretty hard to make the
thousandfold difference go away. The SSD is just blindingly fast,
and perfect for things like databases. Too bad the purchase
price would be so high. They're priced for enterprise usage,
so the price is not even a reflection of the price of the
RAM used in them - 500GB of RAM would cost about $12500, and
the SSD would be priced at many times that number. It would be
a bargain at $50000, and depending on the patent situation, and
how many companies make them, it could be a lot more expensive.
Maybe closer to $100000. And that is why lowly hard drives
are so attractive :-)

Paul

PeteOlcott · Aug 29, 2008

If the seek time is 3 microseconds on that DRAM based SSD, and
11 milliseconds on a hard drive, it is pretty hard to make the
thousandfold difference go away. The SSD is just blindingly fast,
and perfect for things like databases. Too bad the purchase
price would be so high. They're priced for enterprise usage,
so the price is not even a reflection of the price of the
RAM used in them - 500GB of RAM would cost about $12500, and
the SSD would be priced at many times that number. It would be
a bargain at $50000, and depending on the patent situation, and
how many companies make them, it could be a lot more expensive.
Maybe closer to $100000. And that is why lowly hard drives
are so attractive

Paul- Hide quoted text -

- Show quoted text -

http://www.nextlevelhardware.com/storage/barracuda/
Using these numbers for the Seagate Barracuda 7200.11
(1) Access time 12.4 MS or 80 random Accesses per second
(2) Sustained read rate 89.6 MB per second

Assuming that all of the data is contiguous, (no file fragmentation),
why couldn't 24 drive RAID 1 system provide this sustained read rate:
23/24 * 89.6 * 24 = 2060 MB per second?

23/24 is because each drive only has to spend 1/24 of its time
seeking, and the remaining time can be spent reading.

Paul · Aug 29, 2008

PeteOlcott said:
http://www.nextlevelhardware.com/storage/barracuda/
Using these numbers for the Seagate Barracuda 7200.11
(1) Access time 12.4 MS or 80 random Accesses per second
(2) Sustained read rate 89.6 MB per second

Assuming that all of the data is contiguous, (no file fragmentation),
why couldn't 24 drive RAID 1 system provide this sustained read rate:
23/24 * 89.6 * 24 = 2060 MB per second?

23/24 is because each drive only has to spend 1/24 of its time
seeking, and the remaining time can be spent reading.

I'm not arguing that you cannot sustain some large transfer rate
using hard drives (with the right access pattern). But once random
access is added to the picture, and relatively small transfers are
involved, most of the performance time is dominated by head movement.
11 milliseconds moving the head, read 4KB of data, 11 milliseconds
moving the head, read another 4KB of data. Real world applications
do some amount of head movement.

Take my machine as an example. If I use the Search command in Windows,
it might take 2 minutes to run and search all the files. The computer
is not reading a lot of data during that time, but there is a ton of
head movement. If I had that SSD, my search would complete in a couple
seconds. (The 2 minute figure, might be after a reboot, when the
OS file cache is empty.)

On RAID1 with two disks, in theory, on a read, both disks can return
data, and the controller can accept the data from the drive that
finishes first. That helps to reduce the average seek time, by a bit.
Whether that optimization is used on motherboard "soft RAID"
implementations, is unclear to me. That would be one way to reduce
the effects of the seek time, but only on read operations.

Paul

Peter Olcott · Aug 29, 2008

Paul said:
If the seek time is 3 microseconds on that DRAM based SSD,
and
11 milliseconds on a hard drive, it is pretty hard to make
the
thousandfold difference go away. The SSD is just
blindingly fast,
and perfect for things like databases. Too bad the
purchase
price would be so high. They're priced for enterprise
usage,
so the price is not even a reflection of the price of the
RAM used in them - 500GB of RAM would cost about $12500,
and
the SSD would be priced at many times that number. It
would be
a bargain at $50000, and depending on the patent
situation, and
how many companies make them, it could be a lot more
expensive.
Maybe closer to $100000. And that is why lowly hard drives
are so attractive

Paul

All that I would need is a pc that can handle 24 hard-drives
and run windows and I would make my own RAID-1 system that
would provide 2000 MB per second sustained read performance.
I would merely seek once per drive, to a place that is in
increments of 1/24 of the size of the file into a different
(1/24th of the file sized increment) offsets of a single
contiguous buffer, and read a block that is 1/24 of the file
size from each drive. For writing I would have to write
every block to every drive. I may be able to get by with 12
hard-drive controller cards.

Peter Olcott · Aug 29, 2008

Paul said:
I'm not arguing that you cannot sustain some large
transfer rate
using hard drives (with the right access pattern). But
once random
access is added to the picture, and relatively small
transfers are
involved, most of the performance time is dominated by
head movement.
11 milliseconds moving the head, read 4KB of data, 11
milliseconds
moving the head, read another 4KB of data. Real world
applications
do some amount of head movement.

Take my machine as an example. If I use the Search command
in Windows,
it might take 2 minutes to run and search all the files.
The computer
is not reading a lot of data during that time, but there
is a ton of
head movement. If I had that SSD, my search would complete
in a couple
seconds. (The 2 minute figure, might be after a reboot,
when the
OS file cache is empty.)

On RAID1 with two disks, in theory, on a read, both disks
can return
data, and the controller can accept the data from the
drive that
finishes first. That helps to reduce the average seek
time, by a bit.
Whether that optimization is used on motherboard "soft
RAID"
implementations, is unclear to me. That would be one way
to reduce
the effects of the seek time, but only on read operations.

Paul

I need to read a lot of 1.6 GB files, switching between
different data sets very quickly. I am aiming for one second
response time. I might have fifty different 1.6 GB files
that are needed at different times during execution of a
single process.

Paul · Aug 30, 2008

Peter said:
I need to read a lot of 1.6 GB files, switching between
different data sets very quickly. I am aiming for one second
response time. I might have fifty different 1.6 GB files
that are needed at different times during execution of a
single process.

If we go back to your 24 drive RAID example, then we could
use RAID0. In a one second interval, all the drives could
take 11 milliseconds to reach the desired starting location
on the disks, and then they could "sustained read" a total
of 1.6GB of data. So the seek time, overall, is a small
component of the total transfer time.

A question would be, whether the entire 1.6GB has to be
resident while this happens. Certainly, if you're accessing
bytes randomly, to do that directly from the disk, is
going to invoke lots of seeks. If the 1.6GB data set is
hardly accessed at all, then reading the entire thing
doesn't make sense. If the dataset is processed sequentially,
then it could be done in smaller sections.

When you say "switching between datasets very quickly", that
makes it sound like you actually want to access 80GB of data,
and your decision to chop it into a 1.6GB quantity, was an
artificial attempt to subdivide it. You don't have to do that.
Here are two implementations of low seek time storage systems.

*******

There is another device you might consider. SSDs based on
flash, have decent seek performance. USB flash is 1 millisecond,
while something like SATA flash might be 0.1 millisecond.
(What the last sentence means, is don't try this with USB
flash :-)

SATA or IDE based flash, should give a lower seek
time.)

This 32GB MTRON with SATA interface, gives sustained 110MB/sec read.
16 of those connected to an Areca in RAID0 ought to be able to
read 1.6GB in one second. And with a seek of 0.1 millisecond,
if you wanted random accessing as well, you can get it. The
only downside, is what happens when operations are tiny in size.
IOPS on flash can be pretty bad, depending on what you're trying
to do, so be careful with this idea. (The write transfer rate
can drop substantially, if you're doing 4KB writes. Small write
transfers will suck.)

http://www.anandtech.com/showdoc.aspx?i=3167&p=2

( http://mtron.easyco.com/news/papers/07-12-01_mtron-benchmarks.pdf )

In a quick web search, the MTRON 32GB MSP 7000 is $670, so
16 of those is still a small fortune of $10720. And if you
try to go cheaper, there are some inferior flash implementations
out there. The reviews on Newegg, show how upset some customers
have been with the performance - if you're going with flash of
this type, buy one and test it first. Don't buy 16 and live to
regret it. A bargain flash may be a mistake.

*******

Another way to do it, is purchase a server motherboard with 16 FBDIMM
slots, then purchase 16 * 8GB FBDIMMS to install in it. Then
read the entire 80GB of data into memory.

16 FBDIMM slots

http://www.tyan.com/product_board_detail.aspx?pid=560

8 kits * $3456 to fill the motherboard with 128GB of RAM = $27648

http://www.crucial.com/store/mpartspecs.aspx?mtbpoid=1C645BDFA5CA7304

I expect somewhere out there, there is a cheaper kind of FBDIMM
module. There are also bigger modules on the horizon (16GB) but I don't
know if the Intel chipset has issues with something like that or not.
There is another company that is working on big modules as well, but
I cannot find the article now.

http://www.elpida.com/en/news/2008/08-05.html

Paul

Peter Olcott · Aug 30, 2008

Paul said:
If we go back to your 24 drive RAID example, then we could
use RAID0. In a one second interval, all the drives could
take 11 milliseconds to reach the desired starting
location
on the disks, and then they could "sustained read" a total
of 1.6GB of data. So the seek time, overall, is a small
component of the total transfer time.

A question would be, whether the entire 1.6GB has to be
resident while this happens. Certainly, if you're
accessing
bytes randomly, to do that directly from the disk, is
going to invoke lots of seeks. If the 1.6GB data set is
hardly accessed at all, then reading the entire thing
doesn't make sense. If the dataset is processed
sequentially,
then it could be done in smaller sections.

When you say "switching between datasets very quickly",
that
makes it sound like you actually want to access 80GB of
data,
and your decision to chop it into a 1.6GB quantity, was an
artificial attempt to subdivide it. You don't have to do
that.
Here are two implementations of low seek time storage
systems.

*******

There is another device you might consider. SSDs based on
flash, have decent seek performance. USB flash is 1
millisecond,
while something like SATA flash might be 0.1 millisecond.
(What the last sentence means, is don't try this with USB
flash SATA or IDE based flash, should give a lower
seek
time.)

This 32GB MTRON with SATA interface, gives sustained
110MB/sec read.
16 of those connected to an Areca in RAID0 ought to be
able to
read 1.6GB in one second. And with a seek of 0.1
millisecond,
if you wanted random accessing as well, you can get it.
The
only downside, is what happens when operations are tiny in
size.
IOPS on flash can be pretty bad, depending on what you're
trying
to do, so be careful with this idea. (The write transfer
rate
can drop substantially, if you're doing 4KB writes. Small
write
transfers will suck.)

http://www.anandtech.com/showdoc.aspx?i=3167&p=2

(
http://mtron.easyco.com/news/papers/07-12-01_mtron-benchmarks.pdf )

In a quick web search, the MTRON 32GB MSP 7000 is $670, so
16 of those is still a small fortune of $10720. And if you
try to go cheaper, there are some inferior flash
implementations
out there. The reviews on Newegg, show how upset some
customers
have been with the performance - if you're going with
flash of
this type, buy one and test it first. Don't buy 16 and
live to
regret it. A bargain flash may be a mistake.

*******

Another way to do it, is purchase a server motherboard
with 16 FBDIMM
slots, then purchase 16 * 8GB FBDIMMS to install in it.
Then
read the entire 80GB of data into memory.

16 FBDIMM slots

http://www.tyan.com/product_board_detail.aspx?pid=560

8 kits * $3456 to fill the motherboard with 128GB of RAM =
$27648

http://www.crucial.com/store/mpartspecs.aspx?mtbpoid=1C645BDFA5CA7304

I expect somewhere out there, there is a cheaper kind of
FBDIMM
module. There are also bigger modules on the horizon
(16GB) but I don't
know if the Intel chipset has issues with something like
that or not.
There is another company that is working on big modules as
well, but
I cannot find the article now.

http://www.elpida.com/en/news/2008/08-05.html

Paul

It may be the case that more memory may be the best answer
to some aspects of the implementation of my technology, it
may be the case that 32 GB of RAM is entirely sufficient in
many cases. In other cases there may be a need for 10,000
1.6 GB files that need to be entirely loaded into memory,
thus a very fast RAID system would be the only feasible
solution, quickly loading data in a way comparable to
virtual memory management. I am exploring this latter aspect
because it provides the most flexibility. In this case the
VM page size is 1.6 GB.

Peter Olcott · Aug 30, 2008

Paul said:
If we go back to your 24 drive RAID example, then we could
use RAID0. In a one second interval, all the drives could
take 11 milliseconds to reach the desired starting
location
on the disks, and then they could "sustained read" a total
of 1.6GB of data. So the seek time, overall, is a small
component of the total transfer time.

The key here is how to get a desktop (not server) machine
that can host 24 drives. If we make it a server now we have
the problem of getting the data to the workstation from the
server at least 13 gb/sec. Possibly server machines can host
desktop operating systems? If we buy a server OS, then the
price must be based on a single user, and it must be able to
execute all of the desktop application software.

Paul · Aug 30, 2008

Peter said:
The key here is how to get a desktop (not server) machine
that can host 24 drives. If we make it a server now we have
the problem of getting the data to the workstation from the
server at least 13 gb/sec. Possibly server machines can host
desktop operating systems? If we buy a server OS, then the
price must be based on a single user, and it must be able to
execute all of the desktop application software.

http://www.areca.us/support/downloa...l_Spec/ARC_1231_1261_1280ML_Specification.zip

You can get 24 drives on a single card, in the form of the
ARC-1280 or ARC-1280ML. But the specs say that while reading
from the disk, even in RAID0, the read performance is 885MB/sec,
which means the IOP could be the bottleneck. When the card
reads from internal cache, the speed rises to 1624MB/sec, so
the path from the cache, DMA transferring into main system
memory, works well. (It is possible that is all you can stuff
through an x8 connector in one direction. The 1624MB/sec could
represent the practical limits of PCI Express x8.)

You can stick a 2GB DDR2 DIMM in the cache socket, so that
is as much cache as it will use. But I don't see how upgrading
the cache is going to change the nature of the design. The
operations to and from the disks may be the rate limiting step.

You can get desktop computers with two or three PCI Express x16
slots, so the desktop does have room for cards. That might
not be a limitation. The problem is finding a better card
for the job, if you really want more than 800MB/sec from a
single card.

It is hard to say, if purchasing two cards would be the answer.
Then, you'd need some kind of software RAID solution to run
on top of it. I haven't looked for such software, and the
only thing I've seen in the past, is some software RAID
products for Macintosh. I don't know what PC people use
for software RAID (other than the Tomshardware hack).

The Highpoint RR3540 has the same limitation - 780MB/sec sustained read.
http://www.highpoint-tech.com/PDF/RR3540/RR3540_datasheet.pdf

This P5E64 WS motherboard, is a desktop with five PCI Express slots.
So there is room for cards in this one. Two slots have x16 bandwidth,
and three slots have x4 bandwidth.

http://images17.newegg.com/is/image/newegg/13-131-300-S03?$S640W$

http://www.asus.com/products.aspx?modelmenu=2&model=2131&l1=3&l2=11&l3=640&l4=0

4 x PCIe x16 (@ x16, x16, x4, x4) (Dual PCIe 2.0 x 16)
1 x PCIe x4

So a couple of the slots are limited to x4, which is fine
if you were using four cheaper storage cards. x4 gives room
for 1000MB/sec theoretical or 800MB/sec practical. If you
could find (4) four port PCI Express cards, they could
go in the four PCI Express x16 sized slots, leaving the
remaining x4 PCI Express slot for a graphics card.

The issue now, is to find a non-RAID card, put multiple of
them in the machine, and then use a soft RAID approach to
get the bandwidth.

OK, try four of these cards, hosting four disks each. When you're
in Windows, this would give you sixteen visible disks. Since
the connector on this ix PCI Express x4, you can use three of the P5E64
x16 slots and the one x4 slot, to hold the cards. Leaving
an x16 slot for your video card, so your video won't suck.

http://www.newegg.com/Product/Product.aspx?Item=N82E16816132018

Then, perhaps you could use this hack, to combine the power of the
16 disks. Don't forget about the 2.2TB limit for a volume, so
select a "small" partition size and test it thoroughly to make
sure there isn't a problem, before the real data goes on there.

http://www.tomshardware.com/2004/11/19/using_windowsxp_to_make_raid_5_happen/index.html

Using (16) 120MB/sec disks, of which one disk would carry the
capacity needs of parity, that leaves 15 * 120MB/sec of potential
transfer rate or 1800MB/sec at the beginning of the disk. If
half of the total disk capacity is used (because of the 2.2TB
limit), then you'd stay in the "fast" section at the beginning.
(Disk bandwidth stays above 100MB/sec.) 16 Velociraptor disks
would cost a total of $4800. Plus four $100 storage cards.
$360 for motherboard. About $5600 or so of obvious stuff.

http://www.xbitlabs.com/articles/storage/display/wd-velociraptor_4.html

This would be firmly in the "crazy ideas" section :-)

I would have been much happier, if I could find a card which
does not have the IOP341 limitation. If you can find a
software RAID product, that will accept two arrays as
"volumes", then maybe you can do the project with just
two Areca cards. They'd only need enough ports to support
the bandwidth, so each card would not need to be the
24 port version.

Paul

Peter Olcott · Aug 31, 2008

Paul said:
http://www.areca.us/support/downloa...l_Spec/ARC_1231_1261_1280ML_Specification.zip

You can get 24 drives on a single card, in the form of the
ARC-1280 or ARC-1280ML. But the specs say that while
reading
from the disk, even in RAID0, the read performance is
885MB/sec,
which means the IOP could be the bottleneck. When the card
reads from internal cache, the speed rises to 1624MB/sec,
so
the path from the cache, DMA transferring into main system
memory, works well. (It is possible that is all you can
stuff
through an x8 connector in one direction. The 1624MB/sec
could
represent the practical limits of PCI Express x8.)

You can stick a 2GB DDR2 DIMM in the cache socket, so that
is as much cache as it will use. But I don't see how
upgrading
the cache is going to change the nature of the design. The
operations to and from the disks may be the rate limiting
step.

You can get desktop computers with two or three PCI
Express x16
slots, so the desktop does have room for cards. That might
not be a limitation. The problem is finding a better card
for the job, if you really want more than 800MB/sec from a
single card.

It is hard to say, if purchasing two cards would be the
answer.
Then, you'd need some kind of software RAID solution to
run
on top of it. I haven't looked for such software, and the
only thing I've seen in the past, is some software RAID
products for Macintosh. I don't know what PC people use
for software RAID (other than the Tomshardware hack).

The Highpoint RR3540 has the same limitation - 780MB/sec
sustained read.
http://www.highpoint-tech.com/PDF/RR3540/RR3540_datasheet.pdf

This P5E64 WS motherboard, is a desktop with five PCI
Express slots.
So there is room for cards in this one. Two slots have x16
bandwidth,
and three slots have x4 bandwidth.

http://images17.newegg.com/is/image/newegg/13-131-300-S03?$S640W$

http://www.asus.com/products.aspx?modelmenu=2&model=2131&l1=3&l2=11&l3=640&l4=0

4 x PCIe x16 (@ x16, x16, x4, x4) (Dual PCIe 2.0 x 16)
1 x PCIe x4

So a couple of the slots are limited to x4, which is fine
if you were using four cheaper storage cards. x4 gives
room
for 1000MB/sec theoretical or 800MB/sec practical. If you
could find (4) four port PCI Express cards, they could
go in the four PCI Express x16 sized slots, leaving the
remaining x4 PCI Express slot for a graphics card.

The issue now, is to find a non-RAID card, put multiple of
them in the machine, and then use a soft RAID approach to
get the bandwidth.

OK, try four of these cards, hosting four disks each. When
you're
in Windows, this would give you sixteen visible disks.
Since
the connector on this ix PCI Express x4, you can use three
of the P5E64
x16 slots and the one x4 slot, to hold the cards. Leaving
an x16 slot for your video card, so your video won't suck.

Yes and with this setup I could make my own software RAID 1
system that would know that each drive is a mirror of the
others, care would be taken so that no fragmentation occurs.
I could read 1/16 sized piece of each file into a single
contiguous buffer and get RAID 0 read performance and RAID 1
reliability. I would only need one seek per drive. The one
missing piece is where would I physically put all of those
sixteen drives?

The only thing that I can envision is having a bunch of
drives stacked up next to the workstation tower case, and
sixteen sets of extra long ribbon cables coming out of the
back of this workstation case. Is there a solution that is
less clumsy than this one?
If I could find a less clumsy solution that could handle 24
drives, this could provide 2 terabytes of virtual memory at
RAM speed.

Paul · Aug 31, 2008

Peter said:
Yes and with this setup I could make my own software RAID 1
system that would know that each drive is a mirror of the
others, care would be taken so that no fragmentation occurs.
I could read 1/16 sized piece of each file into a single
contiguous buffer and get RAID 0 read performance and RAID 1
reliability. I would only need one seek per drive. The one
missing piece is where would I physically put all of those
sixteen drives?

The only thing that I can envision is having a bunch of
drives stacked up next to the workstation tower case, and
sixteen sets of extra long ribbon cables coming out of the
back of this workstation case. Is there a solution that is
less clumsy than this one?
If I could find a less clumsy solution that could handle 24
drives, this could provide 2 terabytes of virtual memory at
RAM speed.

For a computer case, it is all a matter of money. Proper
server cases, can be in the $1000+ range, but with a server
case, you can always find something big enough to house the
whole thing.

This is an example of a consumer style box, with room
for 19 drives. To fit the 3.5" drives in the 5.25" bays,
you'll need some adapter kits. So this might be a good case
for a 16 drive configuration.

http://www.newegg.com/Product/Product.aspx?Item=N82E16811112189

These U shaped pieces of metal, allow a 3.5" drive to be suspended
in a 5.25" space. You still use the sliders that come with the
5.25" mountings, as well as these U shaped things. At one time,
when you bought a retail (boxed) hard drive, these things and
four screws to install them, were included. Now, a lot of drives
are OEM and without accessories of any kind. Sometimes you can
get these in large quantities, and get a price break on them.

http://www.startech.com/item/BRACKET-Metal-35-to-525-Inch-Drive-Adapter-Bracket.aspx

*******

This looked promising at first.

http://www.servercase.com/miva/miva...MC4E2-QI-XPSS(ETA+5/28)&Category_Code=4UBKBLN

Apparently made by this company, but I cannot find the documents, or enough
pictures to understand it.

http://www.aicipc.com

Server cases are complicated, with choices as to the backplane behind the drive bays,
position and type of motherboard (some have an air cooling scheme intended for
certain motherboard styles). While it is pretty easy to stumble on a
nice looking box, it is hard to find the details needed to know if
it is worth buying. For this case, there are four models, and XPSS
is for SATA drives.

*******

This is another example. This time I found a manual, but it still left me confused.

Here is a Supermicro case - CSE-846TQ-R900B , prices around $1000 or so.

http://apd.lv/image/products/1088581_425_0_425NULL.jpg

http://www.boston.co.uk/products/components/chassis/4u/sc846/cse-846tq-r900b/default.aspx

User manual for chassis.

http://www.supermicro.com/manuals/chassis/tower/SC846.pdf

I cannot tell if the case is ready to work with SATA for sure, or not.

*******

One thing I can warn you about, in your quest for a desktop server,
is hard drive spinup current. Once a hard drive is spinning, the
power requirements are quite reasonable. But during the first ten
seconds, the motor accelerates the spindle up to speed, and during
that interval, 2.5A is drawn from the 12V rail. Now, imagine what
happens to the power supply - 24 drives times 2.5A is 60A from the
12V rail, or 720 watts. That is quite a jolt for ordinary
power supplies. The mammoth current only flows for the first ten
seconds, and then drops to 1/4 or less of that value.

Back in the SCSI era, SCSI had options for "staggered spin". That
allowed the drive spinup to be staged, with some drives starting to
spin after the others. This moderates the peak demand on the power
supply, and prevents it from "falling over". SCSI address was
used as part of the spinup scheme (and I expect the controller
played a part in the scheme as well).

For whatever brand of hard drive you select, try to get a manual
that describes the product in detail. So you can get an accurate
estimate of spinup current. It could be even more than the 2.5A
figure. (I could not get data for Velociraptor!) Being prepared
to handle spinup current would be important, if you are using
a "home brew" solution to packaging, and haven't got any sure
fired scheme for staggered spinup.

In this archived document, pin 11 on the SATA drive power
connector is discussed. It allows some control of spinup,
and I expect the SATA backplane wiring on some server
boxes, may be using the option.

http://web.archive.org/web/20051028...s_technical/staggerd-spin-detection-pin11.pdf

There is no problem finding big power supplies. But it would be
nice to match the supply, to the problem being solved. This
one can supply 12V at up to 100 amps, easily covering a 60 amp
spinup. With supplies like this, even the cord used to plug them
in can be important (use the cord that came with it).

http://www.pcpower.com/power-supply/turbo-cool-1200.html

Alternately, you could house all the drives in standalone racks,
but then you have to be careful of cable length. SATA has limits
for cabling. ESATA doubles those lengths, by changing the driver
and receiver levels on the data cables slightly. While you might run some
number of cables through holes in the chassis, keep those cable
length limits in mind. (The controller would have to state whether
it was ESATA ready or not, in order to use the doubled cable length.)

ESATA is up to 2 meters of wire. SATA is up to 1 meter.
http://en.wikipedia.org/wiki/SATA

So I know about a couple of the smaller issues, but I'm not a
"server chassis guy". Someone in a corporate IT operation may
have more experience specifying and assembling these things.

Paul

Paul · Aug 31, 2008

You know, I'm having second thoughts about that P5E64 WS Evolution
board. I've checked the reviews, and it seems to be a wonderful board
for the overclocking community. They're hitting more than
500MHz on the CPU input clock, for FSB2000 at the processor.
500MHz used to be about the limit, but that board has
gone past it.

What is wrong with the board, is I cannot get the number of
PCI Express lanes to balance properly. In other words,
that "x16, x16, x4, x4" line is wrong, and I cannot figure
out a way to correct it, based on what I know about the
hardware. (I.e. Can all those bandwidths be used simultaneously ?
I don't think so.)

The X48 has two x16 PCI Express interfaces. I don't see any
mention of a "bifurcate" feature in the datasheet, like there
is on the 975X chipset. It doesn't appear that the x16 lane
groups can be split into pieces. The Southbridge has a total
of six PCI Express x1 lanes, and sometimes a motherboard maker
will group four of them and make an x4 slot with them. So there
are a total of 38 lanes available, and limits as to how they can be
split.

Now, a simply horrid idea, is for Asus to do this.

North ------ PCI Express x16
Beidge ------ PCI Express x16
|
| DMI (only x4 bandwidth on this bus)
|
South ------ ("x4") --- PEX8518 ---- x4
Bridge ---- x4
/ \ ---- x4
x1 x1
88SE6145 88E8056
SATA/PATA GbE
? ?

Why does this stink ? Because now the three x4 ports are *sharing*
a total of x4 bandwidth or 1GB/sec. That means if you were to stick
three RAID cards in there, the total bandwidth would be limited
to 1GB/sec. And any nuisance bandwidth usage on the Southbridge,
also comes out of that 1GB bandwidth (i.e. DMI bus x4 bandwidth
is used to supply x6 lanes on the Southbridge plus other busses).
I hope that isn't how they're doing it. The PEX8518 is here, if
you want a look at the info.

http://www.plxtech.com/pdf/product_briefs/ProductBrief_PEX8518.pdf

If all you want to do, is use the two x16 slots to their fullest
potential, then I think the P5E64 WS Evolution would be great.
But I now have my suspicions that the other slots are comparative
garbage. And so far, no review has attempted to test, or do any
real analysis of the wiring.

This article hints that the lane numbers don't match, but they did
it in a superficial way. Since I don't see any evidence in the X48
datasheet, that the x16 groups can be split, that is why I have to
conclude that the above diagram is how they're doing it.

http://news.softpedia.com/news/Asus...CI-Express-Lanes-Don-039-t-Add-Up-80705.shtml

This is an alternative. The Intel Skulltrail, which uses dual Xeon
LGA771 processors. It has four large PCI Express slots.

http://www.newegg.com/Product/ProductReview.aspx?Item=N82E16813121330

http://download.intel.com/support/motherboards/desktop/d5400xs/sb/e30088001us.pdf

There is a block diagram here. The Nforce100 PCI Express switch chips
are entirely unnecessary from a hardware perspective. They were put
there to allow Nvidia SLI and associated drivers to work. As far as
I know, the 5400 could have split its interfaces to make (4) x8 lane
interfaces. Using the Nforce100 means better sharing of the port, if
one x16 needs more bandwidth than the other, so that is good. But
the disadvantage is you're paying for some extra heat on the motherboard,
because those two chips are there. Considering the rest of your box and
its power consumption, I expect this doesn't matter that much. This
thing uses FBDIMMs, and as long as you don't go with the high density
ones, they are now only moderately expensive. The board is EATX, and
so you should check the dimensions before buying a computer case for
this. Check the Newegg reviews or other reviews, to see what kind of
computer case they're using. And what aftermarket coolers work with it.

http://www.xbitlabs.com/images/cpu/intel-skulltrail/sheme.png

http://www.xbitlabs.com/articles/cpu/display/intel-skulltrail.html

Asus has their own 5400 based board, the Z7S WS, but the slot
configuration is even less interesting. There are two full
x16 slots and a slot with x8 bandwidth. And then you'd have to
go through the whole analysis thing again, to figure out where
are the lanes are going :-)

Why do I suddenly feel sleepy...

http://www.newegg.com/Product/Product.aspx?Item=N82E16813131272

HTH,
Paul

Peter Olcott · Aug 31, 2008

Paul said:
For a computer case, it is all a matter of money. Proper
server cases, can be in the $1000+ range, but with a
server
case, you can always find something big enough to house
the
whole thing.

This is an example of a consumer style box, with room
for 19 drives. To fit the 3.5" drives in the 5.25" bays,
you'll need some adapter kits. So this might be a good
case
for a 16 drive configuration.

http://www.newegg.com/Product/Product.aspx?Item=N82E16811112189

These U shaped pieces of metal, allow a 3.5" drive to be
suspended
in a 5.25" space. You still use the sliders that come with
the
5.25" mountings, as well as these U shaped things. At one
time,
when you bought a retail (boxed) hard drive, these things
and
four screws to install them, were included. Now, a lot of
drives
are OEM and without accessories of any kind. Sometimes you
can
get these in large quantities, and get a price break on
them.

http://www.startech.com/item/BRACKET-Metal-35-to-525-Inch-Drive-Adapter-Bracket.aspx

*******

This looked promising at first.

http://www.servercase.com/miva/miva...MC4E2-QI-XPSS(ETA+5/28)&Category_Code=4UBKBLN

Apparently made by this company, but I cannot find the
documents, or enough
pictures to understand it.

http://www.aicipc.com

Server cases are complicated, with choices as to the
backplane behind the drive bays,
position and type of motherboard (some have an air cooling
scheme intended for
certain motherboard styles). While it is pretty easy to
stumble on a
nice looking box, it is hard to find the details needed to
know if
it is worth buying. For this case, there are four models,
and XPSS
is for SATA drives.

*******

This is another example. This time I found a manual, but
it still left me confused.

Here is a Supermicro case - CSE-846TQ-R900B , prices
around $1000 or so.

http://apd.lv/image/products/1088581_425_0_425NULL.jpg

http://www.boston.co.uk/products/components/chassis/4u/sc846/cse-846tq-r900b/default.aspx

User manual for chassis.

http://www.supermicro.com/manuals/chassis/tower/SC846.pdf

I cannot tell if the case is ready to work with SATA for
sure, or not.

*******

One thing I can warn you about, in your quest for a
desktop server,
is hard drive spinup current. Once a hard drive is
spinning, the
power requirements are quite reasonable. But during the
first ten
seconds, the motor accelerates the spindle up to speed,
and during
that interval, 2.5A is drawn from the 12V rail. Now,
imagine what
happens to the power supply - 24 drives times 2.5A is 60A
from the
12V rail, or 720 watts. That is quite a jolt for ordinary
power supplies. The mammoth current only flows for the
first ten
seconds, and then drops to 1/4 or less of that value.

It says that it comes with a 900 Watt power supply, so I get
that this is not an issue.

Peter Olcott · Aug 31, 2008

Paul said:
You know, I'm having second thoughts about that P5E64 WS
Evolution
board. I've checked the reviews, and it seems to be a
wonderful board
for the overclocking community. They're hitting more than
500MHz on the CPU input clock, for FSB2000 at the
processor.
500MHz used to be about the limit, but that board has
gone past it.

What is wrong with the board, is I cannot get the number
of
PCI Express lanes to balance properly. In other words,
that "x16, x16, x4, x4" line is wrong, and I cannot figure
out a way to correct it, based on what I know about the
hardware. (I.e. Can all those bandwidths be used
simultaneously ?
I don't think so.)

The X48 has two x16 PCI Express interfaces. I don't see
any
mention of a "bifurcate" feature in the datasheet, like
there
is on the 975X chipset. It doesn't appear that the x16
lane
groups can be split into pieces. The Southbridge has a
total
of six PCI Express x1 lanes, and sometimes a motherboard
maker
will group four of them and make an x4 slot with them. So
there
are a total of 38 lanes available, and limits as to how
they can be
split.

Now, a simply horrid idea, is for Asus to do this.

North ------ PCI Express x16
Beidge ------ PCI Express x16
|
| DMI (only x4 bandwidth on this bus)
|
South ------ ("x4") --- PEX8518 ---- x4
Bridge ---- x4
/ \ ---- x4
x1 x1
88SE6145 88E8056
SATA/PATA GbE
? ?

Why does this stink ? Because now the three x4 ports are
*sharing*
a total of x4 bandwidth or 1GB/sec. That means if you were
to stick
three RAID cards in there, the total bandwidth would be
limited
to 1GB/sec. And any nuisance bandwidth usage on the
Southbridge,
also comes out of that 1GB bandwidth (i.e. DMI bus x4
bandwidth
is used to supply x6 lanes on the Southbridge plus other
busses).
I hope that isn't how they're doing it. The PEX8518 is
here, if
you want a look at the info.

http://www.plxtech.com/pdf/product_briefs/ProductBrief_PEX8518.pdf

If all you want to do, is use the two x16 slots to their
fullest
potential, then I think the P5E64 WS Evolution would be
great.
But I now have my suspicions that the other slots are
comparative
garbage. And so far, no review has attempted to test, or
do any
real analysis of the wiring.

This article hints that the lane numbers don't match, but
they did
it in a superficial way. Since I don't see any evidence in
the X48
datasheet, that the x16 groups can be split, that is why I
have to
conclude that the above diagram is how they're doing it.

http://news.softpedia.com/news/Asus...CI-Express-Lanes-Don-039-t-Add-Up-80705.shtml

This is an alternative. The Intel Skulltrail, which uses
dual Xeon
LGA771 processors. It has four large PCI Express slots.

http://www.newegg.com/Product/ProductReview.aspx?Item=N82E16813121330

http://download.intel.com/support/motherboards/desktop/d5400xs/sb/e30088001us.pdf

There is a block diagram here. The Nforce100 PCI Express
switch chips
are entirely unnecessary from a hardware perspective. They
were put
there to allow Nvidia SLI and associated drivers to work.
As far as
I know, the 5400 could have split its interfaces to make
(4) x8 lane
interfaces. Using the Nforce100 means better sharing of
the port, if
one x16 needs more bandwidth than the other, so that is
good. But
the disadvantage is you're paying for some extra heat on
the motherboard,
because those two chips are there. Considering the rest of
your box and
its power consumption, I expect this doesn't matter that
much. This
thing uses FBDIMMs, and as long as you don't go with the
high density
ones, they are now only moderately expensive. The board is
EATX, and
so you should check the dimensions before buying a
computer case for
this. Check the Newegg reviews or other reviews, to see
what kind of
computer case they're using. And what aftermarket coolers
work with it.

http://www.xbitlabs.com/images/cpu/intel-skulltrail/sheme.png

http://www.xbitlabs.com/articles/cpu/display/intel-skulltrail.html

Asus has their own 5400 based board, the Z7S WS, but the
slot
configuration is even less interesting. There are two full
x16 slots and a slot with x8 bandwidth. And then you'd
have to
go through the whole analysis thing again, to figure out
where
are the lanes are going Why do I suddenly feel
sleepy...

http://www.newegg.com/Product/Product.aspx?Item=N82E16813131272

HTH,
Paul

So basically no definitive conclusions about the feasibility
of the motherboard aspect of the proposal? In the ideal case
there would be a large set of identical slots that each have
independent bandwidth. In one article that I read they
suggested that each drive have its own slot and thus
controller card. In this case each slot would not have to be
very fast, 200 MB per second would be plenty. With 24 drives
each with 100 MB/s sustained throughput, there would be
enough extra speed to provide the minimum 1.6 GB/s for the
whole drive.