What is the best hardware setup (motherboard, processor, hard drives,
etc.) for a dedicated single process that is mostly compute-bound? It
appears that most of the high performance systems are targeted at
gamers and concentrate on providing fast graphic capabilities. I don't
need fast graphics, just lots of computational power. My app also eats
RAM, so it is a given that I need 4 Gig of RAM to limit swapping out to
disk. And fast drives. SATA? Mostly I would like to know what
processor and motherboard to get.
Go here, and click one of the two "Dump all records" links.
The CSV (comma separated) format will take less download time,
and when I saved the file, it was 2MB for that one. I load
the CSV file into Excel, to make a compact display on the screen.
I can also sort the results in descending order, to make it
easy to see the winners.
http://www.spec.org/cgi-bin/osgresults?conf=all
Why did I select this benchmark ? Because I have absolutely no
idea what your application is like.
Now, the results are quite interesting, and because vendors push
their own stuff, all manner of corners could have been cut. I'm
listing the single cores here, on the assumption your application
would not benefit from multiple cores.
CINT2000
ASUS A8N-SLI Deluxe, AMD Athlon (TM) 64 FX-57 1970
TYAN Tomcat K8E (S2865), AMD Opteron (TM) 154 1956
Fujitsu CELSIUS H230, Intel Pentium M 780 (2.26 GHz) 1839
Acer Altos G320 (3.8 GHz/FSB800 Intel Pentium 4 670) 1834
Intel D955XBK ( 3.73 GHz/FSB1066 P4 Extreme Edition) 1830
Now, how is that for shocking ? The Pentium M, a 35W processor,
beats the fire-breathing top of the line P4 3.8GHz, and the
P4 EE 3.73 with FSB1066.
Now, you can write a program that will crush both the Pentium M
and Athlon 64 processors, and make the P4 processors the winner.
So some other benchmark might really "crush the competition".
That is the trick with the Athlon 64 and the Pentium M - they
aren't good at everything. If I had to gamble (buy without testing,
sight unseen), I'd have to buy a Pentium4, but if you have the time
for characterization and testing, anything is possible.
Returning to the real world for a moment, you would be better
off finding say an ordinary P4 computer, a mere mortal
Athlon64 based machine, and running your program. By overclocking
a trivial amount (like 5%, which is bound to work without a
problem), you should be able to collect enough "trend" information,
to make a performance extrapolation. You can also play with memory
timings, like change from CAS3 to CAS2.5, or in the case of the
Athlon64, maybe you could select a lower ratio (async) relationship
between memory and CPU clock signal, and see the impact that memory
bandwidth has on the two platforms. Memtest86+ can be used to
measure the memory bandwidth that is achieved by modifying the
settings for the memory, and by benchmarking your program with
a constant CPU core clock, and different memory bandwidths,
you'll get some idea of whether improved memory bandwidth on
a given platform will help.
In terms of memory, the Intel 955 desktop chipset can have a
maximum of 8GB of memory. That will require finding some
decent 2GB DDR2-533 or faster memory modules. The P5WD2 Premium
is a board with the 955 chipset on it. (The BIOS should have
a function in it, to make a hole just below the 4GB mark, for
the bus I/O address space. But I don't remember the name
of the setting right off hand.)
The A8N-SLI Deluxe, or the Premium, are Athlon64 boards.
You are more likely to be populating one of those with
4GB of memory (4x1GB), as I don't know if there are
any decent unbuffered 2GB DDR modules around. The manual only
mentions 1GB modules, and it is hard to say if the BIOS would
do the right thing if a 2GB module was presented to it. The BIOS
has a memory hoisting function, that allows more of the
full 4GB to be used (you won't see the BIOS setting in
the downloadable manual, as it was added at about the fifth
revision of the BIOS).
(This AMD doc is one source of intricate detail on their processors.)
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF
In either case, you will need something more sophisticated
than WinXP, to get the use from the memory. A server
type OS might make more of the 4GB available.
Another observation. AMD Opteron boards that have multiple
processors, have memory controllers on each processor. Even
if your application will only run on one processor, a dual
processor board, like a Tyan K8WE, means two processors can
host twice as much memory for use by one of the processors.
Thus, all eight sticks shown in this picture, are accessable
by one processor.
http://www.linuxhardware.org/images/articles/nfpro-012605/K8WE.jpg
If you purchase four processors, and a board like this, then
one processor can access all 16 DIMM slots.
http://www.tyan.com/products/html/thunderk8qw.html
The only limit then, is what is the upper addressable limit
for the Opteron. The Tyan link above states that up to
64GB registered memory can be used on that board.
Pg.17 here shows how "coherent" HyperTransport links, allow
memory to be seen by all processors on a multiprocessor board.
Each processor has up to three busses, for interconnect to I/O
or to other processors. The more busses the processor ships
with, the more expensive it gets. Thus, getting the right
model number of Opteron, is important when building up
a multiprocessor Opteron motherboard.
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/OpteronCustomerPresentation.pdf
Have fun,
Paul