Yousuf said:
SGI Altix systems again. A cluster of 20 512-way Itanium2 Altixes connected
together with NUMAlink. There's also some talk of SGI re-architecting the
cluster so that it becomes 2048-way Altixes rather than 512-way Altixes.
http://www.theinquirer.net/?article=17491
Oh, BTW, it looks like NUMAlink is only used within individual Altixes. The
individual Altixes connect to each other via Infiniband.
http://news.zdnet.co.uk/hardware/0,39020351,39161899,00.htm
I'm not sure where the insightful discussion of what's really going on
here is taking place.
http://www.fcw.com/fcw/articles/2004/0726/web-nasa-07-28-04.asp
<quote>
"NASA has a long history in supercomputing dating back to the origins of
computational fluid dynamics in the 1980s," said G. Scott Hubbard,
director of the Ames Research Center. "It is exciting to join with an
industry team in this innovative venture that will change the very way
in which science and simulation are performed by providing researchers
with capabilities that, until now, they could only dream about."
</quote>
Like even the crudest meaningful estimate of the likelihood of a
catastrophic failure? To be fair, such an estimate may not be
obtainable by any existing methodology, and NASA has been performing
original field experiments in the area, albeit with the politically
explosive and ethically questionable admixture of live human subjects.
Mr. Hubbard had an opportunity to say something incisive about the real
role that computers might play and what the limitations, both real and
perceived, might be, and what, exactly, NASA is doing to make things
better, other than buying more and better boxes. Instead, he opted for
some bureaucrat blah-blah-blah that could have accompanied the roll-out
of just about anything, as long as it was a computer being used by NASA.
The stakes are high, and NASA is working under a
you'd-better-not-screw-up-again mandate. We had computers for the
previous screwups. A computer model played a direct role in the most
recent screwup. What's different now?
http://www.fcw.com/fcw/articles/2004/0726/web-nasa-07-28-04.asp
<quote>
Project Columbia, expected to give NASA's supercomputing capacity a
tenfold boost, will simulate future missions, project how humans affect
weather patterns and help design exploration vehicles.
<quote>
In other words:
Project Columbia will
1. Lead to more Top 500 press releases (more meaningless numbers).
2. Try to contribute to a methodology for failure-mode prediction that
might keep some future NASA Administrator from being hung out to dry.
3. Lead to more NASA web pages and color plot attributions on the
subject of global warming.
4. Do more detailed design work.
The one that matters is (2), which is a qualitatively different mission
from the other three. For (1), (3), and (4), more is better, almost by
definition. For (2), more is actually a trap. NASA _already_ has the
embarrassment of stupefyingly complicated failure analyses that have
proven, unfortunately only in retrospect, to have missed the critical
issues.
"Simulate future missions" means anticipating all the things that could
go wrong. That's a big space, and it's not filled with zillions of
identical grid points, all with the same state vector interacting
according to well-understood conservation laws. It's filled with
zillions of specialized widgets, each with its own specialized state
vector, and all those state vectors interact interact according to
poorly-understood ad hoc rules. I suspect that, if NASA could have put
all 10K nodes under the same system image, they would have; as it is,
the meganodes at least minimize the artificial complications due to
computational (as opposed to real-world system) boundaries.
http://www.computerworld.com/hardwaretopics/hardware/story/0,10801,94862,00.html
<quote>
The reliance of supercomputers built on clusters, commodity products
that operate in parallel, has been criticized before congressional
panels by some corporate IT executives, including officials at Ford
Motor Co. They see the use of clusters as a setback for supercomputing
because the hardware doesn't push technological advances (see story).
[
http://www.computerworld.com/hardwaretopics/hardware/story/0,10801,94607,00.html]
SGI officials said that criticism doesn't apply to their system, which
uses a single image of the Linux operating system in each of its large
nodes and has shared memory.
David Parry, senior vice president of SGI's server platform group, said
the system is "intended to solve very large complex problems that
require the sharing of capabilities of all of those processors on a
single problem at the same time."
</quote>
The criticism doesn't apply to their system? Say what? A single system
image sounds nice for a gigantic blob of a problem with so many unique
pieces that just getting the crudest screwups out of the simulation
(never mind how well it corresponds to reality) sounds like a major
challenge. Make the problem big enough, though, and a shuttle mission
is about as big as it gets in terms of what people are actually going to
try, and you still have to cluster.
RM