Another AMD supercomputer, 13,000 quad-core

Del Cecchi · Nov 17, 2006

Robert Myers said:
Yes, if you want to do FFT's, or, indeed, any kind of non-local
differencing.

That's the problem with the architecture and why I howled so loudly
when it came out. Naturally, I was ridiculed by people whose entire
knowledge of computer architecture is nearest neighbor clusters.

Someone in New Mexico (LANL or Sandia, I don't want to dredge up the
presentation again) understands the numbers as well as I do. The
bisection bandwidth is a problem for a place like NCAR, which uses
pseudospectral techniques, as do most global atmospheric simulations.
The projected efficiency of Red Storm for FFT's was 25%. The
efficiency of Japan's Earth Simulator is at least several times that
for FFT's. No big deal. It was designed for Geophysical simulations,
Blue Gene at Livermore was bought to produce the plots the Lab needed
to justify its own existence (and not to do science). As you have
correctly inferred, the more processors you hang off the
nearest-neighbor network, the worse the situation becomes.
Unless you increase the aggregate bandwidth, you reach a point of
diminishing returns. The special nature of Linpack has allowed
unimaginative bureacrats to make a career out of buying and touting
very limited machines that are the very opposite of being scalable.
"Scalability" does not mean more processors or real estate. It means
the ability to use the millionth processor as effectively as you use
the 65th. Genuine scalability is hard, which is why no one is really
bothering with it.

The problems aren't as special as you think. In fact, the glaring
problem that I've pointed out with machines that rely on local
differencing isn't agenda or marketing driven, it's an unavoidable
mathematical fact. As things stand now, we will have ever more
transistors chuffing away on generating ever-less reliable results.

The problem is this: if you use a sufficiently low-order differencing
scheme, you can do most of the problems of mathematical physics on a
box like Blue Gene. Low order schemes are easy to code, undemanding
with regard to non-local bandwidth, and usually much more stable than
very high-order schemes. If you want to figure out how to place an
air-conditioner, they're just fine. If you're trying to do physics,
the plots you produce will be plausible and beautiful, but very often
wrong.

There is an out that, in fairness, I should mention. If you have
processors to burn, you can always overresolve the problem to the point
where the renormalization problem I've mentioned, while still there,
becomes unimportant. Early results by the biggest ego in the field at
the time suggested that it takes about ten times the resolution to do
fluid mechanics with local differencing as accurately as you can do it
with a pseudospectral scheme. In 3-D, that's a thousand times more
processors. For fair comparison, the number of processors in Livermore
box would be divided by 1000 to get equivalent performance to a box
that could do a decent FFT.

Should be posting to comp.arch so people there can switch from being
experts on computer architecture to being experts on numerical analysis
and mathematical physics.

Robert.

If I recall red storm correctly, it was a hypercube so had same problem
as blue gene.

Robert Myers · Nov 17, 2006

Del said:
If I recall red storm correctly, it was a hypercube so had same problem
as blue gene.

Indeed, it did. I didn't wear my pencil down to a nub doing
calculations, but, in general, the strategy of Blue Gene heads in the
wrong direction by using less-powerful processors. If network
connections is the limiting factor, you want to get the most bang for
every connection, and Blue Gene plainly doesn't do that. It's not a
big effect, but it's why Blue Gene tore it as far as I was concerned.

I was particularly annoyed by the claims of scalability, meaning we can
wire as many of these suckers together as you've got money.

The network processors of Blue Gene and the power and space efficiency
represent genuine engineering accomplishments of which IBM should be
proud, but Blue Gene is not the general purpose scientific workhorse it
is claimed to be.

I've corresponded with computational scientists who are very happy with
hypercube computers. If you through enough bandwidth at each link,
even the performance on FFT's doesn't have to be pathetic.

It's been a while since I've reviewed the situation, but last time I
paid attention, LBL was putting together a very respectable machine
that looked to have a more reasonable balance of capabilities. For all
I know, IBM is building it or already has built it. It just won't make
it to the top of the Linpack chart.

Robert.

Roadrunner Supercomputer using 12,960 CELL Processors Hits 1 PetaFlop(1000 TeraFlops) of double-prec	27	Jun 9, 2008
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance	1	Sep 7, 2006
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance	27	Sep 6, 2006
Linux: It doesn't get any faster	17	Jun 25, 2009
All these Companies use Linux	14	Jul 7, 2017
AMD planning 45nm 12-Core 'Istanbul' Processor ?	99	Apr 24, 2008
Amd-Intel	1	Jun 27, 2005
First Picture of a Cell Processor - Smaller Than a Pushpin, More Powerful Than a PC	98	Feb 7, 2005

Another AMD supercomputer, 13,000 quad-core

Del Cecchi

Robert Myers

Ask a Question

Similar Threads