D
Del Cecchi
If I recall red storm correctly, it was a hypercube so had same problemRobert Myers said:Yes, if you want to do FFT's, or, indeed, any kind of non-local
differencing.
That's the problem with the architecture and why I howled so loudly
when it came out. Naturally, I was ridiculed by people whose entire
knowledge of computer architecture is nearest neighbor clusters.
Someone in New Mexico (LANL or Sandia, I don't want to dredge up the
presentation again) understands the numbers as well as I do. The
bisection bandwidth is a problem for a place like NCAR, which uses
pseudospectral techniques, as do most global atmospheric simulations.
The projected efficiency of Red Storm for FFT's was 25%. The
efficiency of Japan's Earth Simulator is at least several times that
for FFT's. No big deal. It was designed for Geophysical simulations,
Blue Gene at Livermore was bought to produce the plots the Lab needed
to justify its own existence (and not to do science). As you have
correctly inferred, the more processors you hang off the
nearest-neighbor network, the worse the situation becomes.
Unless you increase the aggregate bandwidth, you reach a point of
diminishing returns. The special nature of Linpack has allowed
unimaginative bureacrats to make a career out of buying and touting
very limited machines that are the very opposite of being scalable.
"Scalability" does not mean more processors or real estate. It means
the ability to use the millionth processor as effectively as you use
the 65th. Genuine scalability is hard, which is why no one is really
bothering with it.
The problems aren't as special as you think. In fact, the glaring
problem that I've pointed out with machines that rely on local
differencing isn't agenda or marketing driven, it's an unavoidable
mathematical fact. As things stand now, we will have ever more
transistors chuffing away on generating ever-less reliable results.
The problem is this: if you use a sufficiently low-order differencing
scheme, you can do most of the problems of mathematical physics on a
box like Blue Gene. Low order schemes are easy to code, undemanding
with regard to non-local bandwidth, and usually much more stable than
very high-order schemes. If you want to figure out how to place an
air-conditioner, they're just fine. If you're trying to do physics,
the plots you produce will be plausible and beautiful, but very often
wrong.
There is an out that, in fairness, I should mention. If you have
processors to burn, you can always overresolve the problem to the point
where the renormalization problem I've mentioned, while still there,
becomes unimportant. Early results by the biggest ego in the field at
the time suggested that it takes about ten times the resolution to do
fluid mechanics with local differencing as accurately as you can do it
with a pseudospectral scheme. In 3-D, that's a thousand times more
processors. For fair comparison, the number of processors in Livermore
box would be divided by 1000 to get equivalent performance to a box
that could do a decent FFT.
Should be posting to comp.arch so people there can switch from being
experts on computer architecture to being experts on numerical analysis
and mathematical physics.
Robert.
as blue gene.