Dempsey vs Opty 280 vs Paxville

  • Thread starter Thread starter Rob Stow
  • Start date Start date
Not so much dumb as HUGELY dated. The benchmark was first written in
the late 1970's and hasn't really changed any in the past 20 years. I
doubt that it's used much in the real-world of HPC stuff, there are
better algorithms out there today. Here's a quick overview of it:

http://www.top500.org/lists/linpack.php

Yeah I'd seen that before... and it's not Linpack that's dumb in itself,
rather the use of it for evaluation of CPU system competence. The trouble
seems to be that it was originally intended for relatively small matrices:
100x100 and 1000x1000 are mentioned, which are both fractions of the sizes
being applied here.
As you can see above, Linpack isn't really designed to be the fastest
way to solve the problem, but rather a standard way of comparing MANY
different computer architectures.

But it's only one - reminds me of academics who would propose turning off
the cache(s) to evaluate the "efficiency" of their algorithm... because the
cache was "interfering" with that measure.
That standard was also largely
chosen a LONG time ago. It has it's uses and can provide a reasonable
guess as to how good a system will be at solving matrices, but it's
definitely not going to give you an exact indication of how your
system will perform on real-world code, even if that code is linear
algebra.

My own interest is in sparse matrices so it's not a good "guess" for me.:-)
Not only does it not stress memory much, it also doesn't stress
internode communication much either and yet it is used to determine
what the "fastest supercomputer" in the world is.

If it "doesn't touch the cache much" and you're moving huge amounts of data
in and out between memory & registers, it *has* to be stressing memory;
even where the access patterns can't always be arranged to benefit from
long sequential, contiguous address bursts... you may not be wringing the
maximum bandwidth from the memory channel but you're still stressing memory
with page switching from pseudo-random accesses.
Linpack does have it's uses, but it's hardly the end-all, be-all of
benchmarks.


Prefetch everything into your L1 data cache in blocks and run through
that entire block before moving on to the next chunk.

It's not clear what Linpack allows officially in that respect, if you mean
software manipulation over and above hardware prefetch. So what you really
meant above was "doesn't touch the (*L2*) cache much".;-)... and it depends
on what you mean by "touch" - throw more cache at it and you increase the
size of the last part of the elimination step that's going to fly - with
4MB L2 that'd be very approximately, the last 500-1000 rows.
Extremely suspicious given that the results they achieved themselves
on the same test (for both Paxville and Opteron, but especially
Opteron) are *SIGNIFICANTLY* lower than published results using the
same chips, OS and same compiler. When you compare the numbers that
AMD were able to achieve vs. what Intel achieved on the SPEC scores it
tells a VERY different picture with the AMD chips coming out on top.

One does wonder what "AMDs Opteron processors with SSE3 co-operate with the
Intel compiled Linpack version likewise problem-free" means?... err,
"problem free"??... and would AMD agree with such a statement???:-) BTW I
don't see a mention of the specific compiler used in the Google translation
other than that hint.
 
Back
Top