Rob said:
I'm well aware of that. However, that conveys an advantage
that typically lets an Opty dualie beat out a Xeon dualie that
has a 50% higher cpu clock. This particular benchmark had
a single 2.2 GHz Opty beating - by a huge margin - a 3.2 GHz
Xeon dualie. I suspect the result was reported incorrectly -
it probably should have been a dualie vs dualie result.
Why not? A dual Xeon has a single shared bus to the chipset. If you run two
memory-latency dependent programs on both Xeons, they'll go through the
same bottleneck; typically, you expect the same total throughput as with a
single Xeon running just one program. Now, on the Opty (nice nick ;-), you
have half the latency, and no shared bus, so a single Opty should get
double performance, and a double Opty should get four times (almost;
there's the round trip from the cache coherency).
We've got an Athlon 64 recently, and tried some benchmarks. With my own CPU
intensive microbenchmarks, the Athlon 64 is clock-by-clock as fast as the
old Athlon; nothing gained. However, with our applications (EDA CAD, e.g.
synthesis), there's a factor two difference. The most stunning experience
however is KDE 3.1. It's really fast on the Athlon 64, you barely notice
program startup time (it feels definitely faster than KDE 3.2 on an Athlon
XP, though the KDE people did tune a lot there). KDE program starting
definitely is a memory intensive job, latency bound (linking lots of shared
C++ libraries together). Also, starting up Cadence design framework
(exactly the same workload) was a lot faster than anywhere else.
I think the latency problem is a real one. You won't see it on SPEC, since
really very few SPEC programs are memory latency bound (and if they are,
people will hack the compiler to remove that). Real bloatware (and we have
to use real bloatware everyday, unfortunately) however is.