If it's remotely accurate, then yes. Unfortunately they really didn't
provide any context for this. Is this in comparison to the previous
IBM chipset? And is this just straight latency to memory for a single
chip on a single access or some sort of average? If it's just
straight latency than the original 265ns number was pretty weak to
begin with, Intel's latest desktop chipsets are down under 100ns and
their servers should be somewhere around 130-150ns (though I haven't
seen many tests for the latter).
Since the quote didn't provide nearly enough information to interpret
the latency claims as absolute numbers, I was careful to characterize
it as a measure of relative latency. I'm assuming that IBM would have
the integrity to do an apples-to-apples comparison with their own
hardware, no matter what the absolute numbers may mean. Were the
project manager in marketing, he might have been shrewd enough to say
that they shaved over a hundred nanoseconds off the chipset latency.
I'd be reluctant to say that IBM server chipsets had high latency for
server chipsets based on that soundbyte. You can do as you please,
but see another comparison to a previous Summit generation below.
64-bit support should offer about a 10% improvement all on it's own.
Combine that with a 20% increase in clock speed and a 66% faster
system bus... Also, if I understand the whole "dual bus" idea
properly (ie 2 buses with 2 CPUs connected to each one in a 4P system
vs. 4 processors on a single bus in a traditional Xeon system) I think
this could make up for a lot of the difference as well. This is
exactly how Intel's new E8500 chipset's "dual bus" design operates as
well.
I'd be surprised to learn that server applications are driving
frontside bus bandwidth requirements. One of the reasons you can get
away with hanging so much hardware off a frontside bus in server
applications is that server CPU's spend so much of their time stalled
for memory--a latency, not a bandwidth, problem. Predictable,
computationally-intensive calculations are typically the most
demanding of bandwidth.
A nice chipset to be sure, but I think people are singing the praises
a bit too much and too soon. My guess is that it's only going to end
up being no more than 5% faster than Intel's new E8500 chipset. Sure,
it's a hell of a lot faster than the previous generation of
chipset/processor combination, but it's how this compares to current
chipset/processor combos that matters. IBM just happened to be first
out the door with benchmarks this time around, but I expect others to
follow suit soon enough.
IBM may have taken a lesson from HP:
http://www.lostcircuits.com/tradeshow/idf_2002/4.shtml
<quote>
The server market with its higher longevity of equipment was hurt even
worse than the desktop market, in addition, the platforms available
for the IPF were designed for future scalability and expandability and
somewhat missed the current economic requirements. Examples are the
i870 and the IBM EXA (Summit) chipsets geared towards the very
high-end and comparable with 80,000 lbs trucks. To drop a bomb into
this scenario, Hewlett Packard showcased their zx1, comparable with a
high performance street bike to outrun the competition before they
even know what hit them.
The concept is fairly simple. Take the IPF 64 bit architecture, pare
it free of all excessive fat and provide a platform suitable for both
IA64 as well as for the IPF-compatible PA-RISC processor line.
Features trimmed off comprise the 32 MB L4 cache (IBM EXA), Memory
Mirroring to ensure hot-swapping of DIMMs and x-way scalability. The
result is an up to 4-way scalable platform with enhanced ECC or rather
memory protection to allow Chip Kill. Heart of the chipset is the zx1
Memory & I/O controller featuring eight I/O links to PCI and PCI-X as
well as AGP-4X (to be upgraded to AGP-8X). On the other side, the zx1
controller offers links to no less than 12 memory expander chips
capable of handling up to 64 DIMMs for 128 GB of system memory. Memory
bandwidth scales from 8.5 GB/s in direct-attached designs (without the
optional expanders) to 12.8 GB/s using the expander chips that further
act like registers to decrease the signal load on the memory bus.
This is, however, not the key advantage of the zx1. Because of the
high complexity and scalability, the i870 and EAX chipset are
relatively slow. That is, in addition to the 32 ns latency intrinsic
to McKinley for each memory access, the arbitration within the complex
maze of superscalable interconnections cause another roughly 270 ns
latency until the requested data get back to the processor, so we are
talking about a total of 300 ns access time for a memory request. The
zx1 on the other hand manages to do the quarter mile in 11.2 seconds,
er, make that 112 ns for the memory access latency which is almost 3
times as fast (in direct-attached configurations). Adding the expander
chips costs another 25 ns but compensates with higher bandwidth and
the zx1 is still about twice as fast as the competition.
</quote>
That may also help to put the stated latencies into some perspective
(previous generation Summit compared to zx1 in almost the same way).
Notice the disappearance of the L4 cache (and X3 does away with L3, as
well). A three-year program from IBM? The timing is just about
right.
Intel can design a chip that will come close in performance? I'm sure
they can. Will they? Intel's track record on chipsets has been
spotty (to be charitable, at that).
The only real problem left in computation is getting the data where
you want it when you need it. The parts that do the computing are
almost afterthoughts compared to the machinery dedicated to getting
instructions and data to arrive on time and coping with what happens
when they don't. It's about time the memory subsystem got more
attention, and I hope this isn't the end of it.
.... Of course, you could rid yourself of most of these problems
entirely by changing the whole computing paradigm, but that's for
another thread.
RM