http://arstechnica.com/articles/paedia/cpu/cell-2.ars
Introducing the IBM/Sony/Toshiba Cell Processor -- Part II: The Cell
Architecture
By Jon "Hannibal" Stokes
In today's session, IBM introduced the overall architecture of the Cell
processor. Unfortunately, they didn't include many more microarchitectural
details in today's session than they did in yesterday's. Most of the session
covered issues like power management, clocking, the design process, and so
on. So today's article is going to be more along the lines of a follow-up to
yesterday's piece. I'll fill in the new information that I've picked up, as
well as clarifying leftover questions from yesterday.
The Cell's basic architecture
The basic architecture of the Cell is described by IBM as a "system on a
chip" (SoC) design. This is a perfectly good characterization, but I'd take
it even further and call Cell a "network on a chip." As I described
yesterday, the Cell's eight SPUs are essentially full-blown vector
"computers," insofar as they are fairly simple CPUs with their own local
storage.
These small vector computers are connected to each other and to the 512KB L2
cache via a element interface bus (EIB) that consists of four sixteen-byte
data rings with 64-bit tags. This bus can transfer 96 bytes/cycle, and can
handle over 100 outstanding requests.
The individual SPEs can use this bus to communicate with each other, and
this includes the transfer of data in between SPEs acting as peers on the
network. The SPEs also communicate with the L2 cache, with main memory (via
the MIC), and with the rest of the system (via the BIC). The onboard memory
interface controller (MIC) supports the new Rambus XDR memory standard, and
the BIC (which I think stands for "bus interface controller" but I'm not
100% sure) has a coherent interface for SMP and a non-coherent interface for
I/O.
Unfortunately, today's session was severly lacking in information on the
64-bit PPC core that handles the Cell's general-purpose computing chores. We
do know that this core has a VMX/Altivec unit, at least one FPU, and
supports simultaneous multithreading (SMT). It's also in-order issue, like
the SPUs. So it appears that this core also lacks an instruction window,
presumably for the same reasons that the SPUs do (i.e. to save on die space
and cut down on control logic.) I have in my notes that the core is
two-issue, like the SPUs, but I can't find this corroborated anywhere else.
So it's possible that the core only issues two instructions per cycle peak,
i.e. one from each currently-running thread. I'd imagine that if this is the
case, this core's pipeline is very short. This would fit with the SPUs, in
which the pipeline was also kept short and simple.
The entire Cell is produced on a 90nm SOI process with 8 layers of copper
interconnect. The Cell sports 234 million transistors, and its die size is
221mm2. (This is roughly the size of the Emotion Engine at its
introduction.) The PPC core's 32KB L1 cache is connected to the system L2
cache via a bus that can transfer 32 bytes/cycle between the two caches.
The Cell and Apple
Finally, before signing off, I should clarify my earlier remarks to the
effect that I don't think that Apple will use this CPU. I originally based
this assessment on the fact that I knew that the SPUs would not use
VMX/Altivec. However, the PPC core does have a VMX unit. Nonetheless, I
expect this VMX to be very simple, and roughly comparable to the Altivec
unit o the first G4. Everything on this processor is stripped down to the
bare minimum, so don't expect a ton of VMX performance out of it, and
definitely not anything comparable to the G5. Furthermore, any Altivec code
written for the new G4 or G5 would have to be completely reoptimized due to
inorder nature of the PPC core's issue.
So the short answer is, Apple's use of this chip is within the realm of
concievability, but it's extremely unlikely in the short- and medium-term.
Apple is just too heavily invested in Altivec, and this processor is going
to be a relative weakling in that department. Sure, it'll pack a major SIMD
punch, but that will not be a double-precision Alitvec-type punch.