Fast, Faster and IBM's PlayStation 3 Processor

R420 · Jun 19, 2004

http://www.linuxinsider.com/story/34548.html

Fast, Faster and IBM's PlayStation 3 Processor

By Paul Murphy
LinuxInsider
06/17/04 6:38 AM PT

In practice, Apple has never succeeded in getting the bulk of its
developers to make effective use of the Altivec, and Sun has had
essentially no success getting people outside the military and
intelligence communities to use the four-way SIMD capabilities built
into its Sparc processors.

Three years ago, IBM (NYSE: IBM) , Sony (NYSE: SNE) and Toshiba
announced a partnership aimed at developing a new processor for use in
digital entertainment devices like the PlayStation. Since then, the
product has seen a billion dollars in development work. Two fabs, one
in Tokyo and one in Fishkills, New York, have been custom-built to
make the new processor in large volumes. On May 12th, IBM announced
that the first commercial workstations based on this processor would
become available to game-industry developers late this year.
A lot is known about this processor as planned, but relatively little
real information about the product as built has yet leaked. To the
extent that performance information has become available, it is
characterized by numbers so high that most people simply dismissed the
reports. In November of last year, for example, a senior Sony
executive told an internal audience that implementations would scale
from uniprocessors to 64-way groupings that would deliver in excess of
two teraflops -- making it more than 10 times faster than Xeon.

Most of what we know about this machine comes from U.S. patent
#6,526,491 as issued to Sony in February 2003 for a "memory protection
system and method for computer architecture for broadband networks."

Here's the abstract:

A computer architecture and programming model for high speed
processing over broadband networks are provided. The architecture
employs a consistent modular structure, a common computing module and
uniform software cells. The common computing module includes a control
processor, a plurality of processing units, a plurality of local
memories from which the processing units process programs, a direct
memory access controller and a shared main memory.
A synchronized system and method for the coordinated reading and
writing of data to and from the shared main memory by the processing
units also are provided. A hardware sandbox structure is provided for
security against the corruption of data among the programs being
processed by the processing units. The uniform software cells contain
both data and applications and are structured for processing by any of
the processors of the network. Each software cell is uniquely
identified on the network. A system and method for creating a
dedicated pipeline for processing streaming data also are provided.

The machine is widely referred to as a cell processor, but the cells
involved are software, not hardware. Thus a cell is a kind of TCP
packet on steroids, containing both data and instructions and linked
back to the task of which it forms part via unique identifiers that
facilitate results assembly just as the TCP sequence number does.

Outrageous Performance Claims

The basic processor itself appears to be a PowerPC derivative with
high-speed built-in local communications, high-speed access to local
memory, and up to eight attached processing units broadly akin to the
Altivec short array processor used by Apple (Nasdaq: AAPL) . The
actual product consists of one to eight of these on a chip -- a true
grid-on-a-chip approach in which a four-way assembly can, when fully
populated, consist of four core CPUs, 32 attached processing units and
512 MB of local memory.
The per-cycle performance of the core CPU is undocumented but may be
expected to be comparable to other PowerPC machines running at high
cache hit rates. Specifications for the four or eight attached
processors comprising the array are known; these are expected to turn
in one floating point operation per cycle or around 32 Gigaflops for
the fully populated array at a nominal 4 GHz.

That's where the apparently outrageous performance claims come from; a
four-way assembly running at a planned 4 GHz offers 32 x 4 = 128
Gigaflops in potential floating-point execution. A 64-way supergrid
made by stacking eight eight-way assemblies would have a total of 512
attached processors and could, therefore, break 2 teraflops if data
transportation kept up with the processors.
In practice, however, Apple has never succeeded in getting the bulk of
its developers to make effective use of the Altivec, and Sun has had
essentially no success getting people outside the military and
intelligence communities to use the four-way SIMD capabilities built
into its Sparc processors. Grid computing is slowly entering the
commercial mainstream, but combining both local-array access with grid
computing requires a significant shift in programming paradigm that
will not appeal to the mainstream Wintel and IBM customer base.

Gains Outweigh the Pain

For games developers, however, the potential gains -- up to 50 times
the best x86-based processor and graphics board combinations can
deliver -- should outweigh the pain. Even minor software change, the
kind of thing Adobe does to take advantage of the Altivec in
Photoshop, should offer significant advantages to a wider programming
community and enable floating-point-intensive applications to run a
full order of magnitude more quickly on this machine than on Intel's
(Nasdaq: INTC) best.

An important point to bear in mind is that this processor will be
inexpensive, and systems built around it even less expensive because
no external graphics or network boards will be needed. Both Sony and
IBM have been building fabs specifically to make this device. Volumes
will be high because Sony will use up to 20 million assemblies in the
PlayStation, while 10 million or more that don't quite make the
quality cut will get used in its digital televisions and other
products.
Very little has been publicly revealed about the operating system for
this thing, but it is quite obvious what it has to be and how it has
to work. Each core will have its own local Unix kernel, with most just
executing cells as they arrive from the dispatch manager and one
managing the traffic-coordination hardware. In all likelihood, the
kernel used will prove to be both Linux-derived and Linux-compatible
-- meaning that most Linux software will run out of the box on the
uniprocessor configuration while software adapted for the grid
environment will run unchanged on everything from the uniprocessor to
configurations with hundreds or even thousands of processor
assemblies.

As users of Sun's open-source grid software have found, performance
losses on single processes increase as you add processors because data
flow and timing control issues increase in complexity nonlinearly with
system growth. Fundamentally, what happens is that the larger you make
the total machine, whether on one piece of silicon or in a rack, the
more cell transit time dominates execution time and the greater the
performance cost imposed by the need to coordinate operations.

New Generation of Linux PCs

The patent mentions the use of no-ops (processor nulls) inserted into
cells to get around timing problems associated with having components
run at different speeds -- with processor coordination initially
enforced by setting TTL-like time budgets for cell execution. My
guess, however, is that advances in cell isolation and programming for
asynchronous event handling have since obsolesced those solutions.
I expect, therefore, that when the real thing appears, it will fully
support both the traditional grid format for on-chip work and an
asynchronous hypergrid for multi-assembly processes on the model
Thinking Machines hoped to achieve with the transputer-based hypercube
in 1985 -- and that NSA is rumored to actually have built on 1989's
Sparc-SIMD-based CM-5.

Either way, however, the OS for this machine is likely to offer both
Linux compatibility at the low end and enormous scalability for those
willing to modify their software -- which is why, as I discuss in next
week's column, I expect IBM and Toshiba soon to launch a new
generation of Linux PCs built around the combination of this CPU with
IBM software products like Lotus Workspace for Linux.

J.O. Aho · Jun 19, 2004

R420 said:
The basic processor itself appears to be a PowerPC derivative with
high-speed built-in local communications, high-speed access to local

Nice to see that more and more begins to use PPC, lees need of CPU-fan, good
preformance and keeps your power bills on a low level, sadly the linux support
for PPC is poor (drivers and applications are written only for x86 in some cases).

//Aho

Joe Seigh · Jun 19, 2004

R420 said:
Either way, however, the OS for this machine is likely to offer both
Linux compatibility at the low end and enormous scalability for those
willing to modify their software -- which is why, as I discuss in next
week's column, I expect IBM and Toshiba soon to launch a new
generation of Linux PCs built around the combination of this CPU with
IBM software products like Lotus Workspace for Linux.

Unless you were into HPC, why would you want a PC based on this? If it
was for games, a propietary game console would likely be cheaper.

Joe Seigh

J.O. Aho · Jun 19, 2004

Joe said:
Unless you were into HPC, why would you want a PC based on this? If it
was for games, a propietary game console would likely be cheaper.

For it generates a lot less heat, needs less cooling, even passive cooling can
be enough. CPU preformance is good, even if the MHz isn't the same as on a x86
based system that eats power like wolves in a barn.

As it's today, the x86 has come more or less to it's top preformance due heat
problems... so it's just time to use something better...

Joe Seigh · Jun 19, 2004

J.O. Aho said:
For it generates a lot less heat, needs less cooling, even passive cooling can
be enough. CPU preformance is good, even if the MHz isn't the same as on a x86
based system that eats power like wolves in a barn.

As it's today, the x86 has come more or less to it's top preformance due heat
problems... so it's just time to use something better...

That would be good. Fan noise from the cooling is a serious problem and the
air conditioning and power requirements are getting ridiculous. 1 KVA UPSs
are too big. Getting a larger UPS would be out of the question.

I was thinking from a general multiprocessing perspective. The article talks
about SIMD. Are these processors applicable to generic multithreading also?
ccNUMA is ok. That can be dealt with.

Joe Seigh

RusH · Jun 19, 2004

J.O. Aho said:
As it's today, the x86 has come more or less to it's top
preformance due heat problems...

LOL ? definitelly

so it's just time to use
something better...

yes, lets use Mormons, they come cheap and work hard !

Pozdrawiam.

=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= · Jun 21, 2004

I expect, therefore, that when the real thing appears, it will fully

support both the traditional grid format for on-chip work and an
asynchronous hypergrid for multi-assembly processes on the model
Thinking Machines hoped to achieve with the transputer-based hypercube
in 1985 -- and that NSA is rumored to actually have built on 1989's
Sparc-SIMD-based CM-5.

Thinking Machines and transputer? Sounds like bit rot on the part of the
author....

Jan

myren, lord · Jun 21, 2004

Thinking Machines and transputer? Sounds like bit rot on the part of the
author....

It does sound like its moving vaguely back in the direction of
Transputer, albiet not quite the same massive scale.

It sounds like it'll a tightly integrated hardware/software solution for
distributed processing and I/O. Thats really quite interesting. Looks
like it'll allocate resources through some sort of Serializing Tokens
system, kind of unto DragonFly BSD.

we're extrapolating a lot, but its the closest thing to interesting
Transputer fans' have seen in a long long time.

Grumble · Jun 22, 2004

R420 said:
Fast, Faster and IBM's PlayStation 3 Processor

R420,

Why do you post PS3 articles to comp.sys.ibm.pc.hardware.chips?

[ Followup-To set to comp.sys.ibm.pc.hardware.chips ]

Tony Hill · Jun 23, 2004

For it generates a lot less heat, needs less cooling, even passive cooling can
be enough. CPU preformance is good, even if the MHz isn't the same as on a x86
based system that eats power like wolves in a barn.

As it's today, the x86 has come more or less to it's top preformance due heat
problems... so it's just time to use something better...

You really don't know much of anything about CPU design do you?

There is nothign specific about x86 or PowerPC in terms of power
consumption. Some x86 chips consume a lot of power, some consume very
little (VIA has some C3 chips in the 1-5W range). When it comes to
CPU design you pretty much always get a bunch of trade-offs, two of
the most important these days being power consumption vs. performance.
Top-end x86 chips and top-end PowerPC chips consume similiar amounts
of power for similar amounts of performance.

You also need to be VERY careful when comparing power consumption
figures, often people end up comparing Apples to oranges (no pun
intended of course! :> ). Where Intel is quite good about disclosing
the power consumption figures for their processors, IBM is absolutely
abysmal unless you're paying them big-$$$ and have signed extensive
NDAs. Dozens of times I've seen someone toss around IBM's "typical"
power consumption figure (read: "Power consumption number while
running whatever the hell we feel like running to get this number")
and comparing that to the TDP or maximum power consumption figures
from AMD and Intel.

Eudes Malcor · Jun 29, 2004

"Nice to see that more and more begins to use PPC"
Is there any alternative to the use of PPC ? I mean, a realistic alternative...

Thx
Eudes

Torben Ægidius Mogensen · Jun 29, 2004

"Nice to see that more and more begins to use PPC"
Is there any alternative to the use of PPC ? I mean, a realistic
alternative...

Sure. MIPS derivatives as in PS2. Pentium derivaties as in XBox. An
the cell processor is apparently realistic enough to be used in PS3.

ARM derivaties could also be possible, though few current ARM's have
enough FP oomph. That is just a matter of will, though, as a vector
FP ISA is defined. Sparc could also work. Since the ISA is open, a
game consoles manufacturer could make his own version and exploit that
compilers etc. already exist.

Most of the computing power in games consoles are in the graphical
processors anyway, so the main CPU needn't be high-end workstation
class (it shouldn't be cell-phone class either, though).

Torben

Tony Hill · Jun 30, 2004

"Nice to see that more and more begins to use PPC"
Is there any alternative to the use of PPC ? I mean, a realistic alternative...

MIPS and x86 jump to mind as recently used alternatives that could
easily be made to work well in future consoles. A sort of modified
ARM (with some sort of vector engine FPU type thing) could do the
trick quite well too, though that might be better suited to a handheld
console. Hell, even SPARC or Alpha could work ok, though they're both
rather odd-ball selections.

Honestly the instruction set is pretty much a non-issue. The insides
of AMD's AthlonXP chips look more like IBM's PowerPC 970 (aka the G5)
than Intel's P4. Sure the outside of the AthlonXP and the P4 are the
same ISA, but on the inside that isn't all that critical.

It's more a question of who has a core with the right specs,
performance and power consumption for the right price. IBM has
decided that the consoles are a market that they want to get into so
they're working on PPC-based cores for this market.

Fast, Faster and IBM's PlayStation 3 Processor

R420

J.O. Aho

Joe Seigh

J.O. Aho

Joe Seigh

RusH

=?ISO-8859-1?Q?Jan_Vorbr=FCggen?=

myren, lord

Grumble

Tony Hill

Eudes Malcor

Torben Ægidius Mogensen

Tony Hill