Tilera to Introduce 64-Core Processor

  • Thread starter Thread starter AirRaid
  • Start date Start date
A

AirRaid

Tilera to Introduce 64-Core Processor
By Andy Patrizio

An MIT-inspired startup will introduce a new multi-core chip today at
the annual Hot Chips conference at Stanford University. The TILE64
boasts a "clean sheet" design, unencumbered by any legacy
compatibility concerns, that Tilera says will provide a huge leap in
multithreaded performance.

Tilera was founded in 2004 to bring to market the multi-core processor
designs of MIT researcher Anant Agarwal. Agarwal created what he
called a "mesh" multi-core architecture, where the cores are all
interconnected rather than going through a frontside bus, as Intel's
multi-core chips do.

Agarwal first created this multi-core architecture in 1996, long
before Intel and AMD were anywhere close to doing it. The project
received funding from the Defense Advanced Research Project Agency
(DARPA) and the National Science Foundation, the agency that managed
the Internet for decades.

Tilera holds 40-plus patents for its multi-core design. TIL64 will be
the first in a series of processors built around massively multi-core
chips. The TILE64 processor contains 64 full-featured, programmable
cores that Tilera claims can perform 500 billion operations per second
and delivers ten times the performance and thirty times the
performance-per-watt of the Intel dual-core Xeon.

Agarwal said the company can make these performance leaps because it
doesn't use any legacy technologies or designs.

"The real problem with scale is existing multi-core architectures use
a bus. In that architecture, the bus is a central switch and all the
cores are connected to the single central switch. A packet has to go
through it no matter what, which is fine for one, two or four cores,
but it does not scale," he told internetnews.com.

Tilera uses a mesh architecture, where the cores are laid out in a
checkerboard-like grid, all connected through high-speed
interconnects. "In architectures of this sort, you can keep growing
and you won't have any serious congestion," said Agarwal.

Intel has promised to dispense with the frontside bus with the Nehalem
architecture, due late next year. AMD does not have a frontside bus in
the Opteron, but it's also using four cores at the most, while Tilera
is at 64.

The TILE family can scale up to even more, or down to a two-core
design for the smallest of designs, such as a cell phone. Its power
consumption is a few hundred milliwatts per core, Agarwal said. Its
clock speed will range from 600MHz to 1GHz.

But there's a lot more on the chip than just cores. It has a pair of
10 gigabit Ethernet ports directly on the chip for high speed
networking, as well as on-board I/O and peripheral controllers. Its
integrated memory controllers allow for up to 200 gigabits of memory
bandwidth within the chip.

That's what made the TILE64 chip so appealing to Top Layer, developer
of network security and intrusion detection appliance. The company had
built its own processors but now plans to switch to Tilera's chips,
according to Chief Strategy Officer Mike Paquette.

"Our software is a multi-core design, and we were able to map out
functionality almost 1 for 1 for each process to a core in a Tilera
chip," he said. "The performance we expect in our estimates exceeds
what we could have gotten from any silicon providers."

Top Layer decided to license processors for future products rather
than the expense of building any more, and no other processors had the
scalability. "Because the movement of data is so much of what we do,
we needed a multi-core chip that was optimized for what we were doing
rather than something optimized for general purpose computing Tilera
has capabilities for network capabilities that are far ahead of what
you can get from [x86] processors," said Paquette.

Tilera will ship a full development toolkit, called the Multicore
Development Environment (MDE), for building applications. It's an
Eclipse-based Integrated Development Environment (IDE) with an ANSI
standard C compiler, an application level library and tools for
debugging and profiling multi-core processors.

Wisely, Tilera is not taking on Intel and AMD right out of the gate,
as Transmeta did. It's going for the embedded market.

"We're focused on embedded because we are a startup and want to go
into a space where there is massive demand for performance like ours.
We can focus on a couple of markets and do really well in those
markets by addressing customer demands squarely and don't have to go
up against a dominant competitor," said Agarwal.

Tilera expects to sell the TILE64 processor for $435 in lots of 10,000
units. The company is also planning a 36-core and 120-core processor
for the near future.


http://www.internetnews.com/ent-news/article.php/3695116
 
[[followups-to trimmed to only comp.arch]]

In comp.arch AirRaid said:
The TILE64 processor contains 64 full-featured, programmable
cores that Tilera claims can perform 500 billion operations per second
and delivers ten times the performance and thirty times the
performance-per-watt of the Intel dual-core Xeon. [[...]]
integrated memory controllers allow for up to 200 gigabits of memory
bandwidth within the chip.

What about off-chip bandwidth -- can this keep up with 64 cores' cache
misses? Is there a clever new door to get through the memory wall?

--
-- "Jonathan Thornburg -- remove -animal to reply" <[email protected]>
School of Mathematics, U of Southampton, England
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam
 
Tilera to Introduce 64-Core Processor
By Andy Patrizio

"The real problem with scale is existing multi-core architectures use
a bus. In that architecture, the bus is a central switch and all the
cores are connected to the single central switch. A packet has to go
through it no matter what, which is fine for one, two or four cores,
but it does not scale," he told internetnews.com.

Tilera uses a mesh architecture, where the cores are laid out in a
checkerboard-like grid, all connected through high-speed
interconnects. "In architectures of this sort, you can keep growing
and you won't have any serious congestion," said Agarwal.

I'm a bit puzzled by this. If the cores are laid out in a checkerboard
like grid, doesn't that mean each core is linked to the 8 cores around
it? So it would still come up to some kind of latency bottleneck
wouldn't it? What difference is it from AMD's ccHTT links except
they've got a few more?

Or does he mean the 64 cores are all directly connected to each
other... meaning there are some 63 connections coming out of each core
to every other core for some mindboggling number? (I think
63+62+61+... but my abysmal ability with maths fails me here) But
essentially becoming a nightmare if the number of cores go out. So
unlikely to be case, no?
 
I'm a bit puzzled by this. If the cores are laid out in a checkerboard
like grid, doesn't that mean each core is linked to the 8 cores around
it? So it would still come up to some kind of latency bottleneck
wouldn't it? What difference is it from AMD's ccHTT links except
they've got a few more?
A cpu has more than one layer. I'm not sure how many it has but I think
AMD's is about 9 layers with the K8. I'd suspect the tile64 is a lot more.
The interconnect would be similar to AMD's HT interconnect bus.
Or does he mean the 64 cores are all directly connected to each other...
meaning there are some 63 connections coming out of each core to every
other core for some mindboggling number? (I think 63+62+61+... but my
abysmal ability with maths fails me here) But essentially becoming a
nightmare if the number of cores go out. So unlikely to be case, no?

Probably the reason the core speeds are kept a lot lower. I know I'd like
to have one of these on a small MB compatable with an ATX/BTX case, but
realisitically, I have no need for so much power. But cutting electrical
use would be nice.
 
Tilera to Introduce 64-Core Processor
By Andy Patrizio

An MIT-inspired startup will introduce a new multi-core chip today at
the annual Hot Chips conference at Stanford University. The TILE64
boasts a "clean sheet" design,
unencumbered by any legacy
compatibility concerns,

Which means no software is available.
The TILE64 processor contains 64 full-featured, programmable
cores that Tilera claims can perform 500 billion operations per second
and delivers ten times the performance and thirty times the
performance-per-watt of the Intel dual-core Xeon.
The TILE family can scale up to even more, or down to a two-core
design for the smallest of designs, such as a cell phone. Its power
consumption is a few hundred milliwatts per core, Agarwal said. Its
clock speed will range from 600MHz to 1GHz.

64 processors at 1 GHz giving 500 GIPS means 8 IPC/core? or 1 IPC/core
with eight 8-16-32-64-bit sub-operations per cycle?
But there's a lot more on the chip than just cores. It has a pair of
10 gigabit Ethernet ports directly on the chip for high speed
networking, as well as on-board I/O and peripheral controllers. Its
integrated memory controllers allow for up to 200 gigabits of memory
bandwidth within the chip.

200 Gbits per what unit of time?

500 GIPS should require somthing in the 100GBytes/sec to 500GBytes/sec
range of external memory bandwidth.

The only thing impressive, here, is the level of distortion.......
 
A cpu has more than one layer. I'm not sure how many it has but I think
AMD's is about 9 layers with the K8. I'd suspect the tile64 is a lot more.
The interconnect would be similar to AMD's HT interconnect bus.

The PowerPC 970MP was ten layers. The bad news is that you need them
for things other than interconnecting cores. The same thing that's
forcing more cores (because we have nothing better to do with
transistors) is forcing more layers of interconnect, just to wire
them all up.
Probably the reason the core speeds are kept a lot lower. I know I'd like
to have one of these on a small MB compatable with an ATX/BTX case, but
realisitically, I have no need for so much power. But cutting electrical
use would be nice.

Cores speeds are likely lower so it can be cooled.
 
Tilera expects to sell the TILE64 processor for $435 in lots of 10,000
units. The company is also planning a 36-core and 120-core processor for
the near future.

Shucks, soon we won't bother with putting memory in systems
anymore, we'll just have whole CPU cores where individual memory
bits used to be :-).
 
The PowerPC 970MP was ten layers. The bad news is that you need them
for things other than interconnecting cores. The same thing that's
forcing more cores (because we have nothing better to do with
transistors) is forcing more layers of interconnect, just to wire
them all up.

So it'll just be a real mess isn't it? And they can't just make more
cores using the same blue print can they? Since changing the design
from 64 core to say 128 core would require a whole new design with
like double the number of interconnects no?

Or are they actually just saying every core is identical and only
directly connected to their immediate neighbours? Since this seems to
be the logical way to me.
Cores speeds are likely lower so it can be cooled.

Wouldn't this be like getting close to the same problem as whowasit's
parallel hz idea? Seems to me like this 64core thingy is only going to
be good for certain problems since most stuff just isn't going to be
really parallelizable to such an extent. While many things will just
suffer from the low clockspeed/high latency. In other words, not going
to be a mass market product?
 
I'm a bit puzzled by this. If the cores are laid out in a checkerboard
like grid, doesn't that mean each core is linked to the 8 cores around
it?

Reading between the lines, it's probably a torus rather than a mesh.
An 8x8 2D torus and a 4x4x4 3D torus would give a maximum of 8 hops
and 6 hops, respectively, between cores. That doesn't even take into
account the latency to RAM. This processor would probably work well
only on dataflow problems (which access RAM very little).
So it would still come up to some kind of latency bottleneck
wouldn't it? What difference is it from AMD's ccHTT links except
they've got a few more?

A heck of a lot more hops and so, potentially, a much higher latency.
On the other hand, on chip links can be made wider - perhaps even 8
times the width of HTT links thereby cutting latency per link.
Or does he mean the 64 cores are all directly connected to each
other.

No, since a mesh is mentioned.
 
In comp.sys.ibm.pc.hardware.chips krw said:
The PowerPC 970MP was ten layers. The bad news is that you
need them for things other than interconnecting cores. The same
thing that's forcing more cores (because we have nothing better
to do with transistors) is forcing more layers of interconnect,
just to wire them all up.

Like L'Angel, I'm a bit puzzled, but in a different direction:
Lay the cores out in a flat checkerboard. Bus each row together
to a shared x8 section of L2 cache. One/two layers max. Those
cores better each have their own L1s!
Cores speeds are likely lower so it can be cooled.

And power/gd current handled. Also to avoid overloading
L2 and main RAM busses.

-- Robert
 
I wondering that have 64 cores in a one CPU.But which programming
langs can be solved parallel's problem?C++,Java or Fortress?
 
Tilera expects to sell the TILE64 processor for $435 in lots of 10,000
Shucks, soon we won't bother with putting memory in systems
anymore, we'll just have whole CPU cores where individual memory
bits used to be :-).

That's the point where it becomes interesting. Having only a few of some
resource makes it hard to manage. Having one is easy - there's no choice.
having lots is easy - use what you can. Having a few means you have to
choose carefully.

For most programs, the best way to use two cores is to turn one off. With
64 you can start running pipelines and arrays, which is what the Tilera
looks like it was designed for.
 
Or are they actually just saying every core is identical and only
directly connected to their immediate neighbours? Since this seems to
be the logical way to me.

it's a grid, where each processor has 4 neighbors. See
http://www.tilera.com/pdf/ArchBrief_Arch_V1_Web.pdf
Wouldn't this be like getting close to the same problem as whowasit's
parallel hz idea? Seems to me like this 64core thingy is only going to
be good for certain problems since most stuff just isn't going to be
really parallelizable to such an extent. While many things will just
suffer from the low clockspeed/high latency. In other words, not going
to be a mass market product?

Of course not - it doesn't run Office 97, so it's not a general-purpose
CPU. they're going for the embedded market, which is much larger, but
populated largely by 8-bit CPUs, battery-power ARMs, DSPs, FPGAs.

Perhaps it'll find use as a game platform, set-top box, speech recognizer,
packet sniffer.
 
Alex Colvin wrote:

snip
For most programs, the best way to use two cores is to turn one off.

I think that is an over generalization. Of course it depends a lot upon
what you are doing, but two processors can frequently be used
productively fairly easily. For a few examples, most systems run an OS
that has a few background/intermittent tasks. These naturally can be
done by the second processor, which not only takes their processing
demands off the core you are using, but eliminates the overhead of the
task switches.

In addition, systems with a non-trivial user interface can fairly easily
be modified to offload that functionality to the second core, which
improves responsiveness.

And, some programs have fairly separable functionality that can
reasonably be done on a second processor. For example, though it almost
certainly isn't compute bound, a word processor could separate the
inline spell and grammar checks to a second core. And a spreadsheet can
often have multiple "threads" of cells that can be updated in parallel.

But this kind of stuff gets harder and harder to take advantage of with
increasing number of cores. that is, while it is fairly easy to take
advantage of two cores, it is harder to use four and harder still to use
eight.
With
64 you can start running pipelines and arrays, which is what the Tilera
looks like it was designed for.


Yes, agreed. But taking advantage of those techniques does require
substantial programming effort. It is probably worth the effort for
some applications, but how many is the big question.
 
For most programs, the best way to use two cores is to turn one off.
In addition, systems with a non-trivial user interface can fairly easily
be modified to offload that functionality to the second core, which
improves responsiveness.

True. Most systems are running more than one program. But most programs
are single-threaded. And the cost of thread initiation and switching
doesn't justify a second core.
Yes, agreed. But taking advantage of those techniques does require
substantial programming effort. It is probably worth the effort for
some applications, but how many is the big question.

as we say, the $64 question...
 
In comp.sys.ibm.pc.hardware.chips Alex Colvin said:
Of course not - it doesn't run Office 97, so it's not a general-purpose
CPU. they're going for the embedded market, which is much larger, but
populated largely by 8-bit CPUs, battery-power ARMs, DSPs, FPGAs.

Perhaps it'll find use as a game platform, set-top box, speech recognizer,
packet sniffer.

A packet sniffer would match the other articles I've seen - one of their
early customers is going to be using it for a "network security appliance."

I doubt it will work very well as a game platform CPU per se, although for
GPUs, NVidia is going in sort of that direction with their generalized
stream processors on the GF8 series.
 
Back
Top