Intel's glued-together dual-cores

YKhan · Dec 15, 2004

SiliconStrategies.com - Intel 'dual-core' could be two die glued
together, says report
http://www.siliconstrategies.com/article/showArticle.jhtml?articleId=55301654

George Macdonald · Dec 16, 2004

SiliconStrategies.com - Intel 'dual-core' could be two die glued
together, says report
http://www.siliconstrategies.com/article/showArticle.jhtml?articleId=55301654

LONDON - The planned dual-core processor from Intel Corp.
known as Smithfield could start out as two Pentium 4 chips in a single
package, according to a report that appeared online Tuesday (Dec 14.).

According to the report in The Register, which appeared as Intel held a
telephone press conference to discuss its dual-core processor which is
expected to ship mid-2005, a company executive did not deny the
suggestion that Smithfield would be based on two Pentium 4 processors
glued together in a single package.

Smithfield would initially be fabbed using a 90-nanometer manufacturing
process, but would migrate to a 65-nm process in 2006, the report
quoted Steve Smith, vice president for Intel's desktop platforms group,
as saying.

By the end of 2006 Intel expects over 70 per cent of its desktop CPU
production to be dual-core chips, Smith also said, according to the
report.

The report said Smith declined to comment on whether Smithfield is one
ot more chips in a single package and would only say that Smithfield
contains two execution cores. Smithfiields is expected to operate at a
lower clock frequency than a single P4.

Click to expand...

Hmmm, VIA talked along similar lines a month or so ago... calling it "twin
core" IIRC.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

Yousuf Khan · Dec 16, 2004

George said:
Hmmm, VIA talked along similar lines a month or so ago... calling it "twin
core" IIRC.

Yup, Intel is racing to keep up against VIA. :-)

Yousuf Khan

keith · Dec 17, 2004

Yup, Intel is racing to keep up against VIA.

Ouch! You're cruel! ;-)

Bill Davidsen · Dec 17, 2004

YKhan said:
SiliconStrategies.com - Intel 'dual-core' could be two die glued
together, says report
http://www.siliconstrategies.com/article/showArticle.jhtml?articleId=55301654

LONDON - The planned dual-core processor from Intel Corp.
known as Smithfield could start out as two Pentium 4 chips in a single
package, according to a report that appeared online Tuesday (Dec 14.).

Click to expand...

I believe the original PentiumPro was two chips in a single die carrier,
the CPU and the cache.

According to the report in The Register, which appeared as Intel held a
telephone press conference to discuss its dual-core processor which is
expected to ship mid-2005, a company executive did not deny the
suggestion that Smithfield would be based on two Pentium 4 processors
glued together in a single package.

Click to expand...

To the point, are these current production compatible P4 (ie. HT
enabled)? And do they share L2 (or L3) cache?

Smithfield would initially be fabbed using a 90-nanometer manufacturing
process, but would migrate to a 65-nm process in 2006, the report
quoted Steve Smith, vice president for Intel's desktop platforms group,
as saying.

By the end of 2006 Intel expects over 70 per cent of its desktop CPU
production to be dual-core chips, Smith also said, according to the
report.

The report said Smith declined to comment on whether Smithfield is one
ot more chips in a single package and would only say that Smithfield
contains two execution cores. Smithfiields is expected to operate at a
lower clock frequency than a single P4.

Click to expand...

There are a lot of interesting questions about this coming technology,
it could be really neat or it could be a true cob job.

nobody · Dec 17, 2004

SiliconStrategies.com - Intel 'dual-core' could be two die glued
together, says report
http://www.siliconstrategies.com/article/showArticle.jhtml?articleId=55301654

LONDON - The planned dual-core processor from Intel Corp.
known as Smithfield could start out as two Pentium 4 chips in a single
package, according to a report that appeared online Tuesday (Dec 14.).

According to the report in The Register, which appeared as Intel held a
telephone press conference to discuss its dual-core processor which is
expected to ship mid-2005, a company executive did not deny the
suggestion that Smithfield would be based on two Pentium 4 processors
glued together in a single package.

Click to expand...

....snip...
I already have an oil-filled electric heater that has dual (600 W and
900W) core. The cores can be turned on separately or together, thus
providing 3 heating levels. Is Intel-branded dual-core P4 space
heater going to have the same feature, i.e. could one of the cores be
turned off when it gets too hot in the room?
;-)

Yousuf Khan · Dec 18, 2004

Bill said:
I believe the original PentiumPro was two chips in a single die carrier,
the CPU and the cache.

Yes, and I would guess that the current production Xeons with L3 caches
are also similar, with the L3 being a separate chip?

To the point, are these current production compatible P4 (ie. HT
enabled)? And do they share L2 (or L3) cache?

No, they don't share any of their caches with each other. Actually, the
AMD dual-cores are going to be similar to this too, with no shared
cache. You lose a lot of cost savings at the very least, by not
integrating the L2 caches. But you might get slightly better performance
by having the dedicated L2's.

There are a lot of interesting questions about this coming technology,
it could be really neat or it could be a true cob job.

I think the main question is whether the internal CPU-CPU communications
mechanism is properly designed or just cobbled together. A properly
designed one would reduce if not eliminate entirely the amount of
cache-snoop traffic going over the FSB.

Yousuf Khan

Carlo Razzeto · Dec 18, 2004

Yousuf Khan said:
No, they don't share any of their caches with each other. Actually, the
AMD dual-cores are going to be similar to this too, with no shared cache.
You lose a lot of cost savings at the very least, by not integrating the
L2 caches. But you might get slightly better performance by having the
dedicated L2's.

Yousuf Khan

Interesting, I thought that the DC Opterons were going to share their L2. I
could have sworn I saw that in one of their presentations.

Carlo

Tony Hill · Dec 19, 2004

Yes, and I would guess that the current production Xeons with L3 caches
are also similar, with the L3 being a separate chip?

Actually no, all integrated on-die. The L3 just has a narrower
(64-bit vs. 256-bit) connection to the processor core and higher
latency when compared to the L2 cache. Same goes for Itaniums.

No, they don't share any of their caches with each other. Actually, the
AMD dual-cores are going to be similar to this too, with no shared
cache. You lose a lot of cost savings at the very least, by not
integrating the L2 caches. But you might get slightly better performance
by having the dedicated L2's.

It probably also simplifies design by a fair bit. A shared cache is
going to be trickier to design than a separate one. By no means an
insurmountable problem, but it would probably just compound add to the
performance hit, making it not worthwhile.

Besides which we seem to be quickly getting to a point where designers
have more transistors than they can figure out what to do with.

Yousuf Khan · Dec 19, 2004

Carlo said:
Interesting, I thought that the DC Opterons were going to share their L2. I
could have sworn I saw that in one of their presentations.

Nope, and you'll notice that DC Opterons are almost exactly twice the
size of their SC versions. That's cause they not only add an extra core,
they also added the whole L2 cache too.

From what I've heard, AMD did indeed make their Opterons DC-capable
right from the beginning, but what that actually meant was that they had
simply designed the core so that if they cut two cores side-to-side,
they would see communications channels directly aligned up on each die.
So they were actually ever planning on sharing caches with each other.

Yousuf Khan

Carlo Razzeto · Dec 19, 2004

Yousuf Khan said:
Nope, and you'll notice that DC Opterons are almost exactly twice the size
of their SC versions. That's cause they not only add an extra core, they
also added the whole L2 cache too.

From what I've heard, AMD did indeed make their Opterons DC-capable right
from the beginning, but what that actually meant was that they had simply
designed the core so that if they cut two cores side-to-side, they would
see communications channels directly aligned up on each die. So they were
actually ever planning on sharing caches with each other.

Yousuf Khan

Very interesting... I guess in the end it would make sense to have separate
cache's for each core. Simpler to design, minimal tweaking required to fab
these chips v. single core, and presumably a small performance boost.

Carlo

keith · Dec 20, 2004

Nope, and you'll notice that DC Opterons are almost exactly twice the
size of their SC versions. That's cause they not only add an extra core,
they also added the whole L2 cache too.

Which isn't surprising, considering the architecture. The second/spare
port is into the HT controller, not the L2.

From what I've heard, AMD did indeed make their Opterons DC-capable
right from the beginning, but what that actually meant was that they had
simply designed the core so that if they cut two cores side-to-side,
they would see communications channels directly aligned up on each die.
So they were actually ever planning on sharing caches with each other.

I heard the same, but I'd like to see some more detail. I'm quite sure
it's not all that "simple". There is a left-right issue and all sorts of
other trivia as well.

keith · Dec 20, 2004

Very interesting... I guess in the end it would make sense to have separate
cache's for each core. Simpler to design, minimal tweaking required to fab
these chips v. single core, and presumably a small performance boost.

....or loss. Smaller caches and fewer ports might be faster, but
data duplication and cross-snooping might cause it to be slower. This
isn't so clear-cut.

Alex Johnson · Dec 20, 2004

Tony said:
The L3 just has a narrower
(64-bit vs. 256-bit) connection to the processor core and higher
latency when compared to the L2 cache. Same goes for Itaniums.

Am I misreading you? It sounds like you are saying Itanium's L3 has a
narrower connection to the core than the L2. This is absolutely untrue.
L3 sends data to L2 before L2 sends it on. At worst it is "the same"
because data must take the same path. At best it is "twice as wide"
since the L2 can be filled faster than it can be sent on to the core.
Of course I assume "Itanium" means Itanium 2 family chips since the
original Itanium was a joke and basing any arguments about design
choices of modern processors is insulting.

Alex

Yousuf Khan · Dec 21, 2004

Alex said:
Am I misreading you? It sounds like you are saying Itanium's L3 has a
narrower connection to the core than the L2. This is absolutely untrue.
L3 sends data to L2 before L2 sends it on. At worst it is "the same"
because data must take the same path. At best it is "twice as wide"
since the L2 can be filled faster than it can be sent on to the core. Of
course I assume "Itanium" means Itanium 2 family chips since the
original Itanium was a joke and basing any arguments about design
choices of modern processors is insulting.

Were you involved in the project when the Alpha guys designed Tukwila?
Why did the PA-RISC guys not like their design?

Yousuf Khan

Bill Davidsen · Dec 22, 2004

Yousuf said:
Yes, and I would guess that the current production Xeons with L3 caches
are also similar, with the L3 being a separate chip?

No, they don't share any of their caches with each other. Actually, the
AMD dual-cores are going to be similar to this too, with no shared
cache. You lose a lot of cost savings at the very least, by not
integrating the L2 caches. But you might get slightly better performance
by having the dedicated L2's.

One of those "it depends" cases, you have to do snooping if you do SMP,
the only question is where.

I think the main question is whether the internal CPU-CPU communications
mechanism is properly designed or just cobbled together. A properly
designed one would reduce if not eliminate entirely the amount of
cache-snoop traffic going over the FSB.

Totally agree.

Tony Hill · Dec 22, 2004

Am I misreading you?

Err.. I think you are.

It sounds like you are saying Itanium's L3 has a
narrower connection to the core than the L2.

No, I was saying the exact opposite.

This is absolutely untrue.
L3 sends data to L2 before L2 sends it on. At worst it is "the same"
because data must take the same path. At best it is "twice as wide"
since the L2 can be filled faster than it can be sent on to the core.
Of course I assume "Itanium" means Itanium 2 family chips since the
original Itanium was a joke and basing any arguments about design
choices of modern processors is insulting.

I don't have any numbers for Itanium, the bit I was quoting was for
the P4EE/Xeon (256-bit wide L2 cache port, 64-bit wide L3). I would
guess that the Itanium is at least a similar ratio if not the same
numbers.

Probably more importantly than the bandwidth is the latency. The
P4EE/Xeon chips have something like a 10 cycle L2 latency and about a
40 cycle L3 latency. With Itanium my guess is that the spread is even
wider (ie the very small 256K of L2 cache in the Itanium2 probably has
very low latency while the huge 3-9MB of L3 cache probably has rather
high latency).

Alex Johnson · Dec 22, 2004

Tony said:
No, I was saying the exact opposite.

256-bit wide L2 cache port, 64-bit wide L3

You just said you meant the opposite of what I thought you said, but
then provided numbers to back up what I thought you said. I find the
Xeon to be very strange if it has 256-bit width from L2 and 64-bit width
from L3. That's 32-bytes vs 8-bytes.

Itanium 2 returns data from the L2 256-bits at a time to either the L1D
or the L1I. It fills the L2 256-bits at a time.

P4EE/Xeon chips have something like a 10 cycle L2 latency and about a
40 cycle L3 latency. With Itanium my guess is that the spread is even
wider (ie the very small 256K of L2 cache in the Itanium2 probably has
very low latency while the huge 3-9MB of L3 cache probably has rather
high latency).

Itanium 2 latency is 5 cycles from L2 and 12 cycles from L3. Much
better than Xeon. Xeon has a ratio of 4:1 while Itanium 2 has a ratio
of 2.4:1. Those numbers are for McKinley (the 1GHz version). I believe
the Madison (1.5GHz version) raised the latency to L3 by 2 cycles, so 5
and 14 (2.8:1). Which corresponds to 3.33ns and 9.33ns total time for
the Itanium 2 at 1.5GHz vs (since I don't know what speed Xeon your
numbers are for I'll assume the 3.0GHz Xeon MP with 4M cache) 3.33ns and
13.33ns total times. So the L2 caches have the same access time, but
the Itanium 2 is faster to reach its larger cache. I'm curious to see
what the timings will be on the Montecito, which ups the L3 ante to 12MB.

Alex

Tony Hill · Dec 27, 2004

You just said you meant the opposite of what I thought you said, but
then provided numbers to back up what I thought you said. I find the
Xeon to be very strange if it has 256-bit width from L2 and 64-bit width
from L3. That's 32-bytes vs 8-bytes.

Itanium 2 returns data from the L2 256-bits at a time to either the L1D
or the L1I. It fills the L2 256-bits at a time.

I believe the same is true for the Xeon, it just takes 4 clock cycles
to do a fill from L3 cache.

Perhaps someone else in this newsgroup has a bit more precise
knowledge of how it works though, I know a while back there was some
big discussion going on here about cache lines vs. cache segments and
how they all fit into getting data into and out of the processor. In
the end all I took out of the discussion was that everyone seemed to
have a different definition for everything and none of it made much
sense to me! :>

Itanium 2 latency is 5 cycles from L2 and 12 cycles from L3. Much
better than Xeon. Xeon has a ratio of 4:1 while Itanium 2 has a ratio
of 2.4:1.

Don't quote me on those numbers being exact, just rough estimates of
what I remember them being. I'm not sure if Intel has documented the
exact latency timings for the Xeon, but if they have, I'm not sure
where to find it.

Intel's glued-together dual-cores

YKhan

George Macdonald

Yousuf Khan

keith

Bill Davidsen

nobody

Yousuf Khan

Carlo Razzeto

Tony Hill

Yousuf Khan

Carlo Razzeto

keith

keith

Alex Johnson

Yousuf Khan

Bill Davidsen

Tony Hill

Alex Johnson

Tony Hill