The coming of the Pentium 4 600-series

Yousuf Khan · Jan 3, 2005

The new microprocessors will be based on the Prescott 2M core that brings 2MB L2 cache, Intel EM64T, Enhanced Intel SpeedStep Technology (EIST) as well as Execute Disable Bit (EDB) capability. The chips will be clocked at 3.20GHz, 3.40GHz, 3.60GHz and 3.80GHz and will be intended for infrastructure supporting 800MHz Quad Pumped Bus and TDP of up to 115W.

X-bit labs - Hardware news - Intel Preps Onslaught with New Pentium 4
Processors 600
http://www.xbitlabs.com/news/cpu/display/20050103132921.html

Never anonymous Bud · Jan 3, 2005

Trying to steal the thunder from Arnold said:
Bus and TDP of up to 115W.

X-bit labs - Hardware news - Intel Preps Onslaught with New Pentium 4
Processors 600
http://www.xbitlabs.com/news/cpu/display/20050103132921.html

I like THIS part....

Yousuf Khan · Jan 3, 2005

Never said:
I like THIS part....

Seems like adding more L2 cache is starting come its proverbial point of
diminishing returns.

Yousuf Khan

Felger Carbon · Jan 4, 2005

Yousuf Khan said:
Seems like adding more L2 cache is starting come its proverbial point of
diminishing returns.

Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

keith · Jan 4, 2005

Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

Did you ever find ot with certainty whether or not they added back in the
FXU multiplier and barrel-shifter? That issue still seems to be up in the
air.

Yousuf Khan · Jan 4, 2005

Felger said:
point of

Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

I wonder if it's got something to do with the doubled transistor count?
Twice the transistors to go through, twice the distance to travel
through. Even with a die shrink, it's still twice the distance per
transistor.

Yousuf Khan

Grumble · Jan 4, 2005

Felger said:
Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

Not quite.

Northwood = ~19 cycles
Prescott = ~28 cycles

L1 latency, however, I believe went from 2 to 4 cycles.

chrisv · Jan 4, 2005

(Outhouse-induced mangling fixed)

Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

Well, I think Yousuf is right about the diminishing returns of larger
cache size, as well. Seems to me the "paltry" 256k of a P3 serves
quite well for the job. The cost/performance trade-off of the huge
caches seems suspect, even if you don't factor-in things like latency
increases created by the larger cache size.

Felger Carbon · Jan 4, 2005

keith said:
Did you ever find ot with certainty whether or not they added back in the
FXU multiplier and barrel-shifter? That issue still seems to be up in the
air.

Yes, I did find out (honest), but I quickly lost interest in Prescott
when I discovered it did not increase performance over the previous
generation. So I am no longer certain, but I seem to remember that it
did include those improvements. But the performance improvement
provided by those two items is completely swamped by the lousy L2
latency.

Keith, if you ever discover _why_ the lousy L2 latency, please ping
me?

Felger Carbon · Jan 4, 2005

Yousuf Khan said:
I wonder if it's got something to do with the doubled transistor count?
Twice the transistors to go through, twice the distance to travel
through. Even with a die shrink, it's still twice the distance per
transistor.

Yousuf, if the cache transistor count is doubled the shrink will
result in L2 cache _area_ that's exactly the same as the old
generation, and hence the same distances. Sorry.

Rob Stow · Jan 4, 2005

Felger said:
Yousuf, if the cache transistor count is doubled the shrink will
result in L2 cache _area_ that's exactly the same as the old
generation, and hence the same distances. Sorry.

And it must be mentioned that with a 130 nm process AMD went
from: 256 KB L2 Pre-Barton Athlon XP's, L2 latency = 28 clocks
to: 512 KB L2 Barton Athlon XP L2 latency = 23 clocks
to: 1024 KB L2 AMD64 L2 latency = 20 clocks

No idea what is going to happen to the L2 latency on the 90 nm AMD64 chips.
I've heard that both cache management and the memory controller have
been tweaked, but I haven't read any latency numbers.

keith · Jan 5, 2005

Yes, I did find out (honest), but I quickly lost interest in Prescott
when I discovered it did not increase performance over the previous
generation. So I am no longer certain, but I seem to remember that it
did include those improvements. But the performance improvement
provided by those two items is completely swamped by the lousy L2
latency.

Keith, if you ever discover _why_ the lousy L2 latency, please ping
me?

You keep your ear to the same rumors I do. I certainly am not likely to
see any such information "officially". Since Andy gave me a copy of his
book, we haven't talked much. ;-)

Bill Davidsen · Jan 5, 2005

chrisv said:
(Outhouse-induced mangling fixed)

Well, I think Yousuf is right about the diminishing returns of larger
cache size, as well. Seems to me the "paltry" 256k of a P3 serves
quite well for the job. The cost/performance trade-off of the huge
caches seems suspect, even if you don't factor-in things like latency
increases created by the larger cache size.

The large cache size really helps reduce bus contention in SMP
configurations. It feels as though lower latency would be better than
size at these low speeds, but I bet Intel did simulations and chose more
over faster. Why they had to choose I can't even guess!

Felger Carbon · Jan 5, 2005

Bill Davidsen said:
The large cache size really helps reduce bus contention in SMP
configurations. It feels as though lower latency would be better than
size at these low speeds, but I bet Intel did simulations and chose more
over faster. Why they had to choose I can't even guess!

Amen, bro! ;-)

chrisv · Jan 5, 2005

Yousuf, if the cache transistor count is doubled the shrink will
result in L2 cache _area_ that's exactly the same as the old
generation, and hence the same distances. Sorry.

But skinnier wires and thus inferior RC characteristics, possibly?

Keith R. Williams · Jan 5, 2005

But skinnier wires and thus inferior RC characteristics, possibly?

Or they tool the old macros, plopped down twice as many and wired them
up with whatever glue they needed to get it to work. That would
account for perhaps two clocks (maybe more with the tags), but the
rest???

James Boswell · Jan 8, 2005

Felger said:
Yes, I did find out (honest), but I quickly lost interest in Prescott
when I discovered it did not increase performance over the previous
generation. So I am no longer certain, but I seem to remember that it
did include those improvements. But the performance improvement
provided by those two items is completely swamped by the lousy L2
latency.

Keith, if you ever discover _why_ the lousy L2 latency, please ping
me?

Remember that the L1 latency in Prescott is stupidly high as well

if they could tweak the L1/L2 latencies back down to Northwood levels, it'd
probably start being a very very impressive chip, even with it's stupidly
high power needs.

-JB

The coming of the Pentium 4 600-series

Yousuf Khan

Never anonymous Bud

Yousuf Khan

Felger Carbon

keith

Yousuf Khan

Grumble

chrisv

Felger Carbon

Felger Carbon

Rob Stow

keith

Bill Davidsen

Felger Carbon

chrisv

Keith R. Williams

James Boswell