IBM vs. Intel 90nm power consumption

  • Thread starter Thread starter Tony Hill
  • Start date Start date
T

Tony Hill

Hi all

Somewhat of a boring Friday evening in, so I figured I would crunch a
few numbers with regards to power consumption of Intel vs. IBM's 130nm
vs. 90nm shrink, using the PowerPC 970 -> 970FX, both at 2.0GHz, and
the P4 3.2GHz -> P4 3.2E GHz (ie Northwood -> Prescott).

First, the easy numbers, transistor count:

IBM PPC 970 = 55M -> PPC 970FX = 58M
Intel P4 "Northwood" = 58M -> P4 "Prescott" = 125M

But we should probably ignore L2 cache transistors in this, since
cache takes up a lot of transistors but relatively little die space
and much less power than logic gates... so new numbers are:

IBM PPC 970 = 25M -> PPC 970FX = 28M
Intel P4 "Northwood" = 28M -> P4 "Prescott" = 65M


Ok, and now for the power consumption. This was actually the tricky
part because IBM basically won't tell anyone what the power
consumption of their chips really is, and they like to use some
"typical" power consumption numbers that are MUCH less than what you
will encounter using the chip full-out. Intel, as has been discussed
a few times before, uses "Thermal Design Power", which is
kinda-sorta-almost the maximum power you'll see in the real world.
Intel's numbers should be fine for this exercise, but I had to dig a
bit to find some useable IBM numbers. Best I could come up with is:

IBM PPC 970 = 90W -> PPC 970FX = 55W
Intel P4 "Northwood" = 82W -> P4 "Prescott" = 103W


So, now it's just a matter of running the numbers:

IBM:
PPC 970 = 90W / 25M transistors = 3.60W/M transistor
PPC 970FX = 55W / 28M transistors = 1.96W/M transistors
Percentage improvement = 45.6% less power

Intel:
P4 130nm = 82W / 28M transistors = 2.93W/M transistors
P4 90nm = 103W / 65M transistors = 1.58W/M transistors
Percentage improvement = 46.1% less power


Just a bit of food for thought, given all the noise about how well
IBM's 130nm -> 90nm process went vs. how "badly" Intel's 130nm -> 90nm
transition has gone.
 
IBM PPC 970 = 25M -> PPC 970FX = 28M
Intel P4 "Northwood" = 28M -> P4 "Prescott" = 65M

But since quite a bit of that extra transistor count is actually the
inactive 64bit extensions, shouldn't they be counted as non-active
similar to the cache for determining power consumption?

--
L.Angel: I'm looking for web design work.
If you need basic to med complexity webpages at affordable rates, email me :)
Standard HTML, SHTML, MySQL + PHP or ASP, Javascript.
If you really want, FrontPage & DreamWeaver too.
But keep in mind you pay extra bandwidth for their bloated code
 
Minor correction here. The "Northwood" actually has 55M transistors
(25M when I subtract the cache).
But since quite a bit of that extra transistor count is actually the
inactive 64bit extensions, shouldn't they be counted as non-active
similar to the cache for determining power consumption?

Perhaps, but I haven't a clue how many transistors are actually
disabled. Intel certainly isn't going to say. However it probably
isn't all that many. AMD stated a few times that adding 64-bit
support cost them about 5% in terms of die space, and that was on a
smaller die (at least in terms of transistor count). Probably no more
than 5M transistors at a maximum, and when you factor in the minor
correction mentioned above, the numbers work out about the same.

There are also some test and debug transistors included in the
Prescott which are not enabled, but again I have no idea how many of
these transistors there are. Northwood and the PPC 970/970FX will
also have some disabled transistors, but none of them are documented.
 
Tony Hill said:
Hi all

But we should probably ignore L2 cache transistors in this, since
cache takes up a lot of transistors but relatively little die space
and much less power than logic gates... so new numbers are:

Hi yerself Tony! ;-)

True, L2 caches in the past consumed little power. But with the new
90nm generation and Prescott in particular, leakage current is said to
represent much of the power consumed. Don't cache transistors have
just as much leakage current as logic transistors? If so, is it
really true that caches no longer require much power? Just a thought.

And now, back to March Madness and the NCAA TV games.
 
Hi yerself Tony! ;-)

True, L2 caches in the past consumed little power. But with the new
90nm generation and Prescott in particular, leakage current is said to
represent much of the power consumed. Don't cache transistors have
just as much leakage current as logic transistors? If so, is it
really true that caches no longer require much power? Just a thought.

Ah, Felg. Surely you know that not all transistors are created
equal. SOme are slow, others fast. Some leak like a sieve, some
not so. It's all how you choose to use 'em. BTW, SOI helps half
the problem (at least half the time ;-).
And now, back to March Madness and the NCAA TV games.

....but the good guys are gone.
 
Tony said:
Ok, and now for the power consumption. This was actually the tricky
part because IBM basically won't tell anyone what the power
consumption of their chips really is, and they like to use some
"typical" power consumption numbers that are MUCH less than what you
will encounter using the chip full-out. Intel, as has been discussed
a few times before, uses "Thermal Design Power", which is
kinda-sorta-almost the maximum power you'll see in the real world.
Intel's numbers should be fine for this exercise, but I had to dig a
bit to find some useable IBM numbers. Best I could come up with is:

IBM PPC 970 = 90W -> PPC 970FX = 55W

How did you come up with these two numbers?

Do you have a reference?
 
KR Williams said:
thought.

Ah, Felg. Surely you know that not all transistors are created
equal. SOme are slow, others fast. Some leak like a sieve, some
not so. It's all how you choose to use 'em. BTW, SOI helps half
the problem (at least half the time ;-).

As always, Keith, you're right. ;-)

But you didn't answer my question: does (for example) Prescott's L2
dissipate significant power, a significant part of its 100+ watt
dissipation?
 
How did you come up with these two numbers?

Do you have a reference?

I came up with these numbers after spending about a half hour
scrounging web sites to find some info about this because IBM doesn't
see fit to provide any meaningful documentation for this chip.</rant>

Anyway, I found the numbers on a MacWorld article, which as luck would
have it, doesn't seem to be accessible at the moment. Here's a link
to the article and a Google cache link that might work if the original
doesn't:

http://www.macworld.co.uk/news/top_news_item.cfm?NewsID=7914

http://216.239.39.104/search?q=cach...70fx+power+consumption+90w+55w&hl=en&ie=UTF-8


These numbers appear to be a sort of theoretical maximum, probably
simply just Max Icc at standard Vcc. I also found another set of
number, 39W for the PPC 970FX and 66W for the older 970. As best as I
can tell these were a sort of TDP number. The 39W number seems fairly
reasonable (it was even mentioned in an IBM document.. that has since
been taken offline), but I only just recently heard the 66W number for
the original PPC 970.

Either way, the percentage improvement that IBM achieved with their
130nm -> 90nm transition remained about the same, and that was the
real number I was after all along, so it ended up not mattering much.

Unfortunately most of the time the only power consumption figure I see
thrown around with the PPC 970/970FX chips are "typical" power
consumption figures. This is rather a pain in the butt since
"typical" power consumption means basically nothing, especially if it
accompanied by an explanation of how that number is measured. For
example, some companies measure "typical" to be "idle" (most desktop
PCs are idle 99% of the time). Others measure "typical" to mean power
consumption while playing a DVD movie (Transmeta has used this
definition). Others still have defined "typical" to be the power
consumption while running a Winstone/Winbench type of benchmark (AMD
used this definition for their Athlon line). And finally there are
some who just seem to pull a number out of their ass and present that
as "Typical power consumption". Since IBM hasn't provided any hint
otherwise, I'm kind of assuming that they're using the last method! :>
 
On Fri, 26 Mar 2004 23:38:27 -0500, Tony Hill

IBM:
PPC 970 = 90W / 25M transistors = 3.60W/M transistor
PPC 970FX = 55W / 28M transistors = 1.96W/M transistors
Percentage improvement = 45.6% less power

Intel:
P4 130nm = 82W / 28M transistors = 2.93W/M transistors
P4 90nm = 103W / 65M transistors = 1.58W/M transistors
Percentage improvement = 46.1% less power


Just a bit of food for thought, given all the noise about how well
IBM's 130nm -> 90nm process went vs. how "badly" Intel's 130nm -> 90nm
transition has gone.

Interesting analysis. Since both IBM and Intel could be pulling all
kinds of weird tricks with their transistor count and power numbers,
and I could easily be deceived, it's not an analysis I would attempt
on my own.

_Something_ happened with Prescott that Intel didn't plan on. We
don't _really_ know what that something is and we don't really know
what Intel's options are for fixing whatever it is might be. However
the numbers work out, there is more leakage and higher power
consumption than Intel expected.

From the blah performance, I am guessing that we are already looking
at a design that's operating off its original target design point. I
don't think Intel would have spent all that money in the expectation
that people would look at the numbers and say, but this won't be
interesting even at 4GHz (which is the situation they've got, even if
they do make it to 4GHz).

What's happening feels remniscent of what happened at the 1GHz
barrier. Intel, despite its mastery of process technology, just
couldn't get the special sauce to come out right. That fiasco caused
them to hustle the NetBurst architecture onto the stage faster than
they had planned on, and this latest fiasco is encouraging speculation
that Intel may be preparing to hustle NetBurst off the stage faster
than it had planned on.

I have visions of Ph.D.'s working long into the night running
simulations to try to figure out what's going on and what to do about
it. It would be worth the misery of working under so much pressure
just to see what's really going on.

RM
 
Interesting analysis. Since both IBM and Intel could be pulling all
kinds of weird tricks with their transistor count and power numbers,
and I could easily be deceived, it's not an analysis I would attempt
on my own.

Well the transistor count is easy enough to come by, though it
definitely doesn't tell the full picture, ie how many transistors are
only used for testing and therefore disabled in normal use? Prescott
apparently has a lot of those.

As for power consumption, they do play a lot of games with these
numbers, but as long as the games are consistent for either generation
of chips than the numbers still work out.

In the end though, you're right, it is a kind of a rough estimate at
best.
_Something_ happened with Prescott that Intel didn't plan on. We
don't _really_ know what that something is and we don't really know
what Intel's options are for fixing whatever it is might be. However
the numbers work out, there is more leakage and higher power
consumption than Intel expected.

Knowing what Intel expected is even harder to figure out than what
they actually ended up with! It seems unbelievable that they would
purposely design a CPU that runs right at it's thermal limit at
standard ambient temperatures with a well ventilated case and the
retail box heatsink. However that is exactly what they have with the
Prescott. On the other hand, certainly Intel must have known that
making a processor with 125M transistors that clocks to 3GHz+ will
consume a LOT of power, even on a 90nm fab process.

The one thing I really just can't figure out though is how Intel
managed to design a processor on a more advanced fab space and using
well over twice as many transistors as the "Northwood" P4, and yet
they don't seem to be getting anything from it. While the "Prescott"
is occasionally faster, clock for clock, in some benchmarks, it loses
as many as it wins (if not more so!). Combine that with the fact that
it doesn't seem to be clocking any better than the older Northwood and
it seems to me like Intel has a serious *design* problem.

Who knows though, "Prescott" is just the first implementation of this
second-generation NetBurst core. Maybe it's just going through some
growing pains and needs a few tweaks before it really gets going.
Perhaps we won't see it's full potential until "Tejas" is released
sometime next year.
What's happening feels remniscent of what happened at the 1GHz
barrier. Intel, despite its mastery of process technology, just
couldn't get the special sauce to come out right. That fiasco caused
them to hustle the NetBurst architecture onto the stage faster than
they had planned on, and this latest fiasco is encouraging speculation
that Intel may be preparing to hustle NetBurst off the stage faster
than it had planned on.

To be fair to NetBurst, it's already been here for 3 and a half years,
and doesn't look like it's going to disappear before the end of this
year. Considering that Intel's previous longest-running architecture,
the P6 core, lasted only just over 5 years before the P4/NetBurst core
was introduced, I'd say that it's not been hustled off stage TOO
quickly.

Best estimates I've seen are that the P4 and NetBurst core will be
replaced in late 2006/early 2007 with a completely new design. This
would give
I have visions of Ph.D.'s working long into the night running
simulations to try to figure out what's going on and what to do about
it. It would be worth the misery of working under so much pressure
just to see what's really going on.

Haha, you can take that pressure! I'll stick to my
back-of-the-envelope calculations myself! :>
 
Tony Hill said:
To be fair to NetBurst, it's already been here for 3 and a half
years, and doesn't look like it's going to disappear before the
end of this year. Considering that Intel's previous
longest-running architecture, the P6 core, lasted only just over 5
years before the P4/NetBurst core was introduced, I'd say that
it's not been hustled off stage TOO quickly.

and Tualatin is (was) ? and Dohan is ?
Best estimates I've seen are that the P4 and NetBurst core will be
replaced in late 2006/early 2007 with a completely new design.

based on the current Dohan :) being old P6 hehe


Pozdrawiam.
 
As always, Keith, you're right. ;-)

But you didn't answer my question: does (for example) Prescott's L2
dissipate significant power, a significant part of its 100+ watt
dissipation?

Sorry, Felg. No insiders here on Prescott. ;-)
 
Well the transistor count is easy enough to come by, though it
definitely doesn't tell the full picture, ie how many transistors are
only used for testing and therefore disabled in normal use? Prescott
apparently has a lot of those.

If they didn't the thing would never work. Testing is a huge
part of getting working product.
As for power consumption, they do play a lot of games with these
numbers, but as long as the games are consistent for either generation
of chips than the numbers still work out.

I've found that public information is misleading to useless for
such comparisons.
In the end though, you're right, it is a kind of a rough estimate at
best.

I'd use the term "suspect". ;-)

<no guesses on *what* Intel is thinking>
 
and Tualatin is (was) ? and Dohan is ?

The first PPro was released late 1995, while the first P4 was released
late 2000. Tualatin wasn't released until early 2001, so it came out
after the P4. Sure, the core did live on for a little bit longer in
the form of the Celeron, mobile PIII processors and even a handful of
higher-end desktop, workstation and server parts. Similarly the P4 is
not going to disappear in 2007 right after Intel releases a follow-up
core. It will continue to live on for at least a year or two in a
variety of products.

"Banias" and "Dothan" share a lot of the same technology as the P6,
but they definitely aren't the same. There were some rather important
change made to the core, it most definitely is NOT just a "PIII on a
P4 bus" as many people have suggested.

Also, quite simply, there are only so many ways that you can possibly
make a different core. The PIII, the Athlon, the Athlon64/Opteron and
the Pentium-M all share a lot of common concepts and ideas.
 
If they didn't the thing would never work. Testing is a huge
part of getting working product.

For sure. Rumors has it though that the Prescott has a higher
percentage of transistors used exclusively for testing when compared
to other processors of similar size and performance levels. How
accurate these rumors are... well that's another question altogether.
I've found that public information is misleading to useless for
such comparisons.

Perhaps, but they can't still make for some interesting
discussions/flame wars! :>
I'd use the term "suspect". ;-)

Hey now! Nothing *too* suspect about them!

Actually all I was really trying to point out is that the problems
Intel is having with the Prescott are NOT strictly process-related
problems as seemed to be the common consensus. For a chip with 125M
transistors running at 3+ GHz, it's power consumption really out of
whack with expectations.
 
Tony Hill said:
"Banias" and "Dothan" share a lot of the same technology as the
P6, but they definitely aren't the same. There were some rather
important change made to the core, it most definitely is NOT just
a "PIII on a P4 bus" as many people have suggested.

Also, quite simply, there are only so many ways that you can
possibly make a different core. The PIII, the Athlon, the
Athlon64/Opteron and the Pentium-M all share a lot of common
concepts and ideas.

First P2 (deuthchech or something like that), ppga Celerons
(mendocinos), then fcpgas (coppermine) and tadaaa tualatins share the
same bus and work on the very same motherboards. Thats as close as one
can get, so in my opinion its the same core. Dohan is just a tweaked up
tualatin (I bet one could make them run on old iBX). Its still same old
core, just like Athlon FX is also the same tweaked up slot K7 core. Cut
here, add there, but the basic design never changes.

Pozdrawiam.
 
On Sun, 04 Apr 2004 10:40:20 -0400, Tony Hill

Actually all I was really trying to point out is that the problems
Intel is having with the Prescott are NOT strictly process-related
problems as seemed to be the common consensus. For a chip with 125M
transistors running at 3+ GHz, it's power consumption [isn't?] really out of
whack with expectations.

Can it be? I've discovered a mistake in one of Tony Hill's posts?

RM
 
Robert Myers said:
On Sun, 04 Apr 2004 10:40:20 -0400, Tony Hill

Actually all I was really trying to point out is that the problems
Intel is having with the Prescott are NOT strictly process-related
problems as seemed to be the common consensus. For a chip with 125M
transistors running at 3+ GHz, it's power consumption [isn't?] really out of
whack with expectations.

Can it be? I've discovered a mistake in one of Tony Hill's posts?

Yes, Robert, you have. He typed "it's" when he (surely) intended
"its". However, this is a routine error on Tony's part. "It's", as
we all know, means "it is". ;-)

(since this is a grammatical, not spelling, error I've left out the
obligatory mispeling in my reply)
 
Robert said:
On Sun, 04 Apr 2004 10:40:20 -0400, Tony Hill

Actually all I was really trying to point out is that the problems
Intel is having with the Prescott are NOT strictly process-related
problems as seemed to be the common consensus. For a chip with 125M
transistors running at 3+ GHz, it's power consumption [isn't?] really out of
whack with expectations.


Can it be? I've discovered a mistake in one of Tony Hill's posts?

And Prescott heat is not out of whack with /whose/ expectations ?
It certainly is out of whack from the viewpoint of consumers -
Prescott can't compete with an Opteron/Athlon64/AthlonFX but uses
almost twice as much juice and produces almost twice as much heat.
Sure makes it easy for everyone to take no more than two seconds
to evaluate and discard the Prescott option - especially now that
air conditioning season is about to begin in the northern hemisphere.
 
First P2 (deuthchech or something like that), ppga Celerons
(mendocinos), then fcpgas (coppermine) and tadaaa tualatins share the
same bus and work on the very same motherboards. Thats as close as one
can get, so in my opinion its the same core. Dohan is just a tweaked up
tualatin (I bet one could make them run on old iBX).

Not a hope in hell that the Dothan or Banias will run on an old iBX
system, you would have to totally redesign the bus to start with.

When you compare the Pentium-M "Banias" to the old PIII "Tualatin",
there are some significant differences. In fact, there are a lot of
things in there that more closely resemble AMD's Athlon than the old
PIII, ie the Micro-op fusion is not entirely unlike AMD's MacroOps.

The pipeline has also been lengthened, possibly to the same length as
AMD's K7 core or even the K8 core. Fortunately they basically
eliminated the problems associated with a longer core by including a
totally redesigned branch prediction unit (much of which has now found
it's way into the P4 "Prescott").

There are, of course, some noticeable similarities between the old
PIII and the Pentium-M though. These become most apparent in the
execution units, which are pretty much the same. When you get right
down to it, there are only a few ways to do execution units, and
adding extra units adds a lot of cost (more transistors and more
power) without much of a gain in performance.

Beyond those differences, the Pentium-M doubled the size of the L1
cache, doubled the size of the L2 cache and completely redesigned the
register stack unit.

Basically when you get down to it, the Pentium-M uses the same
execution units as the Pentium-III and most of the front-end decoders
are the same, but every other piece of the chip has been significantly
redesigned.
Its still same old
core, just like Athlon FX is also the same tweaked up slot K7 core. Cut
here, add there, but the basic design never changes.

The basic design never changes because there is just so many ways to
design a wide-issue, out-of-order execution superscaler chip. The
store with the AMD K7 vs. K8 core is pretty much the same as the PIII
vs. Pentium-M. Completely different bus design (this has been TOTALLY
changed for the K8, a rather dramatic departure), longer pipeline,
changed cache and TLB, improvements to the branch predictor, tweaks
and modifications to the scheduler, the addition of SSE2 and some
modifications to the instruction decoder. The execution units
remained more or less the same, but most of the rest of the chip has
changed.

If you look at the execution units of the Pentium-III, Pentium-M,
Athlon, Athlon64/Opteron and even the IBM PowerPC 970 and Alpha 21264,
you will find that they share a LOT of the same basic design. Does
this make them the same core?
 
Back
Top