Itanium finally passes Alpha at HP

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
drooling). But one thing you leave out is the Alpha team's tendancy to keep
projects going long after their planned release dates. EV8 would not be

My memory is as blurry as ever, but I faintly seem to remeber that 21064 and
21164 were pretty much right on time. 21364 was late indeed, but with
plenty of excuses for it, given the tribulations of the company during
this time. As for 21264, I can't remember at all, tho I'd guess it was
a bit late, as most other CPUs are. So it seemed like the team was not
particularly bad w.r.t deadlines.


Stefan
 
Sorry for OT, but when hearing those statements from the above quote I
really ask myself why Alpha is being replaced by Itanium.

Why would you think the reasons are technical?


Stefan
 
Nick Maclaren said:
|>
|> It's also wise to observe exactly who was doing the talking - Terry Shannon,
|> well-known HP shill.

That is unfair. He is biassed, because he wouldn't get cooperation
if he wasn't, but he is not simply a shill.

I suppose if you have kept anywhere nearly as close track of Terry's pro-HP
blather as I have, especially since June 25, 2001, you have a right to make
such a statement (though I'll still challenge it).

Have you?

I'll append below a copy of the rebuttal to his two new 'analyses' that I
posted on comp.os.vms:

In 'Beyond Superdome", he first waxes poetic about current Superdome
capabilities, such as their internal interconnect fabric. Let's see: this
is the server architecture (at least somewhat reminiscent of the old and
rather mediocre GS320 server architecture) that using 64 top-of-the-line
Itanics barely manages to stay ahead of the new POWER5 box that requires
only 16 processors (on a grand total of 8 chips, since they're dual-core) in
TPC-C, right?

Then he crows, "HP delivers dual core before Intel" as some kind of
significant achievement. Well, maybe. Of course, Sun is delivering
dual-core SPARC processors today, and IBM started delivering dual-core
POWER4s nearly three years ago. So what beating Intel to the punch mostly
proves is just how far behind the curve Itanic really is, I'd suggest.

Then he starts talking about "How Superdome will maintain leadership", but
in fact that will be impossible - because it's not in the lead right now, so
there's no way it can 'maintain' any lead. And in fact, it will only fall
farther behind during the time-frame during which any reasonably informed
projections can be made.

Terry's first projected performance graph for OLTP explains why. Doubling
current system performance by about a year from now actually sounds pretty
impressive, until you recognize that Superdome's TPC-C performance today
with 64 processors falls slightly behind today's previous-design-generation
POWER4+ systems that use only half that number of processors and only
slightly manages to beat today's POWER5 boxes that use only 1/4th as many
processors. POWER5 will shortly be available with up to 64 processors,
which means that it should beat today's Superdome performance by a factor of
at least 3 within a few months. When Montecito comes along late next year
it will indeed close much of this gap with POWER5 (Terry's second
TPC-C-specific performance graph suggests it should slightly exceed 2
million tpmC), but POWER5 (a full process generation behind Montecito but
still heading for about 3 million tpmC late *this* year) will no longer be
IBM's top-of-the-line product by then, since POWER5+ (in the same process
generation as Montecito) should then be shipping and upping the ante
significantly.

No, once the full-sized POWER5 boxes appear there's no way that
top-of-the-line Superdome OLTP performance should be able to reach more than
about 50% of top-of-the-line POWER performance any time soon. Maybe Tukwila
will help close that gap when it arrives in 2007. Or maybe not, because
POWER6 is due around then. And Fujitsu has regular enhancements to SPARC64
coming along to keep pace with Itanic (though not POWER), regardless of what
one may think of Sun's future efforts for that architecture.

But perhaps the more important observation is that all the glowing
descriptions Terry makes about Superdome are not only features that IBM
perfected many years ago but are things that make pricing anything *but*
commodity-level. So Superdome won't be offering industry-leading
performance *or* industry-leading price/performance, because the x86-64
brigade will be attacking it from beneath on the second front.

'Son of Superdome' will be "Superdome-centric with Alpha attributes"?
That's, like, deja vu all over again: exactly the kind of thing that people
like Terry and Kerry and Rob were telling us the week of June 25, 2001 -
except that the time-frame being discussed back then for the appearance of
the "Alpha/IA64 hybrid" was about now, not 2007.

Well, given that 'about now' is upon us and I don't see any "Alpha/IA64
hybrids" being benchmarked, 2007 seems at least a lot more credible. I
guess my prediction of 2006 three years ago was slightly optimistic, but for
a 5-year-out guesstimate I don't feel *that* ashamed of it.

Terry may have been able to make the Superdome story sound superficially
attractive, but it just doesn't stand up to scrutiny when the *rest* of the
industry is taken into account. And it's also worth considering just how
Terry's fawning descriptions of more long-term architecture development gibe
with the recent reports of a grinning Shane Robison wielding an axe to R&D
like Jack Nicholson in The Shining.

Moving right along, we come to "Why IPF and Why HP: Because SPARC is dead,
Power5 isn't ready for prime time and Extentions won't cut it in the
datacenter."

SPARC is dead, eh? Or 'no longer relevant', as a later slide says. Someone
better tell Fujitsu so it will stop stomping all over the latest Itanics in
commercial benchmarks like jbb2000: that's really not suitable behavior for
a 'dead' processor. And by all means make sure those HP customers who are
defecting to Sun know this: what on earth do you suppose they're thinking?!

As for POWER5 not being ready for prime time, I guess we can say good-bye to
IBM: if they've released an unready product to their customer base, said
base won't be with them for long. Or could one possibly suppose that Terry
is simply blowing yet more thick, black smoke out of his ass for HP?

Intel waited until 1986 to 'begin executing a plan to achieve microprocessor
dominance'? Don't let the iAPX-432 people hear you say that! Oh, wait -
that failed and disappeared after a few years of futile effort...

And Terry's still shouting as loudly as he can (what size font was that?)
that Compaq made the *right* decision to kill Alpha. Well, despite what he
claims, three-plus years later history really doesn't seem at all inclined
to support that thesis, but when you're talking about what history *will*
prove you always have a built-in response to such observations: just wait
some more...

Terry's purported 'analyses' of the relative potential of EPIC vs. RISC, of
relative performance predictions for Itanic vs. Alpha, and of the commercial
viability of Alpha remain as chock-full of shit as ever. They are no more
convincing when looking back from today's vantage point than they were three
years ago, and I'm not going to bother to debunk them in detail yet again:
POWER5 has already done a more than adequate job of doing that out in the
real world, and EV8 (which of course would be shipping today, had it not
been canceled) would have done an even better one. As for the idea that at
least Itanic would provide a compatible hardware platform on which to run
both IA64 *and* IA32 code, well... turns out the hardware supporting IA32
wasn't quite up to the job, so they're replacing it with software
emulation - you know, like Alpha used? But that emulation, though faster
than the previous disaster on Itanic, still can't hold a candle to native
IA32 processors that can run *both* 32-bit and 64-bit code at full speed.

And what's with the slide that shows 64-bit Itanic code out-performing IA32
by a factor of about 2:1 right about now? Last time I checked, they were
pretty much dead-even in many benchmarks, and where that was not true the
leads were split about evenly. Couldn't be just a *bit* of misdirection in
such a slide, could there?

Note how carefully Terry refers to x86-64 as 'extensions' to a 32-bit
architecture, rather than as an actual 64-bit architecture. Kind of makes
you wonder why he doesn't refer to IA32 as 'extensions' to a 16-bit
architecture, doesn't it? After all, that's exactly the same concept.

If you don't believe that IA32 qualifies as a 'real' 32-bit architecture
(despite rather a lot of commercial and scientific evidence to the
contrary), I guess you could swallow the suggestion that x86-64 isn't really
a 64-bit architecture, even though it shows every promise of competing on
equal (and ofter better) footing with the 'real' 64-bit architectures out
there. Ah, Terry. And if you think that Itanic offers any performance
advantage over x86-64, you haven't looked at benchmarks lately.

As for talking about the decline in quality of the trade press, Terry,
that's pretty hard to stomach coming from such a trasnsparent HP
sock-puppet. But the bolder the lie, the more you seem attracted to it.

But inundating readers with dozens of pages of impressive-sounding buzzwords
that he himself apparently understands only in the vaguest terms may
actually be effective in convincing some portion of the population. It
really does seem that HP ought to be paying him *something* for this effort,
plus perhaps a significant tip for the total abandonment of personal
integrity that it requires.

- bill
 
Alex Johnson said:
This is untrue. The EV8 was reported as having vastly better
performance than Itanium or Pentium 4, and I was eagerly waiting for it
(almost drooling). But one thing you leave out is the Alpha team's
tendancy to keep projects going long after their planned release dates.
EV8 would not be availible realistically until 2006.

Utter crap.

- bill
 
I suppose if you have kept anywhere nearly as close track of Terry's pro-HP
blather as I have, especially since June 25, 2001, you have a right to make
such a statement (though I'll still challenge it).

Have you?

No, but I have read a fair number of his articles, both before and
after that, and have found them useful. I can tell you why our
opinions differ.

I regard the trumpet blowing, flag waving and generalised hype as
the content-free rubbish that it is. I doubt that you will disagree
with THAT - and, if you could get him drunk enough, I doubt that
Terry Shannon would, either. Producing that verbiage is the price of
getting the information that he does. I let that wash over my head.

Where he differs from the true shills is that he doesn't manipulate
the facts, and leaves the real information in there for those who
are prepared to dig it out. And I have generally found it pretty
reliable. There are other commentators who I have seen lying black
is white about hard facts in order to justify their position.

I agree that the quality of his articles has gone down as DEC gave
way to Compaq and Compaq to DEC, because he has been trying to
maintain a positive spin in the face of a more and more negative
situation.


Regards,
Nick Maclaren.
 
Matt said:
Call me an Alpha fanboy but I find it simply amusing that a 0.18 µm
Alpha EV7z finally gets overtaken in terms of performance from a not
yet released 0.09 µm Madison with 5 times the amount of on-die cache.
Well, I wonder when the point of "overtaken" would have been reached
if the original plans for an EV79 (0.13µm, 3 MB L2, 1.8 GHz+) had been
realized and not delayed and finally canceled by HP.

All very valid, but a technical correction on your rant: Madison9M is
0.13 µm not 0.09 µm.
 
Bill said:
In 'Beyond Superdome", he first waxes poetic about current Superdome
capabilities, such as their internal interconnect fabric. Let's see: this
is the server architecture (at least somewhat reminiscent of the old and
rather mediocre GS320 server architecture) that using 64 top-of-the-line
Itanics barely manages to stay ahead of the new POWER5 box that requires
only 16 processors (on a grand total of 8 chips, since they're dual-core) in
TPC-C, right?

I believe you have misinterpretted the "16 processor" POWER5. IBM
actually refers to chips. "16 processor" as reported is 16 POWER5
chips, comprised of 32 cores, allowing 64 threads of execution. So the
64-thread Madison vs the 64-thread POWER5 having similar performance is
just a sign that things are about equal. I'm stunned by how good POWER5
is. But I know that next year Montecito will go from 1 thread per
package to 4 threads per package. Itanium will be down to a 16P system
to compete with IBM's 16P system.
Then he crows, "HP delivers dual core before Intel" as some kind of
significant achievement. Well, maybe. Of course, Sun is delivering
dual-core SPARC processors today, and IBM started delivering dual-core
POWER4s nearly three years ago. So what beating Intel to the punch mostly
proves is just how far behind the curve Itanic really is, I'd suggest.

And Itanium being behind the curve is a joint decision between intel and
HP, pushed by HP. If not for staffing levels on Itanium a few years
back and HP pushing to be the first to do the interesting dual-core
project, there would have been a dual-core Itanium 2 on the market last
year.
Doubling
current system performance by about a year from now actually sounds pretty
impressive, until you recognize that Superdome's TPC-C performance today
with 64 processors falls slightly behind today's previous-design-generation
POWER4+ systems that use only half that number of processors and only
slightly manages to beat today's POWER5 boxes that use only 1/4th as many
processors.

As explained above, if you compare per thread, these machines are
equivalent in size (64P Madison, 32P * 2 cores POWER4+, 16P * 2 cores *
2 threads POWER5).
When Montecito comes along late next year
it will indeed close much of this gap with POWER5 (Terry's second
TPC-C-specific performance graph suggests it should slightly exceed 2
million tpmC), but POWER5 (a full process generation behind Montecito but
still heading for about 3 million tpmC late *this* year) will no longer be
IBM's top-of-the-line product by then, since POWER5+ (in the same process
generation as Montecito) should then be shipping and upping the ante
significantly.

I have not seen these graphs. Could you tell me what configuration
those X million tpmC results are for? 4P, 16P, 64P, 64 *thread*. How
are the estimates being made. I don't have a lot of TPC numbers, but I
know a 4-socket Madison today is 121K and a 4-socket POWER5 (yes, that's
16 threads) is 371K and Montecito is supposed to also be around 370K in
4-socket. It will be a tight race. If you could explain the
configurations, that would help me. If you could quote published
4-socket numbers for POWER4 and POWER4+, that would help me (I'm trying
to make a table).
And Fujitsu has regular enhancements to SPARC64
coming along to keep pace with Itanic (though not POWER), regardless of what
one may think of Sun's future efforts for that architecture.
SPARC is dead, eh? Or 'no longer relevant', as a later slide says.
Someone better tell Fujitsu so it will stop stomping all over the
latest Itanics in commercial benchmarks like jbb2000: that's really
not suitable behavior for a 'dead' processor. And by all means make
sure those HP customers who are defecting to Sun know this: what on
earth do you suppose they're thinking?!

Fujitsu is far ahead of Sun in performance, but they are far behind even
the laggard (intel) when it comes to features. They say dual-core at
end of '05, dual-core with 2 threads each sometime in '07. Compare that
to Montecito which is mid-'05 with dual-core and 2 threads per core.
Two years ahead. What Fujitsu *does* have that keeps pace, or even
stays ahead, is RAS. I don't have much hope for the SPARC family going
ahead. Niagra and Rock could either be a revolution or a flop, but I
think that if Sun sticks with SPARC-64, it will drag them to the bottom
of the ocean.

Itanium is a lousy performer in Java. That is because Java employs
self-modifying code and the Itanium spec explicitly states you can't do
that. It was shortsighted to put that in, but they hoped to escape the
IA32 complexities self-modifying code added to the design. It cost them
vast amounts of performance. It was analyzed and Montecito should have
much better jbb results as they've added new instructions and features
to directly speed up SMC. After all, intel targetted Sun when they
marketted Itanium. To leave Java performance in the shitter would be
marketting suicide.
Well, given that 'about now' is upon us and I don't see any "Alpha/IA64
hybrids" being benchmarked, 2007 seems at least a lot more credible. I
guess my prediction of 2006 three years ago was slightly optimistic, but for
a 5-year-out guesstimate I don't feel *that* ashamed of it.

Yeah, having "hybrid" designs now was always BS. It was from an
external guess with no information. Assuming the Alpha folks were
divided up and sent to each project being worked on in 2001, there might
be Alpha concepts coming out now, but intel kept the Alphans together
and gave them a new project of their own that wasn't on any roadmaps in
2001. Shannon just couldn't have known that and spread his rosie ideal
picture of the future. It was unprofessional to report on what he'd
like to see rather than what he knows, but it's common practice.

Alex
 
My memory is as blurry as ever, but I faintly seem to remeber that
21064 and 21164 were pretty much right on time. 21364 was late
indeed, but with plenty of excuses for it, given the tribulations of
the company during this time. As for 21264, I can't remember at
all, tho I'd guess it was a bit late, as most other CPUs are. So it
seemed like the team was not particularly bad w.r.t deadlines.

The EV6 was significantly delayed due to funding being chopped and
having to remove/not do development and debugging extras for the chip.
Net result was when early chips dropped their guts, you had 2 choises;
reset and read out the Jtag, nothing interesting here, move on, or
suck out a pile of bits and wonder which of them are live registers,
where the PC is, or thinks it is...

That delayed the 264, and also delayed the EV7, as it was to use the
EV6 core as a drop in part, sort off, plus delayed people moving onto
the EV8. Well done curly!

The EV8 was not far from 1st tape out when the axe fell I was told.

--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.
 
Alex Johnson wrote:
[...]
Itanium is a lousy performer in Java. That is because Java employs
self-modifying code and the Itanium spec explicitly states you can't do
that.

The last few times I looked, HP's Itanium JVM (derived from Sun's Hotspot)
hold the top SpecJBB numbers, and not just at one price point but
for every hardware level (4 way, 16 way, 64 way). I must admit I haven't
looked in a couple of months though.

In the last results I saw, a 64 way Superdome had better numbers than
the top Sun platform (106 way ?).

The Itanium architecture has no problem with self-modifying code.
One just has to be explicit about it (sync.i). It's trivial for a JIT.
Bundle-sized atomic writes (documented in the publicly available
architecture documents) will provide further help once available.

I won't comment on the rest of your post.

Eric
 
|>
|> In the last results I saw, a 64 way Superdome had better numbers than
|> the top Sun platform (106 way ?).

That's the F15K with all possible MaxCats. 106 CPUs in a box is
a joke, and nobody has that many. We are seriously unusual, even
at 100. The UltraSPARC IIIcu never was the fastest CPU around,
and the strength of the F15K is that it maintains performance
under parallel, memory-limited loading. My guess is that benchmark
is heavily CPU-limited.

There is now the dual-core F25K, which can have up to 144 CPUs
in a box. That might be faster. But no current SPARCs are known
for blazing single-CPU performance.


Regards,
Nick Maclaren.
 
Alex Johnson said:
I believe you have misinterpretted the "16 processor" POWER5.

You believe incorrectly.

IBM
actually refers to chips.

What IBM may or may not 'refer' to (and they're not always very consistent
in this) is not what matters in this case. What matters is how TPC-C counts
processors - and TPC-C counts cores as processors.

"16 processor" as reported is 16 POWER5
chips, comprised of 32 cores, allowing 64 threads of execution.

Incorrect. It is 16 cores, on 8 chips, allowing 32 threads of execution.

So the
64-thread Madison vs the 64-thread POWER5 having similar performance is
just a sign that things are about equal.

Absolute rubbish. The POWER5 system uses 1/4 as many cores, on 1/8 as many
chips, to achieve 80% of the result. Equating dual-thread SMT to twice as
many cores is the kind of nonsense even someone like Terry probably would
not try to pull off: at *most*, it likely improves the TPC-C throughput of
each core by about 30%, which still leaves each 'raw' (non-SMT) POWER5 core
pumping *well* over twice the TPC-C throughput of each Itanic core (hardly
surprising, since even the 32-core non-SMT POWER4+ system marginally beat
out the 64-processor Superdome in TPC-C).

I'm stunned by how good POWER5
is. But I know that next year Montecito will go from 1 thread per
package to 4 threads per package. Itanium will be down to a 16P system
to compete with IBM's 16P system.

If you consider that having less than half the TPC-C performance of the
POWER5 system with an equal number of cores qualifies as 'competing with'
it, perhaps.
And Itanium being behind the curve is a joint decision between intel and
HP, pushed by HP. If not for staffing levels on Itanium a few years
back and HP pushing to be the first to do the interesting dual-core
project, there would have been a dual-core Itanium 2 on the market last
year.

More bullshit.

Adding staffing to Itanic wouldn't have speeded up the applicable process
technology significantly, so any dual-core Itanic released last year would
still have been in 130 nm. IIRC the current Madison core occupies about 43%
of the space on a 376 mm^2 chip: doubling it would have left no room for
the gargantuan caches that Itanic requires to achieve competitive
performance, not to mention creating a 200W chip to have to cool.
As explained above, if you compare per thread, these machines are
equivalent in size (64P Madison, 32P * 2 cores POWER4+, 16P * 2 cores *
2 threads POWER5).

As explained above, that explanation is even more chock-full of shit than
Terry's tend to be.
I have not seen these graphs. Could you tell me what configuration
those X million tpmC results are for? 4P, 16P, 64P, 64 *thread*. How
are the estimates being made. I don't have a lot of TPC numbers, but I
know a 4-socket Madison today is 121K and a 4-socket POWER5 (yes, that's
16 threads) is 371K and Montecito is supposed to also be around 370K in
4-socket. It will be a tight race. If you could explain the
configurations, that would help me. If you could quote published
4-socket numbers for POWER4 and POWER4+, that would help me (I'm trying
to make a table).

Why don't you try getting a clue what you're talking about first? Learning
something about what SMT is and is not would be a good start. Then try
getting some *quantitative* idea about how much the different SMT
implementations you're so casually throwing together add to the performance
of the core they're associated with.
Fujitsu is far ahead of Sun in performance, but they are far behind even
the laggard (intel) when it comes to features.

Except that they seem to be trouncing Intel in jbb2000, and seem likely to
do very well in other commercial benchmarks given Fujitsu's experience in
large-system design. Funny about that.

They say dual-core at
end of '05, dual-core with 2 threads each sometime in '07. Compare that
to Montecito which is mid-'05 with dual-core and 2 threads per core.
Two years ahead.

Or zero years ahead, depending upon how useful Montecito's relatively crude
two-way SMT turns out to be. But even if SPARC64 falls slightly behind
Itanic in performance (a fact decidedly not yet in evidence) it probably
won't hurt Sun: it will still be far more relatively competitive in
performance than any Sun SPARC has been in recent memory, so should if
anything improve Sun's position.

- bill
 
Paul said:
That delayed the 264, and also delayed the EV7, as it was to use the
EV6 core as a drop in part, sort off, plus delayed people moving onto
the EV8. Well done curly!

The EV8 was not far from 1st tape out when the axe fell I was told.

I have to correct myself. I was not accurate when I guessed a 2006 EV8
release to the public. From the horse's mouth: "end of this year or
early 2005." So the Alpha with 4 threads would have been competing
against the POWER with 4 threads and in 6-9 months the Pentium 4 with 4
threads and the Itanium with 4 threads. All is right in the world
again, as every company stays relatively abreast of it's competitors'
threadcount.

....except SPARC64, which will have 4 threads in 2007, from news earlier
this year.

Alex
 
You can tell by the low prices used alpha systems
1.) The Alpha servers/workstations available on E-Bay are
seldom with processors faster than 233 or 266 MHz.
In other words, only the really ancient stuff is being
sold on E-Bay - so its no surprise that the prices are
low.

Components for upgrading more modern Alpha servers, such
as 1 and 2 GB Memory upgrades for Alpha servers, by contrast
are selling for big bucks. People are willing to pay big
premiums to keep there Alpha servers are alive and well -
hardly a sign that the Alpha is history.

2.) Check the "for sale" newsgroups for your province or city.
Even in Saskatchewan (sk.forsale) we occasionally get local
sales of Alpha systems comparable to what is available on
E-Bay. However, those too are 233 and 266 MHz systems
almost all the time.

3.) What's wrong with using E-Bay but just limiting your search
to Canadian sellers ?

Still there WHERE some alpha's available some time ago, and I missed
my opportunity to get them... [Rembered that at highschool I was
intern in certain city hall, and there where workstations there with
96kb of ondie L2 cache and ran over 500mhz and that time I was using a
brand new PPro... Anyway they put it dumpster when cpq decided to kill
alpha and got PC as replacement...
[I would of loved to get that peace of HW as upgrade from my 366
celeron at the time, since I've been using linux for quite a while.]
Nowadays people put things on ebay, but if there is shop that uses
alpha and is migrating away there is chance of getting one if its
medium age cheap enough.]
 
Alex said:
This is untrue. The EV8 was reported as having vastly better
performance than Itanium or Pentium 4, and I was eagerly waiting for it
(almost drooling). But one thing you leave out is the Alpha team's
tendancy to keep projects going long after their planned release dates.

The problem with *your* analysis is that you view the Alpha demise in
isolation.

Alpha development was drained of resources long before the axe fell.
 
Alex Johnson said:
I believe you have misinterpretted the "16 processor" POWER5. IBM
actually refers to chips. "16 processor" as reported is 16 POWER5
chips, comprised of 32 cores, allowing 64 threads of execution. So the
64-thread Madison vs the 64-thread POWER5 having similar performance is
just a sign that things are about equal. I'm stunned by how good POWER5
is. But I know that next year Montecito will go from 1 thread per
package to 4 threads per package. Itanium will be down to a 16P system
to compete with IBM's 16P system.

No, you are incorrect.

IBM's always refers 16 processors as the number of cores. So a
maximum p570 is 8 Power5 Dual core chips, 16 cores and with SMT 32
threads.

On a per chip basis then, Power5 is > 6X the performance on TPC-C
compared to the HP Superdome.

Here's their spec submission for a fully loaded p570:
http://www.spec.org/cpu2000/results/res2004q3/cpu2000-20040712-03234.html
And Itanium being behind the curve is a joint decision between intel and
HP, pushed by HP. If not for staffing levels on Itanium a few years
back and HP pushing to be the first to do the interesting dual-core
project, there would have been a dual-core Itanium 2 on the market last
year.


As explained above, if you compare per thread, these machines are
equivalent in size (64P Madison, 32P * 2 cores POWER4+, 16P * 2 cores *
2 threads POWER5).

Nope, wrong again. Power4+ is 16 Chips * 2 cores. Power4+ 4X better
per chip compared to Itanium in TPC-C.
I have not seen these graphs. Could you tell me what configuration
those X million tpmC results are for? 4P, 16P, 64P, 64 *thread*. How
are the estimates being made. I don't have a lot of TPC numbers, but I
know a 4-socket Madison today is 121K and a 4-socket POWER5 (yes, that's
16 threads) is 371K and Montecito is supposed to also be around 370K in
4-socket. It will be a tight race. If you could explain the
configurations, that would help me. If you could quote published
4-socket numbers for POWER4 and POWER4+, that would help me (I'm trying
to make a table).

IBM has not published any smaller configuaration numbers for Power4+.

Here is the best non clustered results in terms of performance for
Itanium and power5/power4+ on a per core basis.

4 cores
IBM eServer p5 570 4P - Oracle 10G - 194,391
HP Integrity rx5670 Linux - Oracle 10G - 136,110

8 Cores
IBM eServer p570 8P - Oracle 10G - 371,044
Bull NovaScale 5080 - C/S SQL Server 2000 - 175,366

16 cores
IBM eServer p5 570 16P - IBM DB2 UDB 8.1 - 809,144
Unisys ES7000 Aries 420 Enterprise Server - SQL Server 2000 - 309,036

32 cores
IBM eServer pSeries 690 - IBM DB2 UDB 8.1 - 1,025,486
NEC Express5800/1320Xd - Oracle 10G - 683,575

64 cores
HP Integrity Superdome - Oracle 10G - 1,008,144

All itaniums used 1.5ghz chips
power5s were 1.9Ghz
the p690 (power4+) was 1.9ghz

The 64 way (32 Chips * 2 cores) Power5 based p5 590 is due to be
announced in the next 2 months!

Another thing to note is performance of power5 is greater when using
DB2 compared to Oracle 10G. If IBM had submitted 4 way and 8 way
TPC-C results using DB2, the performance in likely to be ~220K and
~420K respectively.
Fujitsu is far ahead of Sun in performance, but they are far behind even
the laggard (intel) when it comes to features. They say dual-core at
end of '05, dual-core with 2 threads each sometime in '07. Compare that
to Montecito which is mid-'05 with dual-core and 2 threads per core.
Two years ahead. What Fujitsu *does* have that keeps pace, or even
stays ahead, is RAS. I don't have much hope for the SPARC family going
ahead. Niagra and Rock could either be a revolution or a flop, but I
think that if Sun sticks with SPARC-64, it will drag them to the bottom
of the ocean.

Itanium is a lousy performer in Java. That is because Java employs
self-modifying code and the Itanium spec explicitly states you can't do
that. It was shortsighted to put that in, but they hoped to escape the
IA32 complexities self-modifying code added to the design. It cost them
vast amounts of performance. It was analyzed and Montecito should have
much better jbb results as they've added new instructions and features
to directly speed up SMC. After all, intel targetted Sun when they
marketted Itanium. To leave Java performance in the shitter would be
marketting suicide.

A few things to note:
Montecito's dual thread implementation is not SMT, it is the much
simpler HMT. Performance increase expected from this is much less
than SMT.

Itanium's java performance is not that lousy. It is approx similar to
power4+ and a bit less than Sparc64V. Way behind power5 though.

I'll do another comparison using specjbb2000

8 core
IBM eServer p5 570 1.9Ghz - 328996
Fujitsu PRIMEPOWER650 1.89Ghz- 213956
HP Integrity rx7620 Server - 190393

16 core
IBM eServer p5 570 1.9Ghz - 633106
Fujitsu PRIMEPOWER900 1.89Ghz - 402961
HP Integrity rx8620 Server 1.5Ghz - 341098

32 core
Fujitsu PRIMEPOWER1500 1.89Ghz - 663133
NX7700 i9510 1.5Ghz - 580536
IBM eServer pSeries 690 Turbo 1.7Ghz - 553480

48 core
Sun Fire E6900 1.2Ghz - 421773

64 core
HP Integrity Superdome server 1.5Ghz - 1008604
Fujitsu PRIMEPOWER2500 1.3Ghz - 835479

112 core
Fujitsu PRIMEPOWER2500 1.3Ghz - 1420177

Note that power4+ results didn't use their fastest processor (1.9Ghz)
The primepower 2500 is shipping (or about to) with 1.82Ghz Sparc64V.
No official results are in for that config yet.

Of course Itaniums with 1.7Ghz /9M is due very soon as well.

As you can see, in specjbb2000, the lead of power5 is not as great as
the lead of that chip in tpc-c compared to itanium.
 
Thu said:
Alex Johnson <[email protected]> wrote in message
snip

sorry if I screwed up the attributions. The above may have been written
by Bill Todd. Not sure.

Anyway, finally crawled through the pitch. HP didn't do dual core.
What they did was redesign the package to allow two chips to fit in an
area where Intel packages one chip. Their "special relationship"
apparently lets them buy unpackaged die.

Terry has really slipped since he was writing about DEC.
I guess all his sources got layed off or sent to Intel.

del cecchi
 
Alex Johnson wrote:
[...]
Itanium is a lousy performer in Java. That is because Java employs
self-modifying code and the Itanium spec explicitly states you can't do
that.

The last few times I looked, HP's Itanium JVM (derived from Sun's Hotspot)
hold the top SpecJBB numbers, and not just at one price point but
for every hardware level (4 way, 16 way, 64 way). I must admit I haven't
looked in a couple of months though.

You've missed some recent results. HP's Itanium performance in
SPECjbb2000 isn't really what I would call "lousy", but it's
definitely NOT where Intel would like it to be I'm sure, particularly
if you look at 4-core systems (the most widely used config in this
benchmark). Here the fastest 5 chips are:

1. IBM Power5 - 170127
2. AMD Opteron - 133427
3. Intel XeonMP - 118031
4. Intel Itanium2 - 116466
5. IBM Power4 - 96377


The results for the Itanium are a bit dated and the most current
version of the hardware and software would probably move it up ahead
of the Xeon at least, but it seems unlikely that it could match the
Opterons performance in this test while the Power5 looks well out of
reach.

Also of note is that HP's PA-8800 processor is still turning in some
VERY respectable scores in this test considering how dated the design
is. There are no 4-core designs, but their 4-chip/8-core turns in a
result of 214932, finishing just ahead of Fujitsu's SPARC64V at 213956
and noticeably ahead of the top 8-processor Itanium result of 190393
(again, a slightly dated result but on current hardware).


Anyway, as it stands right now it looks like IBM's Power5 is the chip
to beat in this test (as it is in pretty much all benchmarks) on the
high-end while AMD's Opteron and Intel's XeonMP are almost certainly
the "bang-for-buck" leaders (again, pretty much the norm here).
 
del cecchi said:
snip

sorry if I screwed up the attributions. The above may have been written
by Bill Todd. Not sure.

The last part you quoted was.
Anyway, finally crawled through the pitch. HP didn't do dual core.
What they did was redesign the package to allow two chips to fit in an
area where Intel packages one chip. Their "special relationship"
apparently lets them buy unpackaged die.

Actually, my impression was that Terry was referring to the true dual-core
PA-RISC 8800 (which I think is currently shipping), not to the dual-chip
kludge HP created as an interim band aid for Itanic.

- bill
 
That's the F15K with all possible MaxCats. 106 CPUs in a box is
a joke, and nobody has that many. We are seriously unusual, even
at 100. The UltraSPARC IIIcu never was the fastest CPU around,
and the strength of the F15K is that it maintains performance
under parallel, memory-limited loading. My guess is that benchmark
is heavily CPU-limited.

The OP could have done a simple SPEC query to find out; the HP
result seems to have lost its top billing to a 112-way Fujitsu
Prime Power system. (Which seems to deliver about as "SpecJBBs"
per CPU per MHz as the HP system)

Casper
 
On a per chip basis then, Power5 is > 6X the performance on TPC-C
compared to the HP Superdome.

What I would like to know is how the various designs work on benchmarks
that are not embarrassingly parallel (in a memory access sense). The
tradeoff appears to be between single-threaded performance and
scalability (even for non-communicating codes), and you can't have
both. Now, I haven't tested the Altix or SuperDome myself, but I
can witness that most other designs seem to do one or the other well,
but never both.

This is why SGI and Sun systems do much better in practice than that
sort of benchmark would imply - and do NOT trail in the way that they
are said to in the press.
Montecito's dual thread implementation is not SMT, it is the much
simpler HMT. Performance increase expected from this is much less
than SMT.

What on earth is that new TLA? Anyway, a simpler form of threading
is likely to be MORE efficient than Pentium 4 SMT, because the latter
is a technical failure (though a marketing success). Even Eggers'
model (MIPS-based) showed that there was a pretty marginal gain (and
might be none) over a simple CMT design.


Regards,
Nick Maclaren.
 
Back
Top