DC Opteron spanks IBM Hurricane Xeon

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
Y

Yousuf Khan

Not too long after IBM just released the Hurricane chipset for Xeons,
which took the crown away from Opteron, a 4-way dual-core Opteron system
came out swinging. An IBM Hurricane 4-way xSeries 366 does 150704
transactions per minute, while an HP DL585 4-way dual-core Opteron does
187296 TPM in the TPC-C tests.

Looks like the battle continues.

The Linux Beacon--Battle of the X64 Platforms
http://www.itjungle.com/tlb/tlb051005-story02.html

Yousuf Khan
 
Not too long after IBM just released the Hurricane chipset for Xeons,
which took the crown away from Opteron, a 4-way dual-core Opteron system
came out swinging. An IBM Hurricane 4-way xSeries 366 does 150704
transactions per minute, while an HP DL585 4-way dual-core Opteron does
187296 TPM in the TPC-C tests.

Looks like the battle continues.

The Linux Beacon--Battle of the X64 Platforms
http://www.itjungle.com/tlb/tlb051005-story02.html

That article says many more interesting things than that a dual-core
Opteron 4-way beats at Xeon single-core 4-way by a wopping 20%.

Interesting speculation about the origins of the x86-64 instructions
sets (could be Intel's, after all, eh, George).

Even more interesting comment about the difficulty of going beyond
dual core on a single die, no explanation, other than hand-waving
about cache hierarchies as to why.

If you can't expand the number of cores and you can't speed up the
clock, then, that's it, we're done, before the end of Moore's law
(short of a completely new architecture. I like obvious expansability
of the Cell architecture, myself.)

RM
 
Yousuf Khan said:
Not too long after IBM just released the Hurricane chipset for Xeons,
which took the crown away from Opteron, a 4-way dual-core Opteron
system came out swinging. An IBM Hurricane 4-way xSeries 366 does
150704 transactions per minute, while an HP DL585 4-way dual-core
Opteron does 187296 TPM in the TPC-C tests.

Looks like the battle continues.

The Linux Beacon--Battle of the X64 Platforms
http://www.itjungle.com/tlb/tlb051005-story02.html

Yousuf Khan

How does a 32 way opteron do compared to a 32 way X3 x366? And isn't
there a dual core Xeon coming on down the road from your buddies at
Intel? Pretty tough to beat opteron until it runs out of HT links. :-)

del cecchi
 
Robert said:
Interesting speculation about the origins of the x86-64 instructions
sets (could be Intel's, after all, eh, George).

There's only so many ways to skin a cat. Extending x86 out to 64-bit
can be done in a number of ways, all of which would end up looking
similar -- it has to be /x86/ in the end, afterall. One thing that
could've been done different would be whether to retain segment
registers in the extended instruction set. Another change that was
entirely upto the designers was whether to extend the number of general
purpose registers or not.
Even more interesting comment about the difficulty of going beyond
dual core on a single die, no explanation, other than hand-waving
about cache hierarchies as to why.

It just sounded like all he was saying was that it quickly gets to a
situation where there are too many threads to go around. Lots of
execution units (multiple cores with multiple threads), but not enough
programs to run on them. But I can't agree with that assessment, even
in a desktop system you have tons of background processes always
running. If you can run each of those simultaneously without the need
to timeslice them as much, then you'll get a more responsive feeling
system, even if none of these background processes uses a lot of CPU
time.

Yousuf Khan
 
How does a 32 way opteron do compared to a 32 way X3 x366?

We'll have to wait until EITHER of those systems are available before
we can know for sure ;)
And isn't
there a dual core Xeon coming on down the road from your buddies at
Intel?

Dual-core Xeon for 1 and 2 processor workstations should be out late
this year or early next year. The dual-core version of the XeonMP
used in the above-mentioned x366 is probably more than a year away
from release.
Pretty tough to beat opteron until it runs out of HT links. :-)

Beyond 8 sockets you need to go to a cross-bar design for an Opteron,
and such a design might be beneficial when going beyond 4 sockets.
For the Xeon you have to go to a cross-bar design at 4 socket as well,
so really the order of things don't change much here. Only problem is
that, to date, no company has released an Opteron chipset designed to
work in a cross-bar sort of setup. Those Newisys folks have talked
about one with their Horus chip and I think they've even done a few
demos, but nothing that's available in the market yet.
 
Tony Hill said:
We'll have to wait until EITHER of those systems are available before
we can know for sure ;)


Dual-core Xeon for 1 and 2 processor workstations should be out late
this year or early next year. The dual-core version of the XeonMP
used in the above-mentioned x366 is probably more than a year away
from release.


Beyond 8 sockets you need to go to a cross-bar design for an Opteron,
and such a design might be beneficial when going beyond 4 sockets.
For the Xeon you have to go to a cross-bar design at 4 socket as well,
so really the order of things don't change much here. Only problem is
that, to date, no company has released an Opteron chipset designed to
work in a cross-bar sort of setup. Those Newisys folks have talked
about one with their Horus chip and I think they've even done a few
demos, but nothing that's available in the market yet.

Sorry about the momentary lapse of reason. I was referring to the x460
which was announced today. Here is the availability stuff
--------------------------
The IBM eServer xSeries 460 is planned to be available in mid-June. The
x460 entry price starts at $18,129 in the U.S., and typical eight-way
configurations start at $72,182 in the U.S. IBM's eServer X3
architecture-based systems run new scalable 64-bit x86 operating system
software from major technology vendors including Microsoft, Red Hat and
Novell.
---------------------------------

As for performance, an 8way is said to do 250k tpmc (the press release is
at http://biz.yahoo.com/iw/050601/087845.html
) and it is "dual core capable". Goes up to a 32 way.

Del Cecchi
 
That article says many more interesting things than that a dual-core
Opteron 4-way beats at Xeon single-core 4-way by a wopping 20%.

Interesting speculation about the origins of the x86-64 instructions
sets (could be Intel's, after all, eh, George).

Ha-ha never miss a chance to diminish AMD do you?:-) One thing you maybe
missed of course is that this article is an Intel mouthpiece: he knows all
the Intel CPU *and* platform code names -- in *excruciating* detail -- well
into 2006 and still calls Opteron a "Sledgehammer"... d'oh, what happened
to Venus, Troy, Athens, Denmark, Italy, Egypt?... all of them publicly
available.

Then it's a "conspiracy theorist" err, theory. Surely Intel must have
looked at some kind of x86-64 at some time prior to AMD but ISTR the word
"impossible" or maybe "impractical" being mentioned at one time when they
were trying to justify Itanium. Certainly Intel seemed to want *something*
of AMD's... or the would never have signed the cross-license agreement of
Jan 1, 2001.

It does also irk me when people refer to AMD64 as "memory extensions"...
belies a possible ignorance of the details IMO. The one thing we know
which is Intel's in AMD64 is the use of SSEx for addressable FP
registers... something which AMD adopted after the above cross-license
agreement. Recall that prior to that AMD had defined a completely new FPU
with 16 named registers.
Even more interesting comment about the difficulty of going beyond
dual core on a single die, no explanation, other than hand-waving
about cache hierarchies as to why.

Well IBM said it and they are ahead of the game here.
If you can't expand the number of cores and you can't speed up the
clock, then, that's it, we're done, before the end of Moore's law
(short of a completely new architecture. I like obvious expansability
of the Cell architecture, myself.)

As we already discussed, Cell as we know it, would require a major rework
to get above 4GB memory and DP FPU - "doable" I suppose but.....
 
George Macdonald said:
Then it's a "conspiracy theorist" err, theory. Surely Intel must have
looked at some kind of x86-64 at some time prior to AMD but ISTR the word
"impossible" or maybe "impractical" being mentioned at one time when they
were trying to justify Itanium. Certainly Intel seemed to want *something*
of AMD's... or the would never have signed the cross-license agreement of
Jan 1, 2001.

At the time of the Intel/HP collaboration, Intel was working on P7,
which was a 64 bit x86 processor. I do not believe that Intel (or
anyone representing Intel) would have suggested anything that
suggests that x86-64 was "impossible" or "impractical". Switching
from x86-64 ISA to IA64 was not because x86-64 was "impossible" or
"impractical". It was believed to be "better".

When Intel finally decided to do the 64 bit extension to x86, it
could well have done whatever it wanted to do in terms of
programming model extension. The problem is ofcourse the guy
up in NW part of the US put his foot down and says that he'll not
support yet another ISA from Intel. It would be an interesting
power struggle, but if BG wants to be bullheaded about it, he wins
by default. Nothing Intel can do.
As we already discussed, Cell as we know it, would require a major rework
to get above 4GB memory and DP FPU - "doable" I suppose but.....

IBM taped out a new CELL processor with a new PPE. The die size
grew from 221 mm^2 to 235 mm^2. Compared to that effort, any memory
system re-work needed to get more memory capacity support is relatively
minor.

Also, DP FPU is already in both the PPE and SPE, so I'm not sure what
you mean by "and DP FPU".
 
George said:
Ha-ha never miss a chance to diminish AMD do you?:-) One thing you maybe
missed of course is that this article is an Intel mouthpiece: he knows all
the Intel CPU *and* platform code names -- in *excruciating* detail -- well
into 2006 and still calls Opteron a "Sledgehammer"... d'oh, what happened
to Venus, Troy, Athens, Denmark, Italy, Egypt?... all of them publicly
available.

Oh, I don't know, the article seemed relatively even-handed, even if he
was more familiar with Intel technology than AMD ones. For example, he
went into tremendous historical background about Intel's power
management technologies and how they ended up in its server chips. But
then he sort of just said AMD's Powernow technology has been in
production in Opterons for a long time now, so it's nothing new for AMD.
Yes, it was much less verbiage for AMD, but not dismissive in any way.

Oh and he did refer to AMD64 by its previous name of x86-64, while
easily remembering to call Intel EM64T. Yes, definitely more familiar
with Intel technology it looks like.
It does also irk me when people refer to AMD64 as "memory extensions"...
belies a possible ignorance of the details IMO. The one thing we know
which is Intel's in AMD64 is the use of SSEx for addressable FP
registers... something which AMD adopted after the above cross-license
agreement. Recall that prior to that AMD had defined a completely new FPU
with 16 named registers.

I get the feeling that AMD was playing the same game as Intel in that
case. There was tremendous speculation about whether AMD would adopt
Intel's SSE2 specs for K8-generation processors, much like there was
speculation about whether Intel would adopt AMD's version of x86-64. In
the end, both parties did adopt each other's technology thus providing
software companies a huge sigh of relief. It was a big game of bluff.


Yousuf Khan
 
There's only so many ways to skin a cat. Extending x86 out to 64-bit
can be done in a number of ways, all of which would end up looking
similar -- it has to be /x86/ in the end, afterall. One thing that
could've been done different would be whether to retain segment
registers in the extended instruction set. Another change that was
entirely upto the designers was whether to extend the number of general
purpose registers or not.
As the article suggests, getting identical instruction sets, even what
you regard as narrow design constrainsts, would have been about as
likely as winning the lottery.
It just sounded like all he was saying was that it quickly gets to a
situation where there are too many threads to go around. Lots of
execution units (multiple cores with multiple threads), but not enough
programs to run on them. But I can't agree with that assessment, even
in a desktop system you have tons of background processes always
running. If you can run each of those simultaneously without the need
to timeslice them as much, then you'll get a more responsive feeling
system, even if none of these background processes uses a lot of CPU
time.
If that's what he was saying, he wasn't very explicit about it.
Server applications have an endless supply of threads. Maybe that's
the kind of very specific application he had in mind when talking
about Niagara.

RM
 
Robert said:
As the article suggests, getting identical instruction sets, even what
you regard as narrow design constrainsts, would have been about as
likely as winning the lottery.

Oh, is that what the article suggests? I don't know why it even has to
suggest that. It's obvious in that case why they have identical
instruction sets -- Intel copied AMD. Quite legitimately of course,
they do have a cross-licensing agreement which covers the instruction
set. It's what's allowed AMD to copy SSE3 so quickly too, as well as
the Hyperthreading interfaces. Multicore processors are identified
through the same software interfaces that Intel used to identify
Hyperthreaded processors.
If that's what he was saying, he wasn't very explicit about it.
Server applications have an endless supply of threads. Maybe that's
the kind of very specific application he had in mind when talking
about Niagara.

I think there is a built-in assumption, almost an arrogance, that home
users don't multitask. Home users are multitasking without even being
aware of it most of the time.

Yousuf Khan
 
At the time of the Intel/HP collaboration, Intel was working on P7,
which was a 64 bit x86 processor. I do not believe that Intel (or
anyone representing Intel) would have suggested anything that
suggests that x86-64 was "impossible" or "impractical". Switching
from x86-64 ISA to IA64 was not because x86-64 was "impossible" or
"impractical". It was believed to be "better".

"Better"... but for whom though?:-) Intel detested the fact that x86 was a
semi-open architecture & ISA and Itanium solved that "problem". I honestly
don't remember the wording but recall that Intel was said to be concerned
about competitiveness with the RISC machines which everybody was raving
about at the time... something which could have been perceived as
impossible.
When Intel finally decided to do the 64 bit extension to x86, it
could well have done whatever it wanted to do in terms of
programming model extension. The problem is ofcourse the guy
up in NW part of the US put his foot down and says that he'll not
support yet another ISA from Intel. It would be an interesting
power struggle, but if BG wants to be bullheaded about it, he wins
by default. Nothing Intel can do.

Oh I don't think the putting down of the "foot" was entirely due to hubris
as was suggested here recently. M$ had seen several non-x86 WinXX projects
die from lack of err, commitment... by vendors and/or users; the one thing
M$ absolutely needs is volume and I can't believe they are feeling too
happy about Itanium there. Yet another variant to maintain would have been
intolerable - at least one would have to die and by the time Intel threw
its EM64T hat in, AMD64 already had about a year's worth of time invested.

It would certainly be interesting to know to what extent Yamhill, whose
existence was denied for ~18months, started out looking like AMD64. By the
time EM64T was revealed, given where AMD64 was, any attempt to divert
things would have appeared as a capricious act born of umm, hubris. After
all, there's not a lot of different ways you can design a compatible
instruction set - why do it differently?
IBM taped out a new CELL processor with a new PPE. The die size
grew from 221 mm^2 to 235 mm^2. Compared to that effort, any memory
system re-work needed to get more memory capacity support is relatively
minor.

I guess it depends on how high you want to go but to get to 4GB with 512Mb
chips required some fairly fancy footwork already, with different width
memory chips, if I'm remembering things right.
Also, DP FPU is already in both the PPE and SPE, so I'm not sure what
you mean by "and DP FPU".

I don't have time to look up the details right now but ISTR that the DP
performance was not even in the same ball-park as the SP - it just wasn't
good enough.
 
Oh, I don't know, the article seemed relatively even-handed, even if he
was more familiar with Intel technology than AMD ones. For example, he
went into tremendous historical background about Intel's power
management technologies and how they ended up in its server chips. But
then he sort of just said AMD's Powernow technology has been in
production in Opterons for a long time now, so it's nothing new for AMD.
Yes, it was much less verbiage for AMD, but not dismissive in any way.

The very fact that he didn't take the trouble to err, enrich:-) his
knowledge on AMD cores tarnishes his perspective - possibly not dismissive
but somewhat condescending.
Oh and he did refer to AMD64 by its previous name of x86-64, while
easily remembering to call Intel EM64T. Yes, definitely more familiar
with Intel technology it looks like.


I get the feeling that AMD was playing the same game as Intel in that
case. There was tremendous speculation about whether AMD would adopt
Intel's SSE2 specs for K8-generation processors, much like there was
speculation about whether Intel would adopt AMD's version of x86-64. In
the end, both parties did adopt each other's technology thus providing
software companies a huge sigh of relief. It was a big game of bluff.

Remember it took Intel 3 years to make the leap - gawd that must have been
hard after denying the very existence of Yamhill for such a while.:-)
 
I guess it depends on how high you want to go but to get to 4GB with 512Mb
chips required some fairly fancy footwork already, with different width
memory chips, if I'm remembering things right.

1. Is this really a problem? If we're talking about blades and compute
farms, 4G per CPU, 8 GB and 2 CPU per node really should be enough
for anything. If we're not talking about blades, and you really want
4G per CPU, then...

2. FB-DIMM controllers can be easily integrated into the Northbridge.
The pincounts are relatively low, the FlexIO interfaces can certainly
support the additional BW, and the longer latencies can be amortized
in the type of applications (large memory capacity, FP-dominant
number crunching codes) that we're presumably discussing here. The
draw back is that a couple of channels of fully populated FBD's will
quickly eat a lot of power, and you can't do blades, but certainly
1U or 2U is doable.
I don't have time to look up the details right now but ISTR that the DP
performance was not even in the same ball-park as the SP - it just wasn't
good enough.

The CELL processor has > 200 GFlops of SP compute power. Saying that
the CELL processor is inadequate because the DP performance is not in
the same ball park as SP performance is entirely silly. There is
no device that has DP performance that's in the same ball park as 200+
GFlops.

To see if the CELL processor is "good enough", you'll have to define
what is "good enough". Opteron {SC,DC}, Itanium {Madison,Montecito},
Pentium {4,D,M}, etc. Are any of these devices "good enough" in terms
of the DP FP performance? What kind of DP Flops can each of these devices
produce per cycle, and how many? DP FADD ops, DP FMUL ops, DP FMADD ops?

FWIW, each SPE in the CELL processor can produce 2 DP FMADD ops every
7 cycles, and the PPE can sustain the throughput of 1 DP FMADD op per
cycle. Pick your favorate device and compare.
 
1. Is this really a problem? If we're talking about blades and compute
farms, 4G per CPU, 8 GB and 2 CPU per node really should be enough
for anything. If we're not talking about blades, and you really want

2. FB-DIMM controllers can be easily integrated into the Northbridge.
The pincounts are relatively low, the FlexIO interfaces can certainly
support the additional BW, and the longer latencies can be amortized
in the type of applications (large memory capacity, FP-dominant
number crunching codes) that we're presumably discussing here. The
draw back is that a couple of channels of fully populated FBD's will
quickly eat a lot of power, and you can't do blades, but certainly
1U or 2U is doable.

I didn't say it couldn't be done - I said it's a lot of work and it still
seems like it... especially for someone who wants to justify the work based
on targeting general computing markets.
 
Back
Top