Amazing K8 SPECfp score

  • Thread starter Thread starter David Kanter
  • Start date Start date
D

David Kanter

According to
http://www.sun.com/servers/x64/x4600/benchmarks.jsp#question1 Sun
appears to get around 3500 on SPECfp.

They seem to be using some sort of autoparallelizing trick in Sun's
compiler (which is rather well known for SPECtacular optimizations). I
am definitely impressed and quite surprised...

Of course, this trick may be adopted by other vendors just as easily,
the Sun folks hardly have a monopoly on good compilers (and quite
frankly, I would expect Pathscale's to be far more common).

DK
 
According to
http://www.sun.com/servers/x64/x4600/benchmarks.jsp#question1 Sun
appears to get around 3500 on SPECfp.

They seem to be using some sort of autoparallelizing trick in Sun's
compiler (which is rather well known for SPECtacular optimizations). I
am definitely impressed and quite surprised...

Of course, this trick may be adopted by other vendors just as easily,
the Sun folks hardly have a monopoly on good compilers (and quite
frankly, I would expect Pathscale's to be far more common).

DK

So now the question is, is it the cores, the compiler, the o/s, the
communication between the cores, the memory subsystem, all or some of
the above, or something else altogether or as well?

Makes the Woodcrest question less cut and dried.

Ryan
 
Ryan said:
So now the question is, is it the cores, the compiler, the o/s, the
communication between the cores, the memory subsystem, all or some of
the above, or something else altogether or as well?

Makes the Woodcrest question less cut and dried.

Two words: "Intel compilers". That's now in contrast to another two
words: "Sun compilers".

Yousuf Khan
 
Yousuf said:
Two words: "Intel compilers". That's now in contrast to another two
words: "Sun compilers".

It's almost certainly Sun's compilers. Unfortunately, we won't know
until they published detailed results...

DK
 
Two words: "Intel compilers". That's now in contrast to another two
words: "Sun compilers".

Yousuf Khan

Pity, I was really looking forward to carrying on the times where it
was easy to make choices - not long ago you could have a big heater or
a fast cpu, but not both. Now it might really be necessary to do some
work finding out which camp suits an application or o/s etc better.

So is Sun's a better compiler or a better Spec targetted compiler do
you reckon?

Ryan
 
Ryan said:
So is Sun's a better compiler or a better Spec targetted compiler do
you reckon?


Hard to say, just as it's hard to say in the case of Intel's compilers
too. Sometimes Intel compilers are known only for the benchmark
proficiency. I think Sun being one of the big boys in this game too is
not above doing the same sort of thing. But then again, you can't get
good benchmarks without also producing good code.

Yousuf Khan
 
YKhan said:
Hard to say, just as it's hard to say in the case of Intel's compilers
too. Sometimes Intel compilers are known only for the benchmark
proficiency. I think Sun being one of the big boys in this game too is
not above doing the same sort of thing. But then again, you can't get
good benchmarks without also producing good code.

To be perfectly honest, Sun's compiler group is far more aggressive
than Intel's in 'cracking' benchmarks, think about how they cracked
art. However, their compilers are also probably used far more often,
simply because that's the compiler you use for SPARC/Solaris.

A lot of the optimizations really don't help much, and that's doubly
the case in peak SPEC scores. In peak submissions, you can use any
combination of compiler flags for any benchmark, with base scores, you
have to use the same compiler flags for all tests.

While it's unclear how often high levels of compiler optimization are
used, the latter seems much more realistic.

DK
 
To be perfectly honest, Sun's compiler group is far more aggressive
than Intel's in 'cracking' benchmarks, think about how they cracked
art. However, their compilers are also probably used far more often,
simply because that's the compiler you use for SPARC/Solaris.

A lot of the optimizations really don't help much, and that's doubly
the case in peak SPEC scores. In peak submissions, you can use any
combination of compiler flags for any benchmark, with base scores, you
have to use the same compiler flags for all tests.

While it's unclear how often high levels of compiler optimization are
used, the latter seems much more realistic.

DK

So basically with a good compiler, the Opteron beats Woodcrest?

Ryan
 
According to
http://www.sun.com/servers/x64/x4600/benchmarks.jsp#question1 Sun
appears to get around 3500 on SPECfp.

They seem to be using some sort of autoparallelizing trick in Sun's
compiler (which is rather well known for SPECtacular optimizations). I
am definitely impressed and quite surprised...

Of course, this trick may be adopted by other vendors just as easily,
the Sun folks hardly have a monopoly on good compilers (and quite
frankly, I would expect Pathscale's to be far more common).

I think this is one that we're going to have to hold off on until we
get a bit more details. The results are absolutely shocking, and as
the old saying goes, "If it sounds too good to be true..."

For comparison, the best result Sun has so far is 2518 from a x4200
system using a 2.8GHz single-core Opteron. This is using the same
Solaris 10 and Sun Studio 11 OS and compiler mentioned above. Now
DDR2 should a reasonable amount in SPEC CFP and there are 3.0GHz
Opterons out there, but the result still seem rather unbelievable.
 
Hard to say, just as it's hard to say in the case of Intel's compilers
So basically with a good compiler, the Opteron beats Woodcrest?

Hardly. Not only has Sun not released all the appropriate information,
but you are comparing a server that is configured with 4 CPUs, and is
in an entirely different class.

Just as an example, many of the benchmarks in SPECfp gain quite a bit
from extra bandwidth. What if Sun's compiler is doing something
ridiculous like having the other 3 CPUs fetch data from memory, and
then pipe it into the actual K8 running SPECfp using HT?

DK
 
Hardly. Not only has Sun not released all the appropriate information,
but you are comparing a server that is configured with 4 CPUs, and is
in an entirely different class.

Just as an example, many of the benchmarks in SPECfp gain quite a bit
from extra bandwidth. What if Sun's compiler is doing something
ridiculous like having the other 3 CPUs fetch data from memory, and
then pipe it into the actual K8 running SPECfp using HT?

DK

So are you now suggesting that one Opteron doing the heavy lifting can
beat multiple Woodcrests - interesting turnaround, I applaud your
ability to change your mind though , very nimble.
 
Hardly. Not only has Sun not released all the appropriate information,
So are you now suggesting that one Opteron doing the heavy lifting can
beat multiple Woodcrests - interesting turnaround, I applaud your
ability to change your mind though , very nimble.

Um, you didn't understand what I said or are purposefully
misinterpreting/twisting it. Let me try and clarify this:

1. SPECfp2000 scores on Woodcrest use a single processor
2. SPECfp2000 is a single threaded benchmark, SPECfp2000_rate uses
multiple processors or threads for computation
3. SPECfp2000 *computation* cannot be sped up using multiple MPUs
4. I am theorizing that perhaps memory activity can be coordinated by
multiple processors, using software directed prefetch or other tricks
5. Sun has yet to release any detailed information about this score,
we have no numbers for subtests, no compiler flags, nor explanations of
those flags.
6. Without this information is hard to figure out what is happening

IOW, you are definitely jumping to conclusions prematurely. It is
unreasonable to conclude anything based on this information at present,
however, it is quite remarkable and worthy of further examination. It
is rather disappointing that Sun isn't releasing actual benchmark info
and is just giving out scores (which may or may not be certified).

DK
 
To be perfectly honest, Sun's compiler group is far more aggressive
than Intel's in 'cracking' benchmarks, think about how they cracked
art. However, their compilers are also probably used far more often,
simply because that's the compiler you use for SPARC/Solaris.

Given Intel's history & reputation of unusable compilers and plugins...
because the optimizations were too umm, "hard-coded", that'd be surprising.
Sun is also directly answerable to clients who buy the hardware, OS and
compiler form a single source - kinda difficult to cover all the bases and
be too adventurous with compiling.
A lot of the optimizations really don't help much, and that's doubly
the case in peak SPEC scores. In peak submissions, you can use any
combination of compiler flags for any benchmark, with base scores, you
have to use the same compiler flags for all tests.

While it's unclear how often high levels of compiler optimization are
used, the latter seems much more realistic.

Spec is becoming a joke - bend it to whatever you need/want.
 
Pity, I was really looking forward to carrying on the times where it
was easy to make choices - not long ago you could have a big heater or
a fast cpu, but not both. Now it might really be necessary to do some
work finding out which camp suits an application or o/s etc better.

I think you may have hit it on the head with OS. The dual core glitches
with Windows just seem to keep mutiplying with new drivers and hot fixes.
Some samples: http://support.microsoft.com/?id=896256
http://support.microsoft.com/?kbid=330512 and
http://www.xtremesystems.org/forums/showthread.php?t=81429&highlight=amd+optimizer
So is Sun's a better compiler or a better Spec targetted compiler do
you reckon?

....or is the problem here Spec?
 
3. SPECfp2000 *computation* cannot be sped up using multiple MPUs
4. I am theorizing that perhaps memory activity can be coordinated by
multiple processors, using software directed prefetch or other tricks

Such as Reverse HyperThreading, which "does not exist". But, since
the number of cores available will only go up, think of these
"software directed prefetch or other tricks" as a legitimate way to
squeeze more performance from multiple cores even when running an app
that is essentially single threaded. The only question is: are these
compiler enhancements applicable to general use, or they are hardcoded
to get impressive Spec and can't do much else?

NNN
 
Um, you didn't understand what I said or are purposefully
misinterpreting/twisting it. Let me try and clarify this:

1. SPECfp2000 scores on Woodcrest use a single processor
2. SPECfp2000 is a single threaded benchmark, SPECfp2000_rate uses
multiple processors or threads for computation
3. SPECfp2000 *computation* cannot be sped up using multiple MPUs
4. I am theorizing that perhaps memory activity can be coordinated by
multiple processors, using software directed prefetch or other tricks
5. Sun has yet to release any detailed information about this score,
we have no numbers for subtests, no compiler flags, nor explanations of
those flags.
6. Without this information is hard to figure out what is happening

IOW, you are definitely jumping to conclusions prematurely. It is
unreasonable to conclude anything based on this information at present,
however, it is quite remarkable and worthy of further examination. It
is rather disappointing that Sun isn't releasing actual benchmark info
and is just giving out scores (which may or may not be certified).

DK

To be fair (which I try to do as little as possible :) ), I have a
feeling that we're starting to see many more dimensions coming into
play in the performance arena. I think it's going to get harder and
harder to determine how a particular application will act on any cpu /
platform / amount of ram etc. It's always been the case that the only
real hard evidence was benchmarking your app on the hardware you fancy
buying, but it's now getting more relevant. It used to be sort of
possible to get a gut feel about a particular apps performance based
on like things, but those days are disappearing fast.

The relevance of Spec to anything but Spec has been in doubt for a
little while.

As to what Sun have done I'll be mighty impressed if it's a general
piece, and kudos to them. We'll have to wait and see.

In terms of jumping to conclusions I was trying to out-hype Intel - an
obviously impossible task, but worth it for a giggle.

Ryan
 
I think you may have hit it on the head with OS. The dual core glitches
with Windows just seem to keep mutiplying with new drivers and hot fixes.
Some samples: http://support.microsoft.com/?id=896256
http://support.microsoft.com/?kbid=330512 and
http://www.xtremesystems.org/forums/showthread.php?t=81429&highlight=amd+optimizer
Interesting reading George - so it's time for all the benchmarks on
Windoze on hyperthreaded, dual core or dual cpu machines to be redone
:) This just gets better and better.

Sun's experience in the multi cpu and multithreaded world may be a big
strength in Solaris. In the past this would not have been a big deal,
as the number of cores was pretty limited and raw power was where it
was at. Things are changing pretty fast at the moment. For a decade
or two, the only people with more than one core had loads of money,
now it's pretty well anybody who fancies it. So Solaris's time might
have come - stranger things have happened.
...or is the problem here Spec?

I think the utility of Spec is to see how fast Spec runs. It's a
single point in the overall performance spectrum, and likely worth as
much as many others and more than some.

Good links - thanks for those

Ryan
 
Such as Reverse HyperThreading, which "does not exist". But, since
the number of cores available will only go up, think of these
"software directed prefetch or other tricks" as a legitimate way to
squeeze more performance from multiple cores even when running an app
that is essentially single threaded.

No it is not clear that they are valid. Suppose my theory is right.
The compiler is using extra MPU sockets as remote bandwidth to do
prefetch. Then this only works if you were stupid enough to buy a
system with more sockets and MPUs than you need. Why would you ever do
that?

Moreover, this clearly requires recompiling your applications to get
any benefit, and it is unclear whether this benefits all subtests, or
just one or two (and if it has a performance penalty).

Think about it this way, the EV7 also adds memory bandwidth with MPUs.
There's no reason why you could get an 8P system and do th same
thing....

except that nobody who buys servers is stupid enough to buy more
sockets than they really need. Especially when you cross the boundary
from 1/2S where the difference isn't huge, to a 4-8S system, where the
price difference is quite substantial.
The only question is: are these
compiler enhancements applicable to general use, or they are hardcoded
to get impressive Spec and can't do much else?

YMMV : ) The bottom line is we will need to wait and see.

DK
 
To be fair (which I try to do as little as possible :) ), I have a
feeling that we're starting to see many more dimensions coming into
play in the performance arena. I think it's going to get harder and
harder to determine how a particular application will act on any cpu /
platform / amount of ram etc.

To some extent this is inevitable because you have a lot of interaction
effects between
It's always been the case that the only
real hard evidence was benchmarking your app on the hardware you fancy
buying, but it's now getting more relevant. It used to be sort of
possible to get a gut feel about a particular apps performance based
on like things, but those days are disappearing fast.

I'm not really sure that's true. I think it's just that there are more
variables now than before. I'd certainly agree that the days are gone
when you could simply count instruction latency and figure out
performance based on that...
The relevance of Spec to anything but Spec has been in doubt for a
little while.

Well...I think the problem with SPEC is that their benchmark suite was
too long in the tooth. Believe it or not, SPEC is still highly
relevant to a bunch of applications, however, they needed a revision in
2004. I think most benchmarks usually only have a lifetime of around 4
years at best.
As to what Sun have done I'll be mighty impressed if it's a general
piece, and kudos to them. We'll have to wait and see.

In terms of jumping to conclusions I was trying to out-hype Intel - an
obviously impossible task, but worth it for a giggle.

Sun's a good one to study then : )

DK
 
Back
Top