Amazing K8 SPECfp score

Tony Hill · Jul 15, 2006

No it is not clear that they are valid. Suppose my theory is right.
The compiler is using extra MPU sockets as remote bandwidth to do
prefetch. Then this only works if you were stupid enough to buy a
system with more sockets and MPUs than you need. Why would you ever do
that?

Simple answer to your question: If it gives you a 40% increase in
floating point performance for your high-cost application.

Seriously, if these results are real, I don't think you're possible
explanation for the performance invalidates them. It does, however,
place a whole new caveat on the benchmark. Is a 4P Opteron server
that gets 3500 CFP and costs you $20,000 worth the price vs. a 1P
Power5+ system that gets 3000 CFP and costs $10,000?

In any case, we definitely need more info to make any sort of final
judgment on this.

David Kanter · Jul 15, 2006

No it is not clear that they are valid. Suppose my theory is right.

Simple answer to your question: If it gives you a 40% increase in
floating point performance for your high-cost application.

Seriously, if these results are real, I don't think you're possible
explanation for the performance invalidates them. It does, however,
place a whole new caveat on the benchmark. Is a 4P Opteron server
that gets 3500 CFP and costs you $20,000 worth the price vs. a 1P
Power5+ system that gets 3000 CFP and costs $10,000?

In any case, we definitely need more info to make any sort of final
judgment on this.

I think that's a rather fair stance to take. The main issues would be:

1. Can you recompile your application?
2. How do licensing fees for this work?
3. Is this specific to Solaris and Sun Studio or can GCC/ICC/Pathscale
do it?
4. What classes of applications does this work for? Is it only loop
oriented code with predictable data flow? Does it only work for codes
that require massive bandwidth?

The whole situation is quite unclear and will remain so until sun can
shed some light on the matter.

DK

Ryan Godridge · Jul 15, 2006

I think that's a rather fair stance to take. The main issues would be:

1. Can you recompile your application?

Good point.

2. How do licensing fees for this work?

This is the Sun Studio compiler, and as I understand it, you can get
it free. Or at least that's the case with Sun Studio 11 (the new one,
and the one with the claims in).

3. Is this specific to Solaris and Sun Studio or can GCC/ICC/Pathscale
do it?

If Sun can do it in their compilers, anybody can. If it's an o/s
thing that's a bit trickier.

4. What classes of applications does this work for? Is it only loop
oriented code with predictable data flow? Does it only work for codes
that require massive bandwidth?
This is the big question.
The whole situation is quite unclear and will remain so until sun can
shed some light on the matter.

DK

Ryan

David Kanter · Jul 15, 2006

I think that's a rather fair stance to take. The main issues would be:

Good point.

This is the Sun Studio compiler, and as I understand it, you can get
it free. Or at least that's the case with Sun Studio 11 (the new one,
and the one with the claims in).

I meant license fees for your app. For instance, with Oracle, or
another application with a good deal of TLP it would never make sense
to do this. Especially, considering they would probably just charge
you for all 4 CPUs in this situation.

If Sun can do it in their compilers, anybody can. If it's an o/s
thing that's a bit trickier.
This is the big question.

There's a gentleman who posts at my site and works for Sun who has
indicated that this is VERY workload dependent, and I get the sense
that it probably only works on loops without carried dependencies.

DK

Tony Hill · Jul 16, 2006

I think that's a rather fair stance to take. The main issues would be:

1. Can you recompile your application?
2. How do licensing fees for this work?
3. Is this specific to Solaris and Sun Studio or can GCC/ICC/Pathscale
do it?
4. What classes of applications does this work for? Is it only loop
oriented code with predictable data flow? Does it only work for codes
that require massive bandwidth?

Yup, but then again, most of those applications apply to *ANY* SPEC
CPU2000 scores! Caveat emptor indeed.

The whole situation is quite unclear and will remain so until sun can
shed some light on the matter.

Definitely. It'll be interesting to see if there's an overall
improvement in scores or if it's just a single benchmark with a HUGE
improvement and fairly modest gains elsewhere (as was the case back
with the UltraSparc III and 179.art).

Felger Carbon · Jul 16, 2006

Tony Hill said:
Yup, but then again, most of those applications apply to *ANY* SPEC
CPU2000 scores! Caveat emptor indeed.

About a hundred years ago, I learned that the purpose of a fast FPU is to
convert a math problem into an I/O problem. ;-)

krw · Jul 17, 2006

About a hundred years ago, I learned that the purpose of a fast FPU is to
convert a math problem into an I/O problem. ;-)

All problems turn into latency problems at the extreme. Nothing
new here Felg. ...carry on folks.

Amazing K8 SPECfp score

Tony Hill

David Kanter

Ryan Godridge

David Kanter

Tony Hill

Felger Carbon

krw