Dempsey vs Opty 280 vs Paxville

  • Thread starter Thread starter Rob Stow
  • Start date Start date
R

Rob Stow

A blurb at the inquirer takes you to this:
http://www.tecchannel.de/server/hardware/432957/

"Dempsey" = 3.2 GHz dual-core, with 533 MHz DDR2 ECC FB-DIMMs
Opty 280 = 2.2 GHz dual-core, with PC3200 DDR ECC Reg
Paxville = 2.8 GHz dual-core, with PC3200 DDR ECC Reg

The article is in German, but as usual the charts don't need
interpretation.

Dempsey wins about 60% benchmarks, some narrowly and some by
significant margins. Opty 280 wins about 40% of the benchmarks -
most narrowly but by a big margin in CineBench rendering.

The difference, of course, is that Dempsey is 6+ months away from
going retail but you can get an Opty 275 box today and you might
have to wait a week for Opty 280. Even being an AMD fanboy I
find it hard not to be eager to see what Dempsey can do when it
is actually released. Sure, AMD won't be standing still in the
meantime - but neither will Intel.

It is also especially interesting to see that in this review
Paxville performed fairly well against the Opty 280 compared to
what we saw in the GamePC review from about 2 weeks ago. Opty
280 beat Paxville 2.8 GHz in just about everything, but they are
more closely matched than GamePC would lead us to believe.



I would have been very interested to read about Dempsey
power/heat issues, but if the author said anything about this it
is in German - no charts.
 
It is also especially interesting to see that in this review Paxville
performed fairly well against the Opty 280 compared to what we saw in
the GamePC review from about 2 weeks ago. Opty 280 beat Paxville 2.8
GHz in just about everything, but they are more closely matched than
GamePC would lead us to believe.

someone is clearly lying
 
Rob said:
A blurb at the inquirer takes you to this:
http://www.tecchannel.de/server/hardware/432957/

"Dempsey" = 3.2 GHz dual-core, with 533 MHz DDR2 ECC FB-DIMMs
Opty 280 = 2.2 GHz dual-core, with PC3200 DDR ECC Reg
Paxville = 2.8 GHz dual-core, with PC3200 DDR ECC Reg

The article is in German, but as usual the charts don't need
interpretation.

Dempsey wins about 60% benchmarks, some narrowly and some by
significant margins. Opty 280 wins about 40% of the benchmarks -
most narrowly but by a big margin in CineBench rendering.

The difference, of course, is that Dempsey is 6+ months away from
going retail but you can get an Opty 275 box today and you might
have to wait a week for Opty 280. Even being an AMD fanboy I
find it hard not to be eager to see what Dempsey can do when it
is actually released. Sure, AMD won't be standing still in the
meantime - but neither will Intel.

It is also especially interesting to see that in this review
Paxville performed fairly well against the Opty 280 compared to
what we saw in the GamePC review from about 2 weeks ago. Opty
280 beat Paxville 2.8 GHz in just about everything, but they are
more closely matched than GamePC would lead us to believe.

If you are interested in reading an english article that describes the
architecture of the Dempsey system, check out:

http://www.realworldtech.com/page.cfm?ArticleID=RWT110805135916

I also included Intel's performance estimates, which are of course,
best case. I will be releasing my own measurements in the future.

DK

http://www.realworldtech.com/page.cfm?ArticleID=RWT110805135916
 
hackbox.info said:
someone is clearly lying

Not necessarily. It was suggested by someone else here
(George M ?) that the guys at GamePC must have set things up
wrong because they were reporting memory bandwidth numbers that
were half of what he expected. Hence, GamePC might have been
honestly reporting inaccurate or invalid results.
 
someone is clearly lying

Yup, it's the benchmarks! :>

Seriously though, the choice in benchmarks will have a BIG impact on
how the two systems perform relative to one another. Even within the
same application a different datafile can easily skew results somewhat
one way or the other.
 
hahahaha, that is a good one :)

It's worth noting that a number of the repeatable scores listed in
this test are rather umm, terrible. For example, in the article they
list a pair of Opteron 280 chips as having a SPEC CFP2000 rate_base
score of 42.4, but if you look here:

http://www.spec.org/cpu2000/results/res2005q4/cpu2000-20050919-04711.html

AMD managed a score of 62.6 using the same processors, same OS and
same compiler (vs. 68.7 in Linux with the Pathscale compiler).

Similarly a pair of Intel Xeon 2.8GHz (Paxville) chips are reported at
35.9 in this test while HP was able to manage 39.4 in the same setup
(Intel doesn't have any published results themselves). While this
~10% difference might be excusable due to differences in configuration
and optimization, the 48% difference in results for the Opterons is a
bit much and suggests to me that they screwed up that benchmark pretty
bad.
 
suggests to me that they screwed up that benchmark pretty
bad.

or made the entire article up, they allready confessed to the Dempsey part
(scores from Intel). I'l stick to GamePC benchmarks for now.
 
someone is clearly lying

Not really - GamePC tests were more desktop/game oriented; those are more
workstation stuff... from my POV. Heavy duty calcs, like Linpack, get a
lot of benefit from the 2x2MB L2 cache on Paxville. I believe Daytripper
has already remarked that Paxville really likes cache friendly apps. P4
also has an advantage in heavy FPU over Opteron, where clock speed does
make a difference.

One thing worth noting for the Xeon 3.6GHz is how HyperThreading tanks on
Linpack - perfect example of how things can go wrong if two cache hungry
threads are competing for it.
 
Not necessarily. It was suggested by someone else here
(George M ?) that the guys at GamePC must have set things up
wrong because they were reporting memory bandwidth numbers that
were half of what he expected. Hence, GamePC might have been
honestly reporting inaccurate or invalid results.

On reviewing that I'd say the half I mentioned previously was a bit
excessive. Just as Opterons can't get the bandwidth an Athlon64 can, Xeons
won't get what a P4 can... so I'd say I'd expect the Xeon to be on a par
with Opteron at least. AIUI this was an early Paxville compatible E7520
mbrd so it's possible that the BIOS is not optimized yet.
 
or made the entire article up, they allready confessed to the Dempsey part
(scores from Intel). I'l stick to GamePC benchmarks for now.
____________________

This benchmarking ("scores from Intel") is more fitting to be called
_benchmarketing_. Smoke, mirrors, FUD. And the most important
benchmark - Performance/$ - is missing. I kinda doubt Dempsey (+board
+ brand spanking new type of RAM) will be priced any close to Opty +
board + DDR.
;-)
NNN
 
Not really - GamePC tests were more desktop/game oriented; those are more
workstation stuff... from my POV. Heavy duty calcs, like Linpack, get a
lot of benefit from the 2x2MB L2 cache on Paxville.

Linpack doesn't touch the cache much at all. The real advantage Intel
has in that test is SSE2. Linpack uses almost exclusively SSE2
operations, and in both the P4 and the Opteron the performance of
SSE2, so long as it can get data, is directly proportional to the
clock speed. In this case a 2.4GHz Opteron is going to perform almost
exactly like a 2.4GHz P4.

Note that Linpack results do not necessarily reflect real-world
performance, even for applications that primarily revolve around
solving matrices. Linpack puts doesn't do much to stress out the
memory subsystem on most modern desktops while often real-world Linear
algebra does.
I believe Daytripper
has already remarked that Paxville really likes cache friendly apps. P4
also has an advantage in heavy FPU over Opteron, where clock speed does
make a difference.

One thing worth noting for the Xeon 3.6GHz is how HyperThreading tanks on
Linpack - perfect example of how things can go wrong if two cache hungry
threads are competing for it.

Linpack operates in such a way that it's very easy to keep your
pipeline filled at all times which really defeats the purpose of
Hyperthreading.
 
Tony said:
It's worth noting that a number of the repeatable scores listed in
this test are rather umm, terrible. For example, in the article they
list a pair of Opteron 280 chips as having a SPEC CFP2000 rate_base
score of 42.4, but if you look here:

http://www.spec.org/cpu2000/results/res2005q4/cpu2000-20050919-04711.html

AMD managed a score of 62.6 using the same processors, same OS and
same compiler (vs. 68.7 in Linux with the Pathscale compiler).

The Intel compiler strikes again!

Yousuf Khan
 
Tony said:
Linpack doesn't touch the cache much at all. The real advantage Intel
has in that test is SSE2. Linpack uses almost exclusively SSE2
operations, and in both the P4 and the Opteron the performance of
SSE2, so long as it can get data, is directly proportional to the
clock speed. In this case a 2.4GHz Opteron is going to perform almost
exactly like a 2.4GHz P4.

What you stated above is only true if you're talking about the Intel
compiler, where it will use SSE2 on Intel processors, but disable them
on AMD processors.

Yousuf Khan
 
Linpack doesn't touch the cache much at all. The real advantage Intel
has in that test is SSE2. Linpack uses almost exclusively SSE2
operations, and in both the P4 and the Opteron the performance of
SSE2, so long as it can get data, is directly proportional to the
clock speed. In this case a 2.4GHz Opteron is going to perform almost
exactly like a 2.4GHz P4.

Is Linpack really that dumb?... in which case it's not a very useful
benchmark. I haven't looked at Linpack source code but there are certainly
things which can be done to benefit from cache... e.g. beyond the usual
matrix arrangements, surely a "simple" decomposition would be possible and
desirable... even based on cache size... not unusual in the real world.

As for the P4 (SSE2) FPU, it is known to be a somewhat better
implementation than Athlon64 - mentioned here just recently. The 2-way
associativity of Athlon64's L1 has always bothered me here too... for
intelligently arranged matrix ops.
Note that Linpack results do not necessarily reflect real-world
performance, even for applications that primarily revolve around
solving matrices. Linpack puts doesn't do much to stress out the
memory subsystem on most modern desktops while often real-world Linear
algebra does.

Yeah, one of my beefs about many of the linear algebra "benchmarks": they
don't do what real-world code does. Again, Linpack is a dumb benchmark if
it doesn't stress the memory.
Linpack operates in such a way that it's very easy to keep your
pipeline filled at all times which really defeats the purpose of
Hyperthreading.

Yeah well I though it was worth highlighting in view of the HT hype we've
seen here just recently... from someone who's never seen it do worse. The
fact that performance with HT drops significantly would indicate to me that
there is indeed quite good use of cache, the drop being partly due to the
expected cache collisions between the threads, with possibly some TLB
degradation too. How else do you keep a pipeline filled?

Perhaps I should not have singled out Linpack - the point was to highlight
the difference between the GamePC "desktop" oriented approach and the
techchannel workstation set. It's also difficult to know where those
results may have been distorted by Intel optimizations - as pointed out by
Derek baker, quoting "Intel provided" results is umm, suspicious enough...
on top of what appears to be an off the shelf Opteron system vs. Intel labs
provided systems... for which ones?? IOW the whole thing could be an Intel
PR job.

BTW did you notice what appear to be heat-pipe cooling on the
Micron-provided FB-DIMM buffer section?
 
The Intel compiler strikes again!

One part Intel compiler, one part 32-bit code on a 32-bit OS vs.
64-bit code on a 64-bit OS.

Considering that Intel probably doesn't do ANY optimizations at all
for the Opteron, they still manage to produce fairly competitive
results when all else is equal. 32-bit Pathscale on 32-bit Windows
(if such a beast exists) probably wouldn't be much better than Intel's
compiler.
 
What you stated above is only true if you're talking about the Intel
compiler, where it will use SSE2 on Intel processors, but disable them
on AMD processors.

No, it's true in ALL compilers. Both Intel and AMD's implementation
only allows them to perform a theoretical maximum of 2 flops per clock
cycle with SSE2. Opterons can get somewhat closer to their
theoretical peak, but when the Xeons clock speed is usually quite a
bit higher it ends up coming out well ahead.

FWIW all you need to do is have a look at the Top500.org list for a
whole lot of examples. Linpack is the one and only benchmark used for
that list.
 
Is Linpack really that dumb?... in which case it's not a very useful
benchmark.

Not so much dumb as HUGELY dated. The benchmark was first written in
the late 1970's and hasn't really changed any in the past 20 years. I
doubt that it's used much in the real-world of HPC stuff, there are
better algorithms out there today. Here's a quick overview of it:

http://www.top500.org/lists/linpack.php

I haven't looked at Linpack source code but there are certainly
things which can be done to benefit from cache... e.g. beyond the usual
matrix arrangements, surely a "simple" decomposition would be possible and
desirable... even based on cache size... not unusual in the real world.

As you can see above, Linpack isn't really designed to be the fastest
way to solve the problem, but rather a standard way of comparing MANY
different computer architectures. That standard was also largely
chosen a LONG time ago. It has it's uses and can provide a reasonable
guess as to how good a system will be at solving matrices, but it's
definitely not going to give you an exact indication of how your
system will perform on real-world code, even if that code is linear
algebra.
Yeah, one of my beefs about many of the linear algebra "benchmarks": they
don't do what real-world code does. Again, Linpack is a dumb benchmark if
it doesn't stress the memory.

Not only does it not stress memory much, it also doesn't stress
internode communication much either and yet it is used to determine
what the "fastest supercomputer" in the world is.

Linpack does have it's uses, but it's hardly the end-all, be-all of
benchmarks.
Yeah well I though it was worth highlighting in view of the HT hype we've
seen here just recently... from someone who's never seen it do worse. The
fact that performance with HT drops significantly would indicate to me that
there is indeed quite good use of cache, the drop being partly due to the
expected cache collisions between the threads, with possibly some TLB
degradation too. How else do you keep a pipeline filled?

Prefetch everything into your L1 data cache in blocks and run through
that entire block before moving on to the next chunk.
Perhaps I should not have singled out Linpack - the point was to highlight
the difference between the GamePC "desktop" oriented approach and the
techchannel workstation set. It's also difficult to know where those
results may have been distorted by Intel optimizations - as pointed out by
Derek baker, quoting "Intel provided" results is umm, suspicious enough...

Extremely suspicious given that the results they achieved themselves
on the same test (for both Paxville and Opteron, but especially
Opteron) are *SIGNIFICANTLY* lower than published results using the
same chips, OS and same compiler. When you compare the numbers that
AMD were able to achieve vs. what Intel achieved on the SPEC scores it
tells a VERY different picture with the AMD chips coming out on top.
BTW did you notice what appear to be heat-pipe cooling on the
Micron-provided FB-DIMM buffer section?

LOL! I hadn't noticed that before! I sure hope that having a fan on
your memory doesn't become standard in the future, I've got enough
fans in my system as it is!
 
Back
Top