DDR2 versus FBD

  • Thread starter Thread starter David Kanter
  • Start date Start date
George said:
OK I got it.

Fer Chrissakes it's a sales presentation.

Wow, you noticed that?
Highlights?... in a sales job? WTF are you smoking?... ahh, but it's a
Freudian slip: it made your day to be able to play your Intel vs. AMD card
again. But surely you didn't think we'd fail to notice err, marketing
materials"!

You know it's funny you should think of it this way. If you look at my
post, the word "AMD" does not appear, nor does "Intel". In fact, I'm
pretty sure I only talked about FBD and DDR2. Certainly more companies
than just Intel are using FBD (Sun for instance) and more companies
than AMD use DDR2 (HP, IBM, Intel). I think the only freudian slip is
coming from someone else entirely in the discussion.
This stuff is all in the AMD tech docs and DDR2-800 is not even shown for
socket F. OTOH socket AM2 supports 2 DIMMs per channel at DDR2-800;

Socket AM2 is for desktops, and I doubt it supports RDIMMs.

The fact of the matter is that DDR2 trades off higher bandwidth for
capacity.
so I
tend to think AMD is focussing on what's actually available in the DDR2
supply FTM. As with previous CPUs we'll see improvements to the memory
controller for multiple DIMM support.

Perhaps we will, but I rather doubt it.

DK
 
Wow, you noticed that?


You know it's funny you should think of it this way. If you look at my
post, the word "AMD" does not appear, nor does "Intel". In fact, I'm
pretty sure I only talked about FBD and DDR2. Certainly more companies
than just Intel are using FBD (Sun for instance) and more companies
than AMD use DDR2 (HP, IBM, Intel). I think the only freudian slip is
coming from someone else entirely in the discussion.

You don't need to restate your affilitation every time - it's a matter of
record here.
Socket AM2 is for desktops, and I doubt it supports RDIMMs.

No it doesn't but it does support ECC and the option of 2T command rate at
higher loads... so quite a good option for a workstation.
The fact of the matter is that DDR2 trades off higher bandwidth for
capacity.

Of course - that's generally how it works: something has to be traded.
Perhaps we will, but I rather doubt it.

They've done it before with DDR and there were reported probs with the
initial DDR2 implementation.
 
David said:
At some point in time, some of the folks here doubted that DDR2 would
present a capacity problem. I would like to instruct those individuals
to read the following PDF, in particular, page 37.

http://www-5.ibm.com/ch/events/fachtagung/pdf/system_x_performance.pdf

Just so you don't miss the highlights:

DDR2-400/533 supports 8 DIMMs across two channels
DDR2-667 supports 4 DIMMs across two channels
DDR2-800 supports 2 DIMMs across two channels

Perhaps now it should become abundantly clear why FBD is in fact
necessary for high bandwidth. There's an illustration of this on page
42 as well.

Although IBM's System X now includes its Opteron line, it does sound
like this is all to do with its Xeon-based System X only. Afterall, IBM
has built its special NUMA chipset specifically for Xeon only, so it's
likely only commenting on it's own chipset's limits. I see in HP's
Opteron machines that they can get the maximum number of DIMMs when
using DDR-266/DDR-333, but they only have to reduce the maximum number
of DIMMs by 1 when going with the highest performance DDR-400. It's
likely going to be the same sort of thing when they go DDR2.

Yousuf Khan
 
Yousuf said:
Although IBM's System X now includes its Opteron line, it does sound
like this is all to do with its Xeon-based System X only. Afterall, IBM
has built its special NUMA chipset specifically for Xeon only, so it's
likely only commenting on it's own chipset's limits.

Huh? Could you elaborate what you mean by "only commenting on it's own
chipset's limits."? I don't think I follow.
I see in HP's
Opteron machines that they can get the maximum number of DIMMs when
using DDR-266/DDR-333, but they only have to reduce the maximum number
of DIMMs by 1 when going with the highest performance DDR-400. It's
likely going to be the same sort of thing when they go DDR2.

Well according to IBM, it's worse than that...you have to reduce quite
substantially to get DDR2-800 to work.

DK
 
David said:
Huh? Could you elaborate what you mean by "only commenting on it's own
chipset's limits."? I don't think I follow.

X3, Hurricane, whatever it's called these days.
Well according to IBM, it's worse than that...you have to reduce quite
substantially to get DDR2-800 to work.

IBM is probably a bit more sensitive to latencies than Intel is with
their own chipsets. IBM is trying to get a processor that is not NUMA to
work in a NUMA setup. Little changes in latency in one part of the
system might result in large bottlenecks in the overall system.

Yousuf Khan
 
Well according to IBM, it's worse than that...you have to reduce quite
substantially to get DDR2-800 to work.

According to a press brief advertising a competing product when going
from pre-release hardware. Hardly definitive.

One thing that we do know is that you can load up 2 unregistered
DDR2-800 DIMMs per channel on Socket AM2. With Socket F using it's
registered DIMMs it should be EASIER, and not harder, to put more
DIMMs in, so I wouldn't put much faith in the aforementioned IBM
document.
 
Yousuf said:
X3, Hurricane, whatever it's called these days.

Yes, I am aware of IBM's chipset, but what was your point about it?
IBM is talking about limitations to Opteron generally, that apply to
their product and HP or Sun as well. I don't see anything indicating
otherwise.
IBM is probably a bit more sensitive to latencies than Intel is with
their own chipsets. IBM is trying to get a processor that is not NUMA to
work in a NUMA setup. Little changes in latency in one part of the
system might result in large bottlenecks in the overall system.

No offense, but this makes no sense whatsoever. A processor is not
NUMA or UMA; that is function of the chipset and system. Any CC
processor can be used in a NUMA system and any processors can be used
in a UMA system.

IBM is no more sensitive to latency than the next guy, and in fact, if
anything they are less sensitive since they can make optimizations
other people cannot.

DK
 
Huh? Could you elaborate what you mean by "only commenting on it's own
chipset's limits."? I don't think I follow.


Well according to IBM, it's worse than that...you have to reduce quite
substantially to get DDR2-800 to work.

Why don't you just read AMD's docs... instead of IBM promotional
materials.;-)
 
Why don't you just read AMD's docs... instead of IBM promotional
materials.;-)

George is it really your contention that the IBM tome dissing the
opposition was not a fair and accurate appraisal of the entire subject
:)

Ryan
 
Well according to IBM, it's worse than that...you have to reduce quite
Why don't you just read AMD's docs... instead of IBM promotional
materials.;-)

Maybe because they aren't available...if you can point me to them, I'd
be grateful, but I don't see any online.

However, I did find several vendors with products that tend to confirm
what IBM is claiming:
http://www.inventecesc.com/index.asp?coding=english&page=products&subpage=blade server2
http://www.appro.com/product/pdf/datasheets 6-25-06/Hyperblade datasheet 6-23-6 final.pdf

667MHz DDR2, only 4 DIMMs/MPU. This is hardly a capacity problem, but
it does appear to support the claim that capacity drops at 667MHz.

Again, if anyone has any better information that proves this wrong, I'd
love to see it.

DK
 
George is it really your contention that the IBM tome dissing the
opposition was not a fair and accurate appraisal of the entire subject
:)

Well I started by reading the 1st section on network efficiency and had a
hard time figuring what they were up to. Seemed like 2nd generation TOE
(TCP/IP Offload Engine) was what was being pushed... as better than Intel's
IOAT??:-) I dunno what all the fuss is about to tell the truth - there's
nothing particularly revealing here.
 
Maybe because they aren't available...if you can point me to them, I'd
be grateful, but I don't see any online.

Hmph - take a look here
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_7203,00.html
and the first umm, tome includes the stuff on memory population & loads.
However, I did find several vendors with products that tend to confirm
what IBM is claiming:
http://www.inventecesc.com/index.asp?coding=english&page=products&subpage=blade server2
http://www.appro.com/product/pdf/datasheets 6-25-06/Hyperblade datasheet 6-23-6 final.pdf

667MHz DDR2, only 4 DIMMs/MPU. This is hardly a capacity problem, but
it does appear to support the claim that capacity drops at 667MHz.

I don't see what's so surprising here.
 
Well I started by reading the 1st section on network efficiency and had a
hard time figuring what they were up to. Seemed like 2nd generation TOE
(TCP/IP Offload Engine) was what was being pushed... as better than Intel's
IOAT??:-) I dunno what all the fuss is about to tell the truth - there's
nothing particularly revealing here.

My take is that TCP/IP offload removes work from the cpu - tada.
FBDimm gives more capacity and more latency. The new X3 chipset works
better than the old stuff, or rather the Dell & HP - I presume on
Intel, but it's not clear to me - and I think it isn't meant to be
crystal.

All in all the future new IBM systems give better coloured blocks on
these particular charts than current AMD systems.

So no real news, not much in the way of substance - they're selling
the sizzle. These seems to be Woodcrest all over at the moment.

But I have to say that the pictures are nicely drawn :)

Ryan
 
George said:
Hmph - take a look here
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_7203,00.html
and the first umm, tome includes the stuff on memory population & loads.

Perhaps you would be so good as to point out where it says anything
about 800MHz DDR2? I didn't see much in the memory controller or DRAM
section...

I saw table 44 which appears to indicate that upto 4 DIMMs/channel can
be used with DDR2-400/533, but only 2 DIMMs/channel at DDR2-667.
DDR2-800 isn't even listed as an option.
I don't see what's so surprising here.

Well maybe it's the fact that you seem to think IBM is lying, yet these
designs appear to confirm (although hardly in a definitive fashion)
what IBM claimed. And there's the fact that the manual also appears to
confirm what IBM claimed...

DK
 
Maybe because they aren't available...if you can point me to them, I'd
be grateful, but I don't see any online.

However, I did find several vendors with products that tend to confirm
what IBM is claiming:
http://www.inventecesc.com/index.asp?coding=english&page=products&subpage=blade server2
http://www.appro.com/product/pdf/datasheets 6-25-06/Hyperblade datasheet 6-23-6 final.pdf

667MHz DDR2, only 4 DIMMs/MPU. This is hardly a capacity problem, but
it does appear to support the claim that capacity drops at 667MHz.

Umm huh? Unless I missed some small print somewhere, the second link
is advertising a system with 16 DIMM sockets for 2 processors and
makes no mention of any speed limitation for those sockets. This
seems to directly contradict what the IBM press brief said! Now how
in the heck they plan on fitting 16 DIMM sockets into a blade server
is beyond me, but that's another story altogether.
 
Tony said:
Umm huh? Unless I missed some small print somewhere, the second link
is advertising a system with 16 DIMM sockets for 2 processors and
makes no mention of any speed limitation for those sockets. This
seems to directly contradict what the IBM press brief said! Now how
in the heck they plan on fitting 16 DIMM sockets into a blade server
is beyond me, but that's another story altogether.

The last page indicates, as you said, 16 RDIMMs ==> 8 RDIMMS/CPU ==> 4
RDIMMS/channel, but it explicitly specifies DDR2-533 or 667. Given
that information, it seems more likely that IBM is wrong, and that in
fact, DDR2-667 can hit 4 RDIMMS/channel, and that DDR2-800 is going to
be restricted to 2 DIMMs/channel.

The density in blades currently is quite impressive. What doesn't make
sense is why Appro doesn't offer 4GB DIMMs for the Opteron; I don't see
any reason why 4GB DIMMs would be any more difficult, unless they
require 16 devices/module at current densities.

It turns out I was wrong (based on current information), so perhaps AMD
won't have a density problem and IBM was just blowing smoke...

DK
 
The last page indicates, as you said, 16 RDIMMs ==> 8 RDIMMS/CPU ==> 4
RDIMMS/channel, but it explicitly specifies DDR2-533 or 667. Given
that information, it seems more likely that IBM is wrong, and that in
fact, DDR2-667 can hit 4 RDIMMS/channel, and that DDR2-800 is going to
be restricted to 2 DIMMs/channel.

The density in blades currently is quite impressive. What doesn't make
sense is why Appro doesn't offer 4GB DIMMs for the Opteron; I don't see
any reason why 4GB DIMMs would be any more difficult, unless they
require 16 devices/module at current densities.

It turns out I was wrong (based on current information), so perhaps AMD
won't have a density problem and IBM was just blowing smoke...

DK

Looking on the Appro site there's no info as to whether that 533/667
is a choice or a constraint based on number populated.

Ryan
 
David said:
Yes, I am aware of IBM's chipset, but what was your point about it?
IBM is talking about limitations to Opteron generally, that apply to
their product and HP or Sun as well. I don't see anything indicating
otherwise.

No, IBM is most likely talking about limitations to its own products.
IBM has a performance story that it's selling, it wants to show that
it's little Xeon-extending chipset is worth paying extra money for, vs.
a run-of-the-mill (Intel) chipset for that processor.

Since this is a server environment, we're obviously talking about
registered DDR2 rather than unbuffered DDR2. So it's completely
ridiculous that it's claiming that you can only get two sticks of
registered DDR2 at certain frequencies. Registration should be good
enough to get you at least 4 sticks. If you're limited to 2 sticks even
with registered ram, and 2 sticks with unbuffered, then who would in
their right minds buy the registered over unbuffered? The cost savings
alone are prohibitive. However, it's not so ridiculous if we assume that
IBM is trying to achieve certain levels of performance, especially
certain levels of latency.
No offense, but this makes no sense whatsoever. A processor is not
NUMA or UMA; that is function of the chipset and system. Any CC
processor can be used in a NUMA system and any processors can be used
in a UMA system.

But is it optimized for NUMA usage?
IBM is no more sensitive to latency than the next guy, and in fact, if
anything they are less sensitive since they can make optimizations
other people cannot.

Think about it, the IBM chipset is quite obviously latency sensitive. It
has to maintain an external cache coherency directory in the chipset. It
has to be able to deliver data from external nodes to a processor that
is expecting the data to arrive at its data ports within a certain
amount of time because it assumes a single shared bus architecture. The
chipset has to make the CPU happy, not the other way around.

Yousuf Khan
 
Yes, I am aware of IBM's chipset, but what was your point about it?
No, IBM is most likely talking about limitations to its own products.
IBM has a performance story that it's selling, it wants to show that
it's little Xeon-extending chipset is worth paying extra money for, vs.
a run-of-the-mill (Intel) chipset for that processor.
Since this is a server environment, we're obviously talking about
registered DDR2 rather than unbuffered DDR2. So it's completely
ridiculous that it's claiming that you can only get two sticks of
registered DDR2 at certain frequencies.

Not really.
Registration should be good
enough to get you at least 4 sticks.

Not necessarily. RDIMMs only solve the problem of address and command
bus loading, not data bus loading. It is unclear to me, which one of
these presents a problem for the limitations on RDIMMs.
If you're limited to 2 sticks even
with registered ram, and 2 sticks with unbuffered, then who would in
their right minds buy the registered over unbuffered? The cost savings
alone are prohibitive. However, it's not so ridiculous if we assume that
IBM is trying to achieve certain levels of performance, especially
certain levels of latency.

But is it optimized for NUMA usage?

What does "optimized for NUMA usage" mean? Is it a snoop filter, a
directory, or is it a large cache? I don't know what you are
hypothesizing about here, so it's hard for me to answer. So let's turn
this around, what do you think AMD does in their processor cores that
makes NUMA work better. Note the key word is *core*, i.e. not HT, not
memory controller, but the core. I suppose we could include the L2
cache as well.
Think about it, the IBM chipset is quite obviously latency sensitive. It
has to maintain an external cache coherency directory in the chipset. It
has to be able to deliver data from external nodes to a processor that
is expecting the data to arrive at its data ports within a certain
amount of time because it assumes a single shared bus architecture.

Try and remind me how snoop filter (not a directory) CC in the chipset
using eDRAM is in any way shape or form related to MEMORY LATENCY,
which is what we were discussing. This is ridiculous, all it takes is
a little buffering on the *CHIPSET*.

Moreover there is no 'expectation that data arrives in a certain period
of time'. You've heard of disk accesses right? You don't design a
system that craps out every time the disks spin up...

Do you have any real data to back your claims up? If so, why don't you
show us since then we'll all understand you aren't making stuff up...

DK
 
My take is that TCP/IP offload removes work from the cpu - tada.

What I thought but IBM's graphs do not bear that out: rather you get higher
bandwidth for more CPU wotk... apparently.... or do you?:-)

You also have to use Windows x86-64 "Chimney" - doesn't work in Windows
IA32 as nVidia and people with nForce4 chipsets have found out... and
"Linux has not declared support for TOE". As usual, Intel has a
"technology" as a compromise.
FBDimm gives more capacity and more latency. The new X3 chipset works
better than the old stuff, or rather the Dell & HP - I presume on
Intel, but it's not clear to me - and I think it isn't meant to be
crystal.

I think the big question is: has Intel finally managed to do a server
chipset? The jury is still out on that and some early results say "no"...
was it Yousuf who posted the msg about the U.S. govt. rejecting it because
of RAID-5 deficiencies?

FB-DIMM is also bloody HOT - see the heatpipes on the AMBs?... which
Crucial now apparently hides behind spreaders.
All in all the future new IBM systems give better coloured blocks on
these particular charts than current AMD systems.

You mean IBM X3 chipset systems?
So no real news, not much in the way of substance - they're selling
the sizzle. These seems to be Woodcrest all over at the moment.

I get the feeling they wanted to tout their X3 chipset but did not want to
prod Intel too hard while doing it.
 
Back
Top