DDR2 versus FBD

  • Thread starter Thread starter David Kanter
  • Start date Start date
D

David Kanter

At some point in time, some of the folks here doubted that DDR2 would
present a capacity problem. I would like to instruct those individuals
to read the following PDF, in particular, page 37.

http://www-5.ibm.com/ch/events/fachtagung/pdf/system_x_performance.pdf

Just so you don't miss the highlights:

DDR2-400/533 supports 8 DIMMs across two channels
DDR2-667 supports 4 DIMMs across two channels
DDR2-800 supports 2 DIMMs across two channels

Perhaps now it should become abundantly clear why FBD is in fact
necessary for high bandwidth. There's an illustration of this on page
42 as well.

DK
 
George said:
What "folks" would that be?

I wonder...
Doesn't work and appears to require registation... in German.:-(

Doh, sorry about that. Perhaps the presentation was IBM internal. I
managed to DL a copy of it yesterday, but it could be that they just
changed it to require registration...

If you want a copy, send me an email, and I'll shoot it over. It has
some interesting comparisons of K8 to Dempsey, and boy, Dempsey doesn't
look attractive. Woodcrest looks pretty nice though.

[snip]

DK
 
At some point in time, some of the folks here doubted that DDR2 would
present a capacity problem. I would like to instruct those individuals
to read the following PDF, in particular, page 37.

http://www-5.ibm.com/ch/events/fachtagung/pdf/system_x_performance.pdf

Just so you don't miss the highlights:

DDR2-400/533 supports 8 DIMMs across two channels
DDR2-667 supports 4 DIMMs across two channels
DDR2-800 supports 2 DIMMs across two channels

Perhaps now it should become abundantly clear why FBD is in fact
necessary for high bandwidth. There's an illustration of this on page
42 as well.

DK

I'm no memory expert so take this with a shovel of salt, but as I
understand it.

Current FB Dimm is DDR2 memory with a different module interface. The
interface is point to point fast serial allowing more modules to be
attached for a comparable pin count. The interface does not have a
shared bus between modules. It is the extra bus loading which
mandates the use of slower DDR2 with current bus architectures when
more modules are added.

The capacity issue is the number of pins on the controller required
for a memory channel.

The extra bandwidth is due in part to being able to use higher speed
modules when larger numbers are used, and being able to use higher
numbers of modules.

This isn't DDR2 Vs FB Dimm - this is serial Vs parallel, and
everythings going serial 'cos in general it tends to be faster.

Ryan
 
Ryan said:
I'm no memory expert so take this with a shovel of salt, but as I
understand it.

Did you get a copy of the PDF? George mentioned he couldn't access
it...let me know and I'll email it.
Current FB Dimm is DDR2 memory with a different module interface. The
interface is point to point fast serial allowing more modules to be
attached for a comparable pin count.

So the issue isn't modules/pin, it's bandwidth/pin. Each pin costs
quite a bit of time/effort/money, so the lower the pin count, the
better. FBD is 70 pins/channel, DDR2 is 240 pins/channel.

In turn, fewer pins means more channels. More DIMMs/channel is a
result of going serial, as you noted below.
The interface does not have a
shared bus between modules.

Right, the address and command lanes are bypass routed to the next DIMM
in the channel. That's why the unloaded latency is higher, because you
have several hops to get to a DIMM.
It is the extra bus loading which
mandates the use of slower DDR2 with current bus architectures when
more modules are added.

That's right. In general for DDR, capacity decreases as bandwidth
increases.
The capacity issue is the number of pins on the controller required
for a memory channel.

Sort of. The capacity issue is two fold:

1. # of channels, which is what you identified above - FBD can have
more channels with a given number of pins --> higher total bandwidth

2. Number of DIMMs/channel. In FBD, you can have up to 8
DIMMs/channel for any speed grade, whereas in DDR2 there is a hard
limit that decreases as the bandwidth goes up.

To be fair, the latency does get a bit worse with each DIMM you add,
but performance will actually increase for the 2nd DIMM and possibly
the 3rd, since the memory controller can play more interleaving
tricks...however, losing a bit of latency is not a huge deal compared
to letting your data set spill to disk (or having to buy the super high
capacity 4 and 8GB DIMMs, which cost an arm, leg and your first born
child).
The extra bandwidth is due in part to being able to use higher speed
modules when larger numbers are used, and being able to use higher
numbers of modules.

Yes and no. Using more modules does not increase the theoretical peak
bandwidth, but will increase the average bandwidth. The real gain in
bandwidth is because Intel uses 4 channels of FBD where AMD uses 2
channels of DDR2. Now, in a 2S system, AMD will have a total of 4
channels of DDR2; so things will be somewhat more equal. However, the
capacity is definitely going to be a problem. For DDR2-800, a 2S
system would have 4 DIMMs, which means you cannot get nearly enough
memory.

I think the bottom line is that it means instead of trading capacity
for bandwidth, you trade capacity for latency (and some extra heat).
This isn't DDR2 Vs FB Dimm - this is serial Vs parallel, and
everythings going serial 'cos in general it tends to be faster.

Yup, that's right. It really is a serial versus || kind of thing. The
last thing to mention, is that since the routing is easier for FBD, the
boards should be a bit cheaper and easier to make. I couldn't really
quantify that though...

DK
 
Did you get a copy of the PDF? George mentioned he couldn't access
it...let me know and I'll email it.
I managed to get it on the third attempt - dunno what's going on.

Nice FB Dimm for dummies type video linked from here -

http://www.futureplus.com/products/fs2338/fbd_overview.html
So the issue isn't modules/pin, it's bandwidth/pin. Each pin costs
quite a bit of time/effort/money, so the lower the pin count, the
better. FBD is 70 pins/channel, DDR2 is 240 pins/channel.

In turn, fewer pins means more channels. More DIMMs/channel is a
result of going serial, as you noted below.


Right, the address and command lanes are bypass routed to the next DIMM
in the channel. That's why the unloaded latency is higher, because you
have several hops to get to a DIMM.

"Unloaded latency" ?
That's right. In general for DDR, capacity decreases as bandwidth
increases.


Sort of. The capacity issue is two fold:

1. # of channels, which is what you identified above - FBD can have
more channels with a given number of pins --> higher total bandwidth

2. Number of DIMMs/channel. In FBD, you can have up to 8
DIMMs/channel for any speed grade, whereas in DDR2 there is a hard
limit that decreases as the bandwidth goes up.
Is that a question of the current controller implementations or
intrinsically in the i/o standard?
To be fair, the latency does get a bit worse with each DIMM you add,
but performance will actually increase for the 2nd DIMM and possibly
the 3rd, since the memory controller can play more interleaving
tricks...however, losing a bit of latency is not a huge deal compared
to letting your data set spill to disk (or having to buy the super high
capacity 4 and 8GB DIMMs, which cost an arm, leg and your first born
child).

It's swings and roundabouts again, but I think Intel are right with
this one. Parallel interfaces have got nowhere to go.
Yes and no. Using more modules does not increase the theoretical peak
bandwidth, but will increase the average bandwidth. The real gain in
bandwidth is because Intel uses 4 channels of FBD where AMD uses 2
channels of DDR2. Now, in a 2S system, AMD will have a total of 4
channels of DDR2; so things will be somewhat more equal. However, the
capacity is definitely going to be a problem. For DDR2-800, a 2S
system would have 4 DIMMs, which means you cannot get nearly enough
memory.
I'm confused when you say more modules does not increase theoretical
bandwidth - is that a width / depth issue. I.e. if the interface was
wide enough adding extra modules would add bandwidth.
I think the bottom line is that it means instead of trading capacity
for bandwidth, you trade capacity for latency (and some extra heat).


Yup, that's right. It really is a serial versus || kind of thing. The
last thing to mention, is that since the routing is easier for FBD, the
boards should be a bit cheaper and easier to make. I couldn't really
quantify that though...

DK

I'd like to think you're right there, but I bet we won't see the
difference :(
 
[snip]
I managed to get it on the third attempt - dunno what's going on.

Nice FB Dimm for dummies type video linked from here -

http://www.futureplus.com/products/fs2338/fbd_overview.html


"Unloaded latency" ?

So unloaded latency would be the time it takes to access data in memory
when the system is otherwise idle. So you remove any queuing delay.
When someone says that an MPU's memory latency is X nanoseconds,
usually they mean unloaded. Loaded latencies can be substantially
higher, depending on the load. Just as an example, with DDR2 you have
bus turn around (on top of queuing), which is where a bubble is
inserted between R and W activities. FBD supports simultaneous R and W
transactions.

[snip]
Is that a question of the current controller implementations or
intrinsically in the i/o standard?

I'm not sure which you are referring to, so I'll answer for both. 8
DIMMs/channel is a hard limit for FBD. DDR2 the limit is in the I/O
standard. Fundamentally the issue is the number of DRAM devices that
each channel can support, which has dropped off rather substantially.
Going from SDR-->DDR halved that, going to DDR2-400 halved that again,
and DDR2-667 and DDR2-800 will only make it worse.
It's swings and roundabouts again, but I think Intel are right with
this one. Parallel interfaces have got nowhere to go.

Yup I agree. The one thing that AMD is right about is that the heat
can be an issue. The initial implementation of AMB's are not as cool
as everyone would have liked.
I'm confused when you say more modules does not increase theoretical
bandwidth - is that a width / depth issue. I.e. if the interface was
wide enough adding extra modules would add bandwidth.

Yes. Going wider (i.e. more channels) adds bandwidth. Going deeper
(more DIMMs/channel) increases effective bandwidth.

Time for ASCII diagrams

------|--|--|
CPU-|
------|--|--|

Will have better effective bandwidth than:

------|
CPU-|
------|

Both will have better peak bandwidth (and better real bandwidth) than:

------|--|--|
CPU-|
I'd like to think you're right there, but I bet we won't see the
difference :(

Probably not. A lot of times 'cheaper' really seems to mean "we won't
jack up the price and make you bleed", rather than "it will be less
expensive".

DK
 
Did you get a copy of the PDF? George mentioned he couldn't access
it...let me know and I'll email it.


So the issue isn't modules/pin, it's bandwidth/pin. Each pin costs
quite a bit of time/effort/money, so the lower the pin count, the
better. FBD is 70 pins/channel, DDR2 is 240 pins/channel.
[snipped]

If you are counting signal pins, it's more like 150 pins per channel for
DDR2....

/daytripper
 
So the issue isn't modules/pin, it's bandwidth/pin. Each pin costs
quite a bit of time/effort/money, so the lower the pin count, the
better. FBD is 70 pins/channel, DDR2 is 240 pins/channel.
[snipped]

If you are counting signal pins, it's more like 150 pins per channel for
DDR2....

No, I'm counting command, data, power, ground, everything. After all,
8 channels of DDR2 without any power or ground makes for an extremely
low performance system...

DK
 
So the issue isn't modules/pin, it's bandwidth/pin. Each pin costs
quite a bit of time/effort/money, so the lower the pin count, the
better. FBD is 70 pins/channel, DDR2 is 240 pins/channel.
[snipped]

If you are counting signal pins, it's more like 150 pins per channel for
DDR2....

No, I'm counting command, data, power, ground, everything. After all,
8 channels of DDR2 without any power or ground makes for an extremely
low performance system...

DK

Um...it's a bit dishonest or at least misleading for you to discuss chipset
requirements for memory interconnects to include all of the pins that don't
actually connect to the chipset...

/daytripper
 
daytripper said:
So the issue isn't modules/pin, it's bandwidth/pin. Each pin costs
quite a bit of time/effort/money, so the lower the pin count, the
better. FBD is 70 pins/channel, DDR2 is 240 pins/channel.
[snipped]

If you are counting signal pins, it's more like 150 pins per channel for
DDR2....

No, I'm counting command, data, power, ground, everything. After all,
8 channels of DDR2 without any power or ground makes for an extremely
low performance system...

DK

Um...it's a bit dishonest or at least misleading for you to discuss chipset
requirements for memory interconnects to include all of the pins that don't
actually connect to the chipset...

I never was talking about chipset requirements; I was talking about
system board requirements. Do you have to route all 240 pins for each
channel? Yes. Do you have to have those pins to ACTUALLY WORK? Yes.
If you just talk about signal and data pins, then you are missing half
of the picture. The ultimate question is, what is the bandwidth/pin
and what do you trade off for higher capacity.

Suppose I design an ultra-high end memory system; let's call it FBD++.
It uses the same 24 signal and data pins, but it drives them at 10GHz.
That'll have pretty impressive bandwidth/pin....except for the fact
that I'll need several hundred power and ground pins. Ooops, it
doesn't look so good in terms of bandwidth/pin anymore...

Not counting power and ground pins is a big mistake.

Now, if you wish to indulge in such a mistake, then you'll find DDR2
lacking there as well. I believe you cited ~150 pins for DDR2. FBD
uses 48-51 signal and data pins, depending on how you count it. But it
is not a meaningful comparison anyway...

DK
 
So the issue isn't modules/pin, it's bandwidth/pin. Each pin costs
quite a bit of time/effort/money, so the lower the pin count, the
better. FBD is 70 pins/channel, DDR2 is 240 pins/channel.
[snipped]

If you are counting signal pins, it's more like 150 pins per channel for
DDR2....

No, I'm counting command, data, power, ground, everything. After all,
8 channels of DDR2 without any power or ground makes for an extremely
low performance system...

DK

Um...it's a bit dishonest or at least misleading for you to discuss chipset
requirements for memory interconnects to include all of the pins that don't
actually connect to the chipset...

I never was talking about chipset requirements; I was talking about
system board requirements. Do you have to route all 240 pins for each
channel? Yes.

No - and clearly you've never, ever designed a motherboard. One does not route
power and ground, one drops a via into an inner layer plane. If you think you
could actually operate DDR2 dimms and FBDIMMs with their power and ground
references connected via routed wire, you'd be quite wrong...
Do you have to have those pins to ACTUALLY WORK? Yes.

Irrelevant. FBDIMMS have as many power and ground pins at the connector DDR2
dimms. That connector pinout does not define the "channel", by any meaningful
way.
If you just talk about signal and data pins, then you are missing half
of the picture. The ultimate question is, what is the bandwidth/pin
and what do you trade off for higher capacity.

And you're distorting that argument with irrelevant factoids. Where and how
you measure bandwidth per pin makes a difference. Taking that measure at the
dimm connector ain't the one that makes any difference.
Suppose I design an ultra-high end memory system; let's call it FBD++.
It uses the same 24 signal and data pins, but it drives them at 10GHz.
That'll have pretty impressive bandwidth/pin....except for the fact
that I'll need several hundred power and ground pins. Ooops, it
doesn't look so good in terms of bandwidth/pin anymore...
Irrelevant.

Not counting power and ground pins is a big mistake.

Depends on what you're actually counting. If you were counting how many
chipset package pins are required to support one memory interconnect vs
another, *THAT* would be relevant. But you say you weren't making that
distinction ;-)
Now, if you wish to indulge in such a mistake, then you'll find DDR2
lacking there as well. I believe you cited ~150 pins for DDR2. FBD
uses 48-51 signal and data pins, depending on how you count it. But it
is not a meaningful comparison anyway...

Indeed, much of your discussion has suffered from a lack of meaning.

Fwiw, I've designed system boards using DDR six years ago, using DDR2 four and
two years ago, and last year and this year, using FBDIMMs. No big whoop, but
while you've been raving about bandwidth, you've pretty much ignored *latency*
- which often is far more important than bandwidth alone. And latency doesn't
favor FBDIMMs...

Cheers

/daytripper
 
[snip]
No - and clearly you've never, ever designed a motherboard. One does not route
power and ground, one drops a via into an inner layer plane. If you think you
could actually operate DDR2 dimms and FBDIMMs with their power and ground
references connected via routed wire, you'd be quite wrong...

I don't seem to recall saying they are routed alongside data/signals.
Obviously, that doesn't work. But the point is that power and ground
pins cost money. If you ignore P&G you only get an incomplete picture.
I agree the big expense is pins at the memory controller (wherever
that is), but you cannot just ignore P&G.

I haven't designed MBs, you're right.
Irrelevant. FBDIMMS have as many power and ground pins at the connector DDR2
dimms. That connector pinout does not define the "channel", by any meaningful
way.

Right, I wasn't talking about connector pinout, FBD uses a similar
connector as DDR2, with a key in the DIMM so you don't plug them in the
wrong places.
And you're distorting that argument with irrelevant factoids. Where and how
you measure bandwidth per pin makes a difference. Taking that measure at the
dimm connector ain't the one that makes any difference.

Irrelevant.

Why is it irrelevant again? If I need to get an extra P&G plane,
doesn't that add to cost?
Depends on what you're actually counting. If you were counting how many
chipset package pins are required to support one memory interconnect vs
another, *THAT* would be relevant. But you say you weren't making that
distinction ;-)

Well, I guess that's a distinction I should have made. It sems like
the point we want to focus on.
Indeed, much of your discussion has suffered from a lack of meaning.

Fwiw, I've designed system boards using DDR six years ago, using DDR2 four and
two years ago, and last year and this year, using FBDIMMs. No big whoop, but
while you've been raving about bandwidth, you've pretty much ignored *latency*
- which often is far more important than bandwidth alone. And latency doesn't
favor FBDIMMs...

You're right I have not mentioned latency much. FBD makes it so you
trade off latency for capacity. You add a few ns latency for each DIMM
in a channel, but I cannot recall the number off the top of my head, I
want to say it's between 3-15ns, but that's a rather large range.
There is also the SERDES delay...

I think the figures show generally that FBD is about 20ns slower
unloaded, depending on the configuration. I have an Intel slide which
I used in an article here:

http://www.realworldtech.com/page.cfm?ArticleID=RWT110805135916&p=4

However, no independent tests have really done a good job of showing
loaded latency for FBD versus DDR2.

DK
 
At some point in time, some of the folks here doubted that DDR2 would
present a capacity problem. I would like to instruct those individuals
to read the following PDF, in particular, page 37.

http://www-5.ibm.com/ch/events/fachtagung/pdf/system_x_performance.pdf

Just so you don't miss the highlights:

DDR2-400/533 supports 8 DIMMs across two channels
DDR2-667 supports 4 DIMMs across two channels
DDR2-800 supports 2 DIMMs across two channels

Perhaps now it should become abundantly clear why FBD is in fact
necessary for high bandwidth. There's an illustration of this on page
42 as well.

DK

This seems to be quite an issue in Intel world - after all, you can
cram only that many memory channels into a north bridge. When the
north bridge is effectively a part of the CPU (Opteron), every
additional CPU brings to the table another 2 channels, so in AMD world
FBD is not a necessity right now, not even "nice to have" due to much
higher heat dissipation by FBD comparing to DDRx. When these initial
kinks will be ironed out, and bandwidth requirements go waaay up
because of 4+ cores, AMD will be ready, just as it was with DDR2.

NNN
 
Why is it irrelevant again? If I need to get an extra P&G plane,
doesn't that add to cost?

Not "extra", just partitioned (and not the ground plane - splitting grounds,
except in unique situations, like around network interface connectors, is a
Bad Thing). No extra cost in that respect at all, which goes back to my point,
that measuring bw/pin at the memory connector is irrelevant.
Well, I guess that's a distinction I should have made. It sems like
the point we want to focus on.

Indeed. And to be honest, I *thought* that's what you were discussing.
My bad ;-)


Btw, I forgot to mention this earlier, but the "8 FBDIMMs per channel" is a
functional myth, as Intel, in their infinite wisdom, failed to provide
sufficient encoding in the channel to support more than 8 ranks of drams.

Thus, the defacto limit is in fact 4 FBDIMMs, with any of the current enabled
chipsets...

Cheers

/daytripper
 
I wonder...


Doh, sorry about that. Perhaps the presentation was IBM internal. I
managed to DL a copy of it yesterday, but it could be that they just
changed it to require registration...

Oops, thanks to L'Angel... it appears to be something to do with Adobe
Reader and/or Seamonkey or the combo of both. It works with IE. <sigh> I
am heartily sick of Adobe reader, its silent self-installing Download
Manager, all its "versions" and bloat and functional deficiencies... and
now they're going to bugger up Flash/Shockwave.<moan>
 
Why is it irrelevant again? If I need to get an extra P&G plane,
Not "extra", just partitioned (and not the ground plane - splitting grounds,
except in unique situations, like around network interface connectors, is a
Bad Thing). No extra cost in that respect at all, which goes back to my point,
that measuring bw/pin at the memory connector is irrelevant.

I think we are in total agreeance here; DIMM connector = not relevant.
Memory controller = highly relevant.
Indeed. And to be honest, I *thought* that's what you were discussing.
My bad ;-)


Btw, I forgot to mention this earlier, but the "8 FBDIMMs per channel" is a
functional myth, as Intel, in their infinite wisdom, failed to provide
sufficient encoding in the channel to support more than 8 ranks of drams.

Thus, the defacto limit is in fact 4 FBDIMMs, with any of the current enabled
chipsets...

I didn't realize that. Someone should really discuss that a bit more
thoroughly:

http://uw7doc.sco.com/2005forum/presentations/breakouts/Intel_product_overview_1.0.pdf

I hadn't seen a board with > 16 DIMMs, but I sort of assumed that had
to do with board space and lack of a visible market. Not quite so...

However, 16 DIMMs for a 2S system is pretty darn good.

DK
 
At some point in time, some of the folks here doubted that DDR2 would
present a capacity problem. I would like to instruct those individuals
to read the following PDF, in particular, page 37.

http://www-5.ibm.com/ch/events/fachtagung/pdf/system_x_performance.pdf

OK I got it.

Fer Chrissakes it's a sales presentation.
Just so you don't miss the highlights:

Highlights?... in a sales job? WTF are you smoking?... ahh, but it's a
Freudian slip: it made your day to be able to play your Intel vs. AMD card
again. But surely you didn't think we'd fail to notice err, marketing
materials"!
DDR2-400/533 supports 8 DIMMs across two channels
DDR2-667 supports 4 DIMMs across two channels
DDR2-800 supports 2 DIMMs across two channels

Perhaps now it should become abundantly clear why FBD is in fact
necessary for high bandwidth. There's an illustration of this on page
42 as well.

This stuff is all in the AMD tech docs and DDR2-800 is not even shown for
socket F. OTOH socket AM2 supports 2 DIMMs per channel at DDR2-800 so I
tend to think AMD is focussing on what's actually available in the DDR2
supply FTM. As with previous CPUs we'll see improvements to the memory
controller for multiple DIMM support.
 
Back
Top