DDR2 versus FBD

  • Thread starter Thread starter David Kanter
  • Start date Start date
Perhaps you would be so good as to point out where it says anything
about 800MHz DDR2? I didn't see much in the memory controller or DRAM
section...

I saw table 44 which appears to indicate that upto 4 DIMMs/channel can
be used with DDR2-400/533, but only 2 DIMMs/channel at DDR2-667.
DDR2-800 isn't even listed as an option.

I already told you that ([email protected])... not
a good listener?... or your horse is too high.:-)

Could it be that it's not shown because PC2-6400 RDIMMs are not generally
available?... and were possibly not even available as sample parts when the
docs were written?
Well maybe it's the fact that you seem to think IBM is lying, yet these
designs appear to confirm (although hardly in a definitive fashion)
what IBM claimed. And there's the fact that the manual also appears to
confirm what IBM claimed...

No, the question is: why did IBM show DDR2-800 when AMD is not even
claiming support... yet? I made no such suggestion that IBM is lying -
quit trying to put words in my mouth.

Bottom line: this is marketing material and contains a non-mfr approved
configuration; I'd also like to know where they acquired PC2-6400 RDIMMs -
not on Micron's or Crucial's lists! Why pick out such a non-listed, not
generally-available configuration and show it's limitations for only that
system and not for the others - not "lying" but intellectually dishonest?..
I think so... though within tolerance limits for umm, marketing materials
to some people?
 
What I thought but IBM's graphs do not bear that out: rather you get higher
bandwidth for more CPU wotk... apparently.... or do you?:-)
Looking at the graphs again, it looks like 1st gen TOE gives a higher
bandwidth but cpu usage is dependent on MTU size, there's a crossover
point around the 2K to 4K mark where TOE gives higher bandwidth and
lower cpu usage. 2nd gen TOE seems to do what you suggest more
bandwidth , more cpu. With an MTU size of 1K to 8K the bandwidth
difference seems markedly higher for a small cpu increase.

The different MTU sizes give much bigger differences in both bandwidth
and cpu overall though than either TOE or IOAT.

They then bring in 10Gb on X3 - I don't see where that fits compared
to the rest. There is a claim for 30Gb/sec with large packets and
jumbo frames with multiple 10Gb nics.
You also have to use Windows x86-64 "Chimney" - doesn't work in Windows
IA32 as nVidia and people with nForce4 chipsets have found out... and
"Linux has not declared support for TOE". As usual, Intel has a
"technology" as a compromise.


I think the big question is: has Intel finally managed to do a server
chipset? The jury is still out on that and some early results say "no"...
was it Yousuf who posted the msg about the U.S. govt. rejecting it because
of RAID-5 deficiencies?
I remember something about that - a 3 month qualification process
curtailed early.
FB-DIMM is also bloody HOT - see the heatpipes on the AMBs?... which
Crucial now apparently hides behind spreaders.
The AMBs have got typical power requirements around 6W at the moment,
which isn't a lot compared to a cpu, but I don't fancy a Zalman flower
on each dimm module!

I'm not sure I'd want an extra 96W from AMBs alone.
You mean IBM X3 chipset systems?

No - that was really me suggesting that they coloured in the graphs
nicely for the intended audience :) Look guys bigger bars, must be
way better.
I get the feeling they wanted to tout their X3 chipset but did not want to
prod Intel too hard while doing it.

It's an interesting read in the sense that it's dressed up as some
form of study, with this talk of labs and tests, but it really seems
to be a puff piece for the X3.
 
The last page indicates, as you said, 16 RDIMMs ==> 8 RDIMMS/CPU ==> 4
RDIMMS/channel, but it explicitly specifies DDR2-533 or 667. Given
that information, it seems more likely that IBM is wrong, and that in
fact, DDR2-667 can hit 4 RDIMMS/channel, and that DDR2-800 is going to
be restricted to 2 DIMMs/channel.

That seems quite reasonable and expected. 2 DIMMs/channel (4 per
processor) shouldn't pose too much of a limitation for most
applications. With the option to increase that to 4 DIMMs/channel
with virtually no performance penalty, AMD should be able to cover
99%+ of all x86 server applications. The number of cases where
servers use more than 16GB of memory (32GB if/when 4GB registered DDR2
modules become available) per processor installed should be fairly
slim.
The density in blades currently is quite impressive. What doesn't make
sense is why Appro doesn't offer 4GB DIMMs for the Opteron; I don't see
any reason why 4GB DIMMs would be any more difficult, unless they
require 16 devices/module at current densities.

I noticed that too and didn't much understand it. Indeed 4GB DIMMs
probably would require 16 devices/module, but with registered DIMMs
that shouldn't be an issue. The only thing I can think of is that
they don't have a qualified supplier of 4GB DDR2 DIMMs for the Opteron
yet but do have one for the Xeon. Those modules are EXTREMELY rare at
this stage, so it's likely that they're still working on testing and
validation.
 
The density in blades currently is quite impressive. What doesn't make
I noticed that too and didn't much understand it. Indeed 4GB DIMMs
probably would require 16 devices/module,

Depending on whether you can get 1Gb, 2Gb or 4Gb DRAMs.
but with registered DIMMs
that shouldn't be an issue.

No the issue is with loading the data bus. RDIMMs only help make
addressing and command busses easier to implement, they don't help on
the data side at all.
The only thing I can think of is that
they don't have a qualified supplier of 4GB DDR2 DIMMs for the Opteron
yet but do have one for the Xeon. Those modules are EXTREMELY rare at
this stage, so it's likely that they're still working on testing and
validation.

Hard to say...

DK
 
fammacd=! said:
I already told you that ([email protected])... not
a good listener?... or your horse is too high.:-)

Could it be that it's not shown because PC2-6400 RDIMMs are not generally
available?... and were possibly not even available as sample parts when the
docs were written?


No, the question is: why did IBM show DDR2-800 when AMD is not even
claiming support... yet? I made no such suggestion that IBM is lying -
quit trying to put words in my mouth.

Because IBM has their own engineering groups?
Bottom line: this is marketing material and contains a non-mfr approved
configuration; I'd also like to know where they acquired PC2-6400 RDIMMs -
not on Micron's or Crucial's lists! Why pick out such a non-listed, not
generally-available configuration and show it's limitations for only that
system and not for the others - not "lying" but intellectually dishonest?..
I think so... though within tolerance limits for umm, marketing materials
to some people?

I worked for several years in a purchasing engineering organization
(supporting mostly Intel products). It wasn't unusual at all to
spec components tighter than the manufacturer's data book. Three
reasons; sometimes we needed a spec that wasn't, sometimes product
differentiation, and because we could. ;-)

If you can actually *buy* that configuration from IBM, why don't
you believe it to exist? You might not like the price though. ;-)
 
Depending on whether you can get 1Gb, 2Gb or 4Gb DRAMs.


No the issue is with loading the data bus. RDIMMs only help make
addressing and command busses easier to implement, they don't help on
the data side at all.

True, but the clocks, address, and command busses go to *every*
chip on the DIMM. Data doesn't and as such is much more lightly
loaded.
Hard to say...

No it's not.
 
True, but the clocks, address, and command busses go to *every*
chip on the DIMM. Data doesn't and as such is much more lightly
loaded.

So, I don't design memory systems, but it seems to me that if you want
to actually get data out of a DRAM, it needs to be on the databus. How
exactly does a DRAM work if it's not connected to the data bus?
No it's not.

My point is that it seems strange that there would be difficulties
getting 4GB DIMMs to work with the K8, when you can do so with the
Blackford chipset. If the explanation is that it is easier to do high
density FBD, then that implies to me that it is something with the data
bus, since the major change between RDIMMs and FBD is that the data bus
is buffered as well.

However, it could just be that it's harder to meet spec for the K8...

DK
 
So, I don't design memory systems, but it seems to me that if you want
to actually get data out of a DRAM, it needs to be on the databus. How
exactly does a DRAM work if it's not connected to the data bus?

Bit it 0 isn't connected to bit 63. The data bus on a DIMM has one
load per bit per rank. The control, address, and clock signals
have as many loads as there are chips.
 
The AMBs have got typical power requirements around 6W at the moment,
which isn't a lot compared to a cpu, but I don't fancy a Zalman flower
on each dimm module!

Does "typical" average out the pounding the primary takes?
I'm not sure I'd want an extra 96W from AMBs alone.

The Intel Dempsey seed systems had a fan just stuck on top of the DIMMs:
http://www.tecchannel.de/server/hardware/432957/index19.html. Dunno if
they've made some progress to taming the heat since?
 
Bit it 0 isn't connected to bit 63. The data bus on a DIMM has one
load per bit per rank. The control, address, and clock signals
have as many loads as there are chips.

OK, I see what you're saying, the complexity of the data bus network is
a function of whether the DRAMs are x4, x8, x16, etc. and can be
partitioned accordingly. OTOH, addressing really cannot be.

Thanks for clearing that up.
It could also mean that the work hasn't been done yet. It's not a
trivial amount of work to qualify these things. You don't just
take one sample, plug it in, then see if Windows boots.

I'm aware of that.

DK
 
Does "typical" average out the pounding the primary takes?
Not sure about that, I just saw a piece that a new AMB was down 20% on
the typical power requirements of 6W.

Does the primary take more of a pounding? If there was a difference
in usage then I guess it would only affect dynamic power, not static,
and I don't think we've got figures for that - or at least I haven't.
The Intel Dempsey seed systems had a fan just stuck on top of the DIMMs:
http://www.tecchannel.de/server/hardware/432957/index19.html. Dunno if
they've made some progress to taming the heat since?

Even if they shift the heat off the dimms it still has to get out of
the case.
 
OK, I see what you're saying, the complexity of the data bus network is
a function of whether the DRAMs are x4, x8, x16, etc. and can be
partitioned accordingly. OTOH, addressing really cannot be.

....and no one in their right mind uses anything but x16 chips on
DIMMs (eight gets 128b). This is the sort of thing GeorgeM has
been talking about for ages WRT "high density DIMMs".
Thanks for clearing that up.


I'm aware of that.

That's why it's "not hard to say".
 
True, but the clocks, address, and command busses go to *every*
chip on the DIMM. Data doesn't and as such is much more lightly
loaded.
[...]

Not the clocks, they've gone through an on-dimm PLL for a few dimm
generations...

/daytripper
 
Not sure about that, I just saw a piece that a new AMB was down 20% on
the typical power requirements of 6W.

Does the primary take more of a pounding? If there was a difference
in usage then I guess it would only affect dynamic power, not static,
and I don't think we've got figures for that - or at least I haven't.

FB-DIMMs are chained together through the AMBs, acting as transceivers for
the channel, so the Primary carries all signals for all downstream DIMMs.
Of course the memory controller signals are going to see the same activity
so there'll be emm, heat there too. I wonder if that's why AMD is delaying
FB-DIMM implementation... till the heat issue is tamed?
Even if they shift the heat off the dimms it still has to get out of
the case.

From what I've seen, modern cases with 120mm fans seem to do a pretty good
job there, while remaining relatively quiet.

If I remember now, back when I read the translation of that full article,
that system wasn't really a "seed" - it was in an Intel performance lab in
Portland... hmm, nice way to do performance comparison benchmarks.:-)
 
FB-DIMMs are chained together through the AMBs, acting as transceivers for
the channel, so the Primary carries all signals for all downstream DIMMs.
Of course the memory controller signals are going to see the same activity
so there'll be emm, heat there too. I wonder if that's why AMD is delaying
FB-DIMM implementation... till the heat issue is tamed?
Ok makes sense, so anything hitting the last dimm is going to have all
the AMBs going like the clappers. I'd got a single star topology in
my head for this - unfortunately neither the specs nor any
documentation supported this :( So this explains the increasing
latency at each additional dimm on any particular channel, and also
the power draw at the early dimms.

Hmmm I prefer the star topology - no increasing latency and lower
total power - just one problem with pins and tracks, but never mind.

AMD seem to be changing memory when it gains a reasonable level of
market acceptance. In this they seem to be happy to follow Intel - as
long as it's not Rambus. They don't have a need to force the issue,
and don't have to swallow the costs.
From what I've seen, modern cases with 120mm fans seem to do a pretty good
job there, while remaining relatively quiet.

If I remember now, back when I read the translation of that full article,
that system wasn't really a "seed" - it was in an Intel performance lab in
Portland... hmm, nice way to do performance comparison benchmarks.:-)

So a little bit of relatively exotic cooling could be easily applied
to the problem in that environment.
 
True, but the clocks, address, and command busses go to *every*
chip on the DIMM. Data doesn't and as such is much more lightly
loaded.
[...]

Not the clocks, they've gone through an on-dimm PLL for a few dimm
generations...

Ok, the only DIMMs I've played with in several years have either
been PC133 or registered for my home systems. They don't let me
play with real hardware at work anymore. ;-)
 
Ok makes sense, so anything hitting the last dimm is going to have all
the AMBs going like the clappers. I'd got a single star topology in
my head for this - unfortunately neither the specs nor any
documentation supported this :( So this explains the increasing
latency at each additional dimm on any particular channel, and also
the power draw at the early dimms.

Yep - for docs on any of that kind of stuff, I find the Micron Data Sheets
very well written and umm, quite approachable. I've also found their
Technical Notes very useful.
Hmmm I prefer the star topology - no increasing latency and lower
total power - just one problem with pins and tracks, but never mind.

You'd think there must be a better way but you have to assume the "best
minds in the industry" have been applied to the task.
AMD seem to be changing memory when it gains a reasonable level of
market acceptance. In this they seem to be happy to follow Intel - as
long as it's not Rambus. They don't have a need to force the issue,
and don't have to swallow the costs.

I'm not sure DDR2 was Intel's initiative and DDR wasn't. IIRC both came
out of JEDEC after the memory mfrs' wake-up call with the Rambus aggro.
Intel did pump some cash into Micron to get FB-DIMM into place though.
So a little bit of relatively exotic cooling could be easily applied
to the problem in that environment.

A bit of irony in the Google translation of that page: Portland --> haven
country - no idea where that comes from but seems to correlate with the
high honor apparently felt in such an invitation. Hmm, wonder who paid the
airfare?:-) At any rate, I'm sure they did a fine job.<shrug>
 
...and no one in their right mind uses anything but x16 chips on
DIMMs (eight gets 128b). This is the sort of thing GeorgeM has
been talking about for ages WRT "high density DIMMs".

Is that true? From what I've read at Kingston's site, they x8 is the
norm for servers...don't know about desktops though.

DK
 
Back
Top