I have been asked about how dual channel RAM works, and I have found myself
at a loss, realizing that I really don't know what it is, exactly how it
works, especially (also) with the AMD On Chip memory controller.
Could someone explain it to me well, and is there a good website or two
explaining the topic?
TIA!
I'm not sure "Dual Channel" is a technically precise enough word,
that a defensible description can be made for it in all situations.
Perhaps one of the architecture groups would be a better place for
the question, than a motherboard group.
One level of distinction I guess I would start with, is the
width of the processor bus, and the width of individual memory
DIMMs. The P4 and the AthlonXP have 64 bit external interfaces to
whatever is used for the Northbridge. A DIMM is also 64 bits wide.
It is a no brainer figuring out how to connect a 64 bit device
to another 64 bit device. (We'll ignore how rate matching is
done when the FSB and memory are running async.)
Now, to use more than one DIMM at a time, requires a couple of
things. Generally, there is a relationship between cache organization,
and the size of data objects fetched from main memory. Generally,
I notice that the size of a data fetch from the memory controller,
is not larger than a single cache line. So, to profitably use
dual channel memory, also requires that the cache be sized to handle
what is coming from main memory. (In other words, not just any
processor can benefit from a dual channel organization, just as
if I tried to glue quad or octal channels on today's processors,
nothing good would happen. The performance would suck, as most
of the time spent on each channel would be wasted on overhead.)
If I want to use two channels, I have to figure out how to organize
(interleave) memory addresses. For example, I could have bytes 0-7
on one DIMM, then bytes 8-15 on the second DIMM. When I go back to the
first DIMM again, I'd be looking for bytes 16-23 and so on. The
overhead of setting up a memory operation on a DIMM is large enough,
that a burst of data is requested from the memory. The processor will
be doing cache line sized operations to the memory, because to work
on a single byte somewhere in main memory would be a super-expensive
way to do business.
So, what makes a situation "Dual Channel" ? These are the
attributes that come to mind.
1) The memory channels are autonomous. In other words, I can plug a
DIMM into channel0. Or, I can plug a DIMM into channel1. The
channels have equal significance, and one channel is
indistinguishable from the other.
2) When a cache line sized operation is required by the processor,
the two channels alternate supplying information to the
processor. This allows, say, the 3.2GB/sec theoretical bandwidth
of a DDR400 DIMM to be added to the 3.2GB/sec bandwidth of a
second DIMM.
3) How flexible the memory controller is, in terms of mixing DIMMs,
doesn't influence the "Dual Channel" moniker. Some dual channel
controllers allow mixing DIMMs on each channel, and operate in
interleaved mode, as long as there are equal quantities of RAM
on each channel. Others require exact matched DIMMs, perhaps
placed in particular slots. None of that matters to this
discussion.
So, I guess the key distinguishing feature, is the ability to
interleave memory addresses, storing bytes 0-7, 16-23, 32-39 in
one channel, and 8-15, 24-31, 40-47 in the other channel.
Now, in all of this, I didn't mention "rate matching". If the
bandwidth is not balanced between the processor and the memory,
then data could "pile up" or "run dry". A well matched situation
would be a FSB800 processor (6.4GB/sec bus interface) talking to
two DDR400 (3.2GB/sec) DIMMs. Some kind of control mechanism
must be in place, to pace the data. Either a FIFO queue temporarily
holds the data somewhere, or the individual words of data in the
burst are occasionally delayed by a cycle, until the consumer of
the data is ready for that data word. This is outside the discussion
of dual channel (and for motherboards, I doubt I could answer the
question anyway).
There are a couple of examples we could go through, to see some
of the issues around dual channel. First, we'll start with the
Nforce2.
On the Nforce2, if we set up the system to be synchronous, in
fact the memory subsystem has twice the theoretical bandwidth
of the processor interface. And, that means dual channel will be
a waste, in the sense that on a read, the burst of data from
the Northbridge must be slowed to match the processor. That is
why you don't see a big difference between single channel and
dual channel mode in synchronous operation. (The excess of
data is still handy though, because a memory bus is seldom
100% efficient, and dead cycles occasionally happen on a channel.
The dual channals might have data to transfer, due to the
excess. So a slot might get filled that would otherwise be
empty. Typically, users see a 5% difference on an average
application.)
Where dual channel would really pay off on an Nforce2, is
precisely where most people would not run it. Say you had
two PC2100 DIMMs and an FSB400 processor. If the DIMMs
were run single channel, it would be 2.1GB/sec memory
bandwidth versus 3.2GB/sec processor bandwidth. In dual
channel mode, the memory offers 4.2GB/sec, and there would
be a significant difference between those two configs.
But, only an idiot would buy PC2100 DIMMs today, so the
Nforce2 doesn't have an opportunity to shine.
The second case I'll bring up, is the one that prompted
your question. That is, what is the situation on
Opteron/Athlon64 ? First of all, a dead giveaway, is if
you download documents from AMD, they don't use the
term "dual channel" in their technical descriptions.
Instead they use "64 bit mode" and "128 bit mode" when
describing memory. That should tell you right away,
that something is up.
One thing to notice about A64, is DIMMs cannot live in
A1 and A2 by themselves. First you have to populate a
B1 or B2 slot, before an A slot can be used. That violates
(1) above.
I suspect what is going on there, is the processor internal
organization is no longer the 64 bit width we are accustomed
to, on the P4 and AthlonXP. If the processor was 128 bits
width (or the bandwidth equivalent thereof), then there
is no need to interleave accesses to the DIMMs. When a
DIMM is present in A1 and B1, the same command is sent to
both, and it is as if the DIMM is 128 bits wide. Yes, the
data organization of the DIMMs still looks like the
interleaving on a dual channel situation, but in this
case, the processor can "eat" data simultaneously from
both DIMMs. Since there is no longer a "bottleneck" at
the processor interface, there is no notion of interleaving
(the interleaving doesn't have a physical significance, in
this case it is all internal and hidden from us).
Well, how does 64 bit mode work then ? One way to do it,
would be to make two reads to a DIMM, glue the two 8 byte
quantities together, to make a 128 bit wide word for the
processor (internal) interface.
Does A64 benefit from 128 bit mode ? Absolutely. Those
Sandra memory benchmarks don't lie
I don't know if any AMD documents dwell on details like this,
and this is the best I can do to sketch how it _might_ work.
The AMD processor actually contains a memory controller,
a crossbar, and multiple HyperTransport links. Depending on
whether the processor is Athlon64, or one of the several
types of Opteron, determines how many of the HT links is
connected to pins on the processor. The crossbar is a
routing device that decides which interface will satisfy
a request for data. I have no idea what the internal
organization of the crossbar is - in Opteron, there are a
lot of cache protocol issues going on in there, and architects
don't tend to dwell on the tiny issues, like bus widths, in
a discussion like that. I would say, the maximum width
of a bus inside A64, would be the size of a cache line
(a safe bet) - whatever that is.
If you want to "wallow in the architecture", try this site.
I don't want to rewrite any of the fine material the
author has provided here. (Note - if you are on a dialup
modem, this will take a while to load.)
http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html
To sum up - Athlon64/Opteron is not dual channel. Two
DIMMs operating in 128 bit mode, simply matches the
internal organization of the processor better.
Paul