I have a Sempron 2600+ plugged into an A7V880. It's a 333MHz FSB
system, rather than a 400 MHz, and I have two Kinston 256 MB DIMMs--in
the blue slots just as it suggests to. When it boots, the BIOS does
tell me it is in Dual Channel mode. In the BIOS you can turn on or
off dual channel memory access. What is interesting is that according
to the Sandra benchmark the memory bandwidth numbers REMAIN NEARLY
IDENTICAL whether dual channel memory is enabled or disabled. Int =
2261MB, Float 2100.
Also, if you take a look at Kingston's site there is a PDF whitepaper
on Intel's dual channel memory information, and one graphic is a chart
that outlines the different bandwidths of different types of memory in
GB/sec. DDR333 peaks at 5.4GB/sec.
No real world results seem to measure up to the theoretical limit,
including all the Intel CPU/chipset combos, and I'm just curious why.
I'm most curious as to why there is no difference in performance on
the board I have when dual channel memory is off--it's one of the
reasons I bought the board.
Dave in Colorado
Pretty cool, eh
Makes you wonder why they put dual channel on
the Athlon motherboards.
The deal is all in the numbers. Both the Athlon and the P4 have a
64 bit data bus. The Athlon is a DDR bus, and the P4 is a QDR
(quad pumped) bus. If the Athlon is clocked at 200MHz, there are
400Megatransfers per second, or 3200MB/sec. That bandwidth is
obviously fully matched by a single DDR DIMM running at PC3200,
clocked at 200MHz for a DDR400 transfer rate.
The P4 is quad pumped, and with a 200MHz clock, has 800Megatransfer
per second. With the same 64 bit bus width, that gives 6.4GB/sec
transfer rate. That is fully met by two PC3200 DIMMs running in dual
channel (Uber DIMM) mode.
So, the P4 can actually hoover in the data from a dual channel
configuration, while the Athlon cannot. Of course, if you have some
slow memory, like two sticks of DDR266 memory, then two of those in
dual channel configuration will be faster than one stick at DDR266
on the Athlon. But, as the sticks get closer to matching the
processor FSB and running synced with the processor, there is less
and less reason for dual channel.
Dual channel on an Athlon is good for allowing simultaneous AGP
texture transfer, or for allowing PCI DMA transfers to happen at
the same time as the processor is accessing the memory. It is also
helpful if the Northbridge has integrated graphics, and uses main
memory for frame buffer and texture memory. And, dual channel memory
also allows more sticks to be installed on a motherboard, before
the buses become overloaded and need to have the clock rate reduced.
With video cards having decent sized video memory now, AGP texture
transfer is not a high runner kind of bus cycle. And PCI DMA, at about
100MB/sec, is not going to make a dent in a multi GB/sec interface.
Only a board with integrated Northbridge graphics is going to be
a winner with dual channel.
About the best display of a difference you might see, is to run
memtest86 from memtest.org. It has a bandwidth display in the upper
left hand corner of the screen. I think you'll find a difference
between the single and dual channel configs there. For most normal
uses, you'll see virtually no application difference between the
two test cases. (I.e. Even if there was a 5% difference in bandwidth,
the application difference would be about 1.5%)
About the theoretical limit versus the practical limit. The command
bus is SDR and the data bus is DDR. From one command to another,
there are internal limits in the memory (the memory timing numbers)
that prevent commands from coming back to back. All of the time
setting up the memory for a transfer, represents time not spent
transferring data, and so those "dead" cycles represent inefficiency.
One of the biggest efficiency killers is "command rate" or for the
Nforce2 folks, "Command per clock (CPC)" mode. If a memory channel
is heavily loaded (two or more double sided DIMMs being typical),
the address bus begins to fail on setup time. The solution, which
in many cases is not a configurable BIOS option, is to run in
2T mode (AKA "CPC off" mode). What happens is, the address bus is
driven for two clock cycles, but the info on the bus is only
strobed on the second cycle. This gives 1+ cycles of setup time
for the info on the address bus, and solves the loading problem.
But it also adds a whole wasted cycle every time a command is sent.
On an Athlon64 system, setting command rate at 2T will chop
1000MB/sec off the Sandra benchmark, a 20% or so hit (I'm going
from memory here, and don't want to trace down a reference for this
- try looking on Abxzone for more info).
If you want another puzzler to play with, I tried setting CAS to
2.5 or 3 on my A7N8X-E board, and got as close to identical
bandwidth numbers as you could ask for. If you have the option to
try that on your board, I'm curious whether your chipset does the
same thing or not. My suspicion is the Nforce2 chipset may not
actually support fractional data transfer cycles, and maybe it
actually only runs at CAS2 or CAS3, but not CAS2.5. I wouldn't
expect all chipsets that support Athlon to do that, so there is
another experiment for you to try. (I think I did that experiment
in dual channel mode, and maybe it behaves differently in single
channel mode. As I've put the board away for the time being, it
may be a few days before I can try that again.)
Paul