Robert Hancock said:
Most likely what happens is that at DDR400 with 4 DIMMs it uses 2T
command rate instead of 1T, which reduces the memory bandwidth (I've
heard the figure of 20%).
The datasheet for the 875 mentions neither a register for PAT,
nor a register that controls command rate (1T/2T). So, what
I decided to do today, is run some tests on my P4C800-E, and
see what impact four sticks has on bandwidth.
The first test is two sticks in dual channel, with FSB and MEM
operating sync. I then added two more sticks (all four matching
Ballistix PC3200), running the FSB and MEM at the same standard
speeds. For the third test, I bumped up the clock by 5MHz,
but still keeping the FSB and MEM operating sync. The third test
was chosen, due to the known issue where PAT is disabled if four
sticks are used, and the clock is not exactly 200MHz. The PAT
condition was verified with both CTIAW and CPUZ.
2x512MB DS 2-2-2-6 DDR400 FSB800 P4C 2.8Ghz
Main memory speed (MB/s): Read=3609.4, Write=2410.7 (measured by cachemem)
Read=2955MB/sec (measured by memtest86+ 1.4)
Both CTIAW and CPUZ claim memory is actually set to 2-2-2-5
PAT "(1) fully enabled"
CPU-Z version 1.22 Latency Test
stride->4 8 16 32 64 128 256 512
size(Kb)
| 1 2 2 2 2 2 2 2 2
v 2 2 2 2 2 2 2 2 2
4 2 2 2 2 2 2 2 2
8 2 2 2 3 2 3 3 3
16 3 3 7 15 20 19 18 19
32 3 4 8 17 19 18 19 19
64 3 4 9 18 18 18 19 19
128 3 4 9 18 18 18 18 19
256 3 4 9 18 19 19 20 21
512 3 4 9 18 20 20 21 25
1024 4 8 15 28 53 180 172 210
2048 5 9 15 29 55 178 176 187
4096 5 8 16 30 58 215 173 190
8192 5 8 15 28 58 216 176 190
16384 5 8 15 28 58 216 170 189
32768 4 9 16 29 52 183 174 186
4x512MB DS 2-2-2-6 DDR400 FSB800 P4C 2.8Ghz
Main memory speed (MB/s): Read=3307.7, Write=1858.5 (measured by cachemem)
Read=2733MB/sec (measured by memtest86+ 1.4)
Both CTIAW and CPUZ claim memory is set to 2-2-2-6 (agrees with BIOS)
PAT "(1) fully enabled"
CPU-Z version 1.22 Latency Test
stride->4 8 16 32 64 128 256 512
size(Kb)
| 1 2 2 2 2 2 2 2 2
v 2 2 2 2 2 2 2 2 2
4 2 2 2 2 2 2 2 2
8 2 2 2 4 2 2 3 3
16 3 3 7 15 18 19 19 19
32 3 4 8 17 18 18 19 19
64 3 4 9 18 18 18 20 19
128 3 4 9 18 18 18 18 19
256 3 4 9 18 19 19 19 21
512 3 4 9 18 20 23 21 25
1024 5 9 16 30 58 209 208 239
2048 5 9 17 32 63 210 203 214
4096 5 9 17 33 64 239 201 216
8192 5 9 16 31 58 208 203 214
16384 5 9 17 33 63 210 201 215
32768 5 10 17 31 58 210 201 239
4x512MB DS 2-2-2-6 DDR410 FSB820 P4C 2.87Ghz (clk=205MHz)
Main memory speed (MB/s): Read=3230.7, Write=1890.8 (measured by cachemem)
Read=2802MB/sec (measured by memtest86+ 1.4)
Both CTIAW and CPUZ claim memory is set to 2-2-2-6 (agrees with BIOS)
PAT "(0) reserved" - means, AFAIK, that PAT is disabled.
CPU-Z version 1.22 Latency Test
stride->4 8 16 32 64 128 256 512
size(Kb)
| 1 2 2 2 2 2 2 2 2
v 2 2 2 2 2 2 2 2 2
4 2 2 2 2 2 2 2 2
8 2 2 2 3 3 2 2 3
16 3 3 7 15 18 19 19 19
32 3 4 9 17 18 19 19 19
64 3 4 9 18 18 18 19 19
128 3 4 9 18 18 18 19 18
256 3 4 9 18 19 19 19 19
512 4 4 9 18 20 23 22 26
1024 5 9 17 32 62 223 222 229
2048 5 9 17 32 62 223 217 229
4096 5 9 17 32 61 224 218 229
8192 5 9 17 32 62 222 215 229
16384 5 9 17 34 65 223 216 228
32768 5 10 17 33 61 222 245 255
What is interesting to me, is the most significant
effect seems to be the transition from 2 sticks to
4 sticks. In both cases, PAT is supposed to be enabled,
at least according to CPUZ and CTIAW. But the second case
is slower. Now, I did notice that even though the BIOS
memory timings were set the same for all tests (2-2-2-6
in the BIOS), the last param was actually 5 when using two
sticks, and was set to 6 when using four sticks. (The BIOS
must have a subtle bug in it ? Or is this evidence of
something ?)
Now, the other thing that is puzzling, is
the last test, with the slightly elevated
clock, has much "smoother" columns of numbers.
How can disabling PAT be causing such a phenomenon ?
Another puzzler, is the measurement of bandwidth. Between the
second and third test, memtest86+ finds a bamdwidth increase
which is exactly equal to the clock speedup factor. Cachemem
got a drop on read BW and an increase in write BW. And, I ran
Cachemem multiple times, and the deviation on the measured
bandwidth numbers is down to the last digit, so the drop and
the increase are reproducible.
All I can conclude, for the benefit of the OP, is that
using four sticks is costing 7.5% memory bandwidth. And
when you raise the clock, and PAT is disabled, what happens
there really depends on which tool is measuring the bandwidth.
It looks like if you raise the clock high enough, you'll get
the bandwidth back, so that is always an option.
This is one case, where a logic analyser would be called for,
rather than trying to conclude anything from software testing.
Paul