titus12 said:
Should the DDR2 memory DRAM Frequency be as close to the Bus Speed of the
CPU as possible? I have a P4 630-3.0MHz (4HT inside) processor at 800FSB.
Are the settings below OK? There are two DDR2-800 PC6400 2GB sticks on an
Asus P5GD2-X Motherboard.
I want to make my system as fast as possible to hold off buying Vista.
DRAM Frequency - 301.1 MHz
FSB
RAM - 2:3
CL, tRCD, tRP - all 4 clocks
Cycle Time - 12 clocks
Core Speed 3010.8 MHz
Multiplier - x15.0
Bus Speed - 200.7 MHz
FSB - 802.9
Thank you;
David
Those are the fastest stock settings on the P5GD2-X motherboard, according
to the manual.
Note that, the memory interface bus, is not 100% efficient. What that means,
is there are "dead spots", where no data is transferred. The bus transactions
might look like this. (This might be the case, even if two transactions
were available, back to back, from the processor. There might still be
gaps.) This is caused by the bus protocols used, and the internal
operation of DRAM (how cells are read and refreshed).
_____ _____ _____ _____
x-----x-----x-----x-----x_____x_____x_____x_____x-----x
This complicates matters, when you're trying to "do the math", to match
the FSB and memory performance. For example, on the surface of it,
if you had dual channel memory at PC2-4800 (which is DDR2-600 rate),
the bandwidth might look like 2*4800 = 9600MB/sec. The processor is
8 * FSB800 = 6400MB/sec. You might think the memory is faster than
the FSB, but the memory might be only 60% efficient, in which case
they're actually pretty close to one another in bandwidth performance.
If the memory was single channel (only one stick installed, other
channel empty), then you'd get some percentage of 4800MB/sec instead.
The interfaces are designed, such that they are flow controlled, so
nothing gets lost. If one subsystem is faster than another, the
logic makes sure all the data is transferred correctly.
Inside the processor, many of the requests by the core, are satisfied
by the L1 or L2 cache. Only things that are not in the cache, result
in a need to go to the FSB and ask for them. Which is why, for
some core-limited (not memory demanding) applications, the performance
of the memory matters little. The L1 and L2 answer all the requests,
and do it at full speed, with low latency.
In your case, your biggest improvement comes from ensuring the memory is
in dual channel mode. That allows a doubling of memory bandwidth, because
two channels can be working at the same time. Relatively speaking,
changing a setting from DDR2-533 to DDR2-600, would be almost
unmeasurable.
For best performance:
1) Make the CPU core frequency go as fast as possible. This gives
the best improvement. On a 3GHz processor, you could overclock it
slightly, and that is your biggest improvement.
2a) Make the memory go as fast as possible. When the memory goes faster,
there are still those gaps, but on average, you squeeze in more bytes
transferred per second. The improvement stops, once the memory is
going so fast, that the FSB limits any additional transfer rate.
A rule of thumb I made up, is the 1/3rd rule. If I can improve the
memory bandwidth by 10%, the average application level performance
improves by 3%. That means your program finishes 3% sooner. This
rule is only valid over a limited range of memory bandwidth improvements,
so don't get carried away with it. And Photoshop has a different
behavior than Microsoft Word - when I say average, that means some
programs do better or worse than the calculated 3% number.
2b) Operating in dual channel mode, is another way to make the memory
subsystem go faster.
By far, the CPU speed improvement is the best. If you get a 10% clock
improvement, you might see a 9% application improvement. Whereas, with
the memory, a 10% improvement, might yield a 3% application improvement.
So the CPU is the thing to concentrate on first, before the memory.
You can spend a lot more money on memory, and not get back much
of an improvement.
Using benchmarks like SuperPI, you can evaluate all of these possibilities
for yourself. Speed up the processor clock, run a benchmark. Speed up
the memory (i.e. try a run at DDR2-533 and try a run at DDR2-600). Try
with one stick of RAM, or two matched sticks in dual channel configuration
and run the benchmark. Pretty soon, you'll be able to verify some of the
above statements for yourself.
SuperPI (something that reports execution time in seconds). My
3GHz P4 processor gets about 50 seconds for 1 million digits of PI.
Try it for yourself. This program uses only one CPU core, so
running this on a quad processor, sees no additional benefit
from the other three cores. The worlds record for 1 million
digits, is around 9 seconds, as far as I can remember.
http://www.xtremesystems.com/pi/super_pi_mod-1.5.zip
Paul