P2B poor memory performance

  • Thread starter Thread starter Erwin Dokter
  • Start date Start date
E

Erwin Dokter

I have three system, all based in the 440BX chipset: one Dell GX1 and
two P2B rev 1.02 based systems. The Dell and one P2B has a Celeron 1.4
GHz on a Powerleap, the other P2B has a 1.2 GHz on a Powerleap. No
overclocking.

I just noticed something that I'm not quite happy about; the P2B's
have very poor memory performance. All systems run at 100 MHz fsb and
have two sticks of Micron 128MB CL2. When running the memory read
benchmark in Aida32, the Dell clocks at 750 MB/s, while the P2B's
barely reach 520 MB/s. It is as if the memory is running at 66 MHz,
but Aida32 says everything is running at 100 MHz.

Theoretically, all systems should have a maximum memory bandwidth of
800 MB/s. Is there some setting in the BIOS (1014 beta 3) that I
overlooked? I set all to default and tweaked only some minor settings.
All machines run Windows 2000 Professional.

-- Erwin Dokter
 
Erwin said:
I have three system, all based in the 440BX chipset: one Dell GX1 and
two P2B rev 1.02 based systems. The Dell and one P2B has a Celeron 1.4
GHz on a Powerleap, the other P2B has a 1.2 GHz on a Powerleap. No
overclocking.

I just noticed something that I'm not quite happy about; the P2B's
have very poor memory performance. All systems run at 100 MHz fsb and
have two sticks of Micron 128MB CL2. When running the memory read
benchmark in Aida32, the Dell clocks at 750 MB/s, while the P2B's
barely reach 520 MB/s.

Which, BTW, is about what AIDA32 also says on my P2B-D at 100 MHz here.
(2-2-2-8 timing, MA wait state "Fast" - and that with two of the four
sticks being 3-2-2 Toshibas and the low stock VIO. That certainly speaks
for the quality of both the board and the modules.) This low buffered
bandwidth readout seems to be common to *all* Asus BX boards, including
the P2B-S and P3B-F. (The benchmarks w/o buffering in Sandra and the
like are perfectly fine.) I didn't know that this wasn't true for other
manufacturers' BX boards. This seems to be some oddity, given all other
kinds of benchmarks are perfectly in line with what one would expect;
the P2L97-DS, which gave a slightly higher memory bandwidth in AIDA32,
felt noticeably slower. I'd suggest you try this rather nice tool by the
author of the popular "System Speed Test" for DOS:
<http://user.rol.ru/~dxover/cburst/>.
(Peak mem bandwidth: 670-something MB/s, mem latency 57 cycles.) It
would be interesting to know how the Dell system fares. Apparently those
guys love tweaking their stuff, given the BX on my notebook is also
pretty much set for maximum performance.

Stephan
 
OK, and for some tweaking:
http://mozcom.com/~ronnieg/articles/tweakbx.html

* Host Bus Fast Data Ready enabled gave a very slight boost (also
available as BIOS option)
* CL, RCD, RP were already set to 2 each
* I won't touch Leadoff Command Timing, IIRC changing this froze my
systems 100% of the time
* DRAM Leadoff Timing was already at 00 (probably the MA Wait State
BIOS option took care of this)
* DRAM Idle Timer to 0111 gave a noticeable boost (IIRC that's also a
BIOS option)
* DRAM Refresh Rate to 62.4 µs seemed fastest, faster than 124.8 µs
actually

Result:

Sandra MAX3! memory b/w benchmark with buffering disabled: 347 MB/s ALU,
377 MB/s FPU. System clock is 501.08 MHz divided by 5.

System seems stable, at least burnbx.exe (from CPUBurn) didn't complain.

Stephan
 
Stephan said:
* DRAM Idle Timer to 0111 gave a noticeable boost (IIRC that's also a
BIOS option)

I retract that. With longer refresh periods, results are actually best
with the default setting of 0011 a.k.a. 8 (and the difference this
option makes is very small). Seems Sandra needs a few runs to "warm up".
* DRAM Refresh Rate to 62.4 µs seemed fastest, faster than 124.8 µs
actually

Anyway, 249.6 µs turned out to be fastest: 352 MB/s ALU, 382 MB/s ALU.
The difference to 62.4 µs (349/379 MB/s) is very small, though, almost
in the random noise. Hmm, now that is funny: Even if I set this back to
the default of 15.6 µs, the benchmark result is still very good
(349/377). I'm under the impression that Windows messes with the chipset
registers somehow, since after boot the result is much lower (a mere
320/350). I think I noticed something similar with the P2L97-DS, which
also was slower after a fresh boot than after waking up from suspend to
disk.

Stephan
 
Some interesting results that baffle me somewhat...

I ran the test on both machines. They ended up showing *exact* same
charts and values, only the memory speed line shows different results.
Here's the breakdown from the reports:

Dell Optiplex GX1:
Memory 32-bit Bandwidth:
Read: 575,28 MB/s ( 7,78 Cycles), Write: 170,34 MB/s ( 26,29 Cycles)
Memory 64-bit Bandwidth:
Read: 603,35 MB/s ( 14,84 Cycles), Write: 170,29 MB/s ( 52,59 Cycles)
Memory 128-bit Bandwidth:
Read: 603,30 MB/s ( 29,69 Cycles), Write: 170,37 MB/s (105,14 Cycles)
Memory Peak Bandwidth: 604,66 MB/s, Latency: 201,27 Cycles

Asus P2B:
Memory 32-bit Bandwidth:
Read: 397,85 MB/s ( 11,36 Cycles), Write: 175,29 MB/s ( 25,78 Cycles)
Memory 64-bit Bandwidth:
Read: 398,04 MB/s ( 22,71 Cycles), Write: 175,23 MB/s ( 51,59 Cycles)
Memory 128-bit Bandwidth:
Read: 398,72 MB/s ( 45,34 Cycles), Write: 175,29 MB/s (103,15 Cycles)
Memory Peak Bandwidth: 398,21 MB/s, Latency: 160,58 Cycles

So... What's Dell's secret here? And should it matter as the graphs
look exactly alike?

-- Erwin Dokter
 
Erwin said:
I have three system, all based in the 440BX chipset: one Dell GX1 and
two P2B rev 1.02 based systems. The Dell and one P2B has a Celeron 1.4
GHz on a Powerleap, the other P2B has a 1.2 GHz on a Powerleap. No
overclocking.

I just noticed something that I'm not quite happy about; the P2B's
have very poor memory performance. All systems run at 100 MHz fsb and
have two sticks of Micron 128MB CL2. When running the memory read
benchmark in Aida32, the Dell clocks at 750 MB/s, while the P2B's
barely reach 520 MB/s. It is as if the memory is running at 66 MHz,
but Aida32 says everything is running at 100 MHz.

Theoretically, all systems should have a maximum memory bandwidth of
800 MB/s. Is there some setting in the BIOS (1014 beta 3) that I
overlooked? I set all to default and tweaked only some minor settings.
All machines run Windows 2000 Professional.

-- Erwin Dokter

I don't know if it would make any sense or difference but you could have
a look at the 'L2 cache ECC' setting for the CPUs - or its lack of it.
The theory is it might be disabled by default in the Dell and enabled in
the P2B's, but as said I don't know if and to what extent it would matter.

Regards
Nikos
 
Erwin said:
Some interesting results that baffle me somewhat...

I ran the test on both machines. They ended up showing *exact* same
charts and values, only the memory speed line shows different results.
Here's the breakdown from the reports:

Thanks for pointing me to the Report button, never used that b4 ;).

The results on my moderately tweaked P2B-D look like this:

Processor: Intel Pentium®III 501,16 MHz Core: [0681] Coppermine 0.18 µm

Memory 32-bit Bandwidth:
Read: 439,44 MB/s ( 4,35 Cycles), Write: 233,44 MB/s ( 8,18 Cycles)
Memory 64-bit Bandwidth:
Read: 746,99 MB/s ( 5,11 Cycles), Write: 233,80 MB/s ( 16,35 Cycles)
Memory 128-bit Bandwidth:
Read: 747,38 MB/s ( 10,23 Cycles), Write: 233,48 MB/s ( 32,75 Cycles)
Memory Peak Bandwidth: 677,25 MB/s

The 128-bit read bandwidth dips down to 500-something sometimes.

Results are virtually unchanged after reverting to 15.6 µs refresh (i.e.
"BIOS-level" tweaks only).

From the bandwidth table, 1024K row:

MMX Read 473,19
SSE Read 473,32

MMX Write 165,63
SSE Write 165,08
So... What's Dell's secret here?

No idea. I'd do register dumps with WPCREDIT on both systems and compare
them with particular attention to memory related and possibly "reserved"
registers. (Doing a "live" comparison with the program itself might work
out better, though. Well, after looking at the register file I'm
actually pretty sure. ;)
And should it matter as the graphs
look exactly alike?

Good '?'. No idea, but I'd certainly like to know where that strange
difference comes from!

Stephan
 
Nikolaos said:
I don't know if it would make any sense or difference but you could have
a look at the 'L2 cache ECC' setting for the CPUs - or its lack of it.
The theory is it might be disabled by default in the Dell and enabled in
the P2B's, but as said I don't know if and to what extent it would matter.

I don't think it's actually possible to disable L2 ECC on a CuMine PIII
or higher... this just applied to PIIs with off-die L2 cache.

Stephan
 
Stephan said:
I don't think it's actually possible to disable L2 ECC on a CuMine PIII
or higher... this just applied to PIIs with off-die L2 cache.

Stephan

Maybe. But I have a dual socket370 board here (Iwill DVD266u-RN) and
unless I've been smoking interesting stuff, there's certainly an L2 ECC
setting in the BIOS. Plus, the board does not support PPGA CPUs, only
FC-PGA/FC-PGA2 ones so only coppermine or tualatin cores were meant to
run on it in the first place.
So far I've thought the setting was there for a reason. It'd be pretty
interesting (and shameful for those that put together the BIOS) if it
indeed isn't possible to disable the L2 ECC with these cores.
Nevertheless, I'll try to benchmark with either setting and see if it
makes any difference.

Regards
Nikos
 
Ok I did a buffered Sandra 2004 memory test with L2 ECC "disabled" and
ran it 10 times in a row. Interestingly enough the results were:

749/773
761/788
792/822
802/819
802/822
804/826
804/827
805/828
810/831
809/831

at which point I stopped.
This seems like a pretty big disparity between the 1st and 10th run, is
this a known sandra issue? And if so what's to believe from the above
(if any)?

I will also try 10 consecutive tests in exactly the same manner with L2
ECC enabled and post back. For reference though a single test I squeezed
in after a reboot showed 750/772 so it's pretty much identical to the
corresponding run #1 of the non-ECC tests.

Regards
Nikos
 
Maybe. But I have a dual socket370 board here (Iwill DVD266u-RN)
and unless I've been smoking interesting stuff, there's certainly an
L2 ECC setting in the BIOS. Plus,

yes...Level 2 (on processor) caching can be turned OFF.

Many folks turn off the L2 caching as it can interfere with
overclocking (stability). Windows 2k/xp also has registry
setting for amount of L2 to use which is default 256kb,
not sure if auto-adjust for 128kb or just sets to 0. For
512kb processors (Level 2) one needs to set manually.

running without L2 caching can impact performance
from 30-50% overall in Windows or other tasks.
 
NT said:
yes...Level 2 (on processor) caching can be turned OFF.

Of course it can be turned off. However we are not talking about
disabling the L2 cache, but the L2 ECC function.
Many folks turn off the L2 caching as it can interfere with
overclocking (stability). Windows 2k/xp also has registry
setting for amount of L2 to use which is default 256kb,
not sure if auto-adjust for 128kb or just sets to 0. For
512kb processors (Level 2) one needs to set manually.

Eh? What setting is this?
running without L2 caching can impact performance
from 30-50% overall in Windows or other tasks.

Regards
Nikos
 
Nikolaos Tampakis said:
Of course it can be turned off. However we are not talking about disabling the L2 cache, but the L2 ECC function.

error correction, to a degree same story (o/c stability)
but cache ecc should be left on, ram ecc should be off
unless using custom buffered ram and appropriate mb.
Eh? What setting is this?

keywords
registry level 2 cache windows
http://www.ntcompatible.com/faq-33.html

there is some controversy if any effect on newer cpu
or on units with L2 on-die, I included it as a note.
 
The registry hack mentioned here is a hack and does not work.
MS has documented this as a myth.
If the CPU has L2 cache enabled, all of it is used regardless.
Don't waist your time on this.
- Tim
 
NT said:
error correction, to a degree same story (o/c stability)
but cache ecc should be left on, ram ecc should be off
unless using custom buffered ram and appropriate mb.




keywords
registry level 2 cache windows
http://www.ntcompatible.com/faq-33.html

there is some controversy if any effect on newer cpu
or on units with L2 on-die, I included it as a note.

There's no controversy, it doesn't have any effect. Hence the original 'eh'.

Regards
Nikos
 
Follows the complete table of 10 consecutive runs per L2 ECC setting
(enabled/disabled) of Sandra 2004 memory bandwidth benchmark, under
identical conditions (reboot, no network connections, let winxp alone
for a couple of minutes and begin consecutive tests).

L2 ECC enabled L2 ECC disabled
746/768 749/773
759/786 761/788
792/816 792/822
792/820 802/819
793/822 802/822
793/822 804/826
794/822 804/827
798/822 805/828
802/823 810/831
801/824 809/831

So it seems as though the L2 ECC setting indeed works, of course this
isn't granite hard evidence, rather just a sensible indication.

Regards
Nikos
 
[SecondLevelDataCache registry entry]
There's no controversy, it doesn't have any effect. Hence the original 'eh'.

Indeed:
http://support.microsoft.com:80/support/kb/articles/Q183/0/63.asp

| This is not related to the hardware; it is only useful for computers
| with direct-mapped L2 caches. Pentium II and later processors do not
| have direct- mapped L2 caches. SecondLevelDataCache can increase
| performance by approximately 2 percent in certain cases for older
| computers with ample memory (more than 64 MB) by scattering physical
| pages better in the address space so there are not so many L2 cache
| collisions.

I.e. I might gain a tiny bit of performance by adjusting this setting on
my NT4 install on an old HX board with its onboard cache, but with
anything beyond (Super) Socket 7 it's worthless.

Stephan
 
Nikolaos Tampakis said:
Of course it can be turned off. However we are not talking about
disabling the L2 cache, but the L2 ECC function.


Eh? What setting is this?
This is a common 'semi myth'. There is a setting that adjusts the 'assumed'
cache size, in some parts of the memory management algorithm, which can
change the default block ordering used by the WNT/2K/XP memory management
system. However the 'effect', is allmost below detectability (like 0.01%, if
you are lucky!). It does not affect the useage of the cache itself. MS
advise not to change it.

Best Wishes
 
Back
Top