still got performance problems with P4P800-E and Prescott

  • Thread starter Thread starter Johnny
  • Start date Start date
J

Johnny

This is a repost as the other didn't appear so if it pops up twice, sorry.

I posted a while ago the dismal performance I'm getting with this board and
a Prescott 3.0ghz cpu with 2 x 512K crucial 2-2-2-5 ddr400 memory. I've
noticed the passmark cpu tests give significant differences but not entirely
sure if that's not unusual - is it possible the cpu or motherboard is faulty
even though the system works albeit relatively slowly. This thing has me
totally flummoxed and perplexed. I've swapped out a power supply from
another machine with no change (don't know why but thought it might be a
power issue). I haven't got access to another 800FSB cpu to compare and not
sure I'll get any sense out of the tech support as it is actually working
which is frustrating in the extreme. If I select turbo mode the board dies -
it literally blacks out completely requiring a hard power off to get bios
back with the post message that overclocking failed??? I'm really getting
pissed off with this now - is it likely the cpu or mainboard are faulty or
just a combo of the two, who knows?
 
"Johnny" said:
This is a repost as the other didn't appear so if it pops up
twice, sorry.

I posted a while ago the dismal performance I'm getting with
this board and a Prescott 3.0ghz cpu with 2 x 512K crucial
2-2-2-5 ddr400 memory. I've noticed the passmark cpu tests give
significant differences but not entirely sure if that's not
unusual - is it possible the cpu or motherboard is faulty
even though the system works albeit relatively slowly. This
thing has me totally flummoxed and perplexed. I've swapped out
a power supply from another machine with no change (don't know
why but thought it might be a power issue). I haven't got access
to another 800FSB cpu to compare and not sure I'll get any sense
out of the tech support as it is actually working which is
frustrating in the extreme. If I select turbo mode the board dies -
it literally blacks out completely requiring a hard power off
to get bios back with the post message that overclocking failed???
I'm really getting pissed off with this now - is it likely the
cpu or mainboard are faulty or just a combo of the two, who knows?

Are you talking about Passmark giving different results from
repeating the Passmark test ? Or are you talking about comparing
Passmark from your current machine, to a previous slower machine,
or comparing to the Futuremark database ?

With respect to your turbo mode setting, turbo requires the use
of CAS2 memory, which you've got. So, it should have worked.
A "black out" is what happens with CAS2.5 or CAS3 memory, when
turbo is selected.

How many ways are there to make a slow processor:

1) Internal CPU cache has ECC protection. If the internal cache
has bad bits in it, a single bit in error can be corrected
by the ECC checker, but at the price of extra cycles to
attempt to correct the data.

I don't know whether Memtest86 can detect this kind of
fault or not.

2) Intel processors have thermal throttle. In the case of the
Prescott, the processor reduces the internal instruction rate
when the die temperature reaches 70C. If the CPU die is not
making good contact with the heat spreader on the top of the
chip, it might be possible for the die to be hot, yet the
heatsink won't be that hot. There is a thermal paste inside
the processor, between the top of the die and the heat
spreader, and if that paste was missing, your performance
could drop.

3) ACPI has an option to reduce the processor clock rate. But
I doubt that is doing anything in this case. ACPI might
use this option, when the processor is idle, to reduce the
processor operating temperature. During benchmarks, the OS
would turn this off again.

4) Many Northbridge chips have throttle capabilities for the
DIMMs. See section 5.5 (pg.140) of this document, for features
of the 875 Northbridge regarding protecting the DIMMs against
overheat. I doubt Asus bothered with thermal sensors next to
the DIMMs, but there is still the software method:

http://developer.intel.com/design/chipsets/datashts/25252502.pdf

"The number of hexwords transferred over the DRAM interface
are tracked per row. The tracking mechanism takes into account
that the DRAM devices consume different levels of power based
on cycle type (i.e., page hit/miss/empty). If the programmed
threshold is exceeded during a monitoring window, the activity
on the DRAM interface is reduced. This helps in lowering the
power and temperature."

5) You could be experiencing an "interrupt storm". There have been
motherboards in the past, where a particular PCI chip on the
motherboard keeps asserting its IRQ, causing the interrupt
handler to be invoked needlessly, and sucking performance from
the machine. Looking at performance counters might identify such
a problem. (There is one report in Google against the Promise
20378, so see if you can run with that chip disabled, then
run Passmark again.)

6) The PCI Latency Timer setting could influence performance.
A setting lower than 16, could make I/O slow, but the BIOS
on this machine doesn't allow such low settings. Lower settings
promote "fairness" between peripherals, so a sound card can
still get data while a disk drive is doing burst transfers.
A high setting might allow a better disk benchmark, at the
expense of general usability of the computer.

I haven't had too much luck using performance counters in Windows.
I've read that there are all sorts of fancy metrics in Windows, but
maybe you need a plugin/snapin to see them ? I still don't know
what the missing ingredient might be.

It may be easier to see some of these performance counters in
Linux.

The toughest part of your problem, will be finding baseline
numbers for exactly what your combo of hardware should be
doing. Does Futuremark collect enough data, to make sure
the BIOS settings that affect memory performance are the
same, when you compare to other hardware ? If Passmark is
not collecting info on whether PAT is enabled, for example,
that might make a difference to benchmarks.

I tried researching in two directions. I looked for benchmarks
that are a bit simpler than Passmark, and for the CPU, there
is the HINT benchmark. But all knowledge of it is gone from
the .gov site it was on, and even web.archive.org has no
copy of the site. I also tried to find info on performance
counters, and didn't have much luck there, either. Intel
has a $$$ program called Vtune, which is a profiler used by
software developers, but that isn't free.

I was hoping by using free tools, we could compare machines,
and see if you really are slower than other comparable machine,
and what part of the machine is slower. Some things you can
try:

1) memtest.org has version 1.4 of memtest86 available. It is
presumably the same as the other versions, when it comes to
measuring bandwidth. I get L1=8KB=22940MB/s, L2=512KB=19571MB/s,
and main memory is 2955MB/s. Memtest claims PAT is enabled
on my machine. I have a 2.8C Northwood, 2x512MB 2-2-2-6 RAM,
running at stock speed.
2) ftp://ftp.heise.de/pub/ct/ctsi/ctiaw.zip
This runs from a DOS window, and reports a few settings.
It is a way to verify that PAT is enabled. Mine says "fully
enabled".

http://abxzone.com/forums/showthread.php?t=49613&highlight=ctiaw

It also reports two values at the top of the screen, the
"sleep" speed and the "load" speed. On my 2.8C, the values
reported are both close to 2800MHz. It seems other processors
are using different frequencies for this, but I don't know
why. It could be the ACPI throttle feature, not sure.

As for performance counters, booting a copy of Knoppix or some
other Linux distro, might give access to more info than you can
get easily from Windows. If I do "vmstat 5" in a console
window, it says I get exactly 1000 interrupts per second.
(The number will be related to clock tick interrupts, as in
this scenario the system was idle, except for vmstat running.)
If I had a defective 20378, that number would undoubtedly climb.
I don't know how much work it is to get Windows to display
the same stat, whether it is total interrupts, or interrupts
per peripheral device.

HTH,
Paul
 
In XP, Perfmon is in Administrative tools in the control panel. By default
when you open it, it displays the following 3 metrics (aka counters):

% Processor Time
Avg.Disk Queue Length
Pages/sec

The first is the traditional CPU use graph. The Disk Queue length is not a
lot of use to people with one IDE or SATA disk drive that does not support
TCQ or NCQ and are not running multithreading server style systems - it
indicates how busy the disc subsystem is (IDE systems will rarely get much
of a queue as they don't have a supported queueing system...). The third
metric is a measure of the number of memory page references where the page
was not in memory. It indicates memory overloading / virtual memory use. On
a system with adequate memory this should be zero or close to most of the
time.

If you right click on any of the metrics shown at the bottom of Perfmon, you
can select properties where you can change the scale of the displayed
metrics, colour and line style and many other things. If you add a metric
that is off the scale immediately (EG disc read bytes / second) then you can
adjust the scale to fit the screen and / or you could adjust the extent of
the Y axis (EG make it 0 - 200 instead of the default 0 to 100).

The Yellow light bulb on the toolbar is handy - click and it will highlight
the graph line for the counter (metric) you have selected at the bottom.

To add a counter, click the + sign. There is a lot to learn here. A real
lot. To add "interrupts" click +, in the Performance Object drop list select
"Processor", then in the Select counters list, scroll to the bottom and
select Interrupts / sec and click Add.

Note that there is an Explain button which will show a brief and technical
explanantion of the metric. If you need more help on these metrics then I
suggest going to http://support.microsoft.com/ or http://msdn.microsoft.com/
and doing a search or try google.

On my system, if I add Interrupts / Second, the average reading comes up at
around 1300 per second (the system is quite idle). Obvisouly I will not see
a meaningful graph like this unless it is scaled appropriately - it is
scaled to 0.01 which results in a usable graph hovering around the "13"
mark.

Often the output displayed by a tool like Perfmon won't mean much to you -
unless you have an idea of what 'normal' is, so I suggest having a tinker
and getting to understand what some of the more usual counters are, what
normal is, and if you get stuck later you can always compare running
systems.

HTH
- Tim
 
In XP, Perfmon is in Administrative tools in the control panel. By default
when you open it, it displays the following 3 metrics (aka counters):

% Processor Time
Avg.Disk Queue Length
Pages/sec

The first is the traditional CPU use graph. The Disk Queue length is not a
lot of use to people with one IDE or SATA disk drive that does not support
TCQ or NCQ and are not running multithreading server style systems - it
indicates how busy the disc subsystem is (IDE systems will rarely get much
of a queue as they don't have a supported queueing system...). The third
metric is a measure of the number of memory page references where the page
was not in memory. It indicates memory overloading / virtual memory use. On
a system with adequate memory this should be zero or close to most of the
time.

If you right click on any of the metrics shown at the bottom of Perfmon, you
can select properties where you can change the scale of the displayed
metrics, colour and line style and many other things. If you add a metric
that is off the scale immediately (EG disc read bytes / second) then you can
adjust the scale to fit the screen and / or you could adjust the extent of
the Y axis (EG make it 0 - 200 instead of the default 0 to 100).

The Yellow light bulb on the toolbar is handy - click and it will highlight
the graph line for the counter (metric) you have selected at the bottom.

To add a counter, click the + sign. There is a lot to learn here. A real
lot. To add "interrupts" click +, in the Performance Object drop list select
"Processor", then in the Select counters list, scroll to the bottom and
select Interrupts / sec and click Add.

Note that there is an Explain button which will show a brief and technical
explanantion of the metric. If you need more help on these metrics then I
suggest going to http://support.microsoft.com/ or http://msdn.microsoft.com/
and doing a search or try google.

On my system, if I add Interrupts / Second, the average reading comes up at
around 1300 per second (the system is quite idle). Obvisouly I will not see
a meaningful graph like this unless it is scaled appropriately - it is
scaled to 0.01 which results in a usable graph hovering around the "13"
mark.

Often the output displayed by a tool like Perfmon won't mean much to you -
unless you have an idea of what 'normal' is, so I suggest having a tinker
and getting to understand what some of the more usual counters are, what
normal is, and if you get stuck later you can always compare running
systems.

HTH
- Tim

Well, I'm using Win2K, and the Perfmon had nothing in the Window
and there were no resources in the list at the bottom of the screen.

But, I discovered that by double clicking the empty area
(how intuitive...), I got a dialog to pop up with the "add counters"
in it. Gotta confess I'm not an icon guy, and that row of
icons at the top of the screen is just a blur for me. (I'm
a menu guy from way back, and hate the world of tiny icons.)
I never would have considered that big cross up there to be
a plus sign. Maybe if I click the light bulb in the row
of icons, I'll be rewarded with a clue ? (I wonder if Tognazzini,
the Apple interface guru, has anything to say on
the "world of tiny icons" :-)

I found this before making my discovery above. It has the raw
info used to make the "add counters" items.

http://download.microsoft.com/downl...howperf/1.00.0.1/NT5/EN-US/showperf_setup.exe

Thanks for bootstrapping me! I never would have wasted
another moment on this interface if you hadn't got
me to click on stuff :-)

Paul
 
Just recently assembled a new Celeron D computer
and I saw the same, turn on Turbo and it fails, even
when not overclocked. I haven't looked into it much
further yet, because the performance is pretty good
regardless. Eventually I overclocked it by some 40%,
2.4 Celeron D @ 3.4, with stock HSF and a minor increase
in Vcore. The CPU temperature hardly ever goes above 50C,
with QFan disabled - don't need it since my Antec PSU
is quiet enough. I also had to disable Spread
Spectrum modulation, not sure why it defaults to
enabled, not sure why this setting even exists.

My memory is Corsair Value Select, 2x512. It's
running in dual mode, and my memory benchmarks
are just fine for P4 class. All in all it's performing
comparable to a 2.9GHz Northwood, which is normal.
I'm not sure if I even need Turbo.

How do you tell if Turbo is on or off when it's
set to Auto in BIOS?
 
Paul said:
Are you talking about Passmark giving different results from
repeating the Passmark test ? Or are you talking about comparing
Passmark from your current machine, to a previous slower machine,
or comparing to the Futuremark database ?
I'm comparing the exact same machine with repeated runs and differing
results within a short time period.
With respect to your turbo mode setting, turbo requires the use
of CAS2 memory, which you've got. So, it should have worked.
A "black out" is what happens with CAS2.5 or CAS3 memory, when
turbo is selected.
The memory is 2-2-2-5 Corsair (not crucial).
How many ways are there to make a slow processor:

1) Internal CPU cache has ECC protection. If the internal cache
has bad bits in it, a single bit in error can be corrected
by the ECC checker, but at the price of extra cycles to
attempt to correct the data.

I don't know whether Memtest86 can detect this kind of
fault or not.
I have a sinking feeling I'm going to need a like for like swap out
comparison which I can't get yet, although I have a couple of PC's to make
so might well get the opportunity although I obviously want to avoid
replicating this problem in their machines so may well choose a different
manufacturers board.
2) Intel processors have thermal throttle. In the case of the
Prescott, the processor reduces the internal instruction rate
when the die temperature reaches 70C. If the CPU die is not
making good contact with the heat spreader on the top of the
chip, it might be possible for the die to be hot, yet the
heatsink won't be that hot. There is a thermal paste inside
the processor, between the top of the die and the heat
spreader, and if that paste was missing, your performance
could drop.
The CPU temperature hovers around 40C tonight it is 37-38. I'm always
careful to spread a thin but effective layer of thermal contact grease on
the cpu heat sink, I removed it to check the grease was where it needed to
be and doing its job, the coverage was complete (at least visually) and the
temperature monitors seem to support that. I should say I have built a good
number of PC's so i'm not a total novice. I always check carefully the cpu
is properly seated and gripped before fitting the heatsink etc. I don't
normally have any issues at all that aren't inherent faults in the hardware.
3) ACPI has an option to reduce the processor clock rate. But
I doubt that is doing anything in this case. ACPI might
use this option, when the processor is idle, to reduce the
processor operating temperature. During benchmarks, the OS
would turn this off again.
I don't think, at this point, temperature is an issue in this instance.
4) Many Northbridge chips have throttle capabilities for the
DIMMs. See section 5.5 (pg.140) of this document, for features
of the 875 Northbridge regarding protecting the DIMMs against
overheat. I doubt Asus bothered with thermal sensors next to
the DIMMs, but there is still the software method:

http://developer.intel.com/design/chipsets/datashts/25252502.pdf

"The number of hexwords transferred over the DRAM interface
are tracked per row. The tracking mechanism takes into account
that the DRAM devices consume different levels of power based
on cycle type (i.e., page hit/miss/empty). If the programmed
threshold is exceeded during a monitoring window, the activity
on the DRAM interface is reduced. This helps in lowering the
power and temperature."

5) You could be experiencing an "interrupt storm". There have been
motherboards in the past, where a particular PCI chip on the
motherboard keeps asserting its IRQ, causing the interrupt
handler to be invoked needlessly, and sucking performance from
the machine. Looking at performance counters might identify such
a problem. (There is one report in Google against the Promise
20378, so see if you can run with that chip disabled, then
run Passmark again.)
All integrated devices turned off for test purposes - promise controller,
firewire, audio, network. Left serials and parallel enabled.
6) The PCI Latency Timer setting could influence performance.
A setting lower than 16, could make I/O slow, but the BIOS
on this machine doesn't allow such low settings. Lower settings
promote "fairness" between peripherals, so a sound card can
still get data while a disk drive is doing burst transfers.
A high setting might allow a better disk benchmark, at the
expense of general usability of the computer.

I haven't had too much luck using performance counters in Windows.
I've read that there are all sorts of fancy metrics in Windows, but
maybe you need a plugin/snapin to see them ? I still don't know
what the missing ingredient might be.

It may be easier to see some of these performance counters in
Linux.

The toughest part of your problem, will be finding baseline
numbers for exactly what your combo of hardware should be
doing. Does Futuremark collect enough data, to make sure
the BIOS settings that affect memory performance are the
same, when you compare to other hardware ? If Passmark is
not collecting info on whether PAT is enabled, for example,
that might make a difference to benchmarks.

I tried researching in two directions. I looked for benchmarks
that are a bit simpler than Passmark, and for the CPU, there
is the HINT benchmark. But all knowledge of it is gone from
the .gov site it was on, and even web.archive.org has no
copy of the site. I also tried to find info on performance
counters, and didn't have much luck there, either. Intel
has a $$$ program called Vtune, which is a profiler used by
software developers, but that isn't free.

I was hoping by using free tools, we could compare machines,
and see if you really are slower than other comparable machine,
and what part of the machine is slower. Some things you can
try:

1) memtest.org has version 1.4 of memtest86 available. It is
presumably the same as the other versions, when it comes to
measuring bandwidth. I get L1=8KB=22940MB/s, L2=512KB=19571MB/s,
and main memory is 2955MB/s. Memtest claims PAT is enabled
on my machine. I have a 2.8C Northwood, 2x512MB 2-2-2-6 RAM,
running at stock speed.
2) ftp://ftp.heise.de/pub/ct/ctsi/ctiaw.zip
This runs from a DOS window, and reports a few settings.
It is a way to verify that PAT is enabled. Mine says "fully
enabled".

http://abxzone.com/forums/showthread.php?t=49613&highlight=ctiaw

It also reports two values at the top of the screen, the
"sleep" speed and the "load" speed. On my 2.8C, the values
reported are both close to 2800MHz. It seems other processors
are using different frequencies for this, but I don't know
why. It could be the ACPI throttle feature, not sure.

As for performance counters, booting a copy of Knoppix or some
other Linux distro, might give access to more info than you can
get easily from Windows. If I do "vmstat 5" in a console
window, it says I get exactly 1000 interrupts per second.
(The number will be related to clock tick interrupts, as in
this scenario the system was idle, except for vmstat running.)
If I had a defective 20378, that number would undoubtedly climb.
I don't know how much work it is to get Windows to display
the same stat, whether it is total interrupts, or interrupts
per peripheral device.

HTH,
Paul

Paul, first of all thanks for the help and taking the time to reply, it's
much appreciated. I'm using win2Kpro.

One example I noticed tonight is the CPU passmark suite of tests are all
roughly similar within fractions of a percent (i don't know how this
software rates in the scheme of things but it was readily available so i
used it) All except for the CPU integer math test which scored 170 mops then
a few seconds later scored 246.8 then a few seconds later scored 168 but it
doesn't repeat that pattern of higher then lower. All three scores well
below my other test machine running a 533fsb 2.8 ghz northwood on a P4PE
with ddr333 generic ram which scores 261 consistently. During this time
testing the prescott 3.0Ghz there is no change in monitored temperatures on
the P4P800-E. CPU = 37-38C Board = 29C.

this is what the ftp://ftp.heise.de/pub/ct/ctsi/ctiaw.zip software reports

**** INTEL/AMD/VIA memory config info, c't/Andreas Stiller V2.7 June 03
****
Kernel Driver: WinNT DIRECTNT.SYS V01.09
Pentium 4,(0F34-00)ca 3274 MHz (sleep) 2999 MHz (load)
Bus Speed: max=200MHz, ratio=15 => 200 MHz
Hostdevice: (2570) Springdale i865 MCH, Vendor: (8086) Intel, Rev:0002h
----------------------------------------------------------------
Intel Springdale i865 MCH Rev:02: Bus:0, Device-Nr:0, Function:0
System Frequency : FSB533/133 MHz
Memory Frequency : DDR266/133 MHz (1:1)
IOQ Depth : 12 deep
Top of usable Memory : 1024.0 MByte
Extended SMRAM (Tseg) : disabled
Overflowdevice : disabled and unlocked, ID= 2576h, Rev: 2
Memory Delays Base Address : FECF0000 not prefetchable
CPU Parking : disabled
Memory : row0: 512 MByte/16 KB Pages
: row1: 512 MByte/16 KB Pages
DRAM-Channels : Dual Channel Linear, DDR
ECC & Refresh : Non-ECC, Refresh=7.8 µs
PAT-mode : (1) fully enabled
Active to Precharge Delay : 5 clocks .. 70 µs
Tcl - Trcd -Trp : 2.5-2-2 T (DRAM Clocks)

Memory Read Bandwidth : ca. 5715.6 MBytes/s, Cacheline size= 64
so it looks like the system and memory frequencies are set or operating
incorrectly at 5333FSB/266Mhz although the boards bios is autodetecting and
displaying 800FSB/400Mhz, this is surely the software not reading the system
settings correctly. Also the CAS 2.5 looks suspicious unless corsair are a
bunch of bandits. Selecting 2.0 in bios blacks out the board and gives the
overclocking failed message after hard reset.

memtest 1.4 gives the following info

Pentium 4 (0.09) 2999Mhz
L1 cache 16K 20969MB/s
L2 cache 1024K 18396MB/s
Memory 1023M 2928MB/s
Chipset : i848/i865 ECC disabled FSB199Mhz PAT disabled
RAM 199Mhz (DDR 398) CAS 2.5-2-2-5 Dual channel (128bit)1

test #6 moving inversions, 32bit reports 3 counts of an error at 130.1 MB

whether that is significant in relation to speed issues I doubt, although
have to say I wouldn't have expected to see any errors on new RAM.
I'm going to keep messing but I've had enough now tonight. Will give it a go
tomorrow evening, bastard computers.

Thanks,
J
 
I don't normally top post, but don't want to try to trim the
rest of this down.

Some random observations:

1) Could this be a Hyperthreading problem ? Is Hyperthreading
disabled in the BIOS ? I don't know my Hyperthreading policy
versus OS, but perhaps if you were quitting Passmark between
runs, maybe the program is running on a different virtual
processor each time, and one virtual processor has more load
than the other. If you disable Hyperthreading in the BIOS,
the perf difference might stop.

In any case, Hyperthreading is not all it is cracked up to
be. In some cases, it is a clear win, but in other cases it
can trash the performance of the memory subsystem, and actually
run slower than without it.

2) Increase Vdimm to the Corsair. DDR400 memory needs 2.6V to
start with, and you may find bumping the memory voltage up
a couple notches stops the errors. If the memory passes memtest86
in an overnight test without errors, use Prime95 torture test
in mixed mode, and see if it runs error free as well. I've had
memory pass memtest86 and fail Prime95.

3) Look up your Corsair memory here:

http://corsairmicro.com/corsair/xms.html

Click the link and download the datasheet. For example, 3200XL
is rated for 2.75V and you could try that. The datasheet for
3200XL claims the SPD is loaded with 2-2-2-5, so it shouldn't
start at 2.5-2-2 on its own. If this is some other memory,
you may need to post in this forum, and get some help with
your product - or search for someone having the same system
as you've got:

http://www.houseofhelp.com/forums/forumdisplay.php?forumid=128

4) CTIAW and memtest86 disagree on your PAT setting. I don't know
what to make of that.

5) There is a possible reason for CTIAW mis-reporting the bus
speed. An 865PE Northbridge is not supposed to have PAT, but
Asus and others use a trick to enable it. The processor has
two signals called BSEL, and they indicate the bus speed rating
of the processor (400, 533, 800 etc). The BSEL signals are
normally routed from the processor to the Northbridge and to
the clockgen. What Asus did, is they disconnected that link.
Asus sends a fake value of BSEL to the Northbridge - I think
if the FSB is set to 533, PAT is enabled, so by sending the
533 bit pattern to the Northbridge, but setting the clockgen
to 800, PAT is enabled, and the memory can run at DDR400, just
like on an 875P Northbridge. I think what CTIAW could be doing,
is reading the Northbridge register, instead of checking the
clockgen. This trick is great for fooling the hardware, but
software authors have to be aware of the trick too, to get
the info right.

6) I dug up some benchmarks you can try. Maybe these will be
reproducible from run to run.

http://www.super-computing.org/
ftp://pi.super-computing.org/windows/super_pi.zip

Super_pi computes PI, and you select the number of digits from
the menu. You double click the .exe, to run a Windows dialog.
Select the number of digits to calculate and then run it.
I just ran 1 million digits, and it takes 48 seconds
on my 2.8C with 2x512MB 2-2-2-6 memory. I did two test runs and
they had exactly the same test time. A file is created in the
install directory with the results of the calculation.
The test time and the amount of memory used increase
with the digits setting. Some people use the 32M setting
as a stability test for new motherboards.

Here is a second test:

This is some kind of finite element analysis. It was
posted by the author a while back. It uses a good chunk
of memory, and judging by the CPU heating, is not memory
bound, but does a fair amount of computing. To use it,
unzip the file, fire up a MSDOS window, cd to the unzipped
directory, then type "now" into the MSDOS window, to execute
now.bat . After it reaches "step 992", it will finish, and
print the number of "MUPs", which are millions of operations
per second. My computer takes 202 or 203 seconds to run the
benchmark, and achieves a rating of 12.27 MUPs (the number
is printed in scientific notation, so shift the decimal
point as appropriate).

http://users.viawest.net/~hwstock/bench/3d0/3d0.zip

Instructions and some background info are here:
http://www.abxzone.com/forums/showthread.php?t=70142

Those two tests are reproducible for me. Give them a
try, with and without Hyperthreading turned on in the
BIOS.

Note: The 3d0 program is a bit unhygenic, and leaves
a bunch of files in its directory. You may want to
dump all but the original files, when the directory
fills up.

HTH,
Paul
 
Paul said:
I don't normally top post, but don't want to try to trim the
rest of this down.

Some random observations:

1) Could this be a Hyperthreading problem ? Is Hyperthreading
disabled in the BIOS ? I don't know my Hyperthreading policy
versus OS, but perhaps if you were quitting Passmark between
runs, maybe the program is running on a different virtual
processor each time, and one virtual processor has more load
than the other. If you disable Hyperthreading in the BIOS,
the perf difference might stop.

In any case, Hyperthreading is not all it is cracked up to
be. In some cases, it is a clear win, but in other cases it
can trash the performance of the memory subsystem, and actually
run slower than without it.
WOW!!! Before altering any voltages or settings, just running the standard
[auto] jumperless detection settings and simply setting CPU hyperthreading
[disabled] option, the results are now, well, somewhat different!!
How thorough or accurate passmark is I know not but for purposes of
comparison it's useful. It's difficult to present the results in here but
the scores for example of the CPU suite of tests are as follows in my
attempt at a table (hope it comes out ok).

cpu test hyperthreading [enabled] hyperthreading
[disabled]

integer math 170/246 varies 257 solid
floating p math 230 291
mmx 181 278
sse 131 164
compression 1319 1868
encryption 6.8 10.9
image rotation 113 195.9
string sorting 665 810

CPU passmark 322 467
integer math

I havent managed to get anything other than very close to the numbers above
with hyperthreading [disabled], it is solid. [disabled] hyperthreading has
also affected the memory test benchmark speeds, presumably due to the
increased CPU performance.

all this before altering any voltages or any other settings, blimey!
2) Increase Vdimm to the Corsair. DDR400 memory needs 2.6V to
start with, and you may find bumping the memory voltage up
a couple notches stops the errors. If the memory passes memtest86
in an overnight test without errors, use Prime95 torture test
in mixed mode, and see if it runs error free as well. I've had
memory pass memtest86 and fail Prime95.

3) Look up your Corsair memory here:

http://corsairmicro.com/corsair/xms.html

Click the link and download the datasheet. For example, 3200XL
is rated for 2.75V and you could try that. The datasheet for
3200XL claims the SPD is loaded with 2-2-2-5, so it shouldn't
start at 2.5-2-2 on its own. If this is some other memory,
you may need to post in this forum, and get some help with
your product - or search for someone having the same system
as you've got:
The product is CMX512-3200XLPT listed on their site under CMX512-3200XL and
it clearly states 2.75V. Changing the voltage to 2.75V has stopped the
blackouts.

For interest here are the passmark memory results before (but with
hyperthreading disabled) and after voltage change. The - configure DRAM
timing by speed option is [enabled] in bios

test [auto] 2.75V[auto]
[manual] 2.75V / 2.0-2-2-5

allocate small block 1162.8 1163 1164.8
read cached 1390 1389.7
1389.9
read uncached 1326.6 1328.3 1328.8
write 809.4 809.7
809.4

altering the dram burst timing between 4 and 8 clocks appeared to make no
difference in these tests. having memory acceleration enabled gave the
following 1165.4,1389.3, 1340.2, 810 so only read uncached improved
slightly but consistently.

**** INTEL/AMD/VIA memory config info, c't/Andreas Stiller V2.7 June 03
****
Kernel Driver: WinNT DIRECTNT.SYS V01.09
Pentium 4,(0F34-00)ca 3274 MHz (sleep) 2999 MHz (load)
Bus Speed: max=200MHz, ratio=15 => 200 MHz
Hostdevice: (2570) Springdale i865 MCH, Vendor: (8086) Intel, Rev:0002h
----------------------------------------------------------------
Intel Springdale i865 MCH Rev:02: Bus:0, Device-Nr:0, Function:0
System Frequency : FSB533/133 MHz
Memory Frequency : DDR266/133 MHz (1:1)
IOQ Depth : 12 deep
Top of usable Memory : 1024.0 MByte
Extended SMRAM (Tseg) : disabled
Overflowdevice : disabled and unlocked, ID= 2576h, Rev: 2
Memory Delays Base Address : FECF0000 not prefetchable
CPU Parking : disabled
Memory : row0: 512 MByte/16 KB Pages
: row1: 512 MByte/16 KB Pages
DRAM-Channels : Dual Channel Linear, DDR
ECC & Refresh : Non-ECC, Refresh=7.8 µs
PAT-mode : (1) fully enabled
Active to Precharge Delay : 5 clocks .. 70 µs
Tcl - Trcd -Trp : 2-2-2 T (DRAM Clocks)

Memory Read Bandwidth : ca. 5780.5 MBytes/s, Cacheline size= 64
http://www.houseofhelp.com/forums/forumdisplay.php?forumid=128

4) CTIAW and memtest86 disagree on your PAT setting. I don't know
what to make of that.

5) There is a possible reason for CTIAW mis-reporting the bus
speed. An 865PE Northbridge is not supposed to have PAT, but
Asus and others use a trick to enable it. The processor has
two signals called BSEL, and they indicate the bus speed rating
of the processor (400, 533, 800 etc). The BSEL signals are
normally routed from the processor to the Northbridge and to
the clockgen. What Asus did, is they disconnected that link.
Asus sends a fake value of BSEL to the Northbridge - I think
if the FSB is set to 533, PAT is enabled, so by sending the
533 bit pattern to the Northbridge, but setting the clockgen
to 800, PAT is enabled, and the memory can run at DDR400, just
like on an 875P Northbridge. I think what CTIAW could be doing,
is reading the Northbridge register, instead of checking the
clockgen. This trick is great for fooling the hardware, but
software authors have to be aware of the trick too, to get
the info right.

6) I dug up some benchmarks you can try. Maybe these will be
reproducible from run to run.

http://www.super-computing.org/
ftp://pi.super-computing.org/windows/super_pi.zip

Super_pi computes PI, and you select the number of digits from
the menu. You double click the .exe, to run a Windows dialog.
Select the number of digits to calculate and then run it.
I just ran 1 million digits, and it takes 48 seconds
on my 2.8C with 2x512MB 2-2-2-6 memory. I did two test runs and
they had exactly the same test time. A file is created in the
install directory with the results of the calculation.
The test time and the amount of memory used increase
with the digits setting. Some people use the 32M setting
as a stability test for new motherboards.
44 seconds with hyper threading [disabled]
53 seconds with hyper threading [enabled]

as you say this test is consistent
Here is a second test:

This is some kind of finite element analysis. It was
posted by the author a while back. It uses a good chunk
of memory, and judging by the CPU heating, is not memory
bound, but does a fair amount of computing. To use it,
unzip the file, fire up a MSDOS window, cd to the unzipped
directory, then type "now" into the MSDOS window, to execute
now.bat . After it reaches "step 992", it will finish, and
print the number of "MUPs", which are millions of operations
per second. My computer takes 202 or 203 seconds to run the
benchmark, and achieves a rating of 12.27 MUPs (the number
is printed in scientific notation, so shift the decimal
point as appropriate).
with hyperthreading [enabled]

242 - 244seconds 10.16 - 10.24 MUPs +/- 0.04% (i assume)

with hyperthreading [disabled]

203seconds 12.21 MUPs +/- 0.06% consistently.
http://users.viawest.net/~hwstock/bench/3d0/3d0.zip

Instructions and some background info are here:
http://www.abxzone.com/forums/showthread.php?t=70142

Those two tests are reproducible for me. Give them a
try, with and without Hyperthreading turned on in the
BIOS.

Note: The 3d0 program is a bit unhygenic, and leaves
a bunch of files in its directory. You may want to
dump all but the original files, when the directory
fills up.
Be interested to hear what you make of that lot. Obviously hyperthreading is
doing the bulk of the damage but the memory scores seem a little low also.
I'll run the memtest and mess with some other BIOS settings later but I have
to go make some money now.

many thanks,
J
 
"Johnny" said:
Paul said:
I don't normally top post, but don't want to try to trim the
rest of this down.

Some random observations:

1) Could this be a Hyperthreading problem ? Is Hyperthreading
disabled in the BIOS ? I don't know my Hyperthreading policy
versus OS, but perhaps if you were quitting Passmark between
runs, maybe the program is running on a different virtual
processor each time, and one virtual processor has more load
than the other. If you disable Hyperthreading in the BIOS,
the perf difference might stop.

In any case, Hyperthreading is not all it is cracked up to
be. In some cases, it is a clear win, but in other cases it
can trash the performance of the memory subsystem, and actually
run slower than without it.
WOW!!! Before altering any voltages or settings, just running the standard
[auto] jumperless detection settings and simply setting CPU hyperthreading
[disabled] option, the results are now, well, somewhat different!!
How thorough or accurate passmark is I know not but for purposes of
comparison it's useful. It's difficult to present the results in here but
the scores for example of the CPU suite of tests are as follows in my
attempt at a table (hope it comes out ok).

cpu test hyperthreading [enabled] hyperthreading[disabled]

integer math 170/246 varies 257 solid
floating p math 230 291
mmx 181 278
sse 131 164
compression 1319 1868
encryption 6.8 10.9
image rotation 113 195.9
string sorting 665 810

CPU passmark 322 467
integer math

I havent managed to get anything other than very close to the numbers
above with hyperthreading [disabled], it is solid. [disabled]
hyperthreading has also affected the memory test benchmark speeds,
presumably due to the increased CPU performance.

all this before altering any voltages or any other settings, blimey!

Does the memtest86 memory bandwidth indicator change as a function
of the BIOS Hyperthreading setting ? It shouldn't. In any case, one
thing that strikes me, is how negative an effect hyperthreading is
having on your results.
2) Increase Vdimm to the Corsair. DDR400 memory needs 2.6V to
start with, and you may find bumping the memory voltage up
a couple notches stops the errors. If the memory passes memtest86
in an overnight test without errors, use Prime95 torture test
in mixed mode, and see if it runs error free as well. I've had
memory pass memtest86 and fail Prime95.

3) Look up your Corsair memory here:

http://corsairmicro.com/corsair/xms.html

Click the link and download the datasheet. For example, 3200XL
is rated for 2.75V and you could try that. The datasheet for
3200XL claims the SPD is loaded with 2-2-2-5, so it shouldn't
start at 2.5-2-2 on its own. If this is some other memory,
you may need to post in this forum, and get some help with
your product - or search for someone having the same system
as you've got:
The product is CMX512-3200XLPT listed on their site under CMX512-3200XL and
it clearly states 2.75V. Changing the voltage to 2.75V has stopped the
blackouts.

For interest here are the passmark memory results before (but with
hyperthreading disabled) and after voltage change. The - configure DRAM
timing by speed option is [enabled] in bios

test [auto] 2.75V[auto] 2.75V / 2.0-2-2-5

allocate small block 1162.8 1163 1164.8
read cached 1390 1389.7 1389.9
read uncached 1326.6 1328.3 1328.8
write 809.4 809.7 809.4

As the auto and manual setting seem to be doing the same thing, I think
you can conclude that the SPD on the 3200XL is 2-2-2. You can play
with the 5 number manually, as by calculation, the 5 number is supposed
to be the sum of two of the other parameters plus 2 (four beats of
DDR data taking 2 cycles). On an AMD system, raising that number to
10 is best, while on the P4, a lower value is better, but play with it
a bit, and see what happens.

In terms of memory bandwidth, your CTIAW and memtest86 bandwidth
indicators are in the same ballpark as mine, so I don't think you
are far off from optimal. Certainly, overclocking the memory will
be the single biggest determinant of memory bandwidth, and the
nice thing about the 3200XL, is you can play with it a bit. I think
it can be pushed up to DDR500, at the expense of relaxing the timing
numbers a bit. My Ballistix doesn't like that quite as much.

These two documents describe some of the things you can do to
optimize memory bandwidth. But with the Asus hack to enable PAT,
the rules might be more like an 875 than an 865. The chips, after
all, are the same die, but with different signals pinned out.

ftp://download.intel.com/design/chipsets/applnots/25273001.pdf (875P)
ftp://download.intel.com/design/chipsets/applnots/25303601.pdf (865PE)
altering the dram burst timing between 4 and 8 clocks appeared to make no
difference in these tests. having memory acceleration enabled gave the
following 1165.4,1389.3, 1340.2, 810 so only read uncached improved
slightly but consistently.

When the cache is enabled for a certain area of memory, the memory
controller likes to fetch cache-line-sized chunks. That might be why
normally, the 4 versus 8 setting doesn't make a difference. Perhaps
the memory used by PCI cards for I/O is uncached ? I've left mine
set at 4. (I think the cache line size is 64 bytes, and with dual
channel memory, 16 bytes are transferred per beat, so the 4 setting
would be right for it. If you were in single channel mode, perhaps
8 would be the right setting, times 8 bytes per beat.)
**** INTEL/AMD/VIA memory config info, c't/Andreas Stiller V2.7 June 03
****
Kernel Driver: WinNT DIRECTNT.SYS V01.09
Pentium 4,(0F34-00)ca 3274 MHz (sleep) 2999 MHz (load)
Bus Speed: max=200MHz, ratio=15 => 200 MHz
Hostdevice: (2570) Springdale i865 MCH, Vendor: (8086) Intel, Rev:0002h
----------------------------------------------------------------
Intel Springdale i865 MCH Rev:02: Bus:0, Device-Nr:0, Function:0
System Frequency : FSB533/133 MHz
Memory Frequency : DDR266/133 MHz (1:1)
IOQ Depth : 12 deep
Top of usable Memory : 1024.0 MByte
Extended SMRAM (Tseg) : disabled
Overflowdevice : disabled and unlocked, ID= 2576h, Rev: 2
Memory Delays Base Address : FECF0000 not prefetchable
CPU Parking : disabled
Memory : row0: 512 MByte/16 KB Pages
: row1: 512 MByte/16 KB Pages
DRAM-Channels : Dual Channel Linear, DDR
ECC & Refresh : Non-ECC, Refresh=7.8 µs
PAT-mode : (1) fully enabled
Active to Precharge Delay : 5 clocks .. 70 µs
Tcl - Trcd -Trp : 2-2-2 T (DRAM Clocks)

Memory Read Bandwidth : ca. 5780.5 MBytes/s, Cacheline size= 64
http://www.houseofhelp.com/forums/forumdisplay.php?forumid=128

4) CTIAW and memtest86 disagree on your PAT setting. I don't know
what to make of that.

5) There is a possible reason for CTIAW mis-reporting the bus
speed. An 865PE Northbridge is not supposed to have PAT, but
Asus and others use a trick to enable it. The processor has
two signals called BSEL, and they indicate the bus speed rating
of the processor (400, 533, 800 etc). The BSEL signals are
normally routed from the processor to the Northbridge and to
the clockgen. What Asus did, is they disconnected that link.
Asus sends a fake value of BSEL to the Northbridge - I think
if the FSB is set to 533, PAT is enabled, so by sending the
533 bit pattern to the Northbridge, but setting the clockgen
to 800, PAT is enabled, and the memory can run at DDR400, just
like on an 875P Northbridge. I think what CTIAW could be doing,
is reading the Northbridge register, instead of checking the
clockgen. This trick is great for fooling the hardware, but
software authors have to be aware of the trick too, to get
the info right.

6) I dug up some benchmarks you can try. Maybe these will be
reproducible from run to run.

http://www.super-computing.org/
ftp://pi.super-computing.org/windows/super_pi.zip

Super_pi computes PI, and you select the number of digits from
the menu. You double click the .exe, to run a Windows dialog.
Select the number of digits to calculate and then run it.
I just ran 1 million digits, and it takes 48 seconds
on my 2.8C with 2x512MB 2-2-2-6 memory. I did two test runs and
they had exactly the same test time. A file is created in the
install directory with the results of the calculation.
The test time and the amount of memory used increase
with the digits setting. Some people use the 32M setting
as a stability test for new motherboards.
44 seconds with hyper threading [disabled]
53 seconds with hyper threading [enabled]

as you say this test is consistent

I just don't understand why your results are being hammered
so bad by Hyperthreading. The OS cannot be taking up that much
memory bandwidth in the background. And, since your processor
has a 1MB cache, it shouldn't be measurably thrashing the cache
either. I wonder if Windows is actually using the whole
cache ? I remember reading a while back, about a situation where
Windows needed to be manually adjusted to use the whole cache
(back in the P3 era). Something still isn't right here.
Here is a second test:

This is some kind of finite element analysis. It was
posted by the author a while back. It uses a good chunk
of memory, and judging by the CPU heating, is not memory
bound, but does a fair amount of computing. To use it,
unzip the file, fire up a MSDOS window, cd to the unzipped
directory, then type "now" into the MSDOS window, to execute
now.bat . After it reaches "step 992", it will finish, and
print the number of "MUPs", which are millions of operations
per second. My computer takes 202 or 203 seconds to run the
benchmark, and achieves a rating of 12.27 MUPs (the number
is printed in scientific notation, so shift the decimal
point as appropriate).
with hyperthreading [enabled]

242 - 244seconds 10.16 - 10.24 MUPs +/- 0.04% (i assume)

with hyperthreading [disabled]

203seconds 12.21 MUPs +/- 0.06% consistently.

The Hyperthreading penalty seems to be the same here, as
Super_PI. It seems strange that they would be the same, as
these programs won't have the same memory access pattern.
Be interested to hear what you make of that lot. Obviously hyperthreading is
doing the bulk of the damage but the memory scores seem a little low also.
I'll run the memtest and mess with some other BIOS settings later but I have
to go make some money now.

many thanks,
J

<<snip>>

All I can say, is Hyperthreading is doing way more damage than
it should be. Try memtest86 again, with Hyperthreading enabled
and then with it disabled. There should be no change in the
bandwidth readout. If there is, there is some other serious
problem there.

In my registry, I see an entry called SecondLevelDataCache, but
it is set to zero. Implying it is detected automatically, as if
L2 were disabled, you would see the performance plummet.

HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\CONTROL\SESSION MANAGER\MEMORY
MANAGEMENT

According to this, changing it shouldn't help:
http://www.winguides.com/registry/display.php/116/

You might try downloading Sandra Lite 2005 and run the
"Cache and Memory" benchmark. The 2002 version I've got
has that benchmark, and the "bumps" in the curve tell
you where the cache breakpoints are. A Prescott, with
its 1MB cache, should have a breakpoint at the 1MB mark
if the cache is working.

http://www.sisoftware.co.uk/index.html?dir=dload&location=sware_dl_all&langx=en&a=

I think if I try to install it, it will remove the older software,
so I cannot do this right now. I hope the Lite version still has
that benchmark...

HTH,
Paul
 
On Sun, 28 Nov 2004 18:52:47 -0000, "Johnny" <[email protected]>
wrote:

what video are you running and drivers as well ? You would be
surprised how one piece of hardware can ruin your day, even if it's
good...Swap vendors, say ati for nvidia or vice versa...try an agp
card that is 4x not 8x...

or try this: get your mits on a tried and true quality pci video card
(ie: ati 7xxx with 32mb ddr). Remove the agp card and install the pci
card and see how your system performs.

It sounds like you have some pretty good hardware, just finding the
right combination will make all the difference...

or else your board is foobar...
 
Bill said:
what video are you running and drivers as well ? You would be
surprised how one piece of hardware can ruin your day, even if it's
good...Swap vendors, say ati for nvidia or vice versa...try an agp
card that is 4x not 8x...

or try this: get your mits on a tried and true quality pci video card
(ie: ati 7xxx with 32mb ddr). Remove the agp card and install the pci
card and see how your system performs.

It sounds like you have some pretty good hardware, just finding the
right combination will make all the difference...

or else your board is foobar...
My thoughts about the board, cpu and memory are similar at this point. The
gfx card is a 256MB ATI radeon 9600XT at 8x on AGP.

I'll give it a swap out for a PCI card. I also noticed the CPU temperature
jumps to 48-50C when hyperthreading is disabled. Once I enable
hyperthreading and watch asusprobe I see it drop back to 38-40C though
obviously this means the performance is crap as well. I'm at the stage now
of contacting the vendor and telling them I want replacements or money back.

The memtest86 results are showing several errors in the RAM as well, I left
it running all night and see the same addresses cropping up on each pass
(although on different tests bizarrely). It's a long, long time since i've
had problems like this.
 
Paul said:
"Johnny" said:
Paul said:
I don't normally top post, but don't want to try to trim the
rest of this down.

Some random observations:

1) Could this be a Hyperthreading problem ? Is Hyperthreading
disabled in the BIOS ? I don't know my Hyperthreading policy
versus OS, but perhaps if you were quitting Passmark between
runs, maybe the program is running on a different virtual
processor each time, and one virtual processor has more load
than the other. If you disable Hyperthreading in the BIOS,
the perf difference might stop.

In any case, Hyperthreading is not all it is cracked up to
be. In some cases, it is a clear win, but in other cases it
can trash the performance of the memory subsystem, and actually
run slower than without it.
WOW!!! Before altering any voltages or settings, just running the
standard [auto] jumperless detection settings and simply setting CPU
hyperthreading [disabled] option, the results are now, well,
somewhat different!!
How thorough or accurate passmark is I know not but for purposes of
comparison it's useful. It's difficult to present the results in
here but the scores for example of the CPU suite of tests are as
follows in my attempt at a table (hope it comes out ok).

cpu test hyperthreading [enabled] hyperthreading[disabled]

integer math 170/246 varies 257 solid
floating p math 230 291
mmx 181 278
sse 131 164
compression 1319 1868
encryption 6.8 10.9
image rotation 113 195.9
string sorting 665 810

CPU passmark 322 467
integer math

I havent managed to get anything other than very close to the numbers
above with hyperthreading [disabled], it is solid. [disabled]
hyperthreading has also affected the memory test benchmark speeds,
presumably due to the increased CPU performance.

all this before altering any voltages or any other settings, blimey!

Does the memtest86 memory bandwidth indicator change as a function
of the BIOS Hyperthreading setting ? It shouldn't. In any case, one
thing that strikes me, is how negative an effect hyperthreading is
having on your results.
Yeah me too - it's got me flumoxed this thing. The bandwidth indicators in
memtest are exactly the same with hyperthreading enabled and disabled. The
CPU temperature increases by 10C to 48-50C when hyperthreading is disabled
then drops back to 38-40C when I enable it.

If I select Turbo Mode in the BIOS settings the system still bombs out
despite the memory tweaks.
2) Increase Vdimm to the Corsair. DDR400 memory needs 2.6V to
start with, and you may find bumping the memory voltage up
a couple notches stops the errors. If the memory passes memtest86
in an overnight test without errors, use Prime95 torture test
in mixed mode, and see if it runs error free as well. I've had
memory pass memtest86 and fail Prime95.

3) Look up your Corsair memory here:

http://corsairmicro.com/corsair/xms.html

Click the link and download the datasheet. For example, 3200XL
is rated for 2.75V and you could try that. The datasheet for
3200XL claims the SPD is loaded with 2-2-2-5, so it shouldn't
start at 2.5-2-2 on its own. If this is some other memory,
you may need to post in this forum, and get some help with
your product - or search for someone having the same system
as you've got:
The product is CMX512-3200XLPT listed on their site under
CMX512-3200XL and it clearly states 2.75V. Changing the voltage to
2.75V has stopped the blackouts.

For interest here are the passmark memory results before (but with
hyperthreading disabled) and after voltage change. The - configure
DRAM timing by speed option is [enabled] in bios

test [auto] 2.75V[auto] 2.75V / 2.0-2-2-5

allocate small block 1162.8 1163 1164.8
read cached 1390 1389.7 1389.9
read uncached 1326.6 1328.3 1328.8
write 809.4 809.7 809.4

As the auto and manual setting seem to be doing the same thing, I
think you can conclude that the SPD on the 3200XL is 2-2-2. You can
play
with the 5 number manually, as by calculation, the 5 number is
supposed to be the sum of two of the other parameters plus 2 (four
beats of
DDR data taking 2 cycles). On an AMD system, raising that number to
10 is best, while on the P4, a lower value is better, but play with it
a bit, and see what happens.

In terms of memory bandwidth, your CTIAW and memtest86 bandwidth
indicators are in the same ballpark as mine, so I don't think you
are far off from optimal. Certainly, overclocking the memory will
be the single biggest determinant of memory bandwidth, and the
nice thing about the 3200XL, is you can play with it a bit. I think
it can be pushed up to DDR500, at the expense of relaxing the timing
numbers a bit. My Ballistix doesn't like that quite as much.

These two documents describe some of the things you can do to
optimize memory bandwidth. But with the Asus hack to enable PAT,
the rules might be more like an 875 than an 865. The chips, after
all, are the same die, but with different signals pinned out.

ftp://download.intel.com/design/chipsets/applnots/25273001.pdf (875P)
ftp://download.intel.com/design/chipsets/applnots/25303601.pdf (865PE)
altering the dram burst timing between 4 and 8 clocks appeared to
make no difference in these tests. having memory acceleration
enabled gave the following 1165.4,1389.3, 1340.2, 810 so only read
uncached improved slightly but consistently.

When the cache is enabled for a certain area of memory, the memory
controller likes to fetch cache-line-sized chunks. That might be why
normally, the 4 versus 8 setting doesn't make a difference. Perhaps
the memory used by PCI cards for I/O is uncached ? I've left mine
set at 4. (I think the cache line size is 64 bytes, and with dual
channel memory, 16 bytes are transferred per beat, so the 4 setting
would be right for it. If you were in single channel mode, perhaps
8 would be the right setting, times 8 bytes per beat.)
**** INTEL/AMD/VIA memory config info, c't/Andreas Stiller V2.7
June 03
****
Kernel Driver: WinNT DIRECTNT.SYS V01.09
Pentium 4,(0F34-00)ca 3274 MHz (sleep) 2999 MHz (load)
Bus Speed: max=200MHz, ratio=15 => 200 MHz
Hostdevice: (2570) Springdale i865 MCH, Vendor: (8086) Intel,
Rev:0002h
----------------------------------------------------------------
Intel Springdale i865 MCH Rev:02: Bus:0, Device-Nr:0, Function:0
System Frequency : FSB533/133 MHz
Memory Frequency : DDR266/133 MHz (1:1)
IOQ Depth : 12 deep
Top of usable Memory : 1024.0 MByte
Extended SMRAM (Tseg) : disabled
Overflowdevice : disabled and unlocked, ID= 2576h,
Rev: 2 Memory Delays Base Address : FECF0000 not prefetchable
CPU Parking : disabled
Memory : row0: 512 MByte/16 KB Pages
: row1: 512 MByte/16 KB Pages
DRAM-Channels : Dual Channel Linear, DDR
ECC & Refresh : Non-ECC, Refresh=7.8 µs
PAT-mode : (1) fully enabled
Active to Precharge Delay : 5 clocks .. 70 µs
Tcl - Trcd -Trp : 2-2-2 T (DRAM Clocks)

Memory Read Bandwidth : ca. 5780.5 MBytes/s, Cacheline size=
64 >> go on with CR

http://www.houseofhelp.com/forums/forumdisplay.php?forumid=128

4) CTIAW and memtest86 disagree on your PAT setting. I don't know
what to make of that.

5) There is a possible reason for CTIAW mis-reporting the bus
speed. An 865PE Northbridge is not supposed to have PAT, but
Asus and others use a trick to enable it. The processor has
two signals called BSEL, and they indicate the bus speed rating
of the processor (400, 533, 800 etc). The BSEL signals are
normally routed from the processor to the Northbridge and to
the clockgen. What Asus did, is they disconnected that link.
Asus sends a fake value of BSEL to the Northbridge - I think
if the FSB is set to 533, PAT is enabled, so by sending the
533 bit pattern to the Northbridge, but setting the clockgen
to 800, PAT is enabled, and the memory can run at DDR400, just
like on an 875P Northbridge. I think what CTIAW could be doing,
is reading the Northbridge register, instead of checking the
clockgen. This trick is great for fooling the hardware, but
software authors have to be aware of the trick too, to get
the info right.

6) I dug up some benchmarks you can try. Maybe these will be
reproducible from run to run.

http://www.super-computing.org/
ftp://pi.super-computing.org/windows/super_pi.zip

Super_pi computes PI, and you select the number of digits from
the menu. You double click the .exe, to run a Windows dialog.
Select the number of digits to calculate and then run it.
I just ran 1 million digits, and it takes 48 seconds
on my 2.8C with 2x512MB 2-2-2-6 memory. I did two test runs and
they had exactly the same test time. A file is created in the
install directory with the results of the calculation.
The test time and the amount of memory used increase
with the digits setting. Some people use the 32M setting
as a stability test for new motherboards.
44 seconds with hyper threading [disabled]
53 seconds with hyper threading [enabled]

as you say this test is consistent

I just don't understand why your results are being hammered
so bad by Hyperthreading. The OS cannot be taking up that much
memory bandwidth in the background. And, since your processor
has a 1MB cache, it shouldn't be measurably thrashing the cache
either. I wonder if Windows is actually using the whole
cache ? I remember reading a while back, about a situation where
Windows needed to be manually adjusted to use the whole cache
(back in the P3 era). Something still isn't right here.
Here is a second test:

This is some kind of finite element analysis. It was
posted by the author a while back. It uses a good chunk
of memory, and judging by the CPU heating, is not memory
bound, but does a fair amount of computing. To use it,
unzip the file, fire up a MSDOS window, cd to the unzipped
directory, then type "now" into the MSDOS window, to execute
now.bat . After it reaches "step 992", it will finish, and
print the number of "MUPs", which are millions of operations
per second. My computer takes 202 or 203 seconds to run the
benchmark, and achieves a rating of 12.27 MUPs (the number
is printed in scientific notation, so shift the decimal
point as appropriate).
with hyperthreading [enabled]

242 - 244seconds 10.16 - 10.24 MUPs +/- 0.04% (i assume)

with hyperthreading [disabled]

203seconds 12.21 MUPs +/- 0.06% consistently.

The Hyperthreading penalty seems to be the same here, as
Super_PI. It seems strange that they would be the same, as
these programs won't have the same memory access pattern.
Be interested to hear what you make of that lot. Obviously
hyperthreading is doing the bulk of the damage but the memory scores
seem a little low also. I'll run the memtest and mess with some
other BIOS settings later but I have to go make some money now.

many thanks,
J

<<snip>>

All I can say, is Hyperthreading is doing way more damage than
it should be. Try memtest86 again, with Hyperthreading enabled
and then with it disabled. There should be no change in the
bandwidth readout. If there is, there is some other serious
problem there.

In my registry, I see an entry called SecondLevelDataCache, but
it is set to zero. Implying it is detected automatically, as if
L2 were disabled, you would see the performance plummet.

HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\CONTROL\SESSION
MANAGER\MEMORY MANAGEMENT

According to this, changing it shouldn't help:
http://www.winguides.com/registry/display.php/116/

You might try downloading Sandra Lite 2005 and run the
"Cache and Memory" benchmark. The 2002 version I've got
has that benchmark, and the "bumps" in the curve tell
you where the cache breakpoints are. A Prescott, with
its 1MB cache, should have a breakpoint at the 1MB mark
if the cache is working.

http://www.sisoftware.co.uk/index.html?dir=dload&location=sware_dl_all&langx=en&a=

I think if I try to install it, it will remove the older software,
so I cannot do this right now. I hope the Lite version still has
that benchmark...

HTH,
Paul
 
I also noticed the CPU temperature jumps to 48-50C when hyperthreading
is disabled. Once I enable hyperthreading and watch asusprobe
I see it drop back to 38-40C though

I have the same effect with Motherboard Monitor, but not in BIOS.
Motherboard Monitor and other programs is not compatible with
hyperthreading.
 
Ken said:
I have the same effect with Motherboard Monitor, but not in BIOS.
Motherboard Monitor and other programs is not compatible with
hyperthreading.
Hmmm - my experiences with this setup are that it's hyperthreading itself
where all the incompatibilities lie.
 
My thoughts about the board, cpu and memory are similar at this point. The
gfx card is a 256MB ATI radeon 9600XT at 8x on AGP.

I'll give it a swap out for a PCI card. I also noticed the CPU temperature
jumps to 48-50C when hyperthreading is disabled. Once I enable
hyperthreading and watch asusprobe I see it drop back to 38-40C though
obviously this means the performance is crap as well. I'm at the stage now
of contacting the vendor and telling them I want replacements or money back.

The memtest86 results are showing several errors in the RAM as well, I left
it running all night and see the same addresses cropping up on each pass
(although on different tests bizarrely). It's a long, long time since i've
had problems like this.

hmm...corsair...
 
Bill said:
hmm...corsair...
I've given it up as a bad job - there's only so many times I'm prepared to
hit my head against the wall. The parts have been issued an RMA today and
get sent back tomorrow. What are your experiences with Corsair RAM Bill?
 
I've given it up as a bad job - there's only so many times I'm prepared to
hit my head against the wall. The parts have been issued an RMA today and
get sent back tomorrow. What are your experiences with Corsair RAM Bill?

actually, none. Just read alot of posts and I try to base some of that
info on what I purchase...If you have any other ram available, it may
be worth the trouble to try it out. I've been running crucial 2 x 512
ddr400 since they were released and I've had no problems with that.

or try the ram in another working system...
sorry I cannot be of more help, but you can still try to eliminate
some of the hardware being the problem...
 
Back
Top