Memory strangeness

  • Thread starter Thread starter Dave the Funkatron
  • Start date Start date
D

Dave the Funkatron

Hi folks,

I have been having some strange problems with my machine for some time
now -- mainly random crashes and file corruption (on two different
hard drives -- one SATA, one UDMA). I thought this might be related to
the RAM, so I downloaded and ran Windows Memory Test from
http://oca.microsoft.com/en/windiag.asp. It failed EVERY test on the
first pass, and so I killed the subsequent passes. Now, to figure out
what this all means. The "names" of all the tests are rather cryptic,
and so I'm not sure what this kind of total failure would mean.

So, it occurs to me that the problem might be anywhere. Obviously, the
RAM is a good place to start looking, but could it not also be the
motherboard (the memory slots, or the L2 cache), or maybe the CPU (L1
cache), or perhaps the power supply not giving out the juice like it
should.

In an ideal world, I could start swapping components until I find the
one that is causing problems. Alas, I don't have a closet full of
spare hardware, and was hoping that there might be another way to
narrow things down a bit. Any advice in this regard would be
appreciated.

Thanks.

Dave
 
Dave said:
Hi folks,

I have been having some strange problems with my machine for some time
now -- mainly random crashes and file corruption (on two different
hard drives -- one SATA, one UDMA). I thought this might be related to
the RAM, so I downloaded and ran Windows Memory Test from
http://oca.microsoft.com/en/windiag.asp. It failed EVERY test on the
first pass, and so I killed the subsequent passes. Now, to figure out
what this all means. The "names" of all the tests are rather cryptic,
and so I'm not sure what this kind of total failure would mean.

So, it occurs to me that the problem might be anywhere. Obviously, the
RAM is a good place to start looking, but could it not also be the
motherboard (the memory slots, or the L2 cache), or maybe the CPU (L1
cache), or perhaps the power supply not giving out the juice like it
should.

In an ideal world, I could start swapping components until I find the
one that is causing problems. Alas, I don't have a closet full of
spare hardware, and was hoping that there might be another way to
narrow things down a bit. Any advice in this regard would be
appreciated.

Thanks.

Dave

And with no description of the computer make and model, or the
motherboard make and model, CPU model/speed, where would we
begin to make suggestions ? Is the power supply some $20 piece of
junk, or a better unit ? Have you added a lot of power hungry graphics
cards to the box ? Every bit counts.

For a second opinion on memory, try memtest86+ from http://www.memtest.org/ .

If more than one stick of RAM is involved, you could also try with
fewer sticks of RAM present.

The motherboard has regulation circuits on it, and some of those
can be relatively insensitive to power quality. Power rails used
directly by hardware, can be more of an issue. (For CPU, 12V is
converted to 1.5V or so. For RAM, 3.3V or 5V might be used to make
the RAM voltage, and in some cases there isn't a lot of headroom
in the conversion.)

Some motherboards have a hardware monitor page in the BIOS, and
programs like Speedfan (almico.com) or the old MBM5 (site no longer
running), can also monitor voltages while sitting in Windows. Using
any of those options, you can check to see if the main power
rails are within 5% of nominal 3.3V, 5V, 12V, -12V and so on.

The fact that you've been running an OS, means the memory isn't
all bad. For if there was significant problems in the area where
the OS is stored or the registry is stored, you probably
wouldn't survive a reboot or two. Maybe after memtest86+ runs,
you'll have a smaller error count to deal with. In some cases,
simply entering the BIOS and bumping up Vdimm (2.5V to 2.7V on DDR,
1.8V to 2.0V on DDR2, and so on), might be enough to satisfy an
almost error free collection of RAM.

Any chance someone tinkered with the BIOS settings,
and screwed something up ? Do you leave the machine
unplugged or unpowered a lot of the time ? That
could cause the CMOS battery to run down, and perhaps
the BIOS settings have returned to defaults. And
maybe defaults aren't what the box wants.

Paul
 
And with no description of the computer make and model, or the
motherboard make and model, CPU model/speed, where would we
begin to make suggestions ? Is the power supply some $20 piece of
junk, or a better unit ? Have you added a lot of power hungry graphics
cards to the box ? Every bit counts.

For a second opinion on memory, try memtest86+ fromhttp://www.memtest.org/..

If more than one stick of RAM is involved, you could also try with
fewer sticks of RAM present.

The motherboard has regulation circuits on it, and some of those
can be relatively insensitive to power quality. Power rails used
directly by hardware, can be more of an issue. (For CPU, 12V is
converted to 1.5V or so. For RAM, 3.3V or 5V might be used to make
the RAM voltage, and in some cases there isn't a lot of headroom
in the conversion.)

Some motherboards have a hardware monitor page in the BIOS, and
programs like Speedfan (almico.com) or the old MBM5 (site no longer
running), can also monitor voltages while sitting in Windows. Using
any of those options, you can check to see if the main power
rails are within 5% of nominal 3.3V, 5V, 12V, -12V and so on.

The fact that you've been running an OS, means the memory isn't
all bad. For if there was significant problems in the area where
the OS is stored or the registry is stored, you probably
wouldn't survive a reboot or two. Maybe after memtest86+ runs,
you'll have a smaller error count to deal with. In some cases,
simply entering the BIOS and bumping up Vdimm (2.5V to 2.7V on DDR,
1.8V to 2.0V on DDR2, and so on), might be enough to satisfy an
almost error free collection of RAM.

Any chance someone tinkered with the BIOS settings,
and screwed something up ? Do you leave the machine
unplugged or unpowered a lot of the time ? That
could cause the CMOS battery to run down, and perhaps
the BIOS settings have returned to defaults. And
maybe defaults aren't what the box wants.

Thanks Paul,

I guess I was looking for general trouble-shooting ideas, so thanks
for those. If you really want to see my system specs, here they are:

Asus P5B mainboard
Intel Core2duo 2.13GHz CPU
Aeneon DDR2 PCs-5300 RAM (1 stick)
NVidia GeForce 7600 GT PCIe graphics board
Coolermaster eXtreme Power 500W power supply
Maxtor ST3200822A IDE hard drive
Maxtor STM3500630AS SATA hard drive
Pioneer DVD+-RW DUAL LAYER+R 4X CDRW IDE optical drive
Windows XP 64 operating system


I did have an install of MBM, and it give some grim voltage readings:

Expected: Actual:
+3.3 +3.28
+5.0 +5.51
+12.0 +11.00
-12.0 -11.05
-5.0 -4.61

However, when I go into the bios and look at the voltages there,
things are much more well-behaved. SpeedFan also gives a less grim
report, though it lists the following voltages:

Expected: Actual:
+3.3 +3.28
+12.0 +11.88
vcore(1.5?) +1.18

So, perhaps the PS is the culprit, though I am not sure why I get
different readings from within the OS or within the BIOS. Assuming it
is a power supply problem, the next question might be: is it broken,
or just not powerful enough to drive my system? I'm guessing broken,
as it seems to me that my system does not have enough stuff to consume
the 500W that the PS is rated for. Correct me if you think I might be
wrong.

Another thing I noticed: the SpeedFan charts tabs reports that HD0 and
HD1 are running at a flat 39 and 41 degrees C. This seems a bit warm
to be, though not necessarily outside of operating temperatures.

As far as the BIOS losing settings, come to think of it, I did have it
in storage for a few months. I don't remember it having a strange time
when I took it out, but maybe something did get lost. Would it be
worth resetting the BIOS or something like that?
 
Dave said:
Thanks Paul,

I guess I was looking for general trouble-shooting ideas, so thanks
for those. If you really want to see my system specs, here they are:

Asus P5B mainboard
Intel Core2duo 2.13GHz CPU
Aeneon DDR2 PCs-5300 RAM (1 stick)
NVidia GeForce 7600 GT PCIe graphics board
Coolermaster eXtreme Power 500W power supply
Maxtor ST3200822A IDE hard drive
Maxtor STM3500630AS SATA hard drive
Pioneer DVD+-RW DUAL LAYER+R 4X CDRW IDE optical drive
Windows XP 64 operating system


I did have an install of MBM, and it give some grim voltage readings:

Expected: Actual:
+3.3 +3.28
+5.0 +5.51
+12.0 +11.00
-12.0 -11.05
-5.0 -4.61

However, when I go into the bios and look at the voltages there,
things are much more well-behaved. SpeedFan also gives a less grim
report, though it lists the following voltages:

Expected: Actual:
+3.3 +3.28
+12.0 +11.88
vcore(1.5?) +1.18

So, perhaps the PS is the culprit, though I am not sure why I get
different readings from within the OS or within the BIOS. Assuming it
is a power supply problem, the next question might be: is it broken,
or just not powerful enough to drive my system? I'm guessing broken,
as it seems to me that my system does not have enough stuff to consume
the 500W that the PS is rated for. Correct me if you think I might be
wrong.

Another thing I noticed: the SpeedFan charts tabs reports that HD0 and
HD1 are running at a flat 39 and 41 degrees C. This seems a bit warm
to be, though not necessarily outside of operating temperatures.

As far as the BIOS losing settings, come to think of it, I did have it
in storage for a few months. I don't remember it having a strange time
when I took it out, but maybe something did get lost. Would it be
worth resetting the BIOS or something like that?

Looks like MBM5 is out to lunch. I have a similar problem here, where
MBM reads 12.76V, but a check with a multimeter indicates my 12V rail
is just about perfect in terms of voltage.

This is my Speedfan right now.

Vcore 1.60 (set at 1.55 I think)
+12V 12.04V (Speedfan does do a better job)
3.3V 3.28V (Agrees with MBM5)
Vcc 5.01V (Probably my +5V rail)
5Vsb 5.04V (The standby supply, for suspend to RAM)
Vbat 0.00V (Well, nobody is perfect... It isn't zero.)

For temps, my room temp is 25C (separate sensor), case is 30C,
and SMART for both my drives reports 34C. The drives are right
next to the air intake vent. Your 39 to 41C, to put it in perspective,
37C is human body temperature. Temperatures like you've got,
are only a problem if the humidity in the room is high (high
enough to make your carpets mildew). Disk drives have an allowed
temperature/humidity curve, if you can find the info on the manufacturer's
site.

Your Speedfan readings are within 5%, so that looks good. I would have
been nice to see the 5V value as well from that.

A few months storage shouldn't hurt the thing. I get about 3 years roughly
from a CR2032, in storage. So a few months shouldn't flatten the thing.
The BIOS should restore the default settings, if the checksum byte doesn't
agree with the rest of the CMOS contents.

You should enter the BIOS setup screens and have a look around. As I suggested
in the previous post, a little extra Vdimm sometimes helps. But only if there
is a small error count accumulating in Memtest86+. I've also had RAM with
a "bad spot" in it, where some locations were bad each time the program
tests them. If there are transient faults, a few on each pass, and at
different locations, then a change to Vdimm, or bumping up CAS one
notch, might fix it. You might also want to see what your warranty
options are on the RAM. The crap I used to buy locally, had a one year
warranty. I no longer buy the "on sale" specials locally.

Paul
 
Looks like MBM5 is out to lunch. I have a similar problem here, where
MBM reads 12.76V, but a check with a multimeter indicates my 12V rail
is just about perfect in terms of voltage.

This is my Speedfan right now.

Vcore 1.60    (set at 1.55 I think)
+12V  12.04V  (Speedfan does do a better job)
3.3V   3.28V  (Agrees with MBM5)
Vcc    5.01V  (Probably my +5V rail)
5Vsb   5.04V  (The standby supply, for suspend to RAM)
Vbat   0.00V  (Well, nobody is perfect... It isn't zero.)

For temps, my room temp is 25C (separate sensor), case is 30C,
and SMART for both my drives reports 34C. The drives are right
next to the air intake vent. Your 39 to 41C, to put it in perspective,
37C is human body temperature. Temperatures like you've got,
are only a problem if the humidity in the room is high (high
enough to make your carpets mildew). Disk drives have an allowed
temperature/humidity curve, if you can find the info on the manufacturer's
site.

Your Speedfan readings are within 5%, so that looks good. I would have
been nice to see the 5V value as well from that.

A few months storage shouldn't hurt the thing. I get about 3 years roughly
from a CR2032, in storage. So a few months shouldn't flatten the thing.
The BIOS should restore the default settings, if the checksum byte doesn't
agree with the rest of the CMOS contents.

You should enter the BIOS setup screens and have a look around. As I suggested
in the previous post, a little extra Vdimm sometimes helps. But only if there
is a small error count accumulating in Memtest86+. I've also had RAM with
a "bad spot" in it, where some locations were bad each time the program
tests them. If there are transient faults, a few on each pass, and at
different locations, then a change to Vdimm, or bumping up CAS one
notch, might fix it. You might also want to see what your warranty
options are on the RAM. The crap I used to buy locally, had a one year
warranty. I no longer buy the "on sale" specials locally.

    Paul- Hide quoted text -

- Show quoted text -

Memtest86+ actually crashes on me after finding around 32000 errors
(overflow on a signed 16-bit int?). That makes me think that the
memory is really flaky, but then the mystery is: how am I able to boot
into the OS?

Also, I can't find any settings in the BIOS for voltage. It just
reports the voltage, but won't let me adjust it. The manual for the
P5B motherboard doesn't mention power settings either.

So, I think I'll go RAM shopping and let you know what happens after
that.

Thanks.

Dave
 
Dave said:
Memtest86+ actually crashes on me after finding around 32000 errors
(overflow on a signed 16-bit int?). That makes me think that the
memory is really flaky, but then the mystery is: how am I able to boot
into the OS?

Also, I can't find any settings in the BIOS for voltage. It just
reports the voltage, but won't let me adjust it. The manual for the
P5B motherboard doesn't mention power settings either.

So, I think I'll go RAM shopping and let you know what happens after
that.

Thanks.

Dave

P5B

Don't go memory shopping yet.

Set "AI Tuning" to [Manual]. Some more settings should appear.
One of them is "Memory Voltage". It goes from [1.8V] to
[2.1V]. If your RAM has specs, it might mention either the
voltage needed to meet those specs, or it will mention the
maximum voltage the manufacturer recommends. I expect when
you give it a bit more, you might notice some change.

Paul
 
I did have an install of MBM, and it give some grim voltage readings:
Expected:        Actual:
+3.3                +3.28
+5.0                +5.51
+12.0              +11.00
-12.0               -11.05
-5.0                 -4.61

If power supply system is defective, then even different loads (when
running BIOS verses when executing the OS) would cause different
numbers. With numbers displayed, then no other test can report a
reliable numbers until you have first confirmed power 'system'
integrity.

That voltage monitor is not accurate until first calibrated using a
multimeter. And a multimeter will also establish power supply
integrity. You must know the power supply 'system is either
'definitively good' or 'definitively bad'. 'Unknown' (the third
state) is what currently exists.

DC voltages on any one orange, red, purple, and yellow wires must
exceed 3.23, 4.87, or 11.7 VDC. Diagnostics including Memtst86 are
useful only after those VDC numbers are within spec.

A computer typically averages 150 watts. A 350 watt supply is more
than sufficient for most every computer. Is that unique power supply
sufficiently sized? Multimeter numbers (see those minimal acceptable
numbers) will even confirm (once you get the computer running so that
all peripherals are accessed simultaneously - multitasking) that power
supply is large enough. Even if its label says large enough, you do
not know until actual measurements confirm what theoretically should
be. Just another reason why the multimeter is a tool as useful as a
screwdriver - and also sold where screwdrivers are sold.
 
Back
Top