How often do RAM just die when used for several years?

  • Thread starter Thread starter Ant
  • Start date Start date
A

Ant

Hello!

A few days ago in my old Debian box, my computer randomly locked up
twice and rebooted once. Once, I got system failure during POST. I
turned off and on the machine. It didn't boot up (no POST). Turned off
and turned on again, then computer worked for another 24 hours. Then
after that, I got no POST (all fans inside computer worked though), no
video signals, no beeps, no keyboard lights, etc. Clearing CMOS,
changing video card, etc. did not help.

What helped was removing, swapping, testing, etc. the two installed 512
MB of Kingston RAM modules (together and separately). My friend and I
found the one module was going/went bad. I never had problem with this
for the last 4-5 years (bought it new). Since it is lifetime warranty,
it was sent in for a replacement.

I am curious. How often do old RAM (machine runs almost 24/7) go bad
like this? I haven't had any more lockups, reboots, and other weirndness
with the other 512 MB of memory. I didn't see any
burns/melted/discolored areas and smelled any funny odor on the bad
memory. The temperatures have been cool [under 70 degrees(F)] due to
winter time (lots of rain lately though).

Thank you in advance. :)
--
"For while the giants have just been talking about an information
superhighway, the ants have actually been building one: the Internet."
From "The Accidental Superhighway." The Economist: A Survey of the
Internet, 1-7 July 1995, insert.
/\___/\
/ /\ /\ \ Phil/Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Remove ANT from e-mail address: (e-mail address removed)
( ) or (e-mail address removed)
Ant is currently not listening to any songs on his home computer.
 
Hello!

A few days ago in my old Debian box, my computer randomly locked up
twice and rebooted once. Once, I got system failure during POST. I
turned off and on the machine. It didn't boot up (no POST). Turned off
and turned on again, then computer worked for another 24 hours. Then
after that, I got no POST (all fans inside computer worked though), no
video signals, no beeps, no keyboard lights, etc. Clearing CMOS,
changing video card, etc. did not help.

What helped was removing, swapping, testing, etc. the two installed 512
MB of Kingston RAM modules (together and separately). My friend and I
found the one module was going/went bad. I never had problem with this
for the last 4-5 years (bought it new). Since it is lifetime warranty,
it was sent in for a replacement.

I am curious. How often do old RAM (machine runs almost 24/7) go bad
like this?

Almost never, except that in today's world, lead is evil so
we now have tin wiskers growing on everything. I don't
think your modules are post-ROHS though, so I would instead
suspect the motherboard is failing and subjecting them to
excessive voltage, OR there was a power surge that caused
damage.


I haven't had any more lockups, reboots, and other weirndness
with the other 512 MB of memory. I didn't see any
burns/melted/discolored areas and smelled any funny odor on the bad
memory. The temperatures have been cool [under 70 degrees(F)] due to
winter time (lots of rain lately though).

There's not much point in wondering about the 1 time in a
million things didn't work out as expected. Literally,
there are a million other systems out there that didn't have
this problem and it could be anything from a DRAM defect to
a power surge to a mild damage from a ESD long ago. Unless
we can find what actually caused the fault, there is nothing
we can do to prevent them except continue following best
practices.

If I had to play odds, I'd say the capacitors in either the
PSU or motherboard are failing, but you didn't clarify what
testing means, if you took the questionable module and then
tried it in another system (and if an older system, knowing
that it has clean memory contacts and likewise it's
capacitors are not aged onto the point of degraded
performance).

Sadly, nothing is made to last forever. Some failures are
sooner than others.
 
Ant said:
A few days ago in my old Debian box, my computer randomly
locked up twice and rebooted once. Once, I got system failure
during POST. I turned off and on the machine. It didn't boot up
(no POST). Turned off and turned on again, then computer
worked for another 24 hours. Then after that, I got no POST
(all fans inside computer worked though), no video signals, no
beeps, no keyboard lights, etc. Clearing CMOS, changing video
card, etc. did not help.


Did you try cleaning the motherboard's lithium battery contacts?
Did you try replacing the lithium battery?

What helped was removing, swapping, testing, etc. the two
installed 512 MB of Kingston RAM modules (together and separately).
My friend and I found the one module was going/went bad. I never
had problem with this for the last 4-5 years (bought it new). Since
it is lifetime warranty, it was sent in for a replacement.


What tests were these? Did you try just cleaning the contacts on
the memory sticks and the motherboard connectors with contact
cleaner and then reseating the memory sticks?

*TimDaniels*
 
A few days ago in my old Debian box, my computer randomly locked up
twice and rebooted once. Once, I got system failure during POST. I
turned off and on the machine. It didn't boot up (no POST). Turned off
and turned on again, then computer worked for another 24 hours. Then
after that, I got no POST (all fans inside computer worked though), no
video signals, no beeps, no keyboard lights, etc. Clearing CMOS,
changing video card, etc. did not help.

What helped was removing, swapping, testing, etc. the two installed 512
MB of Kingston RAM modules (together and separately). My friend and I
found the one module was going/went bad. I never had problem with this
for the last 4-5 years (bought it new). Since it is lifetime warranty,
it was sent in for a replacement.

I am curious. How often do old RAM (machine runs almost 24/7) go bad
like this?

Almost never, except that in today's world, lead is evil so
we now have tin wiskers growing on everything. I don't
think your modules are post-ROHS though, so I would instead
suspect the motherboard is failing and subjecting them to
excessive voltage, OR there was a power surge that caused
damage.


I haven't had any more lockups, reboots, and other weirndness
with the other 512 MB of memory. I didn't see any
burns/melted/discolored areas and smelled any funny odor on the bad
memory. The temperatures have been cool [under 70 degrees(F)] due to
winter time (lots of rain lately though).

There's not much point in wondering about the 1 time in a
million things didn't work out as expected. Literally,
there are a million other systems out there that didn't have
this problem and it could be anything from a DRAM defect to
a power surge to a mild damage from a ESD long ago. Unless
we can find what actually caused the fault, there is nothing
we can do to prevent them except continue following best
practices.

What's ESD? Power surge. Hmm. Wouldn't my APC Back-UPS XS BX1500
prevented those small ones? I know these consumer based ones can't
handle big ones. My other more powerful machine and 19" LCD monitor are
on this UPS too. Here are the only power hiccups my Debian box recorded
recently via USB (only posted the power issue errors):

# cat /var/log/apcupsd.events
Tue Aug 28 01:30:19 PDT 2007 apcupsd 3.12.4 (19 August 2006) debian
startup succeeded
Sun Sep 02 21:46:55 PDT 2007 Power failure.
Sun Sep 02 21:47:01 PDT 2007 Running on UPS batteries.
Sun Sep 02 21:47:13 PDT 2007 Mains returned. No longer on UPS batteries.
Sun Sep 02 21:47:13 PDT 2007 Power is back. UPS running on mains.
Sun Sep 09 10:41:36 PDT 2007 Power failure.
Sun Sep 09 10:41:38 PDT 2007 Power is back. UPS running on mains.
Mon Sep 10 07:25:54 PDT 2007 Power failure.
Mon Sep 10 07:25:58 PDT 2007 Power is back. UPS running on mains.
Sat Sep 22 18:09:25 PDT 2007 Power failure.
Sat Sep 22 18:09:27 PDT 2007 Power is back. UPS running on mains.
Thu Sep 27 22:24:19 PDT 2007 Power failure.
Thu Sep 27 22:24:22 PDT 2007 Power is back. UPS running on mains.
Thu Oct 11 10:46:29 PDT 2007 Power failure.
Thu Oct 11 10:46:32 PDT 2007 Power is back. UPS running on mains.
Sun Oct 21 08:23:46 PDT 2007 Power failure.
Sun Oct 21 08:23:49 PDT 2007 Power is back. UPS running on mains.
Sun Oct 28 08:42:14 PDT 2007 Power failure.
Sun Oct 28 08:42:17 PDT 2007 Power is back. UPS running on mains.
Wed Nov 07 06:25:56 PST 2007 Power failure.
Wed Nov 07 06:25:59 PST 2007 Power is back. UPS running on mains.
Fri Nov 09 06:25:57 PST 2007 Power failure.
Fri Nov 09 06:26:00 PST 2007 Power is back. UPS running on mains.
Sat Nov 10 07:12:11 PST 2007 Power failure.
Sat Nov 10 07:12:14 PST 2007 Power is back. UPS running on mains.
Thu Nov 15 06:25:57 PST 2007 Power failure.
Thu Nov 15 06:26:00 PST 2007 Power is back. UPS running on mains.
Sun Dec 30 09:16:56 PST 2007 Power failure.
Sun Dec 30 09:16:58 PST 2007 Power is back. UPS running on mains.
Tue Jan 01 08:28:12 PST 2008 Power failure.
Tue Jan 01 08:28:14 PST 2008 Power is back. UPS running on mains.
Sat Jan 05 07:41:30 PST 2008 Power failure.
Sat Jan 05 07:41:33 PST 2008 Power is back. UPS running on mains.
Sun Jan 06 17:03:28 PST 2008 Power failure.
Sun Jan 06 17:03:31 PST 2008 Power is back. UPS running on mains.

If I had to play odds, I'd say the capacitors in either the
PSU or motherboard are failing, but you didn't clarify what
testing means, if you took the questionable module and then
tried it in another system (and if an older system, knowing
that it has clean memory contacts and likewise it's
capacitors are not aged onto the point of degraded
performance).

According to my computer hardware customization logs, I did have to
change my Debian's PSU on 5/14/2007 with a new Fortron FSP650-80GLC PSU
(650 watts) due to its old Antec PSU stopped working (computer no longer
booted up). I don't know if that was related to damage/kill my RAM. It
took this long to cause the symptoms if it is related.

We also tried different memory slots. We did not try another computer
since there wasn't one available. My friend also tried underclocking
speed and that still crashed. The memory module was already sent out for
RMA, so we can't test it on another machine if there was one.

Sadly, nothing is made to last forever. Some failures are
sooner than others.

True, but I wasn't expecting a non-movable component to die that fast. I
can understand movable parts like drives. My friend and I thought it was
a dead motherboard too. We saw nothing fried, no weird shaped
capitactors, no discolorations, no odd odors, etc.
--
"He who storms in like a whirlwind returns like an ant." --Borneo
/\___/\
/ /\ /\ \ Phil/Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Remove ANT from e-mail address: (e-mail address removed)
( ) or (e-mail address removed)
Ant is currently not listening to any songs on his home computer.
 
Did you try cleaning the motherboard's lithium battery contacts?
Did you try replacing the lithium battery?

Yep, even without the battery and booting up the computer!

What tests were these? Did you try just cleaning the contacts on
the memory sticks and the motherboard connectors with contact
cleaner and then reseating the memory sticks?

Yep. Even tried other memory slots (there are three). We didn't have
another machine to try the bad memory module though, but it is too late
now because it was sent back to Kingston to be RMA'ed.
--
"For while the giants have just been talking about an information
superhighway, the ants have actually been building one: the Internet."
From "The Accidental Superhighway." The Economist: A Survey of the
Internet, 1-7 July 1995, insert.
/\___/\
/ /\ /\ \ Phil/Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Remove ANT from e-mail address: (e-mail address removed)
( ) or (e-mail address removed)
Ant is currently not listening to any songs on his home computer.
 
I don't know if this is related to stress my RAM and machine out. I do
use AMD's Cool'n'Quiet feature to keep my machine cooler and not use
power when not used intensely. Here's an example when idled:

$ sensors -f
k8temp-pci-00c3
Adapter: PCI adapter
Core0 Temp: +80.6°F

w83697hf-isa-0290
Adapter: ISA adapter
VCore: +1.12 V (min = +2.11 V, max = +2.48 V)
+3.3V: +3.28 V (min = +1.46 V, max = +0.21 V)
+5V: +4.97 V (min = +0.89 V, max = +0.22 V)
+12V: +11.43 V (min = +0.24 V, max = +9.67 V)
-12V: +0.72 V (min = -5.37 V, max = +0.63 V)
-5V: +5.10 V (min = +1.18 V, max = -7.66 V)
V5SB: +5.59 V (min = +0.13 V, max = +0.48 V)
VBat: +1.55 V (min = +2.24 V, max = +3.46 V)
fan1: 2360 RPM (min = 1622 RPM, div = 4)
fan2: 2376 RPM (min = 1985 RPM, div = 4)
temp1: +82.4°F (high = +201.2°F, hyst = +221.0°F) sensor =
thermistor
temp2: +86.0°F (high = +176.0°F, hyst = +167.0°F) sensor =
thermistor
beep_enable:enabled


I never had problems using Cool'n'Quiet option. I did try disabling it
(hence clearing CMOS) when diagnosing and testing my machine and then
discovering the bad memory module.


Hello!

A few days ago in my old Debian box, my computer randomly locked up
twice and rebooted once. Once, I got system failure during POST. I
turned off and on the machine. It didn't boot up (no POST). Turned off
and turned on again, then computer worked for another 24 hours. Then
after that, I got no POST (all fans inside computer worked though), no
video signals, no beeps, no keyboard lights, etc. Clearing CMOS,
changing video card, etc. did not help.

What helped was removing, swapping, testing, etc. the two installed 512
MB of Kingston RAM modules (together and separately). My friend and I
found the one module was going/went bad. I never had problem with this
for the last 4-5 years (bought it new). Since it is lifetime warranty,
it was sent in for a replacement.

I am curious. How often do old RAM (machine runs almost 24/7) go bad
like this? I haven't had any more lockups, reboots, and other weirndness
with the other 512 MB of memory. I didn't see any
burns/melted/discolored areas and smelled any funny odor on the bad
memory. The temperatures have been cool [under 70 degrees(F)] due to
winter time (lots of rain lately though).

Thank you in advance. :)
--
"When the water rises the fish eat the ants, when the water falls the
ants eat the fish." --Thai Proverb
/\___/\
/ /\ /\ \ Phil/Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Remove ANT from e-mail address: (e-mail address removed)
( ) or (e-mail address removed)
Ant is currently not listening to any songs on his home computer.
 
What's ESD?

Electrostatic discharge, would typically happen prior to
installing the memory or while installing.
Power surge. Hmm. Wouldn't my APC Back-UPS XS BX1500
prevented those small ones?

In an ideal world yes, in the real world parts are built to
a budget and restrained by their reaction time and impedance
to earth ground. We'd have to suspect it was the cause to
make more time spent on this fruitful as memory generally
isn't as likely damaged by a surge as other parts, mainly at
least those through which the surge entered the system and
you report no other failures.

I know these consumer based ones can't
handle big ones. My other more powerful machine and 19" LCD monitor are
on this UPS too. Here are the only power hiccups my Debian box recorded
recently via USB (only posted the power issue errors):

# cat /var/log/apcupsd.events
Tue Aug 28 01:30:19 PDT 2007 apcupsd 3.12.4 (19 August 2006) debian
startup succeeded
Sun Sep 02 21:46:55 PDT 2007 Power failure.
Sun Sep 02 21:47:01 PDT 2007 Running on UPS batteries.
Sun Sep 02 21:47:13 PDT 2007 Mains returned. No longer on UPS batteries.
Sun Sep 02 21:47:13 PDT 2007 Power is back. UPS running on mains.
Sun Sep 09 10:41:36 PDT 2007 Power failure.
Sun Sep 09 10:41:38 PDT 2007 Power is back. UPS running on mains.
Mon Sep 10 07:25:54 PDT 2007 Power failure.
Mon Sep 10 07:25:58 PDT 2007 Power is back. UPS running on mains.
Sat Sep 22 18:09:25 PDT 2007 Power failure.
Sat Sep 22 18:09:27 PDT 2007 Power is back. UPS running on mains.
Thu Sep 27 22:24:19 PDT 2007 Power failure.
Thu Sep 27 22:24:22 PDT 2007 Power is back. UPS running on mains.
Thu Oct 11 10:46:29 PDT 2007 Power failure.
Thu Oct 11 10:46:32 PDT 2007 Power is back. UPS running on mains.
Sun Oct 21 08:23:46 PDT 2007 Power failure.
Sun Oct 21 08:23:49 PDT 2007 Power is back. UPS running on mains.
Sun Oct 28 08:42:14 PDT 2007 Power failure.
Sun Oct 28 08:42:17 PDT 2007 Power is back. UPS running on mains.
Wed Nov 07 06:25:56 PST 2007 Power failure.
Wed Nov 07 06:25:59 PST 2007 Power is back. UPS running on mains.
Fri Nov 09 06:25:57 PST 2007 Power failure.
Fri Nov 09 06:26:00 PST 2007 Power is back. UPS running on mains.
Sat Nov 10 07:12:11 PST 2007 Power failure.
Sat Nov 10 07:12:14 PST 2007 Power is back. UPS running on mains.
Thu Nov 15 06:25:57 PST 2007 Power failure.
Thu Nov 15 06:26:00 PST 2007 Power is back. UPS running on mains.
Sun Dec 30 09:16:56 PST 2007 Power failure.
Sun Dec 30 09:16:58 PST 2007 Power is back. UPS running on mains.
Tue Jan 01 08:28:12 PST 2008 Power failure.
Tue Jan 01 08:28:14 PST 2008 Power is back. UPS running on mains.
Sat Jan 05 07:41:30 PST 2008 Power failure.
Sat Jan 05 07:41:33 PST 2008 Power is back. UPS running on mains.
Sun Jan 06 17:03:28 PST 2008 Power failure.
Sun Jan 06 17:03:31 PST 2008 Power is back. UPS running on mains.

Either your power is flaky or you have set too high a
threshold in your APC software. (At least, IIRC, some of the
APC software I've seen allows the user to set the threshold
at which it goes from line to backup power). If you have an
AFPC PSU (no 110/220V switch on it), it is quite tolerant of
changing line voltage and you can set your UPS threshold
very broad.


According to my computer hardware customization logs, I did have to
change my Debian's PSU on 5/14/2007 with a new Fortron FSP650-80GLC PSU
(650 watts) due to its old Antec PSU stopped working (computer no longer
booted up). I don't know if that was related to damage/kill my RAM. It
took this long to cause the symptoms if it is related.

It can't be direclty assumed, Antecs are not only popular
but certain models are known to fail prematurely due to bad
and/or marginal capacitors, but in those cases where they
did fail I dont' ever recall it damaging the memory. Modern
memory is behind at least one addt'l regulation stage on the
motherboard.


We also tried different memory slots. We did not try another computer
since there wasn't one available. My friend also tried underclocking
speed and that still crashed. The memory module was already sent out for
RMA, so we can't test it on another machine if there was one.



True, but I wasn't expecting a non-movable component to die that fast. I
can understand movable parts like drives. My friend and I thought it was
a dead motherboard too. We saw nothing fried, no weird shaped
capitactors, no discolorations, no odd odors, etc.

There's really not much point in being too concerned about a
single part failure. Random failures are just that, but if
you have more memory failing in the same board in the
future, I would think about retiring the motherboard.
 
I don't know if this is related to stress my RAM and machine out. I do
use AMD's Cool'n'Quiet feature to keep my machine cooler and not use
power when not used intensely. Here's an example when idled:

$ sensors -f
k8temp-pci-00c3
Adapter: PCI adapter
Core0 Temp: +80.6°F

w83697hf-isa-0290
Adapter: ISA adapter
VCore: +1.12 V (min = +2.11 V, max = +2.48 V)
+3.3V: +3.28 V (min = +1.46 V, max = +0.21 V)
+5V: +4.97 V (min = +0.89 V, max = +0.22 V)
+12V: +11.43 V (min = +0.24 V, max = +9.67 V)
-12V: +0.72 V (min = -5.37 V, max = +0.63 V)
-5V: +5.10 V (min = +1.18 V, max = -7.66 V)
V5SB: +5.59 V (min = +0.13 V, max = +0.48 V)
VBat: +1.55 V (min = +2.24 V, max = +3.46 V)
fan1: 2360 RPM (min = 1622 RPM, div = 4)
fan2: 2376 RPM (min = 1985 RPM, div = 4)
temp1: +82.4°F (high = +201.2°F, hyst = +221.0°F) sensor =
thermistor
temp2: +86.0°F (high = +176.0°F, hyst = +167.0°F) sensor =
thermistor
beep_enable:enabled


I never had problems using Cool'n'Quiet option. I did try disabling it
(hence clearing CMOS) when diagnosing and testing my machine and then
discovering the bad memory module.

Cool 'n quiet won't stress your ram. Only think you can
control that would is if you had manually increased the
memory voltage by a substantial amount.
 
What's ESD?
Electrostatic discharge, would typically happen prior to
installing the memory or while installing.

Ah, if that was the case, then it would probably happen in the end of
2006. Or maybe May 2006 when replacing the PSU (doubt the memory was
even touched) by something else?

In an ideal world yes, in the real world parts are built to
a budget and restrained by their reaction time and impedance
to earth ground. We'd have to suspect it was the cause to
make more time spent on this fruitful as memory generally
isn't as likely damaged by a surge as other parts, mainly at
least those through which the surge entered the system and
you report no other failures.
Hm.
Either your power is flaky or you have set too high a

Maybe. I do notice sometimes my lamp blink when this happens if it is
on. Probably a power issue. I live in an old house (built in the late
70s/early 80s). Power is done underground. I have had this UPS since
9/25/2005 according to logs on
http://alpha.zimage.com/~ant/antfarm/about/toys.html ... Before it, I
had an APC Back-UPS 650VA (BK650MC) that had to be replaced because it
kept shutting down even if there was AC power! That is when I replaced
it.

threshold in your APC software. (At least, IIRC, some of the
APC software I've seen allows the user to set the threshold
at which it goes from line to backup power). If you have an
AFPC PSU (no 110/220V switch on it), it is quite tolerant of
changing line voltage and you can set your UPS threshold
very broad.

Hmm, I am not sure how to set this. I used the default configurations
from apcaccess package in Debian. I didn't set my OS to shut down the
computer or anything when battery is low. Maybe my current UPS
statistics help to see what's funky?

$ /sbin/apcaccess
APC : 001,037,0895
DATE : Wed Jan 30 17:43:16 PST 2008
HOSTNAME : FooBar
RELEASE : 3.14.2
VERSION : 3.14.2 (15 September 2007) debian
UPSNAME : ANTian
CABLE : USB Cable
MODEL : Back-UPS RS 1500
UPSMODE : Stand Alone
STARTTIME: Sat Jan 26 23:02:42 PST 2008
STATUS : ONLINE
LINEV : 117.0 Volts
LOADPCT : 11.0 Percent Load Capacity
BCHARGE : 100.0 Percent
TIMELEFT : 70.9 Minutes
MBATTCHG : 5 Percent
MINTIMEL : 3 Minutes
MAXTIME : 0 Seconds
SENSE : High
LOTRANS : 097.0 Volts
HITRANS : 138.0 Volts
ALARMDEL : Always
BATTV : 26.9 Volts
LASTXFER : Low line voltage
NUMXFERS : 0
TONBATT : 0 seconds
CUMONBATT: 0 seconds
XOFFBATT : N/A
SELFTEST : NO
STATFLAG : 0x07000008 Status Flag
MANDATE : 2005-03-16
SERIALNO : QB0512132444
BATTDATE : 2001-09-25
NOMINV : 120
NOMBATTV : 24.0
FIRMWARE : 8.g8 .D USB FW:g8
APCMODEL : Back-UPS RS 1500
END APC : Wed Jan 30 17:44:11 PST 2008


It can't be direclty assumed, Antecs are not only popular
but certain models are known to fail prematurely due to bad
and/or marginal capacitors, but in those cases where they
did fail I dont' ever recall it damaging the memory. Modern
memory is behind at least one addt'l regulation stage on the
motherboard.

Hmm. OK.

There's really not much point in being too concerned about a
single part failure. Random failures are just that, but if
you have more memory failing in the same board in the
future, I would think about retiring the motherboard.

OK. Will note. I hope Kingston doesn't say the RMA'ed RAM is fine and
sends it back. Again, didn't have another machine to test it on.
--
"I used to own an ant farm but had to give it up. I couldn't find
tractors small enough to fit it." --Steven Wright
/\___/\
/ /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
 
Ant said:
A few days ago in my old Debian box, my computer randomly locked up
twice and rebooted once. Once, I got system failure during POST. I
turned off and on the machine. It didn't boot up (no POST). Turned off
and turned on again, then computer worked for another 24 hours. Then
after that, I got no POST (all fans inside computer worked though), no
video signals, no beeps, no keyboard lights, etc. Clearing CMOS,
changing video card, etc. did not help.

What helped was removing, swapping, testing, etc. the two installed 512
MB of Kingston RAM modules (together and separately). My friend and I
found the one module was going/went bad. I never had problem with this
for the last 4-5 years (bought it new). Since it is lifetime warranty,
it was sent in for a replacement.

I am curious. How often do old RAM (machine runs almost 24/7) go bad
like this?

Almost never, except that in today's world, lead is evil so
we now have tin wiskers growing on everything. I don't
think your modules are post-ROHS though, so I would instead
suspect the motherboard is failing and subjecting them to
excessive voltage, OR there was a power surge that caused
damage.


I haven't had any more lockups, reboots, and other weirndness with the
other 512 MB of memory. I didn't see any burns/melted/discolored areas
and smelled any funny odor on the bad memory. The temperatures have been
cool [under 70 degrees(F)] due to winter time (lots of rain lately
though).

There's not much point in wondering about the 1 time in a
million things didn't work out as expected. Literally,
there are a million other systems out there that didn't have
this problem and it could be anything from a DRAM defect to
a power surge to a mild damage from a ESD long ago. Unless
we can find what actually caused the fault, there is nothing
we can do to prevent them except continue following best
practices.

What's ESD? Power surge. Hmm. Wouldn't my APC Back-UPS XS BX1500 prevented
those small ones? I know these consumer based ones can't handle big ones.
My other more powerful machine and 19" LCD monitor are on this UPS too.
Here are the only power hiccups my Debian box recorded recently via USB
(only posted the power issue errors):

# cat /var/log/apcupsd.events
Tue Aug 28 01:30:19 PDT 2007 apcupsd 3.12.4 (19 August 2006) debian
startup succeeded
Sun Sep 02 21:46:55 PDT 2007 Power failure.
Sun Sep 02 21:47:01 PDT 2007 Running on UPS batteries.
Sun Sep 02 21:47:13 PDT 2007 Mains returned. No longer on UPS batteries.
Sun Sep 02 21:47:13 PDT 2007 Power is back. UPS running on mains.
Sun Sep 09 10:41:36 PDT 2007 Power failure.
Sun Sep 09 10:41:38 PDT 2007 Power is back. UPS running on mains.
Mon Sep 10 07:25:54 PDT 2007 Power failure.
Mon Sep 10 07:25:58 PDT 2007 Power is back. UPS running on mains.
Sat Sep 22 18:09:25 PDT 2007 Power failure.
Sat Sep 22 18:09:27 PDT 2007 Power is back. UPS running on mains.
Thu Sep 27 22:24:19 PDT 2007 Power failure.
Thu Sep 27 22:24:22 PDT 2007 Power is back. UPS running on mains.
Thu Oct 11 10:46:29 PDT 2007 Power failure.
Thu Oct 11 10:46:32 PDT 2007 Power is back. UPS running on mains.
Sun Oct 21 08:23:46 PDT 2007 Power failure.
Sun Oct 21 08:23:49 PDT 2007 Power is back. UPS running on mains.
Sun Oct 28 08:42:14 PDT 2007 Power failure.
Sun Oct 28 08:42:17 PDT 2007 Power is back. UPS running on mains.
Wed Nov 07 06:25:56 PST 2007 Power failure.
Wed Nov 07 06:25:59 PST 2007 Power is back. UPS running on mains.
Fri Nov 09 06:25:57 PST 2007 Power failure.
Fri Nov 09 06:26:00 PST 2007 Power is back. UPS running on mains.
Sat Nov 10 07:12:11 PST 2007 Power failure.
Sat Nov 10 07:12:14 PST 2007 Power is back. UPS running on mains.
Thu Nov 15 06:25:57 PST 2007 Power failure.
Thu Nov 15 06:26:00 PST 2007 Power is back. UPS running on mains.
Sun Dec 30 09:16:56 PST 2007 Power failure.
Sun Dec 30 09:16:58 PST 2007 Power is back. UPS running on mains.
Tue Jan 01 08:28:12 PST 2008 Power failure.
Tue Jan 01 08:28:14 PST 2008 Power is back. UPS running on mains.
Sat Jan 05 07:41:30 PST 2008 Power failure.
Sat Jan 05 07:41:33 PST 2008 Power is back. UPS running on mains.
Sun Jan 06 17:03:28 PST 2008 Power failure.
Sun Jan 06 17:03:31 PST 2008 Power is back. UPS running on mains.

If I had to play odds, I'd say the capacitors in either the
PSU or motherboard are failing, but you didn't clarify what
testing means, if you took the questionable module and then
tried it in another system (and if an older system, knowing
that it has clean memory contacts and likewise it's
capacitors are not aged onto the point of degraded
performance).

According to my computer hardware customization logs, I did have to change
my Debian's PSU on 5/14/2007 with a new Fortron FSP650-80GLC PSU (650
watts) due to its old Antec PSU stopped working (computer no longer booted
up). I don't know if that was related to damage/kill my RAM. It took this
long to cause the symptoms if it is related.

We also tried different memory slots. We did not try another computer
since there wasn't one available. My friend also tried underclocking speed
and that still crashed. The memory module was already sent out for RMA, so
we can't test it on another machine if there was one.

Sadly, nothing is made to last forever. Some failures are
sooner than others.

True, but I wasn't expecting a non-movable component to die that fast. I
can understand movable parts like drives. My friend and I thought it was a
dead motherboard too. We saw nothing fried, no weird shaped capitactors,
no discolorations, no odd odors, etc.
--
"He who storms in like a whirlwind returns like an ant." --Borneo
/\___/\
/ /\ /\ \ Phil/Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Remove ANT from e-mail address: (e-mail address removed)
( ) or (e-mail address removed)
Ant is currently not listening to any songs on his home computer.

I have a friend whose job is to sell and install pro UPS`s. He told me that
while they were better than consumer units they still failed to stop all
surges!
 
I know these consumer based ones can't
handle big ones. My other more powerful machine and 19" LCD monitor are
on this UPS too. Here are the only power hiccups my Debian box recorded
recently via USB (only posted the power issue errors):

# cat /var/log/apcupsd.events [snip]
Thu Nov 15 06:25:57 PST 2007 Power failure.
Thu Nov 15 06:26:00 PST 2007 Power is back. UPS running on mains.
Sun Dec 30 09:16:56 PST 2007 Power failure.
Sun Dec 30 09:16:58 PST 2007 Power is back. UPS running on mains.
Tue Jan 01 08:28:12 PST 2008 Power failure.
Tue Jan 01 08:28:14 PST 2008 Power is back. UPS running on mains.
Sat Jan 05 07:41:30 PST 2008 Power failure.
Sat Jan 05 07:41:33 PST 2008 Power is back. UPS running on mains.
Sun Jan 06 17:03:28 PST 2008 Power failure.
Sun Jan 06 17:03:31 PST 2008 Power is back. UPS running on mains.
Either your power is flaky or you have set too high a

Maybe. I do notice sometimes my lamp blink when this happens if it is
on. Probably a power issue. I live in an old house (built in the late
70s/early 80s). Power is done underground. I have had this UPS since
9/25/2005 according to logs on
http://alpha.zimage.com/~ant/antfarm/about/toys.html ... Before it, I
had an APC Back-UPS 650VA (BK650MC) that had to be replaced because it
kept shutting down even if there was AC power! That is when I replaced
it.

I wouldn't worry about it too much - it is perfectly normal and if
it wasn't for the existence of the logs I doubt you would even
notice. The mains power supply isn't as perfectly regulated as
many would have you believe - these are probably merely dips rather
than outages - FWIW my UPS kicks in briefly an average of every 3
days or so. In any case these messages are indicating that your
UPS is doing its job. Reading up on your UPS I see it is a line
interactive model so there will be some changeover time but the
ATX specification mandates that PSUs must be able to bridge at
least 17ms without power (oddly enough this is still the case at
50Hz). If you have a decent PSU as well as your UPS there shouldn't
be any problem.
Hmm, I am not sure how to set this. I used the default configurations
from apcaccess package in Debian. I didn't set my OS to shut down the
computer or anything when battery is low. Maybe my current UPS
statistics help to see what's funky?
LOTRANS : 097.0 Volts
HITRANS : 138.0 Volts

Those are the ones: I'm not sure about how Debian arranges its disk
layout but the file you need to edit is apcupsd.conf which is
_usually_ in /etc/. You don't have the opportunity to set exact
voltages, just select from three or four predefined settings with
a simple number. You should consult your UPS documentation and the
apcupsd man page for more details. kill -HUP the running apcupsd
to get it to pick up the changes in the configuration.

I'd agree with Kony here. While it is certainly not impossible.
I consider it highly unlikely that any power event would damge your
memory and only your memory.
 
On 1/31/2008 6:46 AM PT, Andrew Smallshaw typed:

[snipped]
Those are the ones: I'm not sure about how Debian arranges its disk
layout but the file you need to edit is apcupsd.conf which is
_usually_ in /etc/. You don't have the opportunity to set exact
voltages, just select from three or four predefined settings with
a simple number. You should consult your UPS documentation and the
apcupsd man page for more details. kill -HUP the running apcupsd
to get it to pick up the changes in the configuration.

# cat /etc/apcupsd/apcupsd.conf
## apcupsd.conf v1.1 ##
#
# for apcupsd release 3.10.18 (21 July 2005) - debian
#
# "apcupsd" POSIX config file
#
# ========= General configuration parameters ============
#
# UPSNAME xxx
# Use this to give your UPS a name in log files and such. This
# is particulary useful if you have multiple UPSes. This does not
# set the EEPROM.
#UPSNAME
#
# UPSCABLE [ simple | smart | ether | usb |
# 940-0119A | 940-0127A | 940-0128A | 940-0020B |
# 940-0020C | 940-0023A | 940-0024B | 940-0024C |
# 940-1524C | 940-0024G | 940-0095A | 940-0095B |
# 940-0095C | M-04-02-2000 ]
#
# defines the type of cable that you have.
UPSCABLE usb
#
# Old types, still valid, are mapped to the new drivers
#
# keyword driver used
# UPSTYPE [ backups dumb
# | sharebasic dumb
# | netups dumb
# | backupspro apcsmart
# | smartvsups apcsmart
# | newbackupspro apcsmart
# | backupspropnp apcsmart
# | smartups apcsmart
# | matrixups apcsmart
# | sharesmart apcsmart
#
# *** New driver names. They can be used directly
# rather than using one of the above aliases.
#
# UPSTYPE [ dumb | apcsmart | net | usb | snmp | test]
#
# defines the type of UPS you have.
UPSTYPE usb
#
#
#DEVICE <string> /dev/<serial port>
# name of your UPS device
#
# Here a table of the possible devices related with the UPS drivers.
#
# Driver Device Description
# dumb /dev/tty** Serial character device
# apcsmart /dev/tty** Serial character device
# usb <BLANK> A blank DEVICE setting enables
# autodetection, best choice for most
# installations.
# net hostname:port Network link to a master apcupsd
# through NIS
# snmp hostname:port:vendor:community
# SNMP Network link to an SNMP-enabled
# UPS device. Vendor is the MIB used by
# the UPS device: can be "APC" or "RFC"
# where APC is the powernet MIB and RFC
# is the IETF's rfc1628 UPS-MIB.
# Port is usually 161.
#DEVICE /dev/ttyS0
#
#LOCKFILE <path to lockfile>
# path for serial port lock file
LOCKFILE /var/lock
#
#
#
# ======== configuration parameters used during power failures ==========
#
# The ONBATTERYDELAY is the time in seconds from when a power failure
# is detected until we react to it with an onbattery event.
#
# This means that, apccontrol will be called with the powerout argument
# immediately when a power failure is detected. However, the
# onbattery argument is passed to apccontrol only after the
# ONBATTERYDELAY time. If you don't want to be annoyed by short
# powerfailures, make sure that apccontrol powerout does nothing
# i.e. comment out the wall.
ONBATTERYDELAY 6
#
# Note: BATTERYLEVEL, MINUTES, and TIMEOUT work in conjunction, so
# the first that occurs will cause the initation of a shutdown.
#
# If during a power failure, the remaining battery percentage
# (as reported by the UPS) is below or equal to BATTERYLEVEL,
# apcupsd will initiate a system shutdown.
BATTERYLEVEL 5
#
#
# If during a power failure, the remaining runtime in minutes
# (as calculated internally by the UPS) is below or equal to MINUTES,
# apcupsd, will initiate a system shutdown.
MINUTES 3
#
#
# If during a power failure, the UPS has run on batteries for TIMEOUT
# many seconds or longer, apcupsd will initiate a system shutdown.
# A value of 0 disables this timer.
#
# Note, if you have a Smart UPS, you will most likely want to disable
# this timer by setting it to zero. That way, you UPS will continue
# on batteries until either the % charge remaing drops to or below
BATTERYLEVEL,
# or the remaining battery runtime drops to or below MINUTES. Of course,
# if you are testing, setting this to 60 causes a quick system shutdown
# if you pull the power plug.
# If you have an older dumb UPS, you will want to set this to less than
# the time you know you can run on batteries.
TIMEOUT 0
#
#
# Time in seconds between annoying users to signoff prior to
# system shutdown. 0 disables.
ANNOY 300
#
# Initial delay after power failure before warning users to get
# off the system.
ANNOYDELAY 60
#
# The condition which determines when users are prevented from
# logging in during a power failure.
# NOLOGON <string> [ disable | timeout | percent | minutes | always ]
NOLOGON disable
#
#
# If killdelay is set, apcupsd will continue running after a
# shutdown has been requested, and after the specified time in
# seconds attempt to kill the power. This is for use on systems
# where apcupsd cannot regain control after a shutdown.
# KILLDELAY <seconds> 0 disables
KILLDELAY 0
#
#
# ==== Configuration statements the network information server =========
#
# NETSERVER [ on | off ] on enables, off disables the network
# information server. If netstatus is on, a network information
# server process will be started for serving the STATUS and
# EVENT data over the network (used by CGI programs).
NETSERVER on
#
# NISIP <dotted notation ip address>
# IP address on which NIS server will listen for incoming connections.
# Default value is 0.0.0.0 that means any incoming request will be
# serviced but if you want it to listen to a single subnet you can
# set it up to that subnet address, for example 192.168.10.0
# Additionally you can listen for a single IP like 192.168.10.1
NISIP 0.0.0.0
#
# NISPORT <port> default is 3551 as registered with the IANA
# port to use for sending STATUS and EVENTS data over the network.
# It is not used unless NETSERVER is on. If you change this port,
# you will need to change the corresponding value in the cgi directory
# and rebuild the cgi programs.
NISPORT 3551
#
# If you want the last few EVENTS to be available over the network
# by the network information server, you must define an EVENTSFILE.
EVENTSFILE /var/log/apcupsd.events
#
# EVENTSFILEMAX <kilobytes>
# By default, the size of the EVENTSFILE will be not be allowed to exceed
# 10 kilobytes. When the file grows beyond this limit, older EVENTS will
# be removed from the beginning of the file (first in first out). The
# parameter EVENTSFILEMAX can be set to a different kilobyte value, or set
# to zero to allow the EVENTSFILE to grow without limit.
EVENTSFILEMAX 10
#
# ========== Configuration statements used if sharing =============
# a UPS and controlling it via the network
#
# The configuration statements below are used if you
# want to share one UPS to power multiple machines and have them
# communicate by the network. Obviously, the master is connected
# to the UPS via the serial cable, and it communicates to the
# "slaves" via the network -- i.e. the slaves get their info
# concerning the UPS via the ethernet.
#
# UPSCLASS [ standalone | shareslave | sharemaster | netslave | netmaster ]
# normally standalone unless you share a UPS with multiple machines.
UPSCLASS standalone
#
# Unless you want to share the UPS (power multiple machines).
# this should be disable
# UPSMODE [ disable | share | net | sharenet ]
UPSMODE disable
#
# NETTIME <int>
#NETTIME 100
#
# NETPORT <int>
#NETPORT 6544
#
# MASTER <machine-name>
#MASTER
#
# SLAVE <machine-name>
#SLAVE slave1
#SLAVE slave2
#
# USERMAGIC <string>
#USERMAGIC
#
#
#
#
# ===== Configuration statements to control apcupsd system logging ========
#
# Time interval in seconds between writing the STATUS file; 0 disables
STATTIME 0
#
# Location of STATUS file (written to only if STATTIME is non-zero)
STATFILE /var/log/apcupsd.status
#
#
# LOGSTATS [ on | off ] on enables, off disables
# Note! This generates a lot of output, so if
# you turn this on, be sure that the
# file defined in syslog.conf for LOG_NOTICE is a named pipe.
# You probably do not want this on.
LOGSTATS off
#
#
# Time interval in seconds between writing the DATA records to
# the log file. 0 disables.
DATATIME 0
#
# FACILITY defines the logging facility (class) for logging to syslog.
# If not specified, it defaults to "daemon". This is useful
# if you want to separate the data logged by apcupsd from other
# programs.
#FACILITY DAEMON
#
#
#
#
# ========== Configuration statements used in updating the UPS EPROM
=========
#
# UPS name, max 8 characters -- used only during -n or --rename-ups
#UPSNAME UPS_IDEN
#
# Battery date - 8 characters -- used only during -u or
--update-battery-date
#BATTDATE mm/dd/yy
#
# The following items are set during -c or --configure
#
# Sensitivity to line voltage quality (H cause faster transfer to
batteries)
# SENSITIVITY H M L (default = H)
#SENSITIVITY H
#
# UPS delay after power return (seconds)
# WAKEUP 000 060 180 300 (default = 0)
#WAKEUP 60
#
# UPS Grace period after request to power off (seconds)
# SLEEP 020 180 300 600 (default = 20)
#SLEEP 180
#
#
# Low line voltage causing transfer to batteries
# The permitted values depend on your model as defined by last letter
# of FIRMWARE or APCMODEL. Some representative values are:
# D 106 103 100 097
# M 177 172 168 182
# A 092 090 088 086
# I 208 204 200 196 (default = 0 => not valid)
#LOTRANSFER 208
#
# High line voltage causing transfer to batteries
# The permitted values depend on your model as defined by last letter
# of FIRMWARE or APCMODEL. Some representative values are:
# D 127 130 133 136
# M 229 234 239 224
# A 108 110 112 114
# I 253 257 261 265 (default = 0 => not valid)
#HITRANSFER 253
#
# Battery change needed to restore power
# RETURNCHARGE 00 15 50 90 (default = 15)
#RETURNCHARGE 15
#
# Alarm delay
# 0 = zero delay after pwr fail, T = power fail + 30 sec, L = low
battery, N = never
# BEEPSTATE 0 T L N (default = 0)
#BEEPSTATE T
#
# Low battery warning delay in minutes
# LOWBATT 02 05 07 10 (default = 02)
#LOWBATT 2
#
# UPS Output voltage when running on batteries
# The permitted values depend on your model as defined by last letter
# of FIRMWARE or APCMODEL. Some representative values are:
# D 115
# M 208
# A 100
# I 230 240 220 225 (default = 0 => not valid)
#OUTPUTVOLTS 230
#
# Self test interval in hours 336=2 weeks, 168=1 week, ON=at power on
# SELFTEST 336 168 ON OFF (default = 336)
#SELFTEST 336
#
#

So, am I removing # and changing the values for these two lines?
#LOTRANSFER 208
#HITRANSFER 253

If so, then what values do I put? According to apcaccess command, I have:
FIRMWARE : 8.g8 .D USB FW:g8
APCMODEL : Back-UPS RS 1500
My last letter for firmware is "g" and for model is "S". The values
don't match (D, M, A, and I) as shown in the .conf comments. :(

I'd agree with Kony here. While it is certainly not impossible.
I consider it highly unlikely that any power event would damge your
memory and only your memory.

OK. :)
--
"You're kissing an ant hill." --Mike Nelson
/\___/\
/ /\ /\ \ Phil/Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Remove ANT from e-mail address: (e-mail address removed)
( ) or (e-mail address removed)
Ant is currently not listening to any songs on his home computer.
 
Back
Top