A7N8X Motherboard Low Temperature Sensitivity, CMOS Checksum Error

  • Thread starter Thread starter kony
  • Start date Start date
K

kony

Followups-To alt.comp.hardware

A7N8X Motherboard Low Temperature Sensitivity, CMOS Checksum
Error



SHORT VERSION:

If room ambient temp drops below 25C, system is so
instable it can't even complete a POST. If system is
running when room temp drops, it then starts to act erratic
(odd Explorer pauses and Prime95 errors), powering off then
back on a dozen seconds later, fails resulting in only
""CMOS Checksum Error" and automatically booting to floppy,
running awdflash.

Once ambient temp rises to 32-24C, system always POSTS
and runs fine. In-between these temps, failure to get
beyond "CMOS Checksum Error" and errors in Windows, go up in
frequency as temp drops. Multiple troubleshooting attempts
have been made (Clear CMOS, BIOS flash, swap hardware,
remount board in case, strip down system, etc), problem
appears isolated to motherboard itself. Different BIOS,
bios defaults, etc, have been tried. System is not
overclocked.

What are the potential cause(s) and the best methods to
check these?




LONG VERSION:


A7N8X Deluxe Motherboard rev 2.00
Socket A, nForce2-400
Current BIOS 1008
Athlon XP2400
512MB Kingston PC3200
Fortron 400W Power Supply
ATI AIW 128 Pro
1 HDD, CDROM, Floppy, (typical non-power-user PC)

Approximately 1 year old, system remained unchanged for that
period of time and had (guessing) about 800 hours of
on-time, it was not used very much, and AFAIK, nothing
demanding, it had an easy life so far. All settings were
conservative, default values, no overclocking and minimal
BIOS changes.

Board appears to be very sensitive to temperature, but not
too hot, rather too cold.
If case thermometer (not motherboard integrated temp sensor
but a separate digital probe) reads below approx. 27C, then
powering on system from soft-off state results in system
displaying the following message:

"CMOS Checksum Error"

Then system proceeds to do an Award BIOS recovery by booting
to awdflash if appropriate floppy is already in the system.
There is no option to do anything else, the BIOS setup is
not accessible and the attempt to boot floppy is automatic,
not a normal "boot to floppy" event as would occur with any
normally-working system that has a boot floppy in it.
Clearing CMOS and/or loading setup defaults does not resolve
this.

If the ambient temp is right at 25C or slightly higher,
multiple attempts at powering off, then on, will result in
system posting with setup defaults for FSB & multiplier,
successfully doing so at a rate of roughly 1 in 5 tries,
more often as room temp rises, less as temp falls. Even if
system is manually set to same speeds or very low speed (6X
multiplier for CPU and 100MHz FSB and memory), after saving
these changes (or not) the system will not proceed with POST
again, it takes several tries to get system to post, again
displaying the "CMOS checksum Error" each time. Unplugging
power supply from AC had no effect, nor did clearing CMOS.
I've not pinned down the EXACT temp, since it gets
progressively worse and the range is fairly tight, but it
roughly corresponds to 25C-32C being fail-pass thresholds,
certainly within 10C temp rise it goes from unable to POST
to working fine. This has been deliberately reproduced
(later) by cranking up an air conditioner, it is clearly
low-temperature related, but first the steps prior to this
conclusion...

It does sometimes POST after saving the changes, but if then
powered off it may not POST the next time... seems to still
be marginal regardless of the BIOS settings, since even
loading setup defaults and clearing CMOS didn't change the
roughly 1-in-5 success rate. Every time it fails, the
video does come up and it does attempt to boot to floppy, it
never just acts dead, always has video display. Every
common troubleshooting procedure I could think of was tried
to no avail. Certainly more than mentioned here but due to
the length of the post I'll try to list what seems most
relevant.

Initally suspecting BIOS corruption, I'd flashed the board
with same bios version, which at the time seemed to work,
but later it was discovered that the difference was instead
that the ambient temperature was higher than previously,
because soon enough the temp had dropped and system again
failed to do anything more than "CMOS Checksum Error". The
next time I had a chance I flashed the next, newest bios
version, with no change. I'd been suspecting the often
rumored "nvidia bios corruption" problem, which seems to
occur from exiting the bios too quickly when saving
settings, but this was not the case with this board.

Later it was noted that doing NOTHING to system other than
leaving it sit until ambient temp rose, would return system
to 100% stable state. Even overclocking quite a bit it
passed several different stress tests at 32C room temp, but
once temp falls again, still not stable even at 6 x 100. I
could understand if it these were arctic conditions but such
a drastic change within a span of 10C seems quite unusual.
Indeed, several other systems in same room do work fine at
same temp. Also notable is that if the temp is barely high
enough to get it to post and boot, running a stress test
like Prime95 results in errors within a minute or so, yet
with ambient temp 10C higher the system not only passes same
Prime95 test for 24 hours, but can even pass it running at
50% higher FSB, Memory clock, and CPU frequency.

Trying to isolate the problem I'd changed power supplies 3
times with known/proven good 400W+ units, unplugged
nonessential cards, swapped video and memory, ran in minimal
configuration and checked every mechanical connection as
well as possible (including pulling/inspecting/reinstalling
the EEPROM and jumpers), all with known good/working parts.
It seems that the motherboard itself is simply intolerant of
quite mild temperature drop. Normally I'd just replace it
but this is quite puzzling, unique for such a small temp
span, and I'd like to get to the bottom of it. There are no
visable problems with the board, capacitors look fine and no
visable cracking or other physical abnormalities, though I
don't have the means to check this with a microscope,
especially since a motherboard is a bit wider than most
'scope's reach. Since this is a very popular motherboard
and I've not heard of anyone else having this problem (or
perhaps they just didn't isolate the cause as low temp?) I
wonder if this is an isolated flaw, but the closest
examination I could make showed nothing unusual and it does
work fine, never this problem (or any other that I'm aware
of) once room temp rises by 5-10C. The system case is very
well ventilated, ambient room temp never causes interior air
temp to go up much except immediately adjacent to heatsink,
as expected.

I thought about the battery but voltage on it reads OK and
it shouldn't explain the instability after booting and
running Windows. I don't recall if I ended up putting a
different battery in it or not but will do so just for the
heck of it.

One possibility I'm wondering about is whether one or more
of the capacitors are dropping their ESR as the temp falls,
if one or more are marginal and this is the cause. It might
be a bit difficult to easy check this though, I'd though
about possible touching a small light bulb to each in turn,
individually warming them to see if that made any
difference, but that could take quite a long time,
especially if multiple caps are involved, since it could be
necessary to wait till each cooled to try, isolate the next
cap. It also seems difficult to determine their core temp
without getting the outside can quite hot, as any
non-destructive temp reading would be of the outer can. It
seems a rather crude way of warming them too but I'm drawing
a blank as to how to individually warm a capacitor without
also warming the surrounding area, or at least minimizing
that as much as reasonably possible. I could instead touch
the leads with a solderin iron but would prefer to leave the
solder alone if possible.

I suppose I could take the opposite approach and warm up the
board then use freeze spray on each cap, but that doesn't
seem a very good approach either, since it might easily
(probably would) lower the cap temp too much, introducing
further failures that aren't present at 27C, not until much
colder, and it again seems difficult to thoroughly chill the
core of individual caps without changing surrounding area
temp by this small 10C thermal margin. Another possibility
I'd considered is temporarily placing a tantalum cap in
parallel with (as many suspect caps as possible), since
tantalums should be much more tolerant of low temp (IIRC),
but this also seems to be a lengthly, tedious process that
would best be avoided if anyone has a better idea?


Even if I don't solve this problem I wanted to at least get
this bit of info out there, that at the very least this one
board is effected by a relatively small temp change, but due
to the type of problem I wonder if it's more frequent, a
CMOS Checksum Error is not all that uncommon and "some" of
the occurrences of a Checksum Error might be misdiagnosed...
and some boards not even getting far enough to post "CMOS
Checksum Error" might do so if they were a little warmer.
 
kony said:
Followups-To alt.comp.hardware

A7N8X Motherboard Low Temperature Sensitivity, CMOS Checksum
Error



SHORT VERSION:

Replace the cmos battery, that board is well known for having a
marginally acceptable battery installed. This ties in with the
temperature and cmos checksum errors, a failing battery will not seem to
work at lower temps but be fine when the temp increases.

As for the board being sensitive to cold, I'm afraid that your theory is
wrong, the colder you run most electronics the better, overclockers have
run this board submerged in a non-conductive fluid at -30C.

It may seem odd that a £1 battery could be causing all your probs but
I've seen it on two of those boards, and read about plenty other battery
problems, It's an excellent board otherwise.
 
Replace the cmos battery, that board is well known for having a
marginally acceptable battery installed. This ties in with the
temperature and cmos checksum errors, a failing battery will not seem to
work at lower temps but be fine when the temp increases.

Battery voltage (removed from board) was 3.05V with a ~ 300
Ohm load measured via DMM, but I replaced it anyway...
didn't help.


As for the board being sensitive to cold, I'm afraid that your theory is
wrong, the colder you run most electronics the better, overclockers have
run this board submerged in a non-conductive fluid at -30C.

Well the theory is not that ALL boards won't work when cold.
Rather, the fact of the matter is that cold DOES effect this
one particular board's operation so at this point the focus
is on WHAT is the cold effecting, whether it be a broken
trace or crack, marginal capacitors, bad contacts or ???

It may seem odd that a £1 battery could be causing all your probs but
I've seen it on two of those boards, and read about plenty other battery
problems, It's an excellent board otherwise.

Yes it's a nice board for it's time, and I wish it were
merely the battery. Last time i tried to power it up it
wouldn't even complete POST 1 in 5 times as previosly, so it
may be progressively getting worse? Supposedly system was
doing this "rarely" in the past month but more frequently,
frequently enough to prompt owner to bring it to me a few
days ago.

Anyway, still didn't find the time yet to individually test
any (motherboard) components but when it wouldn't POST after
a dozen times I held a hair-dryer up to the whole thing for
a couple minutes and once again it posted fine, booted to
windows, rebooted and flashed bios again successfully. Then
a few hours later I tried to turn it on and nothing again!
If it were mine (I also have an A7N8X which doesn't have
this problem) I'd try replacing component(s) and keep an eye
on it, stress testing and such but it has to be back to it's
owner so ultimately I'll probably just have them buy a new
one soon, yet I'm still curious to find out what's going on
with it.
 
Anyway, still didn't find the time yet to individually test
any (motherboard) components but when it wouldn't POST after
a dozen times I held a hair-dryer up to the whole thing for
a couple minutes and once again it posted fine, booted to
windows, rebooted and flashed bios again successfully. Then
a few hours later I tried to turn it on and nothing again!
If it were mine (I also have an A7N8X which doesn't have
this problem) I'd try replacing component(s) and keep an eye
on it, stress testing and such but it has to be back to it's
owner so ultimately I'll probably just have them buy a new
one soon, yet I'm still curious to find out what's going on
with it.
Use a piece of cardboard to direct the heat to one half of the
motherboard at a time. After you determine which half is heat
sensitive, then determine which half of that half is the problem,
etc., until you pinpoint the component.
 
kony said:
Followups-To alt.comp.hardware

A7N8X Motherboard Low Temperature Sensitivity, CMOS Checksum
Error



SHORT VERSION:

If room ambient temp drops below 25C, system is so
instable it can't even complete a POST. If system is
running when room temp drops, it then starts to act erratic
(odd Explorer pauses and Prime95 errors), powering off then
back on a dozen seconds later, fails resulting in only
""CMOS Checksum Error" and automatically booting to floppy,
running awdflash.

Once ambient temp rises to 32-24C, system always POSTS
and runs fine. In-between these temps, failure to get
beyond "CMOS Checksum Error" and errors in Windows, go up in
frequency as temp drops. Multiple troubleshooting attempts
have been made (Clear CMOS, BIOS flash, swap hardware,
remount board in case, strip down system, etc), problem
appears isolated to motherboard itself. Different BIOS,
bios defaults, etc, have been tried. System is not
overclocked.

What are the potential cause(s) and the best methods to
check these?

snip

Kony

If you've narrowed the problem to the motherboard by substituting all other
components, then the 'freeze spray' technique would appear to be about the
only way of finding which component it is.
I had a kitset computer many years ago that did exactly what you describe
(erratic within a temperature range) The can of 'Freeze' found the culprit
in less than 10 minutes. (Z80 CPU as it happens)

The only 'concern' with using Freeze spray is when the humidity is high
enough to cause condensation, shorting high impedance lines out. (High
impedance = low power CMOS circuitry these days)

Just do it in methodically, the boards U/S anyway so you've nothing to
loose.

Cheers
Paul.
 
PC said:
.....snip....

Kony

If you've narrowed the problem to the motherboard by substituting all other
components, then the 'freeze spray' technique would appear to be about the
only way of finding which component it is.

.....snip.....

Agreed. I haven't used the DX version of the A7N8X board but I've
built a couple of systems with the -X version. They've been in
continuous use by avid gamers for several months in temps ranging from
an estimated 12C to 35C without any problem.

It's obvious this mobo has an unusual temperature sensitivity. Unusual
because, as you will no doubt be aware, faults usually show up with a
rise in temperature. But with years of servicing electronic products
under my belt, I've seen all kinds of weird things happen. E.g., >99%
of defective resistors generally increase in resistance or become
open, but I've seen a couple of cases where their resistance had
lowered drastically.

In this case, material contraction due to lowered temp must be causing
a bad joint to lose electrical contact (I know this isn't much help).
The bad contact could be at a poor solder point, inside a component
(ICs, capacitors...), wire crimp, or a microscopic crack in the PCB
copper tracks.
 
Followups-To alt.comp.hardware

A7N8X Motherboard Low Temperature Sensitivity, CMOS Checksum
Error



SHORT VERSION:

If room ambient temp drops below 25C, system is so
instable it can't even complete a POST. If system is
running when room temp drops, it then starts to act erratic
(odd Explorer pauses and Prime95 errors), powering off then
back on a dozen seconds later, fails resulting in only
""CMOS Checksum Error" and automatically booting to floppy,
running awdflash.

Once ambient temp rises to 32-24C, system always POSTS
and runs fine. In-between these temps, failure to get
beyond "CMOS Checksum Error" and errors in Windows, go up in
frequency as temp drops. Multiple troubleshooting attempts
have been made (Clear CMOS, BIOS flash, swap hardware,
remount board in case, strip down system, etc), problem
appears isolated to motherboard itself. Different BIOS,
bios defaults, etc, have been tried. System is not
overclocked.

What are the potential cause(s) and the best methods to
check these?




LONG VERSION:


A7N8X Deluxe Motherboard rev 2.00
Socket A, nForce2-400
Current BIOS 1008
Athlon XP2400
512MB Kingston PC3200
Fortron 400W Power Supply
ATI AIW 128 Pro
1 HDD, CDROM, Floppy, (typical non-power-user PC)

Approximately 1 year old, system remained unchanged for that
period of time and had (guessing) about 800 hours of
on-time, it was not used very much, and AFAIK, nothing
demanding, it had an easy life so far. All settings were
conservative, default values, no overclocking and minimal
BIOS changes.

Board appears to be very sensitive to temperature, but not
too hot, rather too cold.
If case thermometer (not motherboard integrated temp sensor
but a separate digital probe) reads below approx. 27C, then
powering on system from soft-off state results in system
displaying the following message:

"CMOS Checksum Error"

Then system proceeds to do an Award BIOS recovery by booting
to awdflash if appropriate floppy is already in the system.
There is no option to do anything else, the BIOS setup is
not accessible and the attempt to boot floppy is automatic,
not a normal "boot to floppy" event as would occur with any
normally-working system that has a boot floppy in it.
Clearing CMOS and/or loading setup defaults does not resolve
this.

If the ambient temp is right at 25C or slightly higher,
multiple attempts at powering off, then on, will result in
system posting with setup defaults for FSB & multiplier,
successfully doing so at a rate of roughly 1 in 5 tries,
more often as room temp rises, less as temp falls. Even if
system is manually set to same speeds or very low speed (6X
multiplier for CPU and 100MHz FSB and memory), after saving
these changes (or not) the system will not proceed with POST
again, it takes several tries to get system to post, again
displaying the "CMOS checksum Error" each time. Unplugging
power supply from AC had no effect, nor did clearing CMOS.
I've not pinned down the EXACT temp, since it gets
progressively worse and the range is fairly tight, but it
roughly corresponds to 25C-32C being fail-pass thresholds,
certainly within 10C temp rise it goes from unable to POST
to working fine. This has been deliberately reproduced
(later) by cranking up an air conditioner, it is clearly
low-temperature related, but first the steps prior to this
conclusion...

It does sometimes POST after saving the changes, but if then
powered off it may not POST the next time... seems to still
be marginal regardless of the BIOS settings, since even
loading setup defaults and clearing CMOS didn't change the
roughly 1-in-5 success rate. Every time it fails, the
video does come up and it does attempt to boot to floppy, it
never just acts dead, always has video display. Every
common troubleshooting procedure I could think of was tried
to no avail. Certainly more than mentioned here but due to
the length of the post I'll try to list what seems most
relevant.

Initally suspecting BIOS corruption, I'd flashed the board
with same bios version, which at the time seemed to work,
but later it was discovered that the difference was instead
that the ambient temperature was higher than previously,
because soon enough the temp had dropped and system again
failed to do anything more than "CMOS Checksum Error". The
next time I had a chance I flashed the next, newest bios
version, with no change. I'd been suspecting the often
rumored "nvidia bios corruption" problem, which seems to
occur from exiting the bios too quickly when saving
settings, but this was not the case with this board.

Later it was noted that doing NOTHING to system other than
leaving it sit until ambient temp rose, would return system
to 100% stable state. Even overclocking quite a bit it
passed several different stress tests at 32C room temp, but
once temp falls again, still not stable even at 6 x 100. I
could understand if it these were arctic conditions but such
a drastic change within a span of 10C seems quite unusual.
Indeed, several other systems in same room do work fine at
same temp. Also notable is that if the temp is barely high
enough to get it to post and boot, running a stress test
like Prime95 results in errors within a minute or so, yet
with ambient temp 10C higher the system not only passes same
Prime95 test for 24 hours, but can even pass it running at
50% higher FSB, Memory clock, and CPU frequency.

Trying to isolate the problem I'd changed power supplies 3
times with known/proven good 400W+ units, unplugged
nonessential cards, swapped video and memory, ran in minimal
configuration and checked every mechanical connection as
well as possible (including pulling/inspecting/reinstalling
the EEPROM and jumpers), all with known good/working parts.
It seems that the motherboard itself is simply intolerant of
quite mild temperature drop. Normally I'd just replace it
but this is quite puzzling, unique for such a small temp
span, and I'd like to get to the bottom of it. There are no
visable problems with the board, capacitors look fine and no
visable cracking or other physical abnormalities, though I
don't have the means to check this with a microscope,
especially since a motherboard is a bit wider than most
'scope's reach. Since this is a very popular motherboard
and I've not heard of anyone else having this problem (or
perhaps they just didn't isolate the cause as low temp?) I
wonder if this is an isolated flaw, but the closest
examination I could make showed nothing unusual and it does
work fine, never this problem (or any other that I'm aware
of) once room temp rises by 5-10C. The system case is very
well ventilated, ambient room temp never causes interior air
temp to go up much except immediately adjacent to heatsink,
as expected.

I thought about the battery but voltage on it reads OK and
it shouldn't explain the instability after booting and
running Windows. I don't recall if I ended up putting a
different battery in it or not but will do so just for the
heck of it.

One possibility I'm wondering about is whether one or more
of the capacitors are dropping their ESR as the temp falls,
if one or more are marginal and this is the cause. It might
be a bit difficult to easy check this though, I'd though
about possible touching a small light bulb to each in turn,
individually warming them to see if that made any
difference, but that could take quite a long time,
especially if multiple caps are involved, since it could be
necessary to wait till each cooled to try, isolate the next
cap. It also seems difficult to determine their core temp
without getting the outside can quite hot, as any
non-destructive temp reading would be of the outer can. It
seems a rather crude way of warming them too but I'm drawing
a blank as to how to individually warm a capacitor without
also warming the surrounding area, or at least minimizing
that as much as reasonably possible. I could instead touch
the leads with a solderin iron but would prefer to leave the
solder alone if possible.

I suppose I could take the opposite approach and warm up the
board then use freeze spray on each cap, but that doesn't
seem a very good approach either, since it might easily
(probably would) lower the cap temp too much, introducing
further failures that aren't present at 27C, not until much
colder, and it again seems difficult to thoroughly chill the
core of individual caps without changing surrounding area
temp by this small 10C thermal margin. Another possibility
I'd considered is temporarily placing a tantalum cap in
parallel with (as many suspect caps as possible), since
tantalums should be much more tolerant of low temp (IIRC),
but this also seems to be a lengthly, tedious process that
would best be avoided if anyone has a better idea?


Even if I don't solve this problem I wanted to at least get
this bit of info out there, that at the very least this one
board is effected by a relatively small temp change, but due
to the type of problem I wonder if it's more frequent, a
CMOS Checksum Error is not all that uncommon and "some" of
the occurrences of a Checksum Error might be misdiagnosed...
and some boards not even getting far enough to post "CMOS
Checksum Error" might do so if they were a little warmer.

Dry solder joint? Board mount shrinks infinitessimally at lower temp?
David in Norfolk UK
 
Just to tell I have asus M5A78L-M LX motherboard...

my screen is booting up in ac room, but not at all in non-ac room. Mothebroard has gone temprature sensitive as yours. Its bit more I think as in non-ac no matter how many attempts you made it doesn't boot at all. Early stage it has low temprautre senstivity now as time passed it has become more temprature sensitve,


have done everything reflash bios from service center, changing psu, screen etc etc
 
Back
Top