spontaneous reboot problem

  • Thread starter Thread starter Mark N
  • Start date Start date
M

Mark N

About ten days ago I suddenly started having a problem with my PC
rebooting, started from nothing to doing it very regularly, no gradual
development into the problem. Sometimes it reboots very quickly,
sometimes it will run for an hour or so without any problem. I ran my
virus and malware scans, nothing new there, and have blown out the box
for dust buildup, checked power connections, memory seating, etc. Don't
think it's windows or a hard drive problem, since it sometimes reboots
before even getting to the load point. I have a hardware monitor
function in the BIOS, and have watched readings there until it reboots,
and not much indication of a problem - voltages look okay, CPU temp
doesn't get alarmingly high. I was away for a week, came home and the
problem started up with a vengeance, couldn't even get Windows loaded,
now has backed off a little.

My box has an MSI K8N Neo Platinum mobo, Athlon64 3400+ proc, 1 gig
Corsair performance memory, ATI 800XT AIW, Enermax 420W PS, XP SP2.
Only recent hardware change was swapping out the optical drive and
adding a new SATA hard drive a number of weeks ago (now have three, the
others EIDE); everything worked well for quite a while after. The rest
is about two years old, last rebuild. I did move some cabling around
while doing the changes, and I noticed that one cable might have been
contacting the CPU fan a bit, although not stopping it or causing a lot
of noise. There were some occasional odd noises just before the reboot
problem started, and I was a bit concerned about a hard drive going
south at that point, but may have been the fan. Nothing lately, though.

I assume first it's a heat problem, but the box is big and cool and all
fans working. I've been a little concerned about the CPU fan, so
thinking of replacing that (retail cooler); I'm not sure if it's always
spinning up faster as the temperature rises. It doesn't seem to get very
hot, usually first readings are around 47 degrees C, work up over 50,
the fan spins up faster around 55 and pushes it down a few degrees, and
sometimes the reading just before a reboot is just 50 or a bit more. Not
sure how accurate or delayed those readings are, though. Next thought is
the PS, but no unusual readings there, with the same caveats on those.

Any ideas on what's happening and fixes? I haven't done that much yet,
but this is a new one for me and has me a bit stumped. Thanks in advance
for any assistance.
 
About ten days ago I suddenly started having a problem with my PC
rebooting, started from nothing to doing it very regularly, no gradual
development into the problem. Sometimes it reboots very quickly,
sometimes it will run for an hour or so without any problem. I ran my
virus and malware scans, nothing new there, and have blown out the box
for dust buildup, checked power connections, memory seating, etc. Don't
think it's windows or a hard drive problem, since it sometimes reboots
before even getting to the load point.

What are you calling the "load point", what exact spot in
the boot process? Nevermind if what you wrote below means
that it doesn't even have to have gone beyond the bios menu,
leaving it sitting at that health page is a good place to
isolate hardware from software.

I have a hardware monitor
function in the BIOS, and have watched readings there until it reboots,
and not much indication of a problem - voltages look okay, CPU temp
doesn't get alarmingly high. I was away for a week, came home and the
problem started up with a vengeance, couldn't even get Windows loaded,
now has backed off a little.

Oh, disregard the above question, since it seems you can
leave the system sitting in the bios health menu and have it
happen.

Most likely suspects are a dodgy power supply or motherboard
failing. You might inspect the capacitors on the
motherboard, even in the PSU after leaving it unplugged from
AC for a few minutes.

My box has an MSI K8N Neo Platinum mobo, Athlon64 3400+ proc, 1 gig
Corsair performance memory, ATI 800XT AIW, Enermax 420W PS, XP SP2.
Only recent hardware change was swapping out the optical drive and
adding a new SATA hard drive a number of weeks ago (now have three, the
others EIDE); everything worked well for quite a while after. The rest
is about two years old, last rebuild. I did move some cabling around
while doing the changes, and I noticed that one cable might have been
contacting the CPU fan a bit, although not stopping it or causing a lot
of noise. There were some occasional odd noises just before the reboot
problem started, and I was a bit concerned about a hard drive going
south at that point, but may have been the fan. Nothing lately, though.


Reinspect that cable to be sure it isn't shorting out on
anything. Inspect the other cards & cables while the system
is open. If nothing else helps, reduce system to a minimal
state by disconnecting all parts unessential towards getting
a POST and the bios menu scenario in which it is resetting.

I assume first it's a heat problem, but the box is big and cool and all
fans working. I've been a little concerned about the CPU fan, so
thinking of replacing that (retail cooler); I'm not sure if it's always
spinning up faster as the temperature rises. It doesn't seem to get very
hot, usually first readings are around 47 degrees C, work up over 50,
the fan spins up faster around 55 and pushes it down a few degrees, and
sometimes the reading just before a reboot is just 50 or a bit more. Not
sure how accurate or delayed those readings are, though. Next thought is
the PS, but no unusual readings there, with the same caveats on those.

If the temp never gets above 55 it's cool enough (save for
some extreme overclock), assuming the temp report is
accurate. If it's getting much hotter you should be able to
fairly easily feel the heatsink as hot rather than mildly
warm. I'm assuming the chipset and other parts are
seemingly cool enough as well.

An intermittent failure related to temp could instead be a
crack in a circuit board (motherboard most likely) or a cold
solder joint. In these modern times of RoHS, lead-free
solder, it could even be tin whiskers that intermittently
short as thermal expansion closes the gap between two
component leads. Is it possible the case has a standoff
where there shouldn't be one, shorting against the back of
the motherboard once it has expanding slightly from temp
change? One thing to try would be taking the parts out of
the case, laying them out on a non-conductive (not
anti-static) surface for another trial.

A PSU can fail such that voltages look ok until the event,
if you have a spare you might try it (all else failing) or
order one from someplace with a good return policy.
Any ideas on what's happening and fixes? I haven't done that much yet,
but this is a new one for me and has me a bit stumped. Thanks in advance
for any assistance.

Some ATI X800 video cards seem prone to fail after awhile,
if you had another card you might swap that in temporarily
but playing odds it seems less likely.
 
About ten days ago I suddenly started having a problem with my PC
rebooting, started from nothing to doing it very regularly, no gradual
development into the problem. Sometimes it reboots very quickly,
sometimes it will run for an hour or so without any problem. I ran my
virus and malware scans, nothing new there, and have blown out the box
for dust buildup, checked power connections, memory seating, etc. Don't
think it's windows or a hard drive problem, since it sometimes reboots
before even getting to the load point. I have a hardware monitor
function in the BIOS, and have watched readings there until it reboots,
and not much indication of a problem - voltages look okay, CPU temp
doesn't get alarmingly high. I was away for a week, came home and the
problem started up with a vengeance, couldn't even get Windows loaded,
now has backed off a little.

Kony has already mentioned the PSU and possible bad electrolytics
- they're both possibilities but it would be an unusual failure
mode since if you're still in the BIOS your machine will be at
nowhere near full load. I'd be thinking about issues with the
quality of the mains too and be tempted to try running it through
a surge protector or ideally a UPS.

Has anything happened to the mains supply that you are aware of?
Any major construction work locally, electrical storms or gales/floods
that may have caused damge?
 
Kony has already mentioned the PSU and possible bad electrolytics
- they're both possibilities but it would be an unusual failure
mode since if you're still in the BIOS your machine will be at
nowhere near full load.

That is true, but there are a couple factors that offset
this.

1) WIthout a modern OS loaded, there is not
ACPI/HLT-Cooling, the CPU is nearet to full load than it is
to the idle (windows) power consumption level. When
everything is cooler still the components, particularly
power electronics, have different parameters that change
along with temp. Normally these are negligable in a proper
design, BUT if a part is failing progressively, there will
be a certain scenario in which this degraded performance
first surfaces... IF it degrades slow enough to notice it.
 
Temperature increase can make a functional but defective computer
apparent. High reliability equipment gets thermal cycled - sometimes
called 'burn in' testing - to find defects before failure occurs. Is
your heatsink incorrect? Well then the computer never did what it
must do - work inside a room at 100 degrees F and be perfectly happy.
Your heatsink, if selected or installed by one with engineering
knowledge such as the manufacturer, is not a problem. However if
selected by just any computer assembler, then start asking questions
that were answered by that heatsink manufacturer. What is its degree
C per watt number? Don't just assume and then start replacing
things. That is called shotgunning.

Your computer is intermittent. Therefore use a hairdryer on high to
heat selective parts at temperatures uncomfortable to touch. That is
a perfectly normal temperature to all working retail electronics.
This is also when comprehensive hardware diagnostics best find
failures. At higher temperature, defective electronics then will fail
consistently. Just another way to find defects and how to have found
that defect long before your problems were apparent.

Hardware monitor will show changes (a monitor) but is not valid as a
voltage measurement until calibrated - which is why you need a 3.5
digit multimeter. A tool so 'complex' as to be sold even in K-mart.
Any voltage below 3.23, 4.87, or 11.7 is a problem and would explain
intermittent crashes.

Line voltages should have no effect on a computer. This assumes the
power supply was purchased with internal functions that were even
standard 30 years ago. AC electric voltages must drop so low that
incandescent lamps are at less than 40% intensity and still computer
both starts and runs 100% OK. Are your lights dimming that low? If
not, then AC voltage is not (should not be) a problem. Anything that
a power conditioner or UPS would do to solve a problem is supposed to
be already inside the power supply. Some power supplies installed by
those who do not learn electricity are so inferior as to need power
conditioning. Fix the problem. Don't purchase something many times
more money to only cure symptoms. Don't cure symptoms. Eliminate the
problem.

As noted earlier, room temperature and therefore CPU temperatures
must increase 30 degree F higher; CPU temperature still below
manufacturer max spec numbers. Again, consult numbers such as
manufacturer specs. Confirm voltages per limits provided above.
Eliminate potential suspects completely and then move on to other
suspects. Hair dryer, multimeter, manufacturer specs ... just some
simple things that answer questions with yes or no. Answers with
'maybe' are wasted time; often from speculation such as AC line noise
solved by a surge protector or UPS.

Each useful solution is not only recommended. Reasons why it is
useful is also important. No reasons why? This is why some recommend
surge protectors or UPS as a solution.

Meanwhile, if I understand this correctly, your problem even occurs
when only in BIOS. Now we have limited the problem to very few
suspects. Only a few parts of a motherboard, a tiny part of memory,
the video controller (maybe), and a power supply 'system'.
 
[Comments regarding overheating snipped from here but will be
referred back to later]
Line voltages should have no effect on a computer. This assumes the
power supply was purchased with internal functions that were even
standard 30 years ago. AC electric voltages must drop so low that
incandescent lamps are at less than 40% intensity and still computer
both starts and runs 100% OK. Are your lights dimming that low? If

Yes, this amount of drop could be perfectly reasonable. Even if
we accept your figures (which sound arbitrary to me, as did your
minimum voltages) a 60% dip in illumination is an unexceptional
sag. Recall that the human eye is logarithmic in behaviour - a
60% drop at normal levels of indoor illumination would be _perceived_
as a drop in the region of 10%. When you consider that it ceases
to be impossible and more a reasonably common occurance.

I do accept in part your comments about the filtering capability
of switched mode power supplies, they do have a certain amount of
resilience built in. But that is mainly either a function of the
way they work (as in the case of tolerance to variations in supply
voltage) or a minimum level to ensure proper operation for at least
the majority of the time (as is the case for protection against
transients). Both have their limits. A UPS or surge protector
mitigates (not so) exceptional events over and above those expected
in normal operation.

One other point: the UPS on this machine is set to switch in at
184 V against a mains rating of 230 V. I hear the relays switch
over momentarily maybe a couple of times a week. It may do it more
often and I just don't notice. More severe sags can actually last
long enough for the alarm to briefly sound - that's much rarer but
still maybe every three months on average.
simple things that answer questions with yes or no. Answers with
'maybe' are wasted time; often from speculation such as AC line noise
solved by a surge protector or UPS.

So, you can immediately identify the cause of any problem without
first running through a list of hypotheses to be considered and
eliminated?
Each useful solution is not only recommended. Reasons why it is
useful is also important. No reasons why? This is why some recommend
surge protectors or UPS as a solution.

Why did I suggest the power as a _possible_ cause (unlike you I do
not claim to have a difinitive answer, I merely offered a suggestion
based on experience)? Well, we're not talking about a clear and
apparent equipment failure here, so the first question I'd ask is
"So what's changed?" This is basic troubleshooting methodology.

If we take the OP at their word then nothing has changed to the
system so the next step is to look at external factors, the mains
being one of them. If you refer back to my original post I did
ask about any factors that the poster may have known about that
could affect the local power supply, although of course the OP may
not be aware of routine maintenance or upgrade work being carried
out.

[Earlier commments]
Temperature increase can make a functional but defective computer
apparent. High reliability equipment gets thermal cycled - sometimes
called 'burn in' testing - to find defects before failure occurs. Is
your heatsink incorrect? Well then the computer never did what it
must do - work inside a room at 100 degrees F and be perfectly happy.
Your heatsink, if selected or installed by one with engineering
knowledge such as the manufacturer, is not a problem. However if
selected by just any computer assembler, then start asking questions
that were answered by that heatsink manufacturer. What is its degree
C per watt number? Don't just assume and then start replacing
things. That is called shotgunning.

You asked for a justification for my suggestion regarding the mains
which I was happy to give. Now, given that the system hasn't
changed, why do you suspect it has suddenly started to suffer
through some kind of thermal stress?
 
On 2007-03-10, w_tom <[email protected]> wrote:
One other point: the UPS on this machine is set to switch in at
184 V against a mains rating of 230 V. I hear the relays switch
over momentarily maybe a couple of times a week. It may do it more
often and I just don't notice. More severe sags can actually last
long enough for the alarm to briefly sound - that's much rarer but
still maybe every three months on average.

Buy an Active PFC PSU and be done with the UPS wear by
lowering the voltage setting. It should not need to come on
at 184V for the computer. Maybe something else you have
might, but any decent PSU should have no problem, it would
just lower the max possible output from the PSU but that max
is within the range of margin it should already have if
suitably sized for the system.
 
Nothing arbitrary was posted. Experience alone (without fundamental
facts and numbers) results in junk science reasoning - also called
speculation. That guess on illumination is just that - speculation.
A computer must work just fine even when bulb illumination drops to
below 40%. Formulas for numbers of voltage and illumination are an
industry standard - see IES Handbook. Experience alone does not
provide this.

Switching power supply 'system' makes those AC electric problems
irrelevant as even defined by another industry standard (CBEMA) of 30
years ago. Industry standard numbers demonstrate why all appliances
work just fine without UPSes or power conditioners. Computer supplies
typically are even more resilient. How many other household
appliances are failing more frequently? Some must be if AC electric is
reason for failure as Andrew speculates. More numbers from both
standards and experience - one needs both - are provided below.

Your AC voltages must maintain specific limits at different points
in household distribution. 230 volts must not drop below 205 volts.
Yes, another standard. Sometimes voltage does drop lower. Then those
with both experience and knowledge locate wiring problems. They don't
install power conditioning.

Meanwhile a 230 volts UPS may flip to battery backup mode typically
at 200 volts. (Yours trips at an untypically low 185 volts - well look
how much lower all computers must work so that your UPS can trip at
185). If Andrews' UPS is tripping daily, experience and the numbers
says Andrews should identify a marginal wiring problem - maybe a loose
screw. Since Andrew only uses experience, then Andrew considers
acceptable what is really unacceptable. Just another reason why
experience without learning fundamental technology results in
erroneous conclusions.

230 volt power supplies must work just fine with 100% load when AC
voltage is even lower - 180 VAC. This even stated in Intel ATX
standards. Does Andrew still think anything posted was arbitrary? I
don't do arbitrary.

For Mark N and the bottom line: a computer must work just fine even
when incandescent lamps dim to 40% intensity. These both from
industry standard numbers and from too many decades of experience. If
PSU voltages are too low and cause a reboot, then problem is apparent
in multimeter numbers defined previously or in
http://tinyurl.com/yvf9vh .
Meter numbers will either report voltages just fine (so we move on to
other suspects as defined by Kony) or report power supply 'system' as
reason for reboot.

As Kony also notes:
If the temp never gets above 55 it's cool enough
System also must be stable when air temperatures are raised another
30 degree F. Heat causing failure in a 70 degree room - then hardware
is obviously defective. Heat typically is a great diagnostic tool to
find defects; not something to be cured. Increased heat can make
reboots more frequent. Again heat used to identify a defect.

Obvious symptoms such as bulging capacitors may also find a
problem. But again, reason for failure is definitive - therefore a
useful tool to find defect.

Fact that reboot occurs even when only in BIOS is another useful
fact that limits problem to very few suspects as posted previously.

Some recommend UPSes or power conditioners when basic knowledge (ie
all those industry standards) is not first learned. Experience alone
often results only in speculation. If a UPS or power conditioner
'fixes' the problem, then power supply 'system' is probably defective,
or incandescent lamps are dimming excessively and obviously.

How to find reasons for reboot? Procedure also provided in
http://tinyurl.com/yvf9vh
and two minutes exonerates or condemns power supply
'system' (including AC voltage problems). Also useful are what those
voltages do when system reboots. If problem is not in the power
supply 'system' (as meter will report), then move on to other
suspects. Don't just speculate which is what happens when experience
is not tempered by fundamental knowledge - ie the above numbers. Get
numbers to know what is and is not working - without speculation.

Experience without those standards and numbers creates only
speculation. I don't do arbitrary.
 
w_tom said:
For Mark N and the bottom line: a computer must work just fine even
when incandescent lamps dim to 40% intensity. These both from
industry standard numbers and from too many decades of experience. If
PSU voltages are too low and cause a reboot, then problem is apparent
in multimeter numbers defined previously or in
http://tinyurl.com/yvf9vh .
Meter numbers will either report voltages just fine (so we move on to
other suspects as defined by Kony) or report power supply 'system' as
reason for reboot.
As Kony also notes:

Thanks to everyone for their suggestions, and the worthwhile discussion
on power issues. But I seem to have discovered the problem doing one of
the things Kony originally suggested. First, what it apparently isn't:

- No high heat readings, and while the suggestion that the readings
need to be calibrated in order to be determined correct, they are
relatively the same as they've been since I built the box two years ago.
So my guess is no heat problems.

- Not a memory voltage issue; I tried that and no change, and also
tried increasing the Vcore voltage, as the readings have always been at
1.44 instead of 1.5, but no help.

- No power issues in my home, lights never dim, and reboots have been
constant, no more or less at peak and off-peak hours.

- I also noticed that the PC was having performance problems when it
was up and running, things like video playback skipping and lagging when
they hadn't before.

The box had never before rebooted until two weeks ago, and the problem
recurs at least every hour, and sometimes every few seconds - I woke up
last night to continuous beeping, each time the PC POSTed. The last
change to the box was several weeks earlier, when I changed optical
drives and added a hard drive, so I decided to start there. I had been a
bit concerned about the SATA HD, as I seem to recall that early SATA
mobos have had some problems, so I unhooked it and the problem was gone.
I updated the mobo drivers and reconnected, but the problem was back. So
now I've changed the power line going to this drive, and that seems to
have righted it, I guess a power problem of sorts after all.

I really don't know anything about power supplies, but I assume I was
trying to pull too much out of one source. Not sure if a higher wattage
PS would solve such problems, but it appears I was managing what I have
rather poorly.

So any chance that I've caused any other problems in the course of all
this? All that rebooting kind of makes me uncomfortable, and I don't
know that this hasn't put undo strain on the PS. It also seems odd that
this problem didn't crop up for several weeks after adding the drive.

Thanks again...
 
The box had never before rebooted until two weeks ago, and the problem
recurs at least every hour, and sometimes every few seconds - I woke up
last night to continuous beeping, each time the PC POSTed. The last
change to the box was several weeks earlier, when I changed optical
drives and added a hard drive, so I decided to start there. I had been a
bit concerned about the SATA HD, as I seem to recall that early SATA
mobos have had some problems, so I unhooked it and the problem was gone.
I updated the mobo drivers and reconnected, but the problem was back. So
now I've changed the power line going to this drive, and that seems to
have righted it, I guess a power problem of sorts after all.

I really don't know anything about power supplies, but I assume I was
trying to pull too much out of one source.

What else did you have hooked up to the same leads? It is
doubtful you were trying to pull too much, more likely the
connector - drive connection was just flaky for whatever
reason, perhaps the connector just didn't plug in good - it
is a shame everyone is trying to shrink all these connectors
down as small as they can go reliably in a lab environment
then expecting real-world use and products to do as well in
all cases. If they were just running out of room on the
back of a hard drive to fit all the pins I could see it, but
the only problem is short-sighted engineering. Engineering
for ideals that don't hold true as often as the legacy power
connections. I'm being a bit overly critical, usually they
work fine but in some ways it is a step backwards.

Not sure if a higher wattage
PS would solve such problems, but it appears I was managing what I have
rather poorly.

So any chance that I've caused any other problems in the course of all
this? All that rebooting kind of makes me uncomfortable, and I don't
know that this hasn't put undo strain on the PS. It also seems odd that
this problem didn't crop up for several weeks after adding the drive.

An intermittent power connection (to the drive) shouldn't
have been resetting the system when sitting at a bios menu.
Where the connector contacts grossly mangled or defective to
the extent that they might have shorted against each other
temporarily? If not, I would wonder about the drive and PSU
still.

A poor connection could take a while to reveal itself,
particularly so when a marginal connection can foul the
contacts from slight arcing. Hopefully the problem is
solved but you might try to examine the contacts on the
drive, and connector, and give it some time to see if the
problem resurfaces... and if it does, then try another drive
and connector, so you have two mating contacts that weren't
potentially fouling themselves.
 
kony said:
Mark N wrote:
What else did you have hooked up to the same leads? It is
doubtful you were trying to pull too much, more likely the
connector - drive connection was just flaky for whatever
reason, perhaps the connector just didn't plug in good - it
is a shame everyone is trying to shrink all these connectors
down as small as they can go reliably in a lab environment
then expecting real-world use and products to do as well in
all cases. If they were just running out of room on the
back of a hard drive to fit all the pins I could see it, but
the only problem is short-sighted engineering. Engineering
for ideals that don't hold true as often as the legacy power
connections. I'm being a bit overly critical, usually they
work fine but in some ways it is a step backwards.
An intermittent power connection (to the drive) shouldn't
have been resetting the system when sitting at a bios menu.
Where the connector contacts grossly mangled or defective to
the extent that they might have shorted against each other
temporarily? If not, I would wonder about the drive and PSU
still.
A poor connection could take a while to reveal itself,
particularly so when a marginal connection can foul the
contacts from slight arcing. Hopefully the problem is
solved but you might try to examine the contacts on the
drive, and connector, and give it some time to see if the
problem resurfaces... and if it does, then try another drive
and connector, so you have two mating contacts that weren't
potentially fouling themselves.

Well, it seems I spoke to soon. After a while I had recurrence of the
reboot problems, even with the new drive disconnected. Then when I
turned on the PC this morning it was back with a vengeance, rebooting
just a matter of seconds after POSTing. After many attempts it managed
to finally start up Windows, and has been stable enough to at least be
typing on this (although it has already rebooted once in the process).
The whole thing doesn't really seem random, it takes time to "warm up"
sufficiently before being able to get farther in the process. Now it's
stable enough to run in Windows for 5-10 minutes anyway.

Anyway, I don't see anything physically wrong on the mobo, and all the
cabling looks fine. I guess I'll try replacing the video card next,
which is easy and would eliminate that idea. Not sure where to go next
after that...
 
Well, it seems I spoke to soon. After a while I had recurrence of the
reboot problems, even with the new drive disconnected. Then when I
turned on the PC this morning it was back with a vengeance, rebooting
just a matter of seconds after POSTing. After many attempts it managed
to finally start up Windows, and has been stable enough to at least be
typing on this (although it has already rebooted once in the process).
The whole thing doesn't really seem random, it takes time to "warm up"
sufficiently before being able to get farther in the process. Now it's
stable enough to run in Windows for 5-10 minutes anyway.

Anyway, I don't see anything physically wrong on the mobo, and all the
cabling looks fine. I guess I'll try replacing the video card next,
which is easy and would eliminate that idea. Not sure where to go next
after that...

My son's was doing just that. Re-booting, specially when big programs
were run, but Not a heat issue. After a few things, the only thing
that cured it was to re-install Windows. He didn't lose any programs
doing the re-install, it's ok now.
 
Well, it seems I spoke to soon. After a while I had recurrence of the
reboot problems, even with the new drive disconnected. Then when I
turned on the PC this morning it was back with a vengeance, rebooting
just a matter of seconds after POSTing. After many attempts it managed
to finally start up Windows, and has been stable enough to at least be
typing on this (although it has already rebooted once in the process).
The whole thing doesn't really seem random, it takes time to "warm up"
sufficiently before being able to get farther in the process. Now it's
stable enough to run in Windows for 5-10 minutes anyway.

Anyway, I don't see anything physically wrong on the mobo, and all the
cabling looks fine. I guess I'll try replacing the video card next,
which is easy and would eliminate that idea. Not sure where to go next
after that...


If when you say:
I don't see anything physically wrong on the mobo

You specifically looked for bad/swollen capacitors, then my money's
still on the Power Supply.

M
 
Buy an Active PFC PSU and be done with the UPS wear by
lowering the voltage setting. It should not need to come on
at 184V for the computer. Maybe something else you have
might, but any decent PSU should have no problem, it would
just lower the max possible output from the PSU but that max
is within the range of margin it should already have if
suitably sized for the system.

To be honest I haven't played with the trigger voltage but left it
on the default setting - it probably is higher than it needs to
be. I'll probably leave it that way since artifically lowering
the potential for testing would mean patching a cable and NetBSD
lacks a journaling filesystem for "real life" testing. That makes
unscheduled reboots at best a pain since at the next boot it spends
25 minutes checking the filesystem ;-)
 
Increased memory voltage? Increases CPU voltage. That is
shotgunning. It learns nothing useful. And if it did fix anything,
it only 'cured symptoms'.

You are still not doing what is said in CSI - follow the evidence.
Trying to fix it rather than first collecting facts only leads to ...
well the previous post described this as experience without basic
knowledge. Due to such speculation, then things that look like
solutions actually solve nothing. Terms like 'spinning your wheels'
apply because advise that would 'follow the evidence' was ignored.

Everything posted implies something in a power supply 'system' has
long been defective. You have used subjective reasoning to declare
that power supply good. Power supply could have been insufficient two
years ago. With age and that optical drive, then the defective power
supply finally created failures. Remember what the meter does? It
may even find problems long before failures happen. Had you used the
meter two years ago, then maybe this problem would have never
occurred.

Worse - you are thinking binary. The world is ternary. Notice
three conditions. a) Power supply good and system works. b) Power
supply defecitve but system works. c) Power supply defective and
system fails. All three exist. Your reasoning has assumed only a)
and c). But again why shotgunning is avoided. But again why those
with only experience sometimes only cure symptoms.

You have established zero items as operational. After so much
effort, nothing is known. Nothing has been accomplished. Get the
meter. In but two minutes, those numbers would have made major
accomplishments. Start a process of learning what is and is not
sufficient. Stop doing the 'try this and try that'. You have
demonstrated only again why shotgunning is not very successful and why
shotgunning takes so much more time. After everything posted, we are
still no closer to a solution because you did not get the meter.

How do we know nothing is being learned. You are trying to fix it.
That comes later. First get the facts. Did you also collect data
from the system (event) log? Another useful fact that long ago might
have identified the problem. Just another example of why we first
collect data long before trying to fix anything.

It is not odd "that this problem didn't crop up for several weeks".
That common occurance is also why shotgunning is so unreliable. But
again, you made an assumption, then drew conclusions from that
assumption. That is also how wild speculation only complicates a
problem. Follow the evidence. Establish what is good. Get the meter
so that every effort results in something other than wild speculation.

So now you will suspect the video card? On what data? Again, more
shotgunning. If another video card changes things, does that mean the
video card is defective? No. It also means the power supply 'system'
is defective, heat is a problem, motherboard bus is defective,
software magically went defective, and ... Stop shotgunning. Or
just telll everyone to go away because you really don't want help.
You have ZERO reasons to blame the video card. ZERO. But you are
shotgunning.
 
w_tom said:
Temperature increase can make a functional but defective computer
apparent. High reliability equipment gets thermal cycled - sometimes
called 'burn in' testing - to find defects before failure occurs. Is
your heatsink incorrect? Well then the computer never did what it
must do - work inside a room at 100 degrees F and be perfectly happy.
Your heatsink, if selected or installed by one with engineering
knowledge such as the manufacturer, is not a problem. However if
selected by just any computer assembler, then start asking questions
that were answered by that heatsink manufacturer. What is its degree
C per watt number? Don't just assume and then start replacing
things. That is called shotgunning.

Your computer is intermittent. Therefore use a hairdryer on high to
heat selective parts at temperatures uncomfortable to touch. That is
a perfectly normal temperature to all working retail electronics.
This is also when comprehensive hardware diagnostics best find
failures. At higher temperature, defective electronics then will fail
consistently. Just another way to find defects and how to have found
that defect long before your problems were apparent.

Hardware monitor will show changes (a monitor) but is not valid as a
voltage measurement until calibrated - which is why you need a 3.5
digit multimeter. A tool so 'complex' as to be sold even in K-mart.
Any voltage below 3.23, 4.87, or 11.7 is a problem

Wrong, as always.
and would explain intermittent crashes.

Pig ignorant lie.
 
w_tom said:
Nothing arbitrary was posted.

Bare faced lie, most obviously with the minimum rail voltages you plucked from your arse.
Experience alone (without fundamental facts and numbers)
results in junk science reasoning - also called speculation.

Your minimum rail voltages in spades.
That guess on illumination is just that - speculation.

Your minimum rail voltages in spades.
A computer must work just fine even when bulb illumination drops to below 40%.

Pig ignorant drivel. You've plucked that out of your arse too.
Formulas for numbers of voltage and illumination are an industry
standard - see IES Handbook. Experience alone does not provide this.

Pity its completely irrelevant to what mains voltage the system will still work fine at.
Switching power supply 'system' makes those AC electric problems
irrelevant as even defined by another industry standard (CBEMA) of 30
years ago. Industry standard numbers demonstrate why all appliances
work just fine without UPSes or power conditioners. Computer supplies
typically are even more resilient. How many other household
appliances are failing more frequently? Some must be if AC electric is
reason for failure as Andrew speculates. More numbers from both
standards and experience - one needs both - are provided below.

More of your drivel, actually.
Your AC voltages must maintain specific limits at different points
in household distribution. 230 volts must not drop below 205 volts.
Yes, another standard. Sometimes voltage does drop lower.

Funny that.
Then those with both experience and knowledge locate
wiring problems. They don't install power conditioning.

Plenty do when the mains isnt as good as it should be.
Meanwhile a 230 volts UPS may flip to battery backup mode typically
at 200 volts. (Yours trips at an untypically low 185 volts - well look
how much lower all computers must work so that your UPS can trip at
185). If Andrews' UPS is tripping daily, experience and the numbers says
Andrews should identify a marginal wiring problem - maybe a loose screw.

Or a lousy mains supply.
Since Andrew only uses experience, then Andrew considers
acceptable what is really unacceptable. Just another reason
why experience without learning fundamental technology
results in erroneous conclusions.

Your minimum rail voltages in spades.
230 volt power supplies must work just fine with 100% load when AC
voltage is even lower - 180 VAC. This even stated in Intel ATX standards.

Intel doesnt set the ATX standards.
Does Andrew still think anything posted was arbitrary? I don't do arbitrary.

You clearly do with those stupid minimum acceptible rail voltages
which arent anything like what the ATX standard says.
For Mark N and the bottom line: a computer must work just fine even
when incandescent lamps dim to 40% intensity. These both from
industry standard numbers and from too many decades of experience.

Irrelevant to what the ATX standard requires.
If PSU voltages are too low and cause a reboot, then problem
is apparent in multimeter numbers defined previously or in
http://tinyurl.com/yvf9vh .
Meter numbers will either report voltages just fine (so we move on to
other suspects as defined by Kony) or report power supply 'system' as
reason for reboot.

Wrong, as always.
As Kony also notes:
System also must be stable when air temperatures are raised another
30 degree F. Heat causing failure in a 70 degree room - then hardware
is obviously defective. Heat typically is a great diagnostic tool to
find defects; not something to be cured. Increased heat can make
reboots more frequent. Again heat used to identify a defect.
Obvious symptoms such as bulging capacitors may also find a
problem. But again, reason for failure is definitive - therefore a
useful tool to find defect.

Meaningless waffle.
Fact that reboot occurs even when only in BIOS is another useful
fact that limits problem to very few suspects as posted previously.
Some recommend UPSes or power conditioners when basic knowledge (ie
all those industry standards) is not first learned. Experience alone
often results only in speculation. If a UPS or power conditioner
'fixes' the problem, then power supply 'system' is probably defective,
or incandescent lamps are dimming excessively and obviously.
Pathetic.

How to find reasons for reboot? Procedure also provided in
http://tinyurl.com/yvf9vh
and two minutes exonerates or condemns power supply
'system' (including AC voltage problems).

Bare faced lie, regardless of how often its repeated.
Also useful are what those voltages do when system reboots.
If problem is not in the power supply 'system' (as meter will report),

Not necessarily.
then move on to other suspects.

No thanks.
Don't just speculate which is what happens when experience
is not tempered by fundamental knowledge - ie the above numbers.
Get numbers to know what is and is not working - without speculation.

Your minimum rail voltages are pure speculation.
Experience without those standards and numbers
creates only speculation. I don't do arbitrary.

Bare faced lie with your minimum rail voltages.
 
Bare faced lie,

Snipped all the comments.

I am new to this forum and was enjoying the interchange of ideas when
Rod proceded to make comment and not help.

DO I need to ignore anything from this person?

Now to the problem at hand....

I have worked on computers for many years.
Flaky PSU's are sometimes just that, flaky.
They do not cause a failure, or reboot until it happens.
Switching power feeds to the HD temporarily cured it.
This really indicates a PSU problem.
However, swapping out, to cure, is not very specific,as the other
helpers have noted.

Please take the time to be thorough.
YB
 
If when you say:


You specifically looked for bad/swollen capacitors, then my money's
still on the Power Supply.

M

Agreed, I'd think that more likely than the video card.
 
To be honest I haven't played with the trigger voltage but left it
on the default setting - it probably is higher than it needs to
be. I'll probably leave it that way since artifically lowering
the potential for testing would mean patching a cable and NetBSD
lacks a journaling filesystem for "real life" testing. That makes
unscheduled reboots at best a pain since at the next boot it spends
25 minutes checking the filesystem ;-)

There shouldn't be any testing necessary if the PSU is
already spec'd for a 110-220+ range, it should be expected
to run equally well already... if it didn't, it would be as
prudent to test it again at it's present voltage in case it
had degraded in general, not-volt-specific, function.
 
Back
Top