random blue screens when warm

  • Thread starter Thread starter Craig
  • Start date Start date
C

Craig

Hello,
My computer (a three year old homebuilt Athlon based PC) just
recently starting giving me random blue screens. These only seem to
occur when the pc is warm (problem begins about half an hour after
turning on the pc, with pc undergoing light usage (office apps, email,
etc.). The problem first exhibited itself, however, during a game of
Doom 3. After warming up, the computer is practically unusable, with
blue screens occuring immediately after logging in (and sometime
before). If I let the computer sit for a couple of hours, all is fine
until it has warmed up again.

I've checked my CPU temps and they are same as always....around 49
degrees C (around 53 C after processor intensive activities). The
voltages on the power supply all appear okay according to a utility
that came with the mother board. I've added no new hardware nor have I
downloaded any updates recently. The problem persists even with the pc
booted into safe mode.

I'm beginning to suspect that one of my DIMMs may have gone bad, there
is a problem with the power supply, or with the chipset on the
motherboard. (Although I do have it undergoing additional cooling by
offsetting the oversize fan on my cpu such that it blows onto the
northbridge chip's heatsink. But if it was the RAM, shouldn't the
problem also occur when the pc is cold? The fact that it only occurs
when warm makes me wonder about the power supply (despite the nominal
voltage reports). The chipset would be scary...no way to test this
without purchasing a new board and if I did this I may was well scrap
the pc alltogether and build a new one!

Anyway, this is a tough nut to crack and am wondering if anybody else
cares to shed insight into this problem.
Craig
 
Just wanted to add that I've also checked this computer for spyware and
viruses and it is clean. I think I mentioned that no updates had been
applied (at least not for at least a few months prior to this problem
manifesting itself).
Craig
 
Craig said:
My computer (a three year old homebuilt Athlon based PC) just
recently starting giving me random blue screens. These only seem to
occur when the pc is warm (problem begins about half an hour after
turning on the pc, with pc undergoing light usage (office apps, email,
etc.). The problem first exhibited itself, however, during a game of
Doom 3. After warming up, the computer is practically unusable, with
blue screens occuring immediately after logging in (and sometime
before). If I let the computer sit for a couple of hours, all is fine
until it has warmed up again.
I've checked my CPU temps and they are same as always....around
49 degrees C (around 53 C after processor intensive activities). The
voltages on the power supply all appear okay according to a utility
that came with the mother board. I've added no new hardware nor have
I downloaded any updates recently. The problem persists even with
the pc booted into safe mode.
I'm beginning to suspect that one of my DIMMs may have gone bad,

Unlikely, but easy to test that possibility with memtest86 run overnight.
there is a problem with the power supply,
or with the chipset on the motherboard.

Or bad caps on the motherboard.
(Although I do have it undergoing additional cooling by
offsetting the oversize fan on my cpu such that it blows
onto the northbridge chip's heatsink. But if it was the RAM,
shouldn't the problem also occur when the pc is cold?

Not necessarily. RAM doesnt usually go bad, the
problem is that the timing used isnt suitable for the
RAM and that stuff can vary with the temperature.
The fact that it only occurs when warm makes me wonder
about the power supply (despite the nominal voltage reports).

Yes, glitches on the rails wont usually be visible in the voltage reports.
The chipset would be scary...no way to test this without
purchasing a new board and if I did this I may was well
scrap the pc alltogether and build a new one!

No reason to indicate that thats a problem.
Anyway, this is a tough nut to crack and am wondering
if anybody else cares to shed insight into this problem.

You've just got to try the possibilitys, obviously with the easiest to try first.
 
Craig said:
Hello,
My computer (a three year old homebuilt Athlon based PC) just
recently starting giving me random blue screens. These only seem to
occur when the pc is warm

Have you tried reseating everything such as memory and cards? Maybe
something has worked loose.
 
Craig said:
Hello,
My computer (a three year old homebuilt Athlon based PC) just
recently starting giving me random blue screens. These only seem to
occur when the pc is warm (problem begins about half an hour after
turning on the pc, with pc undergoing light usage (office apps, email,
etc.). The problem first exhibited itself, however, during a game of
Doom 3. After warming up, the computer is practically unusable, with
blue screens occuring immediately after logging in (and sometime
before). If I let the computer sit for a couple of hours, all is fine
until it has warmed up again.

I've checked my CPU temps and they are same as always....around 49
degrees C (around 53 C after processor intensive activities). The
voltages on the power supply all appear okay according to a utility
that came with the mother board. I've added no new hardware nor have I
downloaded any updates recently. The problem persists even with the pc
booted into safe mode.

I'm beginning to suspect that one of my DIMMs may have gone bad, there
is a problem with the power supply, or with the chipset on the
motherboard. (Although I do have it undergoing additional cooling by
offsetting the oversize fan on my cpu such that it blows onto the
northbridge chip's heatsink. But if it was the RAM, shouldn't the
problem also occur when the pc is cold? The fact that it only occurs
when warm makes me wonder about the power supply (despite the nominal
voltage reports). The chipset would be scary...no way to test this
without purchasing a new board and if I did this I may was well scrap
the pc alltogether and build a new one!

Anyway, this is a tough nut to crack and am wondering if anybody else
cares to shed insight into this problem.
Craig
Go check your Event Viewer first to make sure you don`t
have any Application or System errors.
That`s easy, and free.
The next worst problems, are Overclocking (heat generation),
and possible memory or PSU problems.

You could d\load a memory test program, and run it, to
test your RAM
Now you`re on your way <g>.

www.memtest86.com/
http://www.short-media.com/download.php?d=458
 
I thought about trying a memory tester, but I suspect it will simply
blue screen on me during the test (it's consistent about blue screening
as soon as it has warmed up). Good tips about the PS rails. I'll try
reseating everything when I start to remove hardware.

The computer has been overclocked in the past (video, cpu and bus), but
at the moment it is running at stock speeds and the memory (and the
video card) are actually running a bit underclocked. (Card came that
way from the manufacturer that way and as for the memory, I purchased
DIMM's to use on a Athlon 3200 but ended up building it with a 2500.
(With the intention of overclocking to 3200 speed, but it was never
entirely stable at that speed, it would blue screen on me at bad
times...like when I was winning during a game! A rarity for this old
man. :-) ) Besides, it had been running rock stable until just
recently. The problems happened suddenly, as though someone had
"thrown a switch" and turned on the blue screens. This is why I'm sure
it is a hardware failure of some sort.

Craig
 
Craig said:
I thought about trying a memory tester, but I suspect it will simply
blue screen on me during the test (it's consistent about blue screening
as soon as it has warmed up).

Use memtest86. It makes a bootable CD or floppy and, once you've booted
from the CD or floppy it goes straight in to the test, thus taking
Windows out of the equation.

It's well worth a go.

Also have you had your hand on the CPU heatsink while the machine is
running to see if 49c is sensible for the actual temperature of the
chip. I just wonder if the mainboard is not seeing the true temps on the
chip?

Cheers

Alex
 
I thought about trying a memory tester, but I suspect it will simply
blue screen on me during the test (it's consistent about blue screening
as soon as it has warmed up). Good tips about the PS rails. I'll try
reseating everything when I start to remove hardware.

The computer has been overclocked in the past (video, cpu and bus), but
at the moment it is running at stock speeds and the memory (and the
video card) are actually running a bit underclocked. (Card came that
way from the manufacturer that way and as for the memory, I purchased
DIMM's to use on a Athlon 3200 but ended up building it with a 2500.
(With the intention of overclocking to 3200 speed, but it was never
entirely stable at that speed, it would blue screen on me at bad
times...like when I was winning during a game! A rarity for this old
man. :-) ) Besides, it had been running rock stable until just
recently. The problems happened suddenly, as though someone had
"thrown a switch" and turned on the blue screens. This is why I'm sure
it is a hardware failure of some sort.


Actually sudden changes are more often software, hardware
more gradually degrades if it's still working as much as it
is/was.

Take the cover off and check all fans, and clean out dust if
warranted. Leave cover off and point a desk fan into the
system and see if it helps. Inspect capacitors for bulging
or venting.

Memory can easily become less stable if it overheats, but
it's doubtful the system temp is changing THAT much unless
you had a serious cooling problem. The PSU could be having
trouble, but more significant at this point might be
examining what those Blue Screens state... what's the error
message and stop code? It could be a great hint, in
addition to checking Event Viewer.

Don't think about "but it might blue screen" while checking
memory, just do it... besides, we mean memtest86+, not a
windows based memory tester that can't test nearly as much
memory.

If you have a multimeter, check and monitor voltage that way
and see if you can cause the fault, perhaps running a gaming
benchmark like 3DMark (whichever version is appropriate for
your particular video card so as to not be bottlenecked too
much, allowing a good load on both video and CPU.

Is it possible you have AC electrical problems, perhaps a
heater is coming on and your PSU is too anemic to cope with
this? Perhaps the system is on the floor next to a heater
duct that came on and elevated temp a lot?

I doubt your chipset is overheating if the rest of the
chassis is reasonably ventilated, especially with the CPU
fan blowing over towards it. However, some software monitor
programs may be able to show it's temp, it would be good to
more closely monitor voltages and temps, and while the cover
is off as mentioned above, recheck all mechanical
connections such as cards, cables, memory, etc.

"Rarely" it's possible for an open core CPU like an Athlon
XP era to dry out it's thermal compound and appear cool
enough, but to have an island of compound over the area with
the thermal diode embedded but no 'sinking of heat over
other areas of the CPU... so the temp looks ok but some
parts of the CPU are hotter than others. What I'd just
described I would normally consider so unlikely and remote
to be unworthy of consideration- but I had just this
situation occur a few months ago, and after pulling off the
heatsink, putting fresh compound on it was fine thereafter.
 
Thank you to all who offered advice...I think I found the problem (and I got
lucky, too). Tonight I decided to try some troubleshooting, anticipating a
long procedure of removing, swapping and replacing hardware. Well, I began
by removing one of my (3) DIMMS (after letting the computer warm up and
begin blue screening). Lo and behold, I hit the jackpot on my first try! I
removed the DIMM and the computer booted just fine. I let it run for a
couple of hours, including taxing it with DOOM3 and moving some files (did
these to generate some heat and put a good load on the power supply). All
remained stable. I put the DIMM back in, rebooted and instant blue screen!
The memory stick in question is a 1GB DIMM from USModular. Allegedly it has
a lifetime warranty, but I checked the warranty terms and they dictate that
it be returned with the original receipt, original package and original
anti-static bag! (WTF?) I'm going to phone them tomorrow to see how rigid
this policy is. Computer is running fine now using my remaining memory (a
pair of 512MB modules). I installed that DIMM almost exactly a year
ago....interesting that it would fail so suddenly.
Craig
 
Craig said:
.... snip ...
Computer is running fine now using my remaining memory (a pair of
512MB modules). I installed that DIMM almost exactly a year ago.
...interesting that it would fail so suddenly.

Now is the time to check whether you can replace all your memory
with ECC modules, which in turn depends on whether or not the
chipset is ECC capable. If you had had ECC the only symptom would
have been a slowdown, accompanied by a record in the bios memory of
corrections applied.
 
Thank you to all who offered advice...I think I found the problem (and I got
lucky, too). Tonight I decided to try some troubleshooting, anticipating a
long procedure of removing, swapping and replacing hardware. Well, I began
by removing one of my (3) DIMMS (after letting the computer warm up and
begin blue screening). Lo and behold, I hit the jackpot on my first try! I
removed the DIMM and the computer booted just fine. I let it run for a
couple of hours, including taxing it with DOOM3 and moving some files (did
these to generate some heat and put a good load on the power supply). All
remained stable. I put the DIMM back in, rebooted and instant blue screen!
The memory stick in question is a 1GB DIMM from USModular. Allegedly it has
a lifetime warranty, but I checked the warranty terms and they dictate that
it be returned with the original receipt, original package and original
anti-static bag! (WTF?) I'm going to phone them tomorrow to see how rigid
this policy is. Computer is running fine now using my remaining memory (a
pair of 512MB modules). I installed that DIMM almost exactly a year
ago....interesting that it would fail so suddenly.
Craig


If all you did was remove the one module, it could be the
memory slot instead of the module. You might then try
putting one of the remaining modules in the slot it was in,
and/or putting that one questionable module in alone in a
different slot.
 
Back
Top