Thanks Paul,
Things have drifted into the twilight zone since your last instructions.
Actually the problems all started when my system started crashing with
kernel panics. It said that the os (Debian GNU/Linux) was trying to kill
the idle process and so naturally it shuts down. I noticed buried in
the listing of the register contents and other messages that there was a
paging error so I guessed that the hard drive went bad. So I got a
new hard drive ( 90 G Hiatachi ) and when I installed it I didn't bother
to turn off the machine and so the problems you are responding to
started. I did as you suggested. The CD-ROM was on it own cable plugged
into to the secondary IDE interface. I found an old hard drive as you
suggested and disconnected the CD-ROM and hard drive and connected the
old hard drive to the secondary IDE and booted. It was detected
normally. So I figured the CD-ROM was bad but I reconnected the old
drive to the primary IDE and the CD-ROM to the the secondary IDE and lo
and behold the machine booted to a an old Debian installation disk I had
in the CD-ROM. So I reconnected the new
hard drive and then when I booted not even the POST would start. I
disconnected the
new hard drive and put the old back on and this time it got hung
up on the POST at the PNP init step. I reset about 20 times and
sometimes it would boot but mostly it would hand at some stage of
of the POST, either at PNP init or after it detected the IDE devices.
Sometimes it would boot normally. Anyway I let it boot one time to
the Debian Installation disk and then started the installion process
and after it loaded the kernal it went into a kernal panic saying
that that the os had tried to kill the idle process and I
got all those same messages that I mentioned at the start of this
post. I know that was not the fault of the hard drive. So now
I am wondering if the CPU (AMD K6-2/500) is bad. Any thoughts
would be appreciate if you can make sense of any of this.
ps. I forgot to mention that on one of the resets the
beep code went long short short short so I reseated the
video card and that fixed that (again).
About all I can suggest, is working methodically, starting with a
minimal system and working up. Even doing that, ad-hoc testing
using small utilities doesn't guarantee that you can find the
exact thing wrong with the system. Even in large systems, sometimes
you can only isolate to the nearest three pieces of hardware.
Have you looked at the power monitor BIOS page lately ?
It can tell you the voltages coming from the PS. Every
time you try a new hardware configuration, drop into that
page and check that the power supply is happy. The label
on the power supply should have limits on it (like +/-5% for
each output) and you should check the measured values to
see if the power is good. (If you have a spare power supply,
popping it into the case at this point wouldn't be a bad
idea.)
As far as an orderly test sequence goes, you really need
a working hard drive to do some of the testing. A plugin
PCI IDE card (one rated to use ATA or ATAPI devices, so
both hard drives and CDROMs can be connected), could be
used to replace the ailing IDE interfaces on the
Southbridge. But of course, this will require drivers
being installed to boot from the card.
Test sequence:
1) Memtest via floppy. If memory errors are printed to
the screen, and the memtest program doesn't crash, then
there could be a memory problem
2) 100% CPU load test. A logging program running at the same
time in the background, to record voltages as monitored
by the power monitor chip. This implies the use of an operating
system, to have two or more tasks running at the same time.
The CPU, memory, and power supply will be stressed. You
are monitoring the power supply, the memory has been tested,
and this leaves the CPU. At this point, computation errors
could be caused by a flaky FSB (bus between CPU and Northbridge)
or the CPU itself. The CPU Vcore can also be weak, but in the
P5A case, I don't see a monitor capability for Vcore, so I would
need a cheap multimeter from Radio Shack to check it.
3) Video card test. 3D operations draw the most power on a video
card, and get more of the video chip doing calculations.
If artifacts are seen on the screen during testing, this
could be the video chip or the video memory chips. If the
system crashes during a 3D test, then the AGP bus from
Northbridge to video card is implicated (as the CPU and its
bus have already been tested).
4) With the core of the system tested, the rest of the tests
will be to peripherals. A simple sequential write test to
the disk, followed by read-back, is enough to check for surface
integrity of the disk itself, plus the large amount of
traffic over the IDE ribbon cables helps test the IDE bus
interface pins and DMA transfer hardware. I don't consider
bashing the drive with repetitive seeks to be very useful
(so-called access time testing).
I don't have test programs to suggest for every operating
system out there, so you'll have to search around and
improvise. Of the few commercial computer test programs
I've used, the only one that impressed me, was one that came
with a Sun computer. The personal computer test programs
aren't very verbose in their test interface, so you cannot
tell how thorough they really are. (Inserting faults in
a computer, and then seeing whether a test program can find
them, is the only way to be certain about a commercial testing
program. A glossy GUI is a telltale sign the program is crap,
as all the software effort should go into testing, with an
indication in each test case, as to what has been tested.)
HTH,
Paul