REPOST: A7V333: no longer able to access hard disks at boot time

  • Thread starter Thread starter Chris Metzler
  • Start date Start date
C

Chris Metzler

Hi. I posted this problem originally about three months ago, but
didn't get a response. I'm trying again, hoping someone will have
some advice for me.

My basic problem: after almost a year of working properly, the
system suddenly seems unable to boot off of either hard drive on
IDE0/1, even though the drives seem to be working perfectly when
accessed after booting through some other method.

CONFIGURATION
-------------

- A7V333-RAID w/ Athlon XP 2000, 2x Corsair 512MB CAS2 DDR333/PC2700 RAM.
- IDE0: Lite-On LTR-48125W CD-RW, WD 1200JB hard drive
- IDE1: Lite-On XJ-HD165H DVD/CD-ROM, WD 1200JB hard drive
- IDE2 (Promise): WD 800JB hard drive, WD 1200JB hard drive
- IDE3 (Promise): WD 1200JB hard drive
- Teac (?) floppy drive
- Matrox Millenium G550 video card with Viewsonic P95f+ monitor
- Creative Soundblaster Live! 5.1 sound card
- D-Link DFE-530TX+ NIC
- HP LJ1200 attached through parallel port.

The OS is located on the WD 800JB (80 GB) drive (the master on IDE2,
the first channel of the on-board Promise adapter), freeing the four
120 GB drives (one on each IDE channel) to be used as a pair of
RAID1 arrays. For faster execution, RAID functions are handled
by the OS; the on-board Promise hardware is used solely for the
extra IDE channels. The Promise *RAID* capabilities are not used
at all.

To be able to boot off CD when desired, the machine has to be
configured to boot from IDE0/1 rather than the Promise channels,
since the on-board Promise controller only allows disks to be
connected. To allow the OS to reside on a disk on the Promise
channels while still booting off IDE0/1, a bootloader is installed
into the MBR of the first 120 GB hard disk (IDE0 slave) which
points to the OS on the 80 GB drive. The BIOS is configured to
attempt to boot off off, in order:

1. the CD-RW (IDE0 master);
2. floppy;
3. the first 120 GB hard disk (IDE0 slave).

When it attempts to boot off the hard disk, it loads and executes
the bootloader which loads the OS off the 80 GB drive on IDE2.

I've had this system configured as described since early last
November, working flawlessly under heavy load for a workstation,
including numerous cold boots (after e.g. shutting down for a
thunderstorm) and some warm boots (after e.g. Linux kernel
recompiles). Not one crash or glitch of any sort, ever.


THE PROBLEM
-----------

I'm no longer able to boot from the hard drives on IDE0 or IDE1.

The POST finds the drives OK: they're listed in the opening
screen after memory is counted; and entering the BIOS setup screen
shows all four devices on IDE0/1 just fine. So it's not as if
it doesn't see the drives at all. But when it comes time to
read from the MBR, trouble. If the boot sequence is left as
above, and there are no bootable media in the CD or floppy drives,
then the system hangs after trying the floppy drive, with the
floppy drive light on; a hard reset is necessary. If the boot
order is changed so that the hard drive comes first, then the
system hangs immediately upon starting to try to boot, and again
a hard reset is necessary. No error messages of any sort -- just
a hang. Disabling CD and/or floppy booting doesn't change this;
the boot still hangs when it gets to trying to boot from the hard
drive.

However, booting off the CD or a floppy is possible. In fact, I
can boot the bootloader that would normally be in the first hard
drive's MBR off a floppy, and use *that* to access the OS on IDE2
master without any problem. It's just like things would normally
work, except the first step of the boot process goes through the
floppy rather than the IDE0 slave like it used to.

At this point, I might normally guess that there's some sort of
problem with the drive. However, the OS doesn't use BIOS routines
to access disks; it uses its own, entirely. Once the OS is booted,
I'm able to mount and mess with *all* disks, including the disk
in question. I've done hours of tests on it, filling it up and
doing compares and so forth. The disk works perfectly through the
OS; there's only a problem when attempting to access the MBR through
the BIOS at boot-time.

Another hint that it's not a problem with the disk is that I
installed the bootloader into the MBR of the *other* hard drive
on IDE0/1, the IDE1 slave, and changed the BIOS setting to boot
off that drive instead of the IDE0 slave. Exactly the same thing
happened -- system hang when it came time to boot off that disk.

The fact that there's no error messages at all made me suspicious
that maybe the MBR simply got munged somehow. So I re-installed
the bootloader into the MBR, more than once. No effect.

It is as if the BIOS can no longer read from the drives, or at least
can no longer read from their MBRs. But how can the BIOS get munged
in such a way as to work perfectly in every way *except* in trying
to boot off a hard drive?

I've spent the last three months booting off a floppy as a result
of this, and I'd really like to fix this problem if I can.

Any advice or suggestions would be welcomed.

Thanks.

-c
 
Chris..
No wonder you didn't get any replies - you have a problem you don't see all
that often, but you seem to have thought it through to a logical conclusion.
You didn't mention the OS..... XP? WIN2K?
In XP there are 3 commands you can use at the recovery console command line.
BOOTCFG (/REDIRECT or /REBUILD), FIXBOOT and FIXMBR
I would use BOOTCFG with no switches to see where it's trying to boot from,
then use FIXMBR if all is as it should be.
To access the drive on the raid controller from the RC you might need the
promise driver loaded (F6 on bootup).
My first thought would be a corrupt MBR....... but let us know the real
answer....

JT.
 
Chris Metzler said:
Hi. I posted this problem originally about three months ago, but
didn't get a response. I'm trying again, hoping someone will have
some advice for me.

My basic problem: after almost a year of working properly, the
system suddenly seems unable to boot off of either hard drive on
IDE0/1, even though the drives seem to be working perfectly when
accessed after booting through some other method.

CONFIGURATION
-------------

- A7V333-RAID w/ Athlon XP 2000, 2x Corsair 512MB CAS2 DDR333/PC2700 RAM.
- IDE0: Lite-On LTR-48125W CD-RW, WD 1200JB hard drive
- IDE1: Lite-On XJ-HD165H DVD/CD-ROM, WD 1200JB hard drive
- IDE2 (Promise): WD 800JB hard drive, WD 1200JB hard drive
- IDE3 (Promise): WD 1200JB hard drive
- Teac (?) floppy drive
- Matrox Millenium G550 video card with Viewsonic P95f+ monitor
- Creative Soundblaster Live! 5.1 sound card
- D-Link DFE-530TX+ NIC
- HP LJ1200 attached through parallel port.

The OS is located on the WD 800JB (80 GB) drive (the master on IDE2,
the first channel of the on-board Promise adapter), freeing the four
120 GB drives (one on each IDE channel) to be used as a pair of
RAID1 arrays. For faster execution, RAID functions are handled
by the OS; the on-board Promise hardware is used solely for the
extra IDE channels. The Promise *RAID* capabilities are not used
at all.

To be able to boot off CD when desired, the machine has to be
configured to boot from IDE0/1 rather than the Promise channels,
since the on-board Promise controller only allows disks to be
connected. To allow the OS to reside on a disk on the Promise
channels while still booting off IDE0/1, a bootloader is installed
into the MBR of the first 120 GB hard disk (IDE0 slave) which
points to the OS on the 80 GB drive. The BIOS is configured to
attempt to boot off off, in order:

1. the CD-RW (IDE0 master);
2. floppy;
3. the first 120 GB hard disk (IDE0 slave).

When it attempts to boot off the hard disk, it loads and executes
the bootloader which loads the OS off the 80 GB drive on IDE2.

I've had this system configured as described since early last
November, working flawlessly under heavy load for a workstation,
including numerous cold boots (after e.g. shutting down for a
thunderstorm) and some warm boots (after e.g. Linux kernel
recompiles). Not one crash or glitch of any sort, ever.


THE PROBLEM
-----------

I'm no longer able to boot from the hard drives on IDE0 or IDE1.

The POST finds the drives OK: they're listed in the opening
screen after memory is counted; and entering the BIOS setup screen
shows all four devices on IDE0/1 just fine. So it's not as if
it doesn't see the drives at all. But when it comes time to
read from the MBR, trouble. If the boot sequence is left as
above, and there are no bootable media in the CD or floppy drives,
then the system hangs after trying the floppy drive, with the
floppy drive light on; a hard reset is necessary. If the boot
order is changed so that the hard drive comes first, then the
system hangs immediately upon starting to try to boot, and again
a hard reset is necessary. No error messages of any sort -- just
a hang. Disabling CD and/or floppy booting doesn't change this;
the boot still hangs when it gets to trying to boot from the hard
drive.

However, booting off the CD or a floppy is possible. In fact, I
can boot the bootloader that would normally be in the first hard
drive's MBR off a floppy, and use *that* to access the OS on IDE2
master without any problem. It's just like things would normally
work, except the first step of the boot process goes through the
floppy rather than the IDE0 slave like it used to.

At this point, I might normally guess that there's some sort of
problem with the drive. However, the OS doesn't use BIOS routines
to access disks; it uses its own, entirely. Once the OS is booted,
I'm able to mount and mess with *all* disks, including the disk
in question. I've done hours of tests on it, filling it up and
doing compares and so forth. The disk works perfectly through the
OS; there's only a problem when attempting to access the MBR through
the BIOS at boot-time.

Another hint that it's not a problem with the disk is that I
installed the bootloader into the MBR of the *other* hard drive
on IDE0/1, the IDE1 slave, and changed the BIOS setting to boot
off that drive instead of the IDE0 slave. Exactly the same thing
happened -- system hang when it came time to boot off that disk.

The fact that there's no error messages at all made me suspicious
that maybe the MBR simply got munged somehow. So I re-installed
the bootloader into the MBR, more than once. No effect.

It is as if the BIOS can no longer read from the drives, or at least
can no longer read from their MBRs. But how can the BIOS get munged
in such a way as to work perfectly in every way *except* in trying
to boot off a hard drive?

I've spent the last three months booting off a floppy as a result
of this, and I'd really like to fix this problem if I can.

Any advice or suggestions would be welcomed.

Thanks.

-c

Could it be related to the settings of the disk detection in the
BIOS "Main" section ? Most people set these to "Auto", but perhaps
you've done something different ? Maybe the geometry detected
for the disks is somehow screwed up.

When the BIOS detects hardware, it stores information in the ESCD
section of the flash chip. This seems to be a cache of hardware
settings. The ESCD gets updated whenever new hardware is detected.
Possible, the ESCD would get wiped after flashing the BIOS (even
if you flash with the same version of the BIOS you are currently
using). This is because the ESCD is just a section of the flash,
and the flash image you download from Asus contains an empty ESCD,
that would write over the current ESCD section. Maybe that would
be enough to get the BIOS to reenumerate the hardware. (I don't know
if any disk geometry info is stored in the ESCD or not - my point in
explaining this, is that the use of "caches" in hardware can lead to
modal behavior that is hard to explain otherwise. Even using the
"clear the CMOS" procedure doesn't result in the ESCD being
updated.)

Note that backing up the current BIOS into a file, will have
different contents than the same version of BIOS freshly downloaded
from Asus. So, if you find that flashing with an Asus BIOS results
in a worse mess than you have currently, flashing back to a backup
copy of your current BIOS should return you to where you were.

Also, when flashing BIOS on the A7V family, you might want to check
a7vtroubleshooting.com, as there are comments there about boards
dying if flashing from one particular version of BIOS to another.
Just in case you decided to upgrade to a newer version of BIOS.

http://www.a7vtroubleshooting.com/info/bios/index.htm#333

Good luck and be careful. Messing around with disks with live data
on them is asking for trouble. My own procedure for this kind of
work is a full backup before touching anything. (I'm lucky in that
I only use a single drive on a computer, and keep a backup drive
powered down and disconnected during normal use of the computer.)
That way, if something gets wedged while playing, I can recover
using the backup image.

HTH,
Paul
 
Experience hints that this is not a BIOS problem but a MBR glitch - an NT
snafu that can drive you bananas.
I'll second the backup strategy though, but don't spend too much time
looking in or upgrading the BIOS.

JT
 
"John Tindle" said:
Experience hints that this is not a BIOS problem but a MBR glitch - an NT
snafu that can drive you bananas.
I'll second the backup strategy though, but don't spend too much time
looking in or upgrading the BIOS.

JT

Do you know of any way to examine the contents of the MBR ? Because
he's already tried to repair the MBR to no effect. I hate stuff you
cannot debug. (Hmm. I wonder if you can load an MBR that just prints
something on the screen :-)

I guess my thinking was, that booting from the floppy was using an OS
routine to access the disk, while booting from the hard drive was using
the BIOS for at least a short period of time. So, either the IDE operation
is asking for too large a group of sectors, or maybe the geometry doesn't
match, and instead of the MBR, something entirely different is being
fetched.

Maybe posting the problem in one of the storage or Linux groups would
tweak some memories ?

Paul
 
I have a floppy boot disk for W2k that has got my system going in the past
when I couldn't boot from the HDD.

It seems to have NTLDR and a few other things on it.

Could it be these start up files that are dodgy.

I think I got the info for the startup disk either from MS KB or one of
those boot disk internet sites. It was free anyway.

the_gnome
 
It would help to know if Chris' OS is XP or NT/WIN2K.
NT/2K uses a differnet set of loaders than XP. ie no boot.ini or NTLDR.
You *can* make a backup or copy of the MBR, but you need a utility to do
it.
DiskPro was useful as is Harddrive Mechanic...
Symantec has a tool called BOOT_ALL that copies the MBR to a boot.dat file
for analysis. I guess Chris could do that do verify that there is no
undetected virus...

JT
 
Back
Top