Weird HD problem.

  • Thread starter Thread starter George Macdonald
  • Start date Start date
G

George Macdonald

Again, I'd rather avoid .storage aggro:-) but this one has me baffled: I
have a K8 system (Athlon64 3500+ on a MSI K8N Neo4-F) which is used as a
File/Print/DNS server with a Promise Fasttrak100 TX2 PCI card swappable
RAID-1 set up. This was a hot-swappable RAID kit sold by Promise called a
Fasttrak100 TX2 Pro
http://www.promise.com/product/product_detail.asp?product_id=4 and it's
been working fine for 4-5 years now. I bought two complete kits so I'd
have four drawers and a spare controller.

On Saturday, one of the RAID array drives started dropping out of "ready",
triggering a rebuild, followed by repeating "drop-outs" and rebuilds... so
Sunday I trudged off to the office to err, fix it.:-) We have four
identical Seagate UDMA-100 drives and one of the active pair gets swapped
once per week.

With the swappable drawers/enclosures etc. it took me a while to boil this
down to the "faulty part" but here's the essence of it: the drive which
failed to rebuild gives faulty sectors on every single sector with the
Seagate Seatools diagnostics when connected to the Promise Fasttrak100 TX2
PCI card; the same drive passes OK when connected to the motherboard
IDE-PATA connector.

This is with the same IDE cable -- tried with several 80-wire conductor
cables, including two fresh ones -- connecting the drive directly to the
IDE ports, i.e. without the enclosures. All the other three drives pass
diagnostics, and work fine in real use, when connected to the Fasttrak100
TX2; I also tried a different Fasttrak100 TX2 PCI card, with same results.

So, the bottom line is that this one drive just doesn't work right when
connected to the Fasttrak100 TX2 but is OK when connected to the mbrd
(nForce4) IDE-PATA connector. I don't understand this - how can that be?
AFAIK the Seagate diags only read from the drive so it would not appear to
be a signal cable/connector thing... how can the controller affect whether
the drive gets bad sector reads? I guess I should check full operation of
the drive on the mbrd IDE connection but dunno if I have the time or
motivation.

I'm going to get a new drive of course but... anybody with some wisdom
here?
 
So, the bottom line is that this one drive just doesn't work right when
connected to the Fasttrak100 TX2 but is OK when connected to the mbrd
(nForce4) IDE-PATA connector. I don't understand this - how can that be?

I'm assuming that you tried moving the drive to different ports on the
controller. Marginal logic levels or timing would be my guess. Termination
on the drive or controller could also play a factor.
 
So, the bottom line is that this one drive just doesn't work right when
connected to the Fasttrak100 TX2 but is OK when connected to the mbrd
(nForce4) IDE-PATA connector. I don't understand this - how can that be?
AFAIK the Seagate diags only read from the drive so it would not appear to
be a signal cable/connector thing... how can the controller affect whether
the drive gets bad sector reads? I guess I should check full operation of
the drive on the mbrd IDE connection but dunno if I have the time or
motivation.

I'm going to get a new drive of course but... anybody with some wisdom
here?

I suspect this to be a promise problem, especially if it works off the
onboard controller. I would swap the controller, if you have one on
hand. Then we would know for sure, but promise has not been know for
their quality as of late.

Gnu_Raiz
 
I suspect this to be a promise problem, especially if it works off the
onboard controller. I would swap the controller, if you have one on
hand. Then we would know for sure, but promise has not been know for
their quality as of late.

I was afraid the post was maybe a bit too long:-):
"I bought two complete kits so I'd have four drawers and a spare
controller." &
" I also tried a different Fasttrak100 TX2 PCI card, with same results."

AFAICT none of the current add-in HD controller cards seem to be getting
good marks on quality.:-(
 
I'm assuming that you tried moving the drive to different ports on the
controller. Marginal logic levels or timing would be my guess. Termination
on the drive or controller could also play a factor.

Yep different port *and* a different controller of the same model purchased
at the same time, in the same PCI slot. I think you may be right about the
signal margins. Since the drive had worked fine for 3 years or so on this
same controller, i'm inclined to think a possible degrading of the drive
internal margins, in combination with a noisy controller or PCI Bus on the
MSI mbrd.
 
George Macdonald said:
So, the bottom line is that this one drive just doesn't work right when
connected to the Fasttrak100 TX2 but is OK when connected to the mbrd
(nForce4) IDE-PATA connector. I don't understand this - how can that be?

Have you tried a different mobo or PSU? It sounds like the TX2 doesn't
produce adequate signals, and that would be a really bad goof.

But the TX2 won't work well if it doesn't get enough power, or
if the power is dirty (noisy). When caps fail closed, the results
are spectacular. But how do you know when they fail open?

-- Robert
 
Again, I'd rather avoid .storage aggro:-) but this one has me baffled: I
have a K8 system (Athlon64 3500+ on a MSI K8N Neo4-F) which is used as a
File/Print/DNS server with a Promise Fasttrak100 TX2 PCI card swappable
RAID-1 set up. This was a hot-swappable RAID kit sold by Promise called a
Fasttrak100 TX2 Pro
http://www.promise.com/product/product_detail.asp?product_id=4 and it's
been working fine for 4-5 years now. I bought two complete kits so I'd
have four drawers and a spare controller.

On Saturday, one of the RAID array drives started dropping out of "ready",
triggering a rebuild, followed by repeating "drop-outs" and rebuilds... so
Sunday I trudged off to the office to err, fix it.:-) We have four
identical Seagate UDMA-100 drives and one of the active pair gets swapped
once per week.

With the swappable drawers/enclosures etc. it took me a while to boil this
down to the "faulty part" but here's the essence of it: the drive which
failed to rebuild gives faulty sectors on every single sector with the
Seagate Seatools diagnostics when connected to the Promise Fasttrak100 TX2
PCI card; the same drive passes OK when connected to the motherboard
IDE-PATA connector.

This is with the same IDE cable -- tried with several 80-wire conductor
cables, including two fresh ones -- connecting the drive directly to the
IDE ports, i.e. without the enclosures. All the other three drives pass
diagnostics, and work fine in real use, when connected to the Fasttrak100
TX2; I also tried a different Fasttrak100 TX2 PCI card, with same results.

So, the bottom line is that this one drive just doesn't work right when
connected to the Fasttrak100 TX2 but is OK when connected to the mbrd
(nForce4) IDE-PATA connector. I don't understand this - how can that be?
AFAIK the Seagate diags only read from the drive so it would not appear to
be a signal cable/connector thing... how can the controller affect whether
the drive gets bad sector reads? I guess I should check full operation of
the drive on the mbrd IDE connection but dunno if I have the time or
motivation.

I'm going to get a new drive of course but... anybody with some wisdom
here?
While I can't pinpoint your problem, I recently learned myself that
MSI mobo and Promise controller don't play well together. But in your
case... Either the drive, or the controller is marginal. Or both...
With drives, especially udma100, so cheap these days, just replace it
and don't worry.
NNN
 
Have you tried a different mobo or PSU? It sounds like the TX2 doesn't
produce adequate signals, and that would be a really bad goof.

Both PSU and mbrd are relatively new - mbrd close to a year and the Antec
500W PSU about 6months. I tried to use a different system but it was a
VIA-based chipset mbrd and the bloody diags would not run on it.<sigh> By
that time, after several hours, I needed to get a working server.

I don't see how the TX2 signal strength is an issue - the diag is reporting
"bad sector" for every single sector read, which I interpret as meaning the
hdd electronics is getting ECC errors reading the platter, which cannot be
corrected.
But the TX2 won't work well if it doesn't get enough power, or
if the power is dirty (noisy). When caps fail closed, the results
are spectacular. But how do you know when they fail open?

Always a possibility but the other disks in the set of 4 work fine. The
Promise SuperSwap enclosures have power/fan monitors which all show normal,
with voltage a little high, if anything... though that may not mean much.

I don't know if I'm going to have time to pusue this any further; ATM I'm
treating it as hard disk which has gone "bad", if only marginally, and will
replace it.

My current thinking is that either the controller or mbrd PCI Bus/slot has
a higher level of noise on signals than the on-board nVidia controller...
though I'm not sure if/how that noise can get through to the hard drive's
head amplification circuits to corrupt them. The TX2 has one empty PCI
slot between it and the PCI-E video card -- maybe close enough for some
kind of cross-talk? -- and the IDE cables are not long enough to go an
extra slot away.:-(
 
George Macdonald said:
Always a possibility but the other disks in the set of 4
work fine. The Promise SuperSwap enclosures have power/fan
monitors which all show normal, with voltage a little high,
if anything... though that may not mean much.

Do the other disks still work fine when plugged into that controller port?
My current thinking is that either the controller or mbrd
PCI Bus/slot has a higher level of noise on signals than
the on-board nVidia controller... though I'm not sure
if/how that noise can get through to the hard drive's
head amplification circuits to corrupt them. The TX2 has
one empty PCI slot between it and the PCI-E video card --
maybe close enough for some kind of cross-talk? -- and the
IDE cables are not long enough to go an extra slot away.:-(

Crosstalk is over ~1mm, not 2+cm. Long EIDE cables may be
a problem. Too straight may cause crosstalk.

-- Robert
 
Do the other disks still work fine when plugged into that controller port?

Yep and the bad drive fails in a different, exact same model, TX2. I'm
also beginning to wonder about accuracy of HDD diags now.
Crosstalk is over ~1mm, not 2+cm. Long EIDE cables may be
a problem. Too straight may cause crosstalk.

Yeah well "some kind of" was figuring possibly RF or maybe even induced
through proximity of PCI and PCI-E traces... considering the quite
different frequencies of PCI & PCI-E. The EIDE cables are the Promise ones
which came with the TX2 and are standard length - 18"?

This whole thing has me wondering about the MSI mbrds - first sign of
possible problems with them but I dunno what else to look at -- Asus maybe
-- *and* I need to build a new Intranet Web Server. I wish some of the Web
sites would just use a spectrum analyzer on the motherboards instead of all
this subjective testing... but then that would probably put them out of
business.:-)
 
Back
Top