drive keeps having partition problems

  • Thread starter Thread starter mechphisto
  • Start date Start date
M

mechphisto

I have a friend with a 250GB Seagate Barracuda 7200.10 SATA (in a Win
XP Pro machine) that's throwing weird partition issues at the drop of
a hat.

Last week, after the PC crashed in a game, the entire drive (3 primary
NTFS partitions) became inaccessible. Partition Magic 8 showed it as
one errored partition. I could get data off any of the three
partitions with a partition recovery program, but not actually restore
the partitions themselves. I finally had to 0-out the drive and re-
partition.

Now, yesterday, a power surge during a rain storm (he really does know
better than that) it powered down, and when it came back, the first
(active) partition (this time FAT32) was showing as unformatted! (At
least the other 2 partitions are still there and Partition magic can
"see" them.)

All CHKDSK tests, Partition Magic tests, other drive checking programs
all discover no errors with the drive. (Well, once it's partitioned
and formatted).
Any ideas?
Thanks,
Liam
 
In said:
I have a friend with a 250GB Seagate Barracuda 7200.10 SATA (in a Win
XP Pro machine) that's throwing weird partition issues at the drop of
a hat.
Last week, after the PC crashed in a game, the entire drive (3 primary
NTFS partitions) became inaccessible. Partition Magic 8 showed it as
one errored partition. I could get data off any of the three
partitions with a partition recovery program, but not actually restore
the partitions themselves. I finally had to 0-out the drive and re-
partition.
Now, yesterday, a power surge during a rain storm (he really does know
better than that) it powered down, and when it came back, the first
(active) partition (this time FAT32) was showing as unformatted! (At
least the other 2 partitions are still there and Partition magic can
"see" them.)
All CHKDSK tests, Partition Magic tests, other drive checking programs
all discover no errors with the drive. (Well, once it's partitioned
and formatted).
Any ideas?
Thanks,
Liam

Potential reasons:

Driver or hardware issues that cause transient problems and
prevent timely write-back of information (unlikely) or cause
writes to the wring areas. It may be faulty RAM. It may be
a faultu cache entry or the like.

What would be interesting is whether there is any other
data corruption and what its exact form is (e.g.
is there an all-zero sector or is some other data in it).

Have you looked at the SMART attributes (any other test is
really quite meaningless today) and tun a long SMART selftest?

Arno
 
Potential reasons:

Driver or hardware issues that cause transient problems and
prevent timely write-back of information (unlikely) or cause
writes to the wring areas. It may be faulty RAM. It may be
a faultu cache entry or the like.

What would be interesting is whether there is any other
data corruption and what its exact form is (e.g.
is there an all-zero sector or is some other data in it).

Have you looked at the SMART attributes (any other test is
really quite meaningless today) and tun a long SMART selftest?

Arno

Hmm, how does one look at SMART attributes?
I normally have SMART off at the BIOS (have read SMART can just cause
more issues than it's worth oftentimes.) If I turn it on, what tool
can I use to view the attributes?

I can do another 0'ing out of the drive. Once I do that, how can I
then check to see if indeed it wrote all 0's?

Thanks for the reply!
Liam
 
Hmm, how does one look at SMART attributes?
I normally have SMART off at the BIOS (have read SMART can just cause
more issues than it's worth oftentimes.) If I turn it on, what tool
can I use to view the attributes?

I use the smartmontools (commandline, available on Linux
and windows). There is also a tool called "Everest", that
allows SMART access. And to just see the attributes, you can
use the current SpeedFan.
I can do another 0'ing out of the drive. Once I do that, how can I
then check to see if indeed it wrote all 0's?

Run a long SMART selftest. It does a complete surface scan.

Arno
 
Hmm, how does one look at SMART attributes?
I normally have SMART off at the BIOS (have read SMART can just cause
more issues than it's worth oftentimes.) If I turn it on, what tool
can I use to view the attributes?

I can do another 0'ing out of the drive. Once I do that, how can I
then check to see if indeed it wrote all 0's?

Thanks for the reply!
Liam- Hide quoted text -

- Show quoted text -

Head over to Seagate's web site and get the diagnostics for the hard
drive.

http://www.seagate.com/www/en-us/support/downloads/seatools/

Until the drive has been tested and found to be "ok" do not use this
drive at all. It sounds like it is failing.
 
smlunatick said:
Head over to Seagate's web site and get the diagnostics for the hard
drive.

http://www.seagate.com/www/en-us/support/downloads/seatools/

Until the drive has been tested and found to be "ok" do not use this
drive at all. It sounds like it is failing.

Agreed, and for good measure, replace the sata cable. I've just thrown
out yet ANOTHER sata cable that was causing intermittent issues where
the drive would become inaccessible.

Ari

--
spammage trappage: remove the underscores to reply
Many people around the world are waiting for a marrow transplant. Please
volunteer to be a marrow donor and literally save someone's life:
http://www.abmdr.org.au/
http://www.marrow.org/
 
Agreed, and for good measure, replace the sata cable. I've just thrown
out yet ANOTHER sata cable that was causing intermittent issues where
the drive would become inaccessible.

Ari

--
spammage trappage: remove the underscores to reply
Many people around the world are waiting for a marrow transplant. Please
volunteer to be a marrow donor and literally save someone's life:http://www.abmdr.org.au/http://www.marrow.org/- Hide quoted text -

- Show quoted text -

Drive not accessable is one thing. Erasure of all partition is
another and seems to indicate a big problem since the partitions are
gone but the drive still is connect (BIOS may show it.)
 
I use the smartmontools (commandline, available on Linux
and windows). There is also a tool called "Everest", that
allows SMART access. And to just see the attributes, you can
use the current SpeedFan.


Run a long SMART selftest. It does a complete surface scan.

Arno

Easy Recovery Pro found no drive errors, including surface scan, but
it did find partition errors. (And advised to do a drive test. Hrrm.)
The Everest SMART shows OK and "passes" on all lines for the drive.
The smartctl is a bit complicated... I used "-t long" and never got a
report.
But the -H switch indicates overall self-assessment test is PASSED.
Not sure what else I can check.
I'm going to reformat the lost partition and put the OS back on...and
then test again and see if anything changes.
 
Easy Recovery Pro found no drive errors, including surface scan, but
it did find partition errors. (And advised to do a drive test. Hrrm.)
The Everest SMART shows OK and "passes" on all lines for the drive.

That doesnt prove anything, post the actual report.
 
Easy Recovery Pro found no drive errors, including surface scan, but
it did find partition errors. (And advised to do a drive test. Hrrm.)
The Everest SMART shows OK and "passes" on all lines for the drive.
The smartctl is a bit complicated... I used "-t long" and never got a
report.

But the -H switch indicates overall self-assessment test is PASSED.
Not sure what else I can check.
I'm going to reformat the lost partition and put the OS back on...and
then test again and see if anything changes.

The "pass" on the smart attributes are sometimes over-optimistic. Can
you post the output from "smartctrl -a <device>" here?

Arno
 
In comp.sys.ibm.pc.hardware.storage Franc Zabkar said:
[...]
BTW, don't be alarmed by the very high numbers for Raw Read Error
Rate, Seek Error Rate, and Hardware ECC Recovered for Seagate HDs. My
own testing and research leads me to believe that these are normal and
do not in fact reflect errors.

Same here too. I believe these are from read accesses that were
started immediately after a seek and before the heads really
settled. This is fine, if the read and the ECC fails, the disk can do
a re-read with rettled heads. On writing, the disk gives the
heads more time after a seek.

Arno
 
smlunatick said:
Drive not accessable is one thing. Erasure of all partition is
another and seems to indicate a big problem since the partitions are
gone but the drive still is connect (BIOS may show it.)

A bad cable can mimic a lot of issues - even causing BIOS to misreport
drive size, yet the drive is still detected, etc.

Ari

--
spammage trappage: remove the underscores to reply
Many people around the world are waiting for a marrow transplant. Please
volunteer to be a marrow donor and literally save someone's life:
http://www.abmdr.org.au/
http://www.marrow.org/
 
I've seen strange BIOS reports for PATA drives.
Here is a case where a dropped bit in a PATA cable caused a FUJITSU
MPE3102AT or MPE3102AP hard drive to be mis-detected by the BIOS as a
"BUJIP5Q IPA3102AP". Consequently a Fujitsu HD diagnostic was unable
to see the drive.

Fascinating. But I don't see how that would corrupt the partition
table(s). Hmm. Changed bist in sector numbers for writes? But
that should also result in data corruption all over the disk.
AFAICS, a serial (SATA) cable could not produce the same kind of
error. The drive would be either correctly detected or not at all.

It can be detected with error. SATA puts chscksums on all intructions
and data transfers. I had one cable that causet the disk to be
removed from the system (linux) after a few minutes, because there
the kernel got too many disk errors on access.

Arno
 
Franc said:
I've seen strange BIOS reports for PATA drives.

Here is a case where a dropped bit in a PATA cable caused a FUJITSU
MPE3102AT or MPE3102AP hard drive to be mis-detected by the BIOS as a
"BUJIP5Q IPA3102AP". Consequently a Fujitsu HD diagnostic was unable
to see the drive.

http://groups.google.com/group/micr..._discussion/msg/1e561df4ba4b54c4?dmode=source
http://groups.google.com/group/micr..._discussion/msg/50a71eb70cc9c35f?dmode=source

AFAICS, a serial (SATA) cable could not produce the same kind of
error. The drive would be either correctly detected or not at all.

I should have taken a photo last night of the salad like detection of a
WD 320GB SATAII drive that popped onto my screen during drive detection
with a 'bad' cable then :) It on a prior power on it didn't detect the
drive at all on one occasion.

Ari


--
spammage trappage: remove the underscores to reply
Many people around the world are waiting for a marrow transplant. Please
volunteer to be a marrow donor and literally save someone's life:
http://www.abmdr.org.au/
http://www.marrow.org/
 
In comp.sys.ibm.pc.hardware.storage Franc Zabkar said:
On 2 Apr 2008 09:48:17 GMT, Arno Wagner <[email protected]> put finger to
keyboard and composed: [...]
It can be detected with error. SATA puts chscksums on all intructions
and data transfers. I had one cable that causet the disk to be
removed from the system (linux) after a few minutes, because there
the kernel got too many disk errors on access.

Arno
Then how do you explain spodosaurus's observations?

Matches what I said: There may be errors on the cable, but
they are detected (not: corrected), so the data on disk typically
stays intact. If the disk is detected, then it is detected
correctly. It may, however, fail temporarily or only be
detected in some cases. The OS may also decide the disk is unusable.

Arno
 
In comp.sys.ibm.pc.hardware.storage Franc Zabkar said:
On 2 Apr 2008 00:17:16 GMT, Arno Wagner <[email protected]> put finger to
keyboard and composed:
I just tried booting to DOS with Smartdrv disc caching enabled for
drive C:.
If I execute ...
smartudm 0 /r con
... on my 120GB Seagate HD, the "Seek Error Rate" increases by 8
points each time and the "Raw Read Error Rate" and "Hardware ECC
recovered" values both increase by 3 points. The latter two parameters
have identical values. If your hypothesis were correct, then I would
think that there should be at least as many read errors as seeks.

You do get the same number of read errors as ECC recoverr, don't you?
That is why the read errors are "raw", they are before attempting ECC.
Instead I suspect that there are no real errors at all. At the very
least it seems to me that all three parameters reflect some kind of
count rather than a rate, although that begs the question, why only 3
reads for every 8 seeks?

These are errors, but expected and recoverable ones. Calling
them not real errors is maybe inaccurate, but captures the
spirit. And, yes, these are counts, that get decreased
periodically in some fashion. As to why 3 read errors for 8
seek errors, I would think that not finding a sector is also
a seek error, but if you have nothing, you also have
nothing to read wrongly.
To test my hypothesis that the "Seek Error Rate" figure is actually a
count, I captured the SMART data before and after a SeaTools zero fill
operation on a 13GB ST313021A Seagate HD. The difference in the Seek
Error Rate was 52232 counts.
According to the U Series 8 Product Manual ...

... this drive has 18700 tracks/inch and 3 data surfaces.
Assuming that there are 3 seeks per track (due to the action of the
embedded servo during head switching ???),

Yes, that requires a seek with modern drives. Historically
head switches could be done without in some designs, and
were faster. Not anymore.
then one would expect that
the distance between the first and last tracks would be ...
52232 / 18700 / 3 * 2.54 = 2.36 cm
I measured the difference between the outside and inside diameters on
two typical discs to be 3.5cm. Is it plausible that the usable data
area amounts to only 2.36cm of the surface?

Quite. At least it directly fits what I have seen in 3.5" drives
I opened.

Arno
 
Timothy Daniels said:
Arno Wagner wrote
It makes sense that the radius of the outside tracks don't differ too much from the radius of the inside tracks.
Assuming that all tracks have the same number of bits,

Dud assumption. The datasheets always show that the sectors per
track varys in bands across the platter surface with all modern drives.
the bits on an outside track that had twice the radius of the inside track would be twice as long as the bits on the
inside track.

See above.
The ability of the electronics to interact with the magnetic media is probably optimized for a small range of bit
lengths, and thus for a small range of track radii.

Guess again.
 
Back
Top