zeroing/data-filling fixes hard disk?

  • Thread starter Thread starter wlcna
  • Start date Start date
W

wlcna

Can zeroing a hard drive fix previously unusable blocks? It makes no
intuitive sense to me how putting certain data on a drive could affect the
readability of any of its sectors, but I seem to be seeing that.

My hard disk drive *seemed* to be irretrievably messed up and now it seems
to be claiming it's fine after I've done some combination of zero-filling
it and data-filling it. All tests done with the drive offline. I've
never seen such a thing before. Is my drive really fixed? I actually had
heard before that zero-filling a hard-drive could help a problem drive
work properly again, but that seems to make no sense, but in fact I now
seem to be seeing it.

I'm running Linux with this drive. I discovered the problem when suddenly
I noticed strange "I/O Error" messages when I was just doing ordinary
things in directories. I looked into the logs and found disturbing
low-level sector access problems. I ran the linux "badblocks" program in
read-only mode and I then saw the same errors popping up on my system's
console. Immediately after this, the machine became unbootable.

I figured the HD was toast.

Anyway, I got all my data off as best I could and just completed various
"wipes" of the disk, some zero-filling and some data-filling (0xAA's and
such, done in pieces because the wipe was taking forever once it hit bad
sectors in the early part of the drive, so I averted the sections I knew
had bad stuff in them to make the wipe go faster).

And now? My drive is magically reporting no problems, I have the test
running now and I'm seeing no console messages at all and I've already
passed the part of the drive that was producing reams and reams of these
messages previously.

Does this make any possible sense?
 
Can zeroing a hard drive fix previously unusable blocks? It makes no
intuitive sense to me how putting certain data on a drive could affect
the readability of any of its sectors, but I seem to be seeing that.

My hard disk drive *seemed* to be irretrievably messed up and now it
seems to be claiming it's fine after I've done some combination of
zero-filling it and data-filling it. All tests done with the drive
offline. I've never seen such a thing before. Is my drive really
fixed? I actually had heard before that zero-filling a hard-drive
could help a problem drive work properly again, but that seems to make
no sense, but in fact I now seem to be seeing it.

I'm running Linux with this drive. I discovered the problem when
suddenly I noticed strange "I/O Error" messages when I was just doing
ordinary things in directories. I looked into the logs and found
disturbing low-level sector access problems. I ran the linux
"badblocks" program in read-only mode and I then saw the same errors
popping up on my system's console. Immediately after this, the
machine became unbootable.

I figured the HD was toast.

Anyway, I got all my data off as best I could and just completed
various"wipes" of the disk, some zero-filling and some data-filling
(0xAA's and such, done in pieces because the wipe was taking forever
once it hit bad sectors in the early part of the drive, so I averted
the sections I knew had bad stuff in them to make the wipe go faster).

And now? My drive is magically reporting no problems, I have the test
running now and I'm seeing no console messages at all and I've already
passed the part of the drive that was producing reams and reams of
these messages previously.

Does this make any possible sense?

It's called "sparing". Any contemporary drive has a number of spare
sectors and the logic to use them--when it is reading and it finds a bad
sector it can't do a lot about it except complain because it doesn't
have a good copy of the data, but when it's writing and it encounters a
bad sector it just marks it bad and swaps in one of the spares. So your
zero-fill wrote to all normally accessible sectors and the drive could
then swap out all the bad ones.
 
wlcna said:
Can zeroing a hard drive fix previously unusable blocks? It makes no
intuitive sense to me how putting certain data on a drive could affect the
readability of any of its sectors, but I seem to be seeing that.

My hard disk drive *seemed* to be irretrievably messed up and now it seems
to be claiming it's fine after I've done some combination of zero-filling
it and data-filling it. All tests done with the drive offline. I've
never seen such a thing before. Is my drive really fixed? I actually had
heard before that zero-filling a hard-drive could help a problem drive
work properly again, but that seems to make no sense, but in fact I now
seem to be seeing it.

I'm running Linux with this drive. I discovered the problem when suddenly
I noticed strange "I/O Error" messages when I was just doing ordinary
things in directories. I looked into the logs and found disturbing
low-level sector access problems. I ran the linux "badblocks" program in
read-only mode and I then saw the same errors popping up on my system's
console. Immediately after this, the machine became unbootable.

I figured the HD was toast.

Anyway, I got all my data off as best I could and just completed various
"wipes" of the disk, some zero-filling and some data-filling (0xAA's and
such, done in pieces because the wipe was taking forever once it hit bad
sectors in the early part of the drive, so I averted the sections I knew
had bad stuff in them to make the wipe go faster).

And now? My drive is magically reporting no problems, I have the test
running now and I'm seeing no console messages at all and I've already
passed the part of the drive that was producing reams and reams of these
messages previously.

Does this make any possible sense?

No. Not with those problem areas skipped (not been written over).
 
J.Clarke said:
It's called "sparing". Any contemporary drive has a number of spare
sectors and the logic to use them -- when it is reading and it finds a bad
sector it can't do a lot about it except complain because it doesn't
have a good copy of the data,

This is where the sector is marked a 'bad sector candidate' in the drive's
internal administration.
but when it's writing and it encounters a bad sector it just marks it bad

You've got that mangled entirely.
There is no way a drive can detect a bad data sector on a write
unless it knows beforehand from it's own internal administration.
and swaps in one of the spares.

If necessary.
The drive will test the sector and if it turns out good it will reuse that sector.
So your
zero-fill wrote to all normally accessible sectors and the drive could
then swap out all the bad ones.

Problem is: he said that he avoided some areas that contained the bad sectors
to speed things up.
 
J.Clarke said:
It's called "sparing". Any contemporary drive has a number of spare
sectors and the logic to use them--when it is reading and it finds a bad
sector it can't do a lot about it except complain because it doesn't
have a good copy of the data, but when it's writing and it encounters a
bad sector it just marks it bad and swaps in one of the spares. So your
zero-fill wrote to all normally accessible sectors and the drive could
then swap out all the bad ones.

Thanks much for this clarification. That does make sense now. I've
always thought of disks as tracks and cylinders that are contiguous, this
"fix up" idea is new to me but makes sense.

I would wonder how effective it would be though, because for example in my
case the problem seemed to be affecting more and more sectors all the
time. Being that a disk *is* a thing with tracks, cylinders and such,
isn't it true that when some go bad, they may tend to start going bad in a
certain unison over time?

Anyway, just curious, but thanks for this information.
 
Folkert Rienstra said:
Problem is: he said that he avoided some areas that contained the bad sectors > to speed things up.

Here's one point that may clarify things: I was doing initially read-only
tests and then later write-based tests. The write-based tests are the
ones that were occurring when I stopped seeing the console messages about
failures.

Now I may have formulated my plan for skipping the bad sectors while doing
the read tests, but at the end of the whole process, I did use a
write-based test on the whole disk, and that was when I saw what I've
mentioned, no low-level errors appearing on the console. I did not
realize there was this read/write error traceability difference.

Perhaps if I had done a write-based test initially, I wouldn't have seen
any messages b/c according to you the drive can't tell anything about a
sector being bad at write time.

It seems that the lack of errors I was seeing may not even be meaningful
according to what you've said, since at write time the drive can't tell if
it's a bad sector or not. I stopped the test before it finished doing the
second part of the test, going from the beginning again and actually
comparing the values it wrote (linux badblocks writes in one long sweep
then only tests in a later sweep). I think I tried a simple read-only
badblocks test and that seemed to be working, not sure about that though,
I was really getting sick of the drive at that point, and it's back in a
box now, hopefully never to be seen by me again.
 
Thanks much for this clarification. That does make sense now. I've
always thought of disks as tracks and cylinders that are contiguous,
this"fix up" idea is new to me but makes sense.

I would wonder how effective it would be though, because for example
in my case the problem seemed to be affecting more and more sectors
all the time. Being that a disk *is* a thing with tracks, cylinders
and such, isn't it true that when some go bad, they may tend to start
going bad in a certain unison over time?

Generally speaking, and ignoring sparing, a drive will have a few bad
sectors when it comes from the factory. A few more may show up in the
first few weeks of operation. Then others will show up very rarely.
Eventually the drive dies for whatever reason--one of they ways a drive
may die is to show a rapidly increasing number of bad sectors. When
this happens sparing may mask the problem for a while, but eventually it
surfaces. When you have a drive showing a rapidly increasing number of
bad sectors it's generally wise to consider it to be on the verge of
failure and back it up before dinking with it further. Even though
writing all sectors allows the bad sectors to be swapped out eventually
all the spares will get used up.
 
wlcna said:
Here's one point that may clarify things: I was doing initially read-only
tests and then later write-based tests. The write-based tests are the
ones that were occurring when I stopped seeing the console messages about
failures.

Now I may have formulated my plan for skipping the bad sectors while
doing the read tests, but at the end of the whole process,
I did use a write-based test on the whole disk,

OK, that makes a lot more sense.
and that was when I saw what I've mentioned, no low-level errors appearing
on the console.

As expected.
I did not realize there was this read/write error traceability difference.

Data errors are only detected on reads. Writes are not checked.
Write errors e.g., may occur when the drive is unable to find the sector.
There are 2 possibilities: the error is soft (retries succeed eventually)
and the sector is reassigned on the read or the error is hard (unrecover-
able) and the sector is indicated as a 'candidate bad sector'. The 'can-
didate bad sector' can be reassigned on the next write to that sector.
Perhaps if I had done a write-based test initially, I wouldn't have seen
any messages b/c according to you the drive can't tell anything about a
sector being bad at write time.

Correct. That would require a write-check (which is very slow).
It seems that the lack of errors I was seeing may not even be meaningful
according to what you've said, since at write time the drive can't tell if
it's a bad sector or not.

It certainly can from it's internal administration, except that that info is
gathered at read time. Writes to a sector go unchecked except when it is
internally registered as a 'bad sector candidate'.
 
wlcna said:
Thanks much for this clarification. That does make sense now.
I've always thought of disks as tracks and cylinders that are contiguous,

It still was, once, when the spares resided in the same track or cylinder
and the remaining track was reordered (called a pushdown) so that the
sequential order was still maintained. That was a bit expensive and also
stood in the way of making drives faster so on current drive spares
usually are at the end of the medium now. This means that sector A and
sector A+1 can be in very different places after a bad sector reassigned.

Some drives can be low level formatted in such a way that the sequential
order of *all* sectors is again restored, like when it came from the factory.

this "fix up" idea is new to me but makes sense.

I would wonder how effective it would be though, because for example in
my case the problem seemed to be affecting more and more sectors all the
time.

It does in situations where external problems cause the bad sectors.
After taking care of the external causes the drive can be back to new.
Being that a disk *is* a thing with tracks, cylinders and such, isn't
it true that when some go bad, they may tend to start going bad in a
certain unison over time?

I fail to see what that has to do with "tracks, cylinders and such".
 
Some drives can be commanded to do read-after-write check for a few
power-ups (Maxtor, others?). The vendor's utility may employ this feature.
 
Some drives can be commanded to do read-after-write
check for a few power-ups (Maxtor, others?).

More strictly quite a few drives can have that enabled and
disabled and Maxtor chooses to enable that by default with
that being turned off after a number of power cycles.
The vendor's utility may employ this feature.

Quite possibly.
 
Alexander Grigoriev said:
Some drives can be commanded to do read-after-write check for a few
power-ups (Maxtor, others?).
The vendor's utility may employ this feature.

It doesn't need to when that capability is part of the sparing
process already.
I also don't think that it makes that much of a difference
between a drive initiated write-check and a programmed one.
And then there is the question why the app would offer a
verification run when it already does that for every write.
 
Back
Top