R
Ron Reaugh
I'm seeing a set of behaviors over a period of time across several different
mfgs' EIDE HDs(mfged in that last 3 years) that leads me to a new
hypothesis.
It has always been my understanding that the bad sector processing procedure
was one whereby if the drive began to find a sector that was hard to
read(retries required) then it would copy the data to one of is spare
sectors and move that bad sector to the 'bad pool' onboard the HD. This
process would happen on the fly and be essentially transparent to the user
and OS.
Also it was my understanding that if the HD suddenly found a sector that it
could not read and get a completely proper data validation check then the
drive would return a error to the OS declaring a read failure.
When's the last time anyone ever saw a HD data read failure during normal
processing on an EIDE HD in 98SE, ME or XP off a standard mobo EIDE
controller for just a single sector for a small few bit data flaw? When the
last time anyone ever saw a "small read error" in defrag? I've always
ascribed my 'no' aswers to the above two questions to a belief that flaw
processing was working and one didn't ever see sectors go full bad anymore.
The theory that I'm developing is that the drive's internal flawing process
is in fact doing what I describe above but it is also doing it when it can't
completely with validation read a sector before copying it to the spare OR
it's not doing the flawing and leaving the 'mostly readable' sector with the
error and the OS(something) is letting it 'slide by' unreported.
This only becomes apparent during critical processes like Windows
initialization where one simply sees "Windows Protection Error". In more
that one case I've traced such "Windows Protection Error"s to code (DLL or
EXE) that's sat on the HD untouched with NO intervening defrag for along
time prior. A sector just suddenly has bad data in it and Scandisk thorough
isn't finding it. I believe that I'm seeing a slow accumulation of
unreported bad sectors on drives that Scandisk thorough does not find nor
fix. Those accumulating few bad bits would likely go unnoticed anywhere
else or just be fixed by reinstalling an app with a shrug as to what the
problem had been.
I've seen this behavior on quite a number of drives in the several months
prior to a drive obviously going bad or entering a period where the flaw
processing is using up the spares. I see this especially on HDs where a
Scandisk thorough keeps restarting back to checking the folders when only
Explorer and Scandisk are in task manager. I'm guessing that lots of
flawing is going on. Some how the drive is signalling the OS to restart
Scandisk(something changed) without ever declaring any errors.
One recent case I studied was on a Quantum 20GB LCT15 which has ME and just
suddenly quit booting on a Windows Protection Error. No defrag in >6
months. A module in the Nvidia driver was bad(I assume reloading the Nvidia
driver fixed it) and that driver was untouched for many months before.
Scandisk thorough took 15 hours to complete restarting every 5 minutes or
so. The Scandisk finished with NO bad sectors being reported nor any other
error message(normal completion).
I then ran the latest Maxtor Diags and it passed the advanced test. I then
ran the 'low level format' and it went to completion and reported "PASSED"
and didn't take unusually long(1-2 hours). However I noticed that only
39102208 blocks were available rather than the target value for the drive
that had been displayed all along during the low level format 39102336. All
the spares must be used up.
I can find no other reasonable explanation. HD data is going bad and is
not being reported or not being detected.
My theory is that EIDE HD mfgs have determined that they can decrease their
support and warranty costs if they subtly change the error processing to let
'little errors' go by. A bit her and a bit there and nobody will notice,
which is true in the vast majority of cases. Read no evil....report no evil
for a few bits here and there saves them big bucks.
Anyone got any ideas/information?
mfgs' EIDE HDs(mfged in that last 3 years) that leads me to a new
hypothesis.
It has always been my understanding that the bad sector processing procedure
was one whereby if the drive began to find a sector that was hard to
read(retries required) then it would copy the data to one of is spare
sectors and move that bad sector to the 'bad pool' onboard the HD. This
process would happen on the fly and be essentially transparent to the user
and OS.
Also it was my understanding that if the HD suddenly found a sector that it
could not read and get a completely proper data validation check then the
drive would return a error to the OS declaring a read failure.
When's the last time anyone ever saw a HD data read failure during normal
processing on an EIDE HD in 98SE, ME or XP off a standard mobo EIDE
controller for just a single sector for a small few bit data flaw? When the
last time anyone ever saw a "small read error" in defrag? I've always
ascribed my 'no' aswers to the above two questions to a belief that flaw
processing was working and one didn't ever see sectors go full bad anymore.
The theory that I'm developing is that the drive's internal flawing process
is in fact doing what I describe above but it is also doing it when it can't
completely with validation read a sector before copying it to the spare OR
it's not doing the flawing and leaving the 'mostly readable' sector with the
error and the OS(something) is letting it 'slide by' unreported.
This only becomes apparent during critical processes like Windows
initialization where one simply sees "Windows Protection Error". In more
that one case I've traced such "Windows Protection Error"s to code (DLL or
EXE) that's sat on the HD untouched with NO intervening defrag for along
time prior. A sector just suddenly has bad data in it and Scandisk thorough
isn't finding it. I believe that I'm seeing a slow accumulation of
unreported bad sectors on drives that Scandisk thorough does not find nor
fix. Those accumulating few bad bits would likely go unnoticed anywhere
else or just be fixed by reinstalling an app with a shrug as to what the
problem had been.
I've seen this behavior on quite a number of drives in the several months
prior to a drive obviously going bad or entering a period where the flaw
processing is using up the spares. I see this especially on HDs where a
Scandisk thorough keeps restarting back to checking the folders when only
Explorer and Scandisk are in task manager. I'm guessing that lots of
flawing is going on. Some how the drive is signalling the OS to restart
Scandisk(something changed) without ever declaring any errors.
One recent case I studied was on a Quantum 20GB LCT15 which has ME and just
suddenly quit booting on a Windows Protection Error. No defrag in >6
months. A module in the Nvidia driver was bad(I assume reloading the Nvidia
driver fixed it) and that driver was untouched for many months before.
Scandisk thorough took 15 hours to complete restarting every 5 minutes or
so. The Scandisk finished with NO bad sectors being reported nor any other
error message(normal completion).
I then ran the latest Maxtor Diags and it passed the advanced test. I then
ran the 'low level format' and it went to completion and reported "PASSED"
and didn't take unusually long(1-2 hours). However I noticed that only
39102208 blocks were available rather than the target value for the drive
that had been displayed all along during the low level format 39102336. All
the spares must be used up.
I can find no other reasonable explanation. HD data is going bad and is
not being reported or not being detected.
My theory is that EIDE HD mfgs have determined that they can decrease their
support and warranty costs if they subtly change the error processing to let
'little errors' go by. A bit her and a bit there and nobody will notice,
which is true in the vast majority of cases. Read no evil....report no evil
for a few bits here and there saves them big bucks.
Anyone got any ideas/information?