Have I found a CHKDSK Bug on NTFS Drives?

  • Thread starter Thread starter Thomas Platt
  • Start date Start date
T

Thomas Platt

For starters, I'm a technically-oriented IT pro with over
35 years working in the computer industry. In addition,
I have over 22 years working on PCs running a variety of
operating systems - including linux, Unix, CPM, DRDOS and
several versions of Windows including 3.1, 95, 98, ME and
XP. So, I'm both very thorough AND highly experienced
when it comes to analyzing and diagnosing hardware and
software problems.

I'm running XP Home Edition on my Compaq Presario Laptop
model 1510. Attached to this machine via the 1394
firewire port are two 250GB external Maxtor hard drives
which are formatted using the NTFS file system. It's
worth noting here that these two hard drives contain a
relatively small number VERY large AVI data files which
average 15 - 20 gb apiece in size and that one drive is
essentially used as a static backup of the files stored
on the other. In short, drive E is used to store backups
of the files that were originally captured and stored on
drive F in case of a calamity involving drive F. I say it
is a static backup because there's no effort to maintain
the drive dynamically (i.e. no ghost utility or anything
like that). These AVI files rarely change. I only use
them as my source to create videos.

Yesterday, while running a defrag on one of those two
250GB hard drives, my system suddenly shut down with an
apparent temperature problem. It has had this problem for
months but Compaq has been unable to fix it -- advising
me instead that I "shouldn't use my computer so hard". (I
kid you not!)

Anyway, I came back to the machine yesterday to find it
had powered off and the defrag had suddenly been
terminated. This morning, in an effort to see whether any
damage had been done (write caching is disabled on both
of these drives, so there should not have been any
damage.) , I ran chkdsk/f on the drive that was being
defragged when the failure occured and chkdsk found (and
I very carefully wrote down) nineteen files which it
claimed had bad clusters and of course it also claimed to
have fixed those errors. But later, when I ran the
chkdsk/f again on the same drive to verify that the
errors had indeed been corrected, I found that chkdsk
reported the exact same errors in the exact same list of
19 files that it had found and (supposedly) corrected the
first time and of course, once again it claimed to have
fixed the errors!

At this point, I got quite curious. So, I ran chkdsk/f
again on the backup drive (i.e. the one that was NOT
being defraged when the system shutdown occured) and low
and behold, chkdsk reported the exact same errors on the
same 19 files on THAT drive PLUS errors on four other
Windows backup drive images (which also average over 20gb
in size) that were also stored on that drive but were NOT
on the first drive. In short, chkdsk found, reported and
supposedly corrected the exact same errors on the a
nearly identical set of files stored on two different
hard drives.

As a result of this exercise, I strongly suspect I've
encountered a BUG in chkdsk which causes it to report
errors erroneously on very large data files stored on an
NTFS file system. I've checked the Microsoft knowledge
base on both chksk and the NTFS file system but naturally
found no mention of such a problem.

Has anyone else encountered or reported what are
apparently spurious errors like this from chkdsk?

Thanks!
 
Hmm, that's unusual, haven't run into that before, my porn files don't get
that large :) j/k...

What does your HD MFG's utility report?

Might I suggest mirroring the drives? (I don't know for certain whether
that's supported in home edition, but it should be...) Also, get on
pricewatch and get yourself some mega fans, a replacement heatsink, etc.
Shouldn't cost more than 25 bucks. If you're hardcore, you could always go
for the water cooling...
 
-----Original Message-----
For starters, I'm a technically-oriented IT pro with over
35 years working in the computer industry. In addition,
I have over 22 years working on PCs running a variety of
operating systems - including linux, Unix, CPM, DRDOS and
several versions of Windows including 3.1, 95, 98, ME and
XP. So, I'm both very thorough AND highly experienced
when it comes to analyzing and diagnosing hardware and
software problems.

I'm running XP Home Edition on my Compaq Presario Laptop
model 1510. Attached to this machine via the 1394
firewire port are two 250GB external Maxtor hard drives
which are formatted using the NTFS file system. It's
worth noting here that these two hard drives contain a
relatively small number VERY large AVI data files which
average 15 - 20 gb apiece in size and that one drive is
essentially used as a static backup of the files stored
on the other. In short, drive E is used to store backups
of the files that were originally captured and stored on
drive F in case of a calamity involving drive F. I say it
is a static backup because there's no effort to maintain
the drive dynamically (i.e. no ghost utility or anything
like that). These AVI files rarely change. I only use
them as my source to create videos.

Yesterday, while running a defrag on one of those two
250GB hard drives, my system suddenly shut down with an
apparent temperature problem. It has had this problem for
months but Compaq has been unable to fix it -- advising
me instead that I "shouldn't use my computer so hard". (I
kid you not!)

Anyway, I came back to the machine yesterday to find it
had powered off and the defrag had suddenly been
terminated. This morning, in an effort to see whether any
damage had been done (write caching is disabled on both
of these drives, so there should not have been any
damage.) , I ran chkdsk/f on the drive that was being
defragged when the failure occured and chkdsk found (and
I very carefully wrote down) nineteen files which it
claimed had bad clusters and of course it also claimed to
have fixed those errors. But later, when I ran the
chkdsk/f again on the same drive to verify that the
errors had indeed been corrected, I found that chkdsk
reported the exact same errors in the exact same list of
19 files that it had found and (supposedly) corrected the
first time and of course, once again it claimed to have
fixed the errors!

At this point, I got quite curious. So, I ran chkdsk/f
again on the backup drive (i.e. the one that was NOT
being defraged when the system shutdown occured) and low
and behold, chkdsk reported the exact same errors on the
same 19 files on THAT drive PLUS errors on four other
Windows backup drive images (which also average over 20gb
in size) that were also stored on that drive but were NOT
on the first drive. In short, chkdsk found, reported and
supposedly corrected the exact same errors on the a
nearly identical set of files stored on two different
hard drives.

As a result of this exercise, I strongly suspect I've
encountered a BUG in chkdsk which causes it to report
errors erroneously on very large data files stored on an
NTFS file system. I've checked the Microsoft knowledge
base on both chksk and the NTFS file system but naturally
found no mention of such a problem.

Has anyone else encountered or reported what are
apparently spurious errors like this from chkdsk?

Thanks!

Your post raises a few questions:

1) Nothing to do with the problem, if there is one, but
don't you think that defragging disks used as you
describe is a bit anal? This ain't 1985, you know, and
disks used as you describe (large files, few changes)
are not likely to ever become fragmented to the point
where performance will be affected.

2) How do you know that the errors reported by chkdsk were
the result of the shutdown? Isn't it possible that the
errors were present before that, and copied to the
backup drive,which would account for them being reported
in both places? I know this doesn't account for
chkdsk's failure to actually remedy the situation, but
it makes more sense than assuming that a chkdsk bug
is the culprit.
 
OK; so far, so good. Crucial Q: How do you make the backups? At a
file level (i.e. copying files) or at a raw disk imaging level? If
imaging, is that a "dumb" image that carries over file system defects
as-is, or "smart" image that regenerates file system structure?

The answer to that bears on Wislu's later comments.

Are both HDs outside of this overheating case? How hot do they get?
Hot HDs tend to throw bad sectors and die.

There could be damage, given that Defrag is a stream of back-to-back
file operations that should be atomic, but aren't quite - it's almost
inevitable that cluster contents and file system structure will likely
be left out of synch. Even a small bomb will injure someone, if
thrown into a large and tightly-packed crowd.

Not sure how transaction rollback would handle this. Normally, if you
were copying one of these huge files and the copy was interrupted by a
bad exit, the whole interrupted transaction would be rolled back.
That's nice spin-speak for; the whole file is discarded.

But I dunno if defrag file content moves are managed as formal file
transactions. On the one hand, they should be - the risk is as high,
or higher - but on the other hand, the performance impact may suck,
and there may be issues with where this rollback info is stored (on a
volume that is being defragged, hmm.)
I ran chkdsk/f on the drive that was being

Bad clusters, as in surface defects? That's a whole nother order of
pain, like going to see your phychoanalyst for a cranial bullet wound.

Ew.. if you had said "ChkDsk /r found the same 19 bad clusters", and I
knew ChkDsk /r re-tested existing bad clusters, I'd say "well, fine".
Or, if ChkDsk /r didn't re-test clusters but kept finding new bad
ones, I'd say "well, that's what one might expect of a dying HD".

But files and recurrent logic errors; that's odd.

Does ChkDsk say exactly WHAT these errors are? Some directory entry
metadata errors may persist and be carried wherever the file goes,
e.g. invalid date stamps or illegan file name characters. In fact, an
app that always applies the same bad metadata may have this effect.

It would be nice to chase this at the raw file structure level, but we
can't, as deep structure is undocumented.
Your post raises a few questions:

1) Nothing to do with the problem, if there is one, but
don't you think that defragging disks used as you
describe is a bit anal? This ain't 1985, you know, and
disks used as you describe (large files, few changes)
are not likely to ever become fragmented to the point
where performance will be affected.

2) How do you know that the errors reported by chkdsk were
the result of the shutdown? Isn't it possible that the
errors were present before that, and copied to the
backup drive,which would account for them being reported
in both places? I know this doesn't account for
chkdsk's failure to actually remedy the situation, but
it makes more sense than assuming that a chkdsk bug
is the culprit.

--------------- ----- ---- --- -- - - -
Dreams are stack dumps of the soul
 
Back
Top