NTFS goes bust with repeatedly writing+deleting small files !

  • Thread starter Thread starter Martin T.
  • Start date Start date
M

Martin T.

Hi -

We have a WindowsXP/sp2 setup, that does repeatedly write&delete small
files on the harddisk. This setup runs on over 40 sites, with exactly
the same hardware. (The systems are collecting production data.)

We write&delete approx. 1 file per second.

Today we had the 7th site with a corrupt file-system. And its always
the directory where these small files are written. (Basically the
folder becomes unaccessible, and after CHKDSK /F the folder gets
converted into a file with 0KB)

Any ideas what we could do to prevent further problems?

The Details:
-----------------
The files (< 1KB) are written to facilitate a communication buffer. The
normal case is that a file is written by one thread (recv. thread) and
immediately read & deleted by another thread (processing thread).
Sometime the processing thread would be a bit slower, or has to wait
for the DB and then some files will exist longer, but most of the time
I guess the files only live for a very short time.

These files are all written/read/del in one directory for each
communication channel - i.e. we have about 2-4 directories where this
w/r/d takes place.

When the FS corruption occurs, no files can be written into the
respective directory, and after CHKDSK /F it often left a file with 0KB
with the same name as this folder.

Hardware: The machines are all Workstation PC's with a 160GB
RAID1(mirror) SATA System.
OS: Windows XP sp2
Software: Running an ORACLE 9i2 DB (data 3GB Disk usage) and some
services.

Now, as I understand NTFS, all these small files would get stored in
the MFT. What I do not understand is how this would start corrupting
the system. (There are never many small files concurrently on the
disk.)

I'll give you a stat from one site (except that we maybe should add a
defrag task, it doesn't look that bad to me):
--------------------------------------------------
C:\Documents and Settings\Administrator>defrag -a -v c:
Windows Disk Defragmenter
Copyright (c) 2001 Microsoft Corp. and Executive Software
International, Inc.
Analysis Report
Volume size = 149 GB
Cluster size = 4 KB
Used space = 22,21 GB
Free space = 127 GB
Percent free space = 85 %
Volume fragmentation
Total fragmentation = 21 %
File fragmentation = 42 %
Free space fragmentation = 0 %
File fragmentation
Total files = 110.369
Average file size = 264 KB
Total fragmented files = 9.667
Total excess fragments = 44.447
Average fragments per file = 1,40
Pagefile fragmentation
Pagefile size = 1,50 GB
Total fragments = 1
Folder fragmentation
Total folders = 15.774
Fragmented folders = 783
Excess folder fragments = 4.190
Master File Table (MFT) fragmentation
Total MFT size = 352 MB
MFT record count = 126.561
Percent MFT in use = 35
Total MFT fragments = 3
You should defragment this volume.
--------------------------------------------------

Checkdisk would produce something like this:
--------------------------------------------------
CHKDSK Info:
Description:
Checking file system on C:
The type of the file system is NTFS.

The volume is dirty.
Deleted corrupt attribute list entry
with type code 144 in file 112548.
Unable to locate attribute of type 0x90, lowest vcn 0x0,
instance tag 0x6 in file 0x1b7a4.
Deleted corrupt attribute list entry
with type code 176 in file 112548.
Unable to locate attribute of type 0xb0, lowest vcn 0x0,
instance tag 0x5 in file 0x1b7a4.
Unable to locate attribute with instance tag 0x5 and segment
reference 0x400000001b7a4. The expected attribute type is 0x90.
Deleting corrupt attribute record (144, $I30)
from file record segment 112548.
Unable to locate attribute with instance tag 0x6 and segment
reference 0x400000001b7a4. The expected attribute type is 0xb0.
Deleting corrupt attribute record (176, $I30)
from file record segment 112548.
Deleted corrupt attribute list entry
with type code 144 in file 112563.
Unable to locate attribute of type 0x90, lowest vcn 0x0,
instance tag 0x6 in file 0x1b7b3.
Deleted corrupt attribute list entry
with type code 176 in file 112563.
Unable to locate attribute of type 0xb0, lowest vcn 0x0,
instance tag 0x5 in file 0x1b7b3.
Unable to locate attribute with instance tag 0x5 and segment
reference 0x400000001b7b3. The expected attribute type is 0x90.
Deleting corrupt attribute record (144, $I30)
from file record segment 112563.
Unable to locate attribute with instance tag 0x6 and segment
reference 0x400000001b7b3. The expected attribute type is 0xb0.
Deleting corrupt attribute record (176, $I30)
from file record segment 112563.
The index root $I30 is missing in file 0x1b7a4.
Correcting error in index $I30 for file 112548.
The index root $I30 is missing in file 0x1b7b3.
Correcting error in index $I30 for file 112563.
The file name index present bit in file 0x1b7a4 should not be set.
Correcting a minor error in file 112548.
The file name index present bit in file 0x1b7b3 should not be set.
Correcting a minor error in file 112563.
Cleaning up minor inconsistencies on the drive.
CHKDSK is recovering lost files.
Cleaning up 73 unused index entries from index $SII of file 0x9.
Cleaning up 73 unused index entries from index $SDH of file 0x9.
Cleaning up 73 unused security descriptors.
Inserting data attribute into file 112548.
Inserting data attribute into file 112563.
Correcting errors in the master file table's (MFT) BITMAP attribute.
Correcting errors in the Volume Bitmap.
Windows has made corrections to the file system.

156191962 KB total disk space.
22689276 KB in 111048 files.
38848 KB in 15958 indexes.
0 KB in bad sectors.
431206 KB in use by the system.
65536 KB occupied by the log file.
133032632 KB available on disk.

4096 bytes in each allocation unit.
39047990 total allocation units on disk.
33258158 allocation units available on disk.
Internal Info: (...)
------------------------------------------------------------------------

So ...
If somebody knows a solution, or someone had similar problem with many
small-file operations, I'll be happy to hear any suggestions.

best regards,
Martin Trappel
Graz / Austria
 
When the FS corruption occurs, no files can be written into the
respective directory, and after CHKDSK /F it often left a file with 0KB
with the same name as this folder.

Hardware: The machines are all Workstation PC's with a 160GB
RAID1(mirror) SATA System.
OS: Windows XP sp2
Software: Running an ORACLE 9i2 DB (data 3GB Disk usage) and some
services.

What SATA controller are you using?

Have you disabled write caching?

Does this happen if you run this on the same hardware without the RAID-1
setup?
 
Can you put an IDE drive in and see if it occurs. Does event log say anything?

NTFS remembers file times of deleted and recreated filenames for a unspecified period of time (but short) to handle program that write a temp file and copy it over an existing file. Perhaps your names match.

Have you contacted MS.
 
Leythos said:
What SATA controller are you using?
Device Manager say: "Intel 82801FR SATA RAID Controller"

Have you disabled write caching?
We didn't change anything from the defaults - I've checked the ARRAY
properties in the DevMan and it says "Optimize for Performance", i.e.
Write Caching would be on.

Strange enough I'd guess that this could create problems with power
outages - but the sites were FS Errors occured (so far) are rather good
in this aspect. We have a few sites where we had some power outages,
but there this didn't happen (so far).
Does this happen if you run this on the same hardware without the RAID-1
setup?
We haven't tried to repro the behavior with any other settings up to
now.


Many thanks for your suggestions,
Martin
 
Back
Top