Is there a special reason why you store all data on a single volume?
Otherwise I would advise to divide your data up and store it on
several smaller volumes. This probably minimizes the effect of data
corruption on your work and should give you the ability to do a fsck
rather than recover the data from tape. It will also increase your
overall file system performance and conforms to the Microsoft
performance tuning guidelines.
There are defragmentation tools available that do a much more
sophisticated job than the built-in defrag. You can do online
defragmentation or defragment with less than 15% free space on your
volume, schedule jobs and much more. Try OO Defrag
(
www.oo-software.com) for example.
When you fill your volumes up to 99% this will result in MFT
fragmentation in addition to file fragmentation which slows down the
file system, particularly with such a large amount of files. See MS
KB article 174619 for details.
I would not recommend to use compression. This will probably not
result in a faster file system, rather in a slower one.
When you use Windows Server 2003 you can tune the file system via
several registry parameters, you can disable the update of the Last
Access Time attribute or the generation of short names in the 8.3
naming convention. I don't know if these parameters are available
under Windows 2000. Check the Windows Server 2003 Performance Tuning
Guidelines for details. Cheers Frank
Since your post a week or so ago, A couple of things have occurred to
me; (this is based on my reading of your problem description, I may
have things wrong.)
- You are running much too close to the hairy edge and making lots of
pain for yourself. multi-TB-scale storage arrays are getting amazingly
cheap. If you boss doesn't approve basic expenses like idks space
you've got other problems. You (or he) is putting your business at risk.
- I believe your CPU and IO is being sucked up by the number of files
in your folders. Your performance may greatly improve if you modify
your application to use the first character of your file name as a
subfolder name (ie file abcdef.txt gets stored in
./a/bcdef.txt). (36 subfolders). If your application is going to
scale up, you might use the first 2 characters (1296 folders).
You can easily test this hypothisis on a PC with a big disk; write a
script that creates folders and 100,000 files with your naming
convention, but one byte size. Do a DIR command, try defrag, etc.
You may be suprised.
- It's possible that a well-designed Oracle or Sybase database could
handle your data much better than NTFS files and folders can, but
that kind of advice doesn't come for free.
- Contact Dell/EMC. Talk to a salesman about a configuration and quote
for a NAS/SAN storage box. If they decide you are serious you will
be able to ask their engineers about how well their file systems
will behave with your data. If you can get them to tell you how much
better their product is than NTFS you will learn lots about the
shortcommings of NTFS, if any.) There are other NAS/SAN vendors, and
you can see lots of parts pricing at
http://www.aberdeeninc.com/
I still think that using NTFS compression for your file systems will
be a big win, but you've got to solve fundamental problems, first.
IMHO it's the number of files in your folders.
I'd like to hear how it works out.