NTFS drive capacity/utilization percentage?

usenetacct · Jun 9, 2004

i've read that an NTFS volume should be considered "full" when 80-85%
of the drive space has been used. Is there any official MS
documentation with this recommendation? I really need it, if so. My
company runs to 99.9% with millions & millions & millions of files,
which makes FS response painfully slow. I'd like ammo to back my
point.

thanks

Brad · Jun 10, 2004

i've read that an NTFS volume should be considered "full" when 80-85%
of the drive space has been used. Is there any official MS
documentation with this recommendation? I really need it, if so. My
company runs to 99.9% with millions & millions & millions of files,
which makes FS response painfully slow. I'd like ammo to back my
point.

I don't care what the disk format or OS is. If you are filling your disk
up like that and your files are larger than one cluster or you
modify/delete files you will have fragmentation issues. If it's basically
a WORM you shouldn't have a problem.

anon · Jun 10, 2004

yes, but i need documentation. Know of any?

Al Dykes · Jun 10, 2004

i've read that an NTFS volume should be considered "full" when 80-85%
of the drive space has been used. Is there any official MS
documentation with this recommendation? I really need it, if so. My
company runs to 99.9% with millions & millions & millions of files,
which makes FS response painfully slow. I'd like ammo to back my
point.

thanks

Can you give us somespecifics about your system ?

You don't state that you are having system performance problems, is it
this a hypothecial question ? Not that there's anything wrong with
that. With "millions" of files you might run into other file system
bottlenecks unrelated to fragmentation.

You might find http://www.ntfs.com/ interesting.

usenetacct · Jun 10, 2004

sure, we need to keep files created by jobs online for several months
& that may be going to a year. This is currently 120,000 new files
per day & could go up dramatically very soon with a new client. Size
of them ranges from 4MB to 1KB, mostly on the smaller end (<50KB). We
constantly delete files to make room for new ones, so each day we're
deleting anywhere from 120,000 to 500,000 of these files while adding
those I mention above. They're stored in a directory structure based
on the
year-month-day-hour-minute-job#. When I say millions of files, I am
being literal. Tens of millions, to be accurate.

We also have a single directory that contains the original files the
work is based on, & we have a weeks' worth of data in that, which is
~150,000 files in that single directory. We delete old data from here
every day to make room for the new also. This system runs 24/7 and we
cannot take it down to defragment, run chkdsk, or anything else. If
we have disk corruption, we have to format & restore from tape,
because the sheer number of files means running chkdsk takes many days
to run.

At the moment, we can store 3 months of data in on an 880GB array only
if we literally let the system run constantly at extremely high
utilization rates. We may need to expand it to allow a years worth.
At current volumes, that would mean a single 4 terabyte volume. If we
continue growing at current rates (very fast), 4 terabytes is only the
beginning.

thanks

usenetacct · Jun 10, 2004

Al,

as a small example of what i have to contend with, here's stats from a
directory's properties:

235,382,300,672 bytes in 2,800,114 files contained within 236,384
folders.

that's but a small portion of what I have, but getting the full volume
stats would take way too long.

thanks

Al Dykes · Jun 12, 2004

sure, we need to keep files created by jobs online for several months
& that may be going to a year. This is currently 120,000 new files
per day & could go up dramatically very soon with a new client. Size
of them ranges from 4MB to 1KB, mostly on the smaller end (<50KB). We
constantly delete files to make room for new ones, so each day we're
deleting anywhere from 120,000 to 500,000 of these files while adding
those I mention above. They're stored in a directory structure based
on the
year-month-day-hour-minute-job#. When I say millions of files, I am
being literal. Tens of millions, to be accurate.

We also have a single directory that contains the original files the
work is based on, & we have a weeks' worth of data in that, which is
~150,000 files in that single directory. We delete old data from here
every day to make room for the new also. This system runs 24/7 and we
cannot take it down to defragment, run chkdsk, or anything else. If
we have disk corruption, we have to format & restore from tape,
because the sheer number of files means running chkdsk takes many days
to run.

At the moment, we can store 3 months of data in on an 880GB array only
if we literally let the system run constantly at extremely high
utilization rates. We may need to expand it to allow a years worth.
At current volumes, that would mean a single 4 terabyte volume. If we
continue growing at current rates (very fast), 4 terabytes is only the
beginning.

thanks

I assume you are running NTFS file systems, formatted at the max
cluster size (4KB ?) that supports compression. I recommend using
file compression. With such small files, some of them will go right
into the MFT, and you'll get other space back.

Why/when do you get "corruption" ?

I'm no expert, but I've worked on projects were multi-platform file
system performance was an issue, and we had a couple people that wrote
production file system code in prior lives. I saw a demo of an NTFS
(NT 4.0) vs Solaris with identical 100,000 files in a folder on an NT
server and a Solaris server. NT/NTFS crawled, and Solaris response
was close to instantanious when doing something as simple as a DIR/ls
command.

How often to you access 90-day-old data ? Some sort of hierarchical
storage managememt that rolls the third month's data to a 300GB sata
disk ($250 and dropping) in a hot-swapable tray that you put on the
shelf until needed.

Guest · Jun 18, 2004

[email protected] said:
sure, we need to keep files created by jobs online for several months
& that may be going to a year. This is currently 120,000 new files
per day & could go up dramatically very soon with a new client. Size
of them ranges from 4MB to 1KB, mostly on the smaller end (<50KB). We
constantly delete files to make room for new ones, so each day we're
deleting anywhere from 120,000 to 500,000 of these files while adding
those I mention above. They're stored in a directory structure based
on the
year-month-day-hour-minute-job#. When I say millions of files, I am
being literal. Tens of millions, to be accurate.

We also have a single directory that contains the original files the
work is based on, & we have a weeks' worth of data in that, which is
~150,000 files in that single directory. We delete old data from here
every day to make room for the new also. This system runs 24/7 and we
cannot take it down to defragment, run chkdsk, or anything else. If
we have disk corruption, we have to format & restore from tape,
because the sheer number of files means running chkdsk takes many days
to run.

At the moment, we can store 3 months of data in on an 880GB array only
if we literally let the system run constantly at extremely high
utilization rates. We may need to expand it to allow a years worth.
At current volumes, that would mean a single 4 terabyte volume. If we
continue growing at current rates (very fast), 4 terabytes is only the
beginning.

thanks

Is there a special reason why you store all data on a single volume? Otherwise I would advise to divide your data up and store it on several smaller volumes. This probably minimizes the effect of data corruption on your work and should give you the ability to do a fsck rather than recover the data from tape. It will also increase your overall file system performance and conforms to the Microsoft performance tuning guidelines.

There are defragmentation tools available that do a much more sophisticated job than the built-in defrag. You can do online defragmentation or defragment with less than 15% free space on your volume, schedule jobs and much more. Try OO Defrag (www.oo-software.com) for example.

When you fill your volumes up to 99% this will result in MFT fragmentation in addition to file fragmentation which slows down the file system, particularly with such a large amount of files. See MS KB article 174619 for details.

I would not recommend to use compression. This will probably not result in a faster file system, rather in a slower one.

When you use Windows Server 2003 you can tune the file system via several registry parameters, you can disable the update of the Last Access Time attribute or the generation of short names in the 8.3 naming convention. I don't know if these parameters are available under Windows 2000. Check the Windows Server 2003 Performance Tuning Guidelines for details.

Cheers
Frank

Al Dykes · Jun 18, 2004

Is there a special reason why you store all data on a single volume?
Otherwise I would advise to divide your data up and store it on
several smaller volumes. This probably minimizes the effect of data
corruption on your work and should give you the ability to do a fsck
rather than recover the data from tape. It will also increase your
overall file system performance and conforms to the Microsoft
performance tuning guidelines.

There are defragmentation tools available that do a much more
sophisticated job than the built-in defrag. You can do online
defragmentation or defragment with less than 15% free space on your
volume, schedule jobs and much more. Try OO Defrag
(www.oo-software.com) for example.

When you fill your volumes up to 99% this will result in MFT

fragmentation in addition to file fragmentation which slows down the
file system, particularly with such a large amount of files. See MS
KB article 174619 for details.

I would not recommend to use compression. This will probably not
result in a faster file system, rather in a slower one.

When you use Windows Server 2003 you can tune the file system via
several registry parameters, you can disable the update of the Last
Access Time attribute or the generation of short names in the 8.3
naming convention. I don't know if these parameters are available
under Windows 2000. Check the Windows Server 2003 Performance Tuning
Guidelines for details. Cheers Frank

Since your post a week or so ago, A couple of things have occurred to
me; (this is based on my reading of your problem description, I may
have things wrong.)

- You are running much too close to the hairy edge and making lots of
pain for yourself. multi-TB-scale storage arrays are getting amazingly
cheap. If you boss doesn't approve basic expenses like idks space
you've got other problems. You (or he) is putting your business at risk.

- I believe your CPU and IO is being sucked up by the number of files
in your folders. Your performance may greatly improve if you modify
your application to use the first character of your file name as a
subfolder name (ie file abcdef.txt gets stored in
./a/bcdef.txt). (36 subfolders). If your application is going to
scale up, you might use the first 2 characters (1296 folders).

You can easily test this hypothisis on a PC with a big disk; write a
script that creates folders and 100,000 files with your naming
convention, but one byte size. Do a DIR command, try defrag, etc.
You may be suprised.

- It's possible that a well-designed Oracle or Sybase database could
handle your data much better than NTFS files and folders can, but
that kind of advice doesn't come for free.

- Contact Dell/EMC. Talk to a salesman about a configuration and quote
for a NAS/SAN storage box. If they decide you are serious you will
be able to ask their engineers about how well their file systems
will behave with your data. If you can get them to tell you how much
better their product is than NTFS you will learn lots about the
shortcommings of NTFS, if any.) There are other NAS/SAN vendors, and
you can see lots of parts pricing at http://www.aberdeeninc.com/

I still think that using NTFS compression for your file systems will
be a big win, but you've got to solve fundamental problems, first.
IMHO it's the number of files in your folders.

I'd like to hear how it works out.

NTFS drive capacity/utilization percentage?

usenetacct

Brad

anon

Al Dykes

usenetacct

usenetacct

Al Dykes

Guest

Al Dykes