Why does copying files to a new hard drive not defragment it?

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
Plus, I don't think the fragments that Unix file systems are talking
about are the same fragments that they talk about on Windows. In Unix,
a fragment refers to any inode that contains only partial data from a
file; usually at the end of the file, where the file won't fill the
whole inode is a fragment.

I think thats something different. Otherwise allmost every
file would be reported as fragmented, not the 1-2% I typically
see on a filesystem check.

Arno
 
Franc Zabkar said:
On 1 Mar 2009 15:32:34 GMT, Arno <[email protected]> put finger to
keyboard and composed:
Using Win95 DOS, I copied two small files (1 sector and 2 sectors,
respectively) to a newly formatted diskette, deleted them both, and
then copied a third file (3 sectors). Even though the FAT was empty,
the third file was copied to cluster 4 rather than cluster 2 (the
first cluster in the data area), leaving the disc fragmented.

Well, a filesystem in an OS is not only the filesystem itself,
it is also the usage strategy implemented in the filesystem driver.

I have had similar observations wuith HDDs under MS FAT.

Arno
 
Franc said:
XXCLONE's "Theory of Operation" page:
http://www.xxclone.com/itheory.htm

=====================================================================
When a clone operation is performed for the first time, all the files
created on the target volume will be stored in a contiguous region.
Therefore, the clone operation in full backup mode automatically
performs the so-called "de-frag" operations.

The competing products that are based on a sector-to-sector
duplication principle propagate the same degree of fragmentation found
in the source volume to the target.
=====================================================================


Yup, exactly why I brought this subject up. I knew that XXClone claims
to defrag files during the copy, but it didn't in this case. Also,
neither did the other utility, TeraCopy.

Yousuf Khan
 
Franc said:
Some time ago I experimented with a floppy disc file system (FAT12):
http://groups.google.com/group/comp.os.ms-windows.win95.misc/msg/8f1507a7a914d861?dmode=source

Using Win95 DOS, I copied two small files (1 sector and 2 sectors,
respectively) to a newly formatted diskette, deleted them both, and
then copied a third file (3 sectors). Even though the FAT was empty,
the third file was copied to cluster 4 rather than cluster 2 (the
first cluster in the data area), leaving the disc fragmented.


Well, that means that Windows avoids overwriting previously used
clusters on FAT file systems. I wonder if it retains the same behavior
in NTFS?

But in my case, I had freshly formatted disks, so there should've been
absolutely no records of previously used sectors at all on those disks.
The idea that someone posted about there being temp files on the disk
that got created and deleted during the copy operation itself, would
seem make some sense here. If a temp file was created and deleted, the
OS would avoid reusing those sectors till it was absolutely necessary.

Yousuf Khan
 
Arno said:
I think thats something different. Otherwise allmost every
file would be reported as fragmented, not the 1-2% I typically
see on a filesystem check.

I'm going on memory here from stuff I've read years ago, so my numbers
and terminology might be a little imprecise. Also I read this stuff on
Solaris's UFS, but it's probably valid for most Unixes, including Linux.

An Unix filesystem is usually subdivided into fixed-length regions known
variously as superblocks or extents or whatever. For example, let's say
a 1TB file system might be subdivided into extents of 1GB, for
simplicity. Each extent is further subdivided by inodes, which use up
the extents by arbitrary lengths. Now, there is one inode per file per
extent.

Usually, the inodes are just the metadata of the files, containing a
bitmap of which sectors in an extent a file occupies. But the inodes
take up a bit of space themselves, usually a fixed size of a few
kilobytes. If a file is small enough, then instead of allocating a
bitmap for it, then they just stuff the file itself into the inode.
These inodes can be pretty big, like maybe in the range of 1K upto 32K,
so why allocate a whole bitmap for a file that maybe smaller than the
inode?

A file that's big enough may need to span over multiple extents. If the
final leftover data of the file is bigger than an inode can hold, then
they'll just open another bitmap inode for the last extent. If the
leftover data is small enough to fit inside an inode, then they'll stuff
the leftover into the inode. That's what they call a fragmentary inode,
or fragment in Unix. I believe that whether an inode is a bitmap or a
fragment or something else is determined by its header.

Really small files that reside entirely in a single inode are not
considered fragments. Only those big files that reside over multiple
extents and finish in a fragmentary inode are considered fragmented.

Another thing, big files that reside over multiple extents are not
necessarily in adjoining extents, each of the extents can be scattered
anywhere on the disk. So even if the extents aren't right next to each
other, then these files aren't considered fragmented. They don't worry
about the whole-disk fragmentation, just the fragmentation within extents.

Yousuf Khan
 
That's as expected, and it's not fragmentation. DOS, by design,
will avoid writing to erased disk space, until it's needed. This
is to allow recovery of deleted files. The FAT is not empty.
When DOS deletes files it just removes the first character of
each deleted file's name in the directory. You can easily recover
the deleted files with a disk editor, or even a hex editor, by
replacing the first characters of the deleted files' names in the
directory. This is assuming the erased files have not been written
over.

In my test example (see URL), part of the second file was overwritten
by the third, and the directory entry for the first file was
overwritten by the directory entry for the third. The FAT is indeed
empty - the clusters occupied by any deleted file are marked as free.
AIUI, the reason you can often recover a deleted file is because the
directory entry points to the first cluster, and subsequent clusters
are assumed to be contiguous.

See http://en.wikipedia.org/wiki/Undelete#FAT_file_system

- Franc Zabkar
 
I'm going on memory here from stuff I've read years ago, so my numbers
and terminology might be a little imprecise. Also I read this stuff on
Solaris's UFS, but it's probably valid for most Unixes, including Linux.
An Unix filesystem is usually subdivided into fixed-length regions known
variously as superblocks or extents or whatever. For example, let's say
a 1TB file system might be subdivided into extents of 1GB, for
simplicity. Each extent is further subdivided by inodes, which use up
the extents by arbitrary lengths. Now, there is one inode per file per
extent.
Usually, the inodes are just the metadata of the files, containing a
bitmap of which sectors in an extent a file occupies. But the inodes
take up a bit of space themselves, usually a fixed size of a few
kilobytes. If a file is small enough, then instead of allocating a
bitmap for it, then they just stuff the file itself into the inode.
These inodes can be pretty big, like maybe in the range of 1K upto 32K,
so why allocate a whole bitmap for a file that maybe smaller than the
inode?
A file that's big enough may need to span over multiple extents. If the
final leftover data of the file is bigger than an inode can hold, then
they'll just open another bitmap inode for the last extent. If the
leftover data is small enough to fit inside an inode, then they'll stuff
the leftover into the inode. That's what they call a fragmentary inode,
or fragment in Unix. I believe that whether an inode is a bitmap or a
fragment or something else is determined by its header.
Really small files that reside entirely in a single inode are not
considered fragments. Only those big files that reside over multiple
extents and finish in a fragmentary inode are considered fragmented.
Another thing, big files that reside over multiple extents are not
necessarily in adjoining extents, each of the extents can be scattered
anywhere on the disk. So even if the extents aren't right next to each
other, then these files aren't considered fragmented. They don't worry
about the whole-disk fragmentation, just the fragmentation within extents.
Yousuf Khan

Well, essentially correct, I think. Although the details do vary.
The bottom line is however, that you get mostly the same througput
as unfragmented on large sequential accesses. Small accesses are
dominated by access time anyways. One effect is that you do
not need defragmentation. FAT is far more primitive and cannot really
be prevented from fragmenting. No idea about NTFS.

Arno
 
Back
Top