Defraging and modern disks

  • Thread starter Thread starter Michael Daly
  • Start date Start date
M

Michael Daly

Back when a $800 30MB RLL hard drive was the hot thing, I understood defraging.
However, I was thinking recently about the difference between the apparent
layout of a modern drive and the actual configuration. Since the drive
controller maps the layout the OS thinks it is dealing with to the actual
cyl/track/sector layout that is physically implemented on the disk, does a
conventional defragmenting approach make sense? If the apparent relationship
between two bits of file is not the same as the physical relationship, does
defraging it really optimize or just potentially move a file piece to an equally
arbitrary location on the disk?

Also, on a RAID1 configuration, is there a guarantee that the two disks are
written in exactly the same way? Is it possible that a hardware RAID1 will put
the same data in two very different locations on two disks (thereby further
negating the logic of a conventional defrag algorithm)?

FWIW, the target platform under consideration is (cough) Windows XP.

Mike
 
Michael Daly said:
Back when a $800 30MB RLL hard drive was the hot thing, I understood defraging.

Yes, it made more sense then.
However, I was thinking recently about the difference
between the apparent layout of a modern drive and the actual configuration. Since the drive
controller maps the layout the OS thinks it is dealing with to the actual cyl/track/sector layout
that is physically implemented on the disk, does a conventional defragmenting approach make sense?

No it doesnt, but for different reasons.

The real reason its pointless now except in a few very unusual
situations is because modern hard drives seek so fast, and
modern OSs are moving the heads around a hell of a lot even
when just doing stuff as basic as web browsing, for the
temporary internet cache. While you can get significant
fragmentation with very large video files particularly, the
speed of access to those is completely determined by the
frame rate so extra seeks between frags are completely
irrelevant to the playback speed and even when you are
editing those, the speed is completely dominated by the time
required to transcode those, not by the head seek times.

The only real advantage with defragged files now is that they
can be easier to recover if you are stupid enough to not have
full backups. But with hard drives so cheap now, you have
to be completely stupid not to have adequate backups.
If the apparent relationship between two bits of file is not the same as the physical
relationship,

They are in the sense that a continuous series of logical blocks
normally is still a continuous series of physical sectors. The only
exception is with reallocated sectors which can involve an extra
head move to the reallocated sector, but modern hard drives
have so few of those that its completely academic in reality.
does defraging it really optimize

Yes, it still does.
or just potentially move a file piece to an equally arbitrary location on the disk?

Nope, that doesnt happen.
Also, on a RAID1 configuration, is there a guarantee that the two disks are written in exactly the
same way?

Not a guarantee but its close enough to that.
Is it possible that a hardware RAID1 will put the same data in two very different locations on two
disks (thereby further negating the logic of a conventional defrag algorithm)?

The short answer is no.
FWIW, the target platform under consideration is (cough) Windows XP.

The other thing that many defraggers attempt to do is to locate files
on the hard drive to maximise the speed of access. Thats a separate
issue to minimising the number of fragments each file has. But XP does
that reorganisation itself and it really only affects boot time much anyway.
 
Previously Michael Daly said:
Back when a $800 30MB RLL hard drive was the hot thing, I understood defraging.
However, I was thinking recently about the difference between the apparent
layout of a modern drive and the actual configuration. Since the drive
controller maps the layout the OS thinks it is dealing with to the actual
cyl/track/sector layout that is physically implemented on the disk, does a
conventional defragmenting approach make sense? If the apparent relationship
between two bits of file is not the same as the physical relationship, does
defraging it really optimize or just potentially move a file piece to an equally
arbitrary location on the disk?
Also, on a RAID1 configuration, is there a guarantee that the two
disks are written in exactly the same way? Is it possible that a
hardware RAID1 will put the same data in two very different
locations on two disks (thereby further negating the logic of a
conventional defrag algorithm)?
FWIW, the target platform under consideration is (cough) Windows XP.

Defraggers have allways optimized for linear reads. For those it is not
relevant how the C/H are linearized. A RAID1 will allways put
the same data into the same logical secor, except for the RAID
superblock. That is different for each RAID component (disk,
or with software RAID: Partition or file).

In addition, modern filesystems do not require defragmentation
in many cases.

Arno
 
Michael said:
Also, on a RAID1 configuration, is there a guarantee that the two disks
are written in exactly the same way? Is it possible that a hardware
RAID1 will put the same data in two very different locations on two
disks (thereby further negating the logic of a conventional defrag
algorithm)?


There's absolutely no guarantee that the two disks are written exactly
the same way whether it's software or hardware raid. Hardware RAID has
no additional insight into the internal organization of its disks
anymore than software RAID does; hardware RAID is just software RAID
moved into the disk storage array's processor. I've seen various volume
managers write mirror data in completely opposite ends of each mirror
disk. The only thing software cares about is the overall size of the
volumes and that's it, but not the specific organization of that volume.

Yousuf Khan
 
Previously Yousuf Khan said:
There's absolutely no guarantee that the two disks are written exactly
the same way whether it's software or hardware raid. Hardware RAID has
no additional insight into the internal organization of its disks
anymore than software RAID does; hardware RAID is just software RAID
moved into the disk storage array's processor. I've seen various volume
managers write mirror data in completely opposite ends of each mirror
disk. The only thing software cares about is the overall size of the
volumes and that's it, but not the specific organization of that volume.

Interesting. I did not know that. Care to name a manager that
does this?

Well, Linux software RAID 1 does wite exactly the same to all
disks. Except for the RAID superblock, of course. The RAID superblock
is placed at the end of the disk/partition/file. The reason is that
with these two things you can mount each disk individually and
unraided. This is one of the design criteria for the RAID-1
implementation in the kernel and hence reliable.

Arno
 
Arno said:
Interesting. I did not know that. Care to name a manager that
does this?

The ones I'm most familiar with are the ones that run under Solaris,
which are Solstice Disk Suite and Veritas Volume Manager. They both do it.
Well, Linux software RAID 1 does wite exactly the same to all
disks. Except for the RAID superblock, of course. The RAID superblock
is placed at the end of the disk/partition/file. The reason is that
with these two things you can mount each disk individually and
unraided. This is one of the design criteria for the RAID-1
implementation in the kernel and hence reliable.


The RAID superblock sounds like the same thing as what they call
Metadata in Disk Suite, or Private sections in Volume Manager. They just
maintain the persistent organization data for their respective volume
management software.

And these are just the software RAID products. In hardware RAID, you
have even less control over placement, and storage array just chooses
the disks for you.

Yousuf Khan
 
The ones I'm most familiar with are the ones that run under Solaris,
which are Solstice Disk Suite and Veritas Volume Manager. They both do it.

The RAID superblock sounds like the same thing as what they call
Metadata in Disk Suite, or Private sections in Volume Manager. They just
maintain the persistent organization data for their respective volume
management software.

They are. The smart thing Linux software RAID does is placing
them at the end, so the beginning looks like an ordinary disk.
And these are just the software RAID products. In hardware RAID, you
have even less control over placement, and storage array just chooses
the disks for you.

Agreed. One reason I like software RAID better. I can just plug the
disks into any other PC in any way, boot some current Linux
and get at my data. If a hardware controller goes up in smoke,
no such easy solution.

Arno
 
Arno said:
They are. The smart thing Linux software RAID does is placing
them at the end, so the beginning looks like an ordinary disk.

These other products do the same thing too, usually. An exception to the
case is when it's converting over an existing non-mirrored boot disk to
mirrored. In that case, it has to build the RAID metadata wherever it
can, so they often just steal a bit of space from the swap partition and
put their RAID metadata there.
Agreed. One reason I like software RAID better. I can just plug the
disks into any other PC in any way, boot some current Linux
and get at my data. If a hardware controller goes up in smoke,
no such easy solution.

I'd say for mirroring, a good software RAID package is just as good as
any hardware RAID. It's only when you're doing RAID5 that hardware RAID
makes a bit of a difference to performance. And yet, still hardware
RAID5 still can't compete against software RAID0+1 for maximum performance.

Yousuf Khan
 
These other products do the same thing too, usually. An exception to the
case is when it's converting over an existing non-mirrored boot disk to
mirrored. In that case, it has to build the RAID metadata wherever it
can, so they often just steal a bit of space from the swap partition and
put their RAID metadata there.
I'd say for mirroring, a good software RAID package is just as good as
any hardware RAID. It's only when you're doing RAID5 that hardware RAID
makes a bit of a difference to performance. And yet, still hardware
RAID5 still can't compete against software RAID0+1 for maximum performance.

That matches my experience. At least under Linux, software RAID
is a match for hardware RAID. At it is cheaper, better integrated
into the system, more flexibel and easier to manage.

Arno
 
Arno said:
That matches my experience. At least under Linux, software RAID
is a match for hardware RAID. At it is cheaper, better integrated
into the system, more flexibel and easier to manage.

Yeah, the only really time-consuming, processor-intensive RAID is RAID5
(and its variations) parity calculations. Especially when you've lost a
disk and you're rebuilding data on the fly from parity. Second most
intense usage of processing power is when you're building new parity
from write operations. If you are performance constrained, but not
capacity constrained, then you should always choose RAID1 mirroring over
RAID5 parity.

Yousuf Khan
 
Yeah, the only really time-consuming, processor-intensive RAID is RAID5
(and its variations) parity calculations. Especially when you've lost a
disk and you're rebuilding data on the fly from parity. Second most
intense usage of processing power is when you're building new parity
from write operations. If you are performance constrained, but not
capacity constrained, then you should always choose RAID1 mirroring over
RAID5 parity.

Actually I found that even RAID6 is not too hard on the CPU
on a dual core system on writing. Reading is no problem with a
non-degraded array, of course.

Arno
 
Back
Top