ToC Home Issues Hearts Links

Issue #1, April 2005

Slack Wisdom

Slack Wisdom is a column that gathers Slack wisdom from various newsgroups and mailing lists. For the first issue of The Slack World, we have chosen a posting made by Lew Pitcher at the alt.os.linux.slackware newsgroup. The posting is reproduced verbatim, with a kind permission from the author. The complete thread can be found here.


Defragmentation of a hard drive

Newsgroups: alt.os.linux.slackware
From: Lew Pitcher
Date: Thu, 04 Nov 2004 14:05:38 -0500
Local: Thurs, Nov 4 2004 11:05 am
Subject: Re: Defragment Hard Drive

Miguel De Anda wrote:
> Ok. I've looked in to this and it seems that everybody says you don't need
> to defragment your hard drive with the ext3 filesystem. I don't believe
> that.

Then, you don't really understand what fragmentation is, and why it
causes problems in MSWindows.

Suffice it to say that
a) ext2 and ext3 are designed to be resistant to the effects of
   fragmentation    and only suffer problems under extremely high (say 90%)
   fragmentation, and

b) in any multiuser, multiprocess operating system, you are going to have
   filespace fragmentation. It's the Operating system's job to minimize the
   effects of fragmentation, and Linux does a particularly good job of that

If you want my stock detailed answer, here it is...

In a single-user, single-tasking OS, it's best to keep all the data
blocks for a given file together, because most of the disk accesses over
a given period of time will be against a single file. In this scenario,
the read-write heads of your HD advance sequentially through the hard
disk. In the same sort of system, if your file is fragmented, the
read-write heads jump all over the place, adding seek time to the hard
disk access time.

In a multi-user, multi-tasking, multi-threaded OS, many files are being
accessed at any time, and, if left unregulated, the disk read-write
heads would jump all over the place all the time. Even with
'defragmented' files, there would be as much seek-time delay as there
would be with a single-user single-tasking OS and fragmented files.

Fortunately, multi-user, multi-tasking, multi-threaded OSs are usually
built smarter than that. Since file access is multiplexed from the point
of view of the device (multiple file accesses from multiple, unrelated
processes, with no order imposed on the sequence of blocks requested),
the device driver incorporates logic to accomodate the performance hits,
like reordering the requests into something sensible for the device (i.e
an "elevator" algorithm or the like).

In other words, fragmentation is a concern when one (and only one)
process access data from one (and only one) file. When more than one
file is involved, the disk addresses being requested are 'fragmented'
with respect to the sequence that the driver has to service them, and
thus it doesn't matter to the device driver whether or not a file was
fragmented.

To illustrate:

I have two programs executing simultaneously, each reading two different
files.

The files are organized sequentially (unfragmented) on disk...

 1.1  1.2  1.3  2.1  2.2  2.3  3.1  3.2  3.3  4.1  4.2  4.3  4.4

Program 1 reads file 1, block 1
                file 1, block 2
                file 2, block 1
                file 2, block 2
                file 2, block 3
                file 1, block 3

Program 2 reads file 3, block 1
                file 4, block 1
                file 3, block 2
                file 4, block 2
                file 3, block 3
                file 4, block 4

The OS scheduler causes the programs to be scheduled and executed such
that the device driver receives requests

                file 3, block 1
                file 1, block 1
                file 4, block 1
                file 1, block 2
                file 3, block 2
                file 2, block 1
                file 4, block 2
                file 2, block 2
                file 3, block 3
                file 2, block 3
                file 4, block 4
                file 1, block 3

Graphically, this looks like...

   1.1  1.2  1.3  2.1  2.2  2.3  3.1  3.2  3.3  4.1  4.2  4.3  4.4
$------------------------------>:3.1:
  :1.1:<--------------------------'
    `----------------------------------------->:4.1:
       :1.2:<------------------------------------'
         `-------------------------->:3.2:
                 :2.1:<----------------'
                   `------------------------------->:4.2:
                     :2.2:<--------------------------'
                        `---------------->:3.3:
                           :2.3:<-----------'
                             `------------------------------->:4.4:
             :1.3:<---------------------------------------------'

As you can see, the accesses are already 'fragmented' and we haven't
even reached the disk yet (up to this point, the access have been
against 'logical' addresses). I have to stress this, the above situation
is no different from an MSDOS single file physical access against a
fragmented file.

So, how do we minimize the effect seen above? If you are MSDOS, you
reorder the blocks on disk to match the (presumed) order in which they
will be requested. On the other hand, if you are Linux, you reorder the
requests into a regular sequence that minimizes disk access using
something like an elevator algorithm. You also read ahead on the drive
(optimizing disk access), buffer most of the file data in memory, and
you only write dirty blocks. In other words, you minimize the effect of
'file fragmentation' as part of the other optimizations you perform on
the access requests before you execute them.

Now, this is not to say that 'file fragmentation' is a good thing. It's
just that 'file fragmentation' doesn't have the impact here that it
would have in MSDOS-based systems. The performance difference between a
'file fragmented' Linux file system and a 'file unfragmented' Linux file
system is minimal to none, where the same performance difference under
MSDOS would be huge.

Under the right circumstances, fragmentation is a neutral thing, neither
bad nor good. As to defraging a Linux filesystem (ext2fs), there are
tools available, but (because of the design of the system) these tools
are rarely (if ever) needed or used. That's the impact of designing up
front the multi-processing/multi-tasking multi-user capacity of the OS
into it's facilities, rather than tacking multi-processing/multi-tasking
multi-user support on to an inherently single-processing/single-tasking
single-user system.

--
Lew Pitcher
IT Consultant, Enterprise Data Systems,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
  


BerliOS Logo