Toby,
Without indexes, the database is left to search the entire table for matches
on the Where clause - these are often referred to as FTS or Full Table Scan.
These acceses occur when you select from a single table, and also when you
join two or more tables.
As a general rule of thumb, when you are expecting less than 10% of the rows
to be returned, and where the size of the table is greater than 10 blocks,
indexes have the potential of adding an incredible performance boost.
Consider the problem in searching a table of customers for those records
that have the phone number of 800-555-1212 (US NAN assumed here).
If there are no indexes, and if there are 2,000,000 rows in the database,
and each row is on average 320 bytes, and if you expect only one or two rows
in the resultset, the database reads all 2,000,000 rows, comparing each
phone number to the value above. If we assume a block size of 8k, then we
may be able to squeeze 25 or so records into a block. To read the 2,000,000
rows would require 87,000 or so disk reads. (**disk reads are VERY slow**)
This would result in a substantial amount of compute resources to do the
2,000,000 string compares, as well as an incredible amount of disk activity
if we assume that the customer table was not already in memory (650 mbytes
plus some for overhead of segment headers, etc) - and all that work for just
a couple of records. One solution would be an index on phonenumber. An
index is a separate segment, very much related to the associated table, and
is organized in a structure that is optimized for seaches, especially
searches that involve a test for equality (or inequality). In the simplest
case (excluding IOTs, clusters, and composite indexes) the index is
typically a balanced, sorted tree structure.
The root of the tree is some value near the middle, and the root has two
nodes - a left node and a right node. The left and right nodes are for all
values that less than the root value and greater than the root value
respectively. Each of these left and right nodes is again a distinct node
that is the root for the structure below it, each with a left and right
node, and so on until you get to the terminal nodes, which are typically
called leaf nodes. A leaf node contains a value (a phone number is our
case) and a block and offset pointer into the associated table.
While this is an absurd simplification, the concept is quite solid. The
subject is treated quite well by Knuth in his Vol I, The Art of Computer
Programming-Fundamental Algorithms. The actual implementation is platform
specific, and well optimized for block (or multiblock) disk reads (i.e. a
node may not be atomic as implied, rather a block of nodes - or as many
values as would fit in a block) - mainly because indexes are there to
minimize disk reads (although other benefits include supporting constraints
and sorting).
Back to our problem of above; using the index, the records that satisfy the
where clause might be accessed in as few as a 6 or so disk reads. It is the
ratio of the disk reads using an index to the disk reads of a FTS that we
might use to measure the effectiveness of the index. But indexes do not
alway help! Consider what happens when we write the select statement such
that it returns 1,000,000 records (i.e. all of the records in a particular
state or country) The worse case for the FTS would be 87,000 disk reads.
But with the index, we could easily exceed that count several times if the
distibution of the items in the data table were dramatically different from
the distribution of the keys in the index block (i.e. the index is sorted by
state or country, the table data is sorted by person's last name, and no two
adjacent index leaf nodes point to records in the same block in the data
table). This is a case where the index actually slows performance - and a
FTS is a LOT quicker.
As Val said, this subject is not a 5 minute topic - and moreover, it is not
about "to index or not to index", it is about appropriate indexes - based on
the data and how it will be accessed (select, update, insert, etc). Hang in
there -- while this stuff may seem overly complicated at first, it's just
way tooooo much fun to even consider doing anything else...
regards
roy fine