Sahil Malik said:
Mark -
I explained the "why" you have to compact your tables earlier, but here is
it again ---
Everytime you add a column, the physical location of that column on the
disk might be very different from where the table is - therefore running a
sql query on it will end up making the hard disk head jump a lot.
First of all reading tables rarely gets all the way to the "hard disk head".
Data is usually cached in the server's memory. In memory, there's a
relatively smaller "seek time", so having data contigous for logical IO is
not as important as for physical IO. The "seek time" problem exists, but
only over at small scale of the processor's Level1 and Level 2 cache. At
the scale of a database server's data cache (many 10's of megabytes), and
the read size of a page or block (8k), contiguous blocks reads are not
significantly faster than non-contigous block reads. What matters in
logical IO is the number of reads, not their location.
. Databases - independent of whichever you might choose db2/oracle/sql
server - the tables will get fragmented as you add more columns into them.
Adding columns, and adding rows using the new columns does not cause
fragmentation in Sql Server or Oracle. Updating existing rows with the new
column can be expensive, but adding a column to a table doesn't necesarilly
require you to "compact" the table (DBCC DBREINDEX for SQL, ALTER TABLE MOVE
for Oracle). Moreoever for all but the largest tables on the most highly
available systems, the table can be compacted and reorganized weekly without
too much trouble.
In Sql Server and Oracle whole rows are always stored together (except in
Oracle if the row is over 8k and must be chained over multiple blocks). In
either case when you add a nullable column to a table, nothing much happens
except for the data dictionary update. And in either case adding a
non-nullable column with a default is basically the same as adding a
nullable column and then updating each row.
In Sql Server if you later update an existing row to set a value for the new
column, the row will be updated on its page. If there is not enough room on
the page for the updated row, you will get a page split and half the rows on
the page will be copied to a new page. So for updating existing rows with
new columns you will get slower update performance, and you may end up with
your table's pages not on contiguous extents. The biggest cost is the slow
updates however, not the "fragmentation".
However this is not a problem for adding new rows which have the a value for
the new columns. The new rows will be inserted into empty space on existing
pages, and will not cause any additional page splits. Except that the
larger rows may fill up pages slightly faster.
Oracle is a slightly diffent story. In oracle when updating a row on a full
block, the row is "migraged" to another block. Subsequently reading the row
can incurr 2 reads instead of 1 since the the indexes point to the original
block which holds the "forwarding address" of the block to which the row was
migrated. So in Oracle you don't pay the price on the update (migrating a
row is cheap), you pay it on the subsequent operations on that row.
Also, however, this only applies to updating existing rows with the new
column. Adding new rows with the added column is no more expensive for
having added the column.
In short, there is no serious performance problem with adding columns to
your tables.
In Sql Server you might want to create your clustered index with a smaller
fillfactor to leave room for the additional columns, and you might want to
periodically run DBCC DBREINDEX, but there's no real objection to regularly
adding columns as the OP proposes.
David