Adding columns to SQL Server tables dynamically

  • Thread starter Thread starter Mark Rae
  • Start date Start date
M

Mark Rae

Hi,

I'm writing a routine which queries a table in a mySQL database and then
adds records from it into a SQL Server 2000 database - pretty easy stuff.

Problem is, that the mySQL table may have additional columns every time I
query it.

I can easily use the Columns collection of both tables to see if there is a
new column in the mySQL table which isn't in the SQL Server table, but
what's the easiest / most efficient way to add a new column to an existing
SQL Server table, given that I know the name and datatype of the column to
be added?

Would it be to pass some sort of "ALTER TABLE..." SQL into an SQLCommand
object, or is there a better way?

Any assistance gratefully received.

Best regards,

Mark Rae
 
Well I have to say, that while the below mentioned approach will work, and
it is pretty much the only approach ADO.NET offers you (and that is not a
shortcoming of ADO.NET in my view), dynamically adding or removing columns
in my eyes is not the wisest thing to do. Everytime you add a column, the
physical location of that column on the disk might be very different from
where the table is - therefore running a sql query on it will end up making
the hard disk head jump a lot.

In short, you need to go through a compact process on your tables after
you've added a column - so while it will work - it is less than the idea
solution - the better way out would be to decide on the table structures in
advance and then do the inserts - OR - come up with a table strucutre that
doesn't need table column additions on the fly.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
 
In short, you need to go through a compact process on your tables after
you've added a column

Why...?

- so while it will work - it is less than the idea solution -

Irrelevant - it's the only one I have...
the better way out would be to decide on the table structures in advance
and then do the inserts -

Not possible given my client's business...
OR - come up with a table strucutre that doesn't need table column
additions on the fly

Not possible given my client's business...

LOL! So, is SQLCommand + "ALTER TABLE..." the only way...?
 
In short, you need to go through a compact process on your tables after
you've added a column

Why...?

- so while it will work - it is less than the idea solution -

Irrelevant - it's the only one I have...
the better way out would be to decide on the table structures in advance
and then do the inserts -

Not possible given my client's business...
OR - come up with a table strucutre that doesn't need table column
additions on the fly

Not possible given my client's business...

LOL! So, is SQLCommand + "ALTER TABLE..." the only way...?
 
Mark -

I explained the "why" you have to compact your tables earlier, but here is
it again ---

Everytime you add a column, the physical location of that column on the disk
might be very different from where the table is - therefore running a sql
query on it will end up making the hard disk head jump a lot.

.. Databases - independent of whichever you might choose db2/oracle/sql
server - the tables will get fragmented as you add more columns into them.

And as per SqlCommand + Alter table being the only way ... let me correct
myself ... that is "the" way you'd do that .. except ... even if you were to
execute this thru Query Analyzer and simply run alter table .. you'd still
run into the above problem.

I digress that, that might be the only way you have. I have seen many such
cases in which this was argued, and a solution was found. (Is it possible to
create a temporary buffer database before you throw rows into your real
database?) (Is it possible to run a compact process after this operation has
finished?)

And is SqlCommand "the only way" ? No .. there are other ways ...

OleCommand
IDBCommand
ADO classic
DAO
MFC Classes
OleDb layer
Sql Script
OSql

And finally of course - Query Analyzer and a little guy typing really fast.

In each one of the above, you will have a fragmented table issue. And this
issue would remain if you were to use Java/Oracle/ anything you can come up
with.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
 
Sahil Malik said:
Mark -

I explained the "why" you have to compact your tables earlier, but here is
it again ---

Everytime you add a column, the physical location of that column on the
disk might be very different from where the table is - therefore running a
sql query on it will end up making the hard disk head jump a lot.

First of all reading tables rarely gets all the way to the "hard disk head".
Data is usually cached in the server's memory. In memory, there's a
relatively smaller "seek time", so having data contigous for logical IO is
not as important as for physical IO. The "seek time" problem exists, but
only over at small scale of the processor's Level1 and Level 2 cache. At
the scale of a database server's data cache (many 10's of megabytes), and
the read size of a page or block (8k), contiguous blocks reads are not
significantly faster than non-contigous block reads. What matters in
logical IO is the number of reads, not their location.
. Databases - independent of whichever you might choose db2/oracle/sql
server - the tables will get fragmented as you add more columns into them.

Adding columns, and adding rows using the new columns does not cause
fragmentation in Sql Server or Oracle. Updating existing rows with the new
column can be expensive, but adding a column to a table doesn't necesarilly
require you to "compact" the table (DBCC DBREINDEX for SQL, ALTER TABLE MOVE
for Oracle). Moreoever for all but the largest tables on the most highly
available systems, the table can be compacted and reorganized weekly without
too much trouble.

In Sql Server and Oracle whole rows are always stored together (except in
Oracle if the row is over 8k and must be chained over multiple blocks). In
either case when you add a nullable column to a table, nothing much happens
except for the data dictionary update. And in either case adding a
non-nullable column with a default is basically the same as adding a
nullable column and then updating each row.

In Sql Server if you later update an existing row to set a value for the new
column, the row will be updated on its page. If there is not enough room on
the page for the updated row, you will get a page split and half the rows on
the page will be copied to a new page. So for updating existing rows with
new columns you will get slower update performance, and you may end up with
your table's pages not on contiguous extents. The biggest cost is the slow
updates however, not the "fragmentation".

However this is not a problem for adding new rows which have the a value for
the new columns. The new rows will be inserted into empty space on existing
pages, and will not cause any additional page splits. Except that the
larger rows may fill up pages slightly faster.

Oracle is a slightly diffent story. In oracle when updating a row on a full
block, the row is "migraged" to another block. Subsequently reading the row
can incurr 2 reads instead of 1 since the the indexes point to the original
block which holds the "forwarding address" of the block to which the row was
migrated. So in Oracle you don't pay the price on the update (migrating a
row is cheap), you pay it on the subsequent operations on that row.

Also, however, this only applies to updating existing rows with the new
column. Adding new rows with the added column is no more expensive for
having added the column.

In short, there is no serious performance problem with adding columns to
your tables.
In Sql Server you might want to create your clustered index with a smaller
fillfactor to leave room for the additional columns, and you might want to
periodically run DBCC DBREINDEX, but there's no real objection to regularly
adding columns as the OP proposes.

David
 
David,

So would you recommend a production app to add columns through ado.net
sqlcommands? In other words; is it okay to have a production application add
columns on the fly to the database, in addition to select/update/delete
statements? I always thought that was not a recommended practise precisely
because you'd have to run DBCC DBREINDEX on your production servers more
frequently.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
 
Sahil Malik said:
David,

So would you recommend a production app to add columns through ado.net
sqlcommands? In other words; is it okay to have a production application
add columns on the fly to the database, in addition to
select/update/delete statements? I always thought that was not a
recommended practise precisely because you'd have to run DBCC DBREINDEX on
your production servers more frequently.

I wouldn't want to see columns added nearly as frequently as
select/update/delete. But for, say, daily import job it isn't a very big
deal, especially if the you only insert new rows with the added columns, not
update existing rows. New nullable columns are really free to add: it's
like they were there the whole time, but null on all existing rows. Mass
updates which significantly increase the size of the target rows will cause
page splits, and will take a long time.

It really depends on the frequency of the changes and the size of the table.
If the table has 100 million rows, then I would never ALTER it except over
the Christmas holiday. For any table under a million rows, it's just not a
big deal. And for any Sql Server whose active database is smaller than the
amount of ram dedicated to Sql Server, it's just not a big deal, since Sql
Server index fragmentation only really affects disk throughput.

"Back in the day", say 1995, RAM sizes were much smaller, and disk access
was relatively faster. Database servers used _much_ more physical IO than
they do these days. Any issue affecting physical IO throughput went right
to the bottom line of server performance.

For instance you might have 4 100 mhz Pentium processors, 128 megs of ram
accessed across a 33mhz bus, and 20gigs of storage spread among 10 hard
drives. The hard drives were almost half as fast as todays drives, and the
large number of spindles probably made up for it. But the RAM was 10x
slower and 10x smaller, and the CPU was 30x slower. In addition to log
flushing, any query of any size, most sorts, and many DML operations
required really significant physical IO.

Compare that with a low-end server today. 2 cpu's at 3.4ghz 2gigs of ram on
the other end of an 800mhz memory bus, and 2 35gig SCSI drives in a mirror.
It's not just that today's machine is faster, it's a different kind of
machine: it's a machine where you only touch the disks when you need to
push something to non-volatile storage. It's not that it's so fast that you
don't have performance problems: you just have different performance
priorities.

David
 
Hi,

Supposing you had several remote MySQL databases, one for each client, each
of which contains a table that you need to amalgamate into a single local
SQL Server table on a nightly basis, for consolidated reporting etc.

a) The schema of the MySQL source tables is identical to the schema of the
SQL Server destination table, apart from the fact that the SQL Server table
has an initial integer field which denotes each client's unique client
identifier - this field, together with the primary key of the MySQL source
tables, makes a composite primary key in the destination SQL Server table.

b) The MySQL source tables have a timestamp field which is updated whenever
a record is added or updated, so you don't have to pull across every MySQL
record every time - only those which have been inserted or updated since the
last time you did it.

c) One of the fields in the MySQL table has a datatype of 'text'.

Since the current and anticipated number of new and updated records isn't
very high (a few thousand at an absolute maximum) would the best approach be
as follows:

1) For each client, fill an ADO.NET DataSet object with the MsSQL records
inserted or upated since the last time the process was run.

2) Iterate through each records in this source DataSet and try to fetch a
DataSet from the destination SQL Server table which matches the primary key.

If an empty DataSet is returned, we know we have to add a new record to the
destination SQL Server table, so we instantiate a DataRow object using the
<mySQLDataSet>.Tables(0).NewRow method.

If a DataSet with one row is returned, we know that we have to update an
existing record in the SQL Server table, so we instantiate a DataRow object
using the <mySQLDataSet>.Tables(0).Rows(0) method.

3) We synchronise the DataRow in the destination table with the DataRow from
the source table.

4) We call the AcceptChanges method of the destination DataRow object.

Question: is this the most efficient way of accomplishing the above? If not,
I'd be grateful to know your thoughts on a better way.

If so, is there an easy way to "synchronise" a DataRow from one DataSet with
a DataRow from another DataSet without having to iterate through all the
columns?

Any assistance gratefully received.

Best regards,

Mark Rae
 
Back
Top