SqlCommand slow on INSERT

  • Thread starter Thread starter John Bailo
  • Start date Start date
J

John Bailo

I have a c# program that loops through a table on a DB2 database.

On each iteration it assigns data to values in the SqlParameter
collection. The command text is an INSERT statement to a Sql Server
database, run with an .ExecuteQuery I enclosed the loop in a
SqlTransaction and commit it at the end.

I timed the program and it inserts about 70 records a second...which I
think is sort of slow...so I set up some Debug.WriteLines to show where
the time was being spent.

The DataReader loop to the DB2 table is instantaneous. Almost 0s spent
getting each record. Same with assigning values.

The slow step is the actual execution of the SqlCommand. However, I
also ran a SQL Trace and monitored the execution of the statement on the
server. It took 0s to execute. The SqlCommand itself is adding an
extra 0.01s to 0.03s which can add up over the course of hundreds of
thousands of records.

So the only overhead is running .ExecuteQuery on the SqlCommand
object(!) Is there anyway to reduce or minimize this overhead, or a
setting that can affect performance.

I mean if my external source and target are running at 0s - my code
shouldn't be adding overhead to run a command!
 
I have a Model A Ford that does not climb hills that well when towing my 65'
boat. What should I do?
Ah, ADO.NET (or any of the data access interfaces) are not designed to do
bulk inserts. While the 2.0 is faster than ever (with batch mode), it's
still orders of magnitude slower than the fastest DAI INSERT loop. Using
SqlBulkCopy or DTS I can move 500,000 rows in about 20 seconds.

--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
INETA Speaker
www.betav.com/blog/billva
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
 
I have a Model A Ford that does not climb hills that well when towing my 65'
boat. What should I do?

Put smaller tires on the Model A.

I'm not John, but thanks for the tip on SqlBulkCopy! Some of us don't get
very many chances to peek over the trenches to find the new jewels in 2.0.

-Mike
 
Ok, I'm curious.

Just why is BULK INSERT so fast? I mean, does it not go through the
SQL DBMS itself and somehow write directly to the the .mdf file? What
is the mechanism it uses?

Also, running my code on a quad proc (SQL 2000, W2K) I am getting 5,000
records in 10-14s ( or about 450 to 500 recs per second ). Not too shabby!
 
One thing I found that seems to make a little difference is using the
..Prepare() method on my SqlCommand.

This seems to take a SqlCommand and turn it into a sproc, /on-the-fly/,
for SqlCommands that need to be reused.
 
John said:
Ok, I'm curious.

Just why is BULK INSERT so fast? I mean, does it not go through the
SQL DBMS itself and somehow write directly to the the .mdf file? What
is the mechanism it uses?

Well, I'm not a SQL Server expert or anything, but I believe bulk
inserts bypass the transaction log. That probably accounts for some of
the difference anyway.
 
To bad I can't programmatically disable the transaction log or request
sql not to log my inserts...
 
The bulk copy interface and functionality has been in place since the Sybase
versions... a long time. In the Model A days we had to use the BCP utility,
but in SS7 and later we could use one of the SQL Server management "object"
libraries (SQLDMO, SQLSMO) to call Bulk copy functionality. In SS 2000 (or
earlier?) it appeared as TSQL functions too.

Yes, Bulk Copy turns off the TL. It assumes that you're copying to a temp
table (or a permanent work table). Generally, these tables don't have
indexes or other constraints to slow down the import process. BCP also uses
special TDS stream packets to move the data. It's THE way to go when moving
data from/to the server. DTS leverages this technology by permitting you to
setup a scripted BCP operation that can transform (edit) the data as it's
moved from any data source with a provider or driver (.NET, OLE DB, ODBC,
text, tight-string-with-two-cans).

hth

--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
INETA Speaker
www.betav.com/blog/billva
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
 
The thing is, that answer really doesn't address my question.

As I mentioned, when I run the Trace utility, the duration of each
insert command is 0s.

It's the wait time to return before the SqlCommand and after the
SqlCommand.

So, to me, if I were to string together a bunch of INSERT statements,
that represent the changing parameters of the INSERT, it should run
lightening fast.

My question still stands: why is there so much overhead in the
SqlCommand object? Or, is it the transmission time to send the
command on the network? Or the conversion to TDS protocol?

If either of those factors could be reduced, then I could send my
INSERTs through almost instantly.
 
It's the fundamental difference in the mechanism. First, each INSERT
statement is sent as text to the server, not as raw data. The SQL Server
compiler then needs to compile the INSERT statement(s) and generate a query
plan. Nope, this does not take long, but it takes time. It then logs the
operation to the TL (which can't be disabled) and then to the database. At
that point the constraints are checked, the indexes are built and any RI
checks are made.

In the case of BCP, the protocol (which is proprietary and subject to
change) opens a channel, sends the meta data (once), and the server starts
an agent that simply writes the inbound data stream (binary) to the rows in
the target table. It requires very little overhead--90% of which can't be
disabled.


--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
INETA Speaker
www.betav.com/blog/billva
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
 
It's the fundamental difference in the mechanism. First, each INSERT
statement is sent as text to the server, not as raw data. The SQL Server
compiler then needs to compile the INSERT statement(s) and generate a
query plan.

....Which then causes me to question: Are you Prepare() ing the insert
statement? I know you mentioned you were using parameters, but I think the
speed bonus does not occurr if the statement is not prepared (because, as
Bill said, the insert statements have to be parsed each time).

At least with Oracle it makes a huge difference in speed.
 
Shouldn't using a .Prepare() method (which I do) eliminate that overhead?

Also, I enclose loop in a SqlTransaction.

Doesn't that mean that all the log entries are written when the
transaction is committed?
 
Also, I enclose loop in a SqlTransaction.
Doesn't that mean that all the log entries are written when the
transaction is committed?

I might be mistaken for SQL Server (I'm certified as an Oracle DBA, but I
only know SQLServer superficially), but what happens in Oracle (somewhat
simplified) is that the DB itself is updated during a transaction, and the
log keeps the original data in it. That way when you commit a transaction
there is very little cost in terms of IO, there is only a cost if you
rollback the transaction (which is assumed to happen very few times). I
would not be suprised at all if SQL Server worked this way too, since, if
the assumption that the ratio of transaction commits to rollbacks is
extremely high holds, transactions take up very little extra overhead.

The only problem is when a transaction involves a great many operations:
The log system gets bogged down since a very big part of the log is "vital"
and could be needed in case of a rollback. For the stuff I do normally, I
have noticed that committing transactions every could hundred record
insertions is a good mid-point performance-wise. But that number is highly
dependant on the server hardware and also on the size of the records being
inserted. Inserting 100,000 records in a single transaction is just asking
for trouble, in other words.
 
I guess I might start believing that the transaction log is the
additional overhead, except that I would expect that to be reflected in
the duration reported in a the SQL trace.

But it isn't! The duration for each INSERT statement is 0 -- or less
than 1 millesecond! Wouldn't the time to write to the transaction log
be reflected in the duration for each INSERT?
 
I guess I might start believing that the transaction log is the additional
overhead, except that I would expect that to be reflected in the duration
reported in a the SQL trace.
But it isn't! The duration for each INSERT statement is 0 -- or less
than 1 millesecond! Wouldn't the time to write to the transaction log be
reflected in the duration for each INSERT?

Writing to the transaction log at insert time is not the only I/O done on
the transaction log. There is another I/O done when you commit the
transaction, to tell the DB that the logs are ok to be recycled, since you
have just commited the transaction (and hence, cannot rollback anymore).

You might try splitting into several smaller transactions (with their own
commits, of course) to see if you get any speed improvements.

But to answer your question, I do not know exactly how SQL trace works, so I
do not know if it includes updates to the log.

Bypassing the log is dangerous though... You would not be able to rollback
(so that means no transactions either), nor would you be able to recover
from data curruption should the DB crash during the insert operation. You
may as well use DBF files instead of SQL server, inserts to that are
lightning fast. I have done this in the past where speed is absolutely of
the essence and data loss is no big deal... But there are lots of
downsides, I guess it's the same as using goto statements: Only people who
fully understand what they are doing should be messing with it!
 
No. You still don't understand. BCP requires no compile overhead and uses
direct IO to the DBMS instead of "logical" writes. And no, we don't usually
write to a permanent table so the Transaction Log issues are not a factor.
When we're ready to post the data to the live tables we run a SP that edits
the data and does a proper INSERT.

--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
INETA Speaker
www.betav.com/blog/billva
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
 
Would writing to a #temp table, or to a table variable (and then
transferring the data to a regular table with INSERT...SELECT) be faster
than regular INSERTs to a table?
 
Basically, yes. We generally transfer to "temporary" tables in the database
(not to tempdb) and perform intelligent INSERT/UPDATE statements from there.
It's far faster.

--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
INETA Speaker
www.betav.com/blog/billva
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
 
Hmmm...well, there's no reason I can't do that as an intermediate way in
my code.

Do I have to create the temporary table on the same command that I run
my INSERTS on?

Or can I just create two SqlCommand's and run them on the same
SqlConnection? Will the table remain in memory if I do that?

And then I'll need to do the transfer from the temp table to the table
after that.
 
Back
Top