Help: Bulk insert with ADO.NET from outside database

  • Thread starter Thread starter Usenet User
  • Start date Start date
U

Usenet User

I am programmatically importing lots of rows from dBase 5 into SQL and
want to get an idea what the optimal approach would be for this
scenario.

Currently I'm doing the following:

1) Using OleDbConnection, OleDbCommand and OleDbDataReader to read the
rows from dBase tables;

2) Putting the extracted data into a DataSet;

3) Once the number of rows in the DataSet reaches, say, 1000, I'm
using SqlDataAdapter to submit the data to the database. This step
creates a transaction, issues an INSERT statement for each row in the
DataSet, then commits the transaction.

4) Go to Step 1 and repeat until all data is processes.

The above is inefficient and slow even if the both database
connections are kept open during the whole batch process. For example,
50,000 rows get processed in around 30 seconds.

On the other hand, if I use Import and Export Wizard, the same task
(dBase -> SQL) takes less than 5 seconds.

Out of curiosity, I used SQL Profiler to trace the Import and Export
Wizard to see what it's doing. The last statement in the log is as
follows:

insert bulk [DatabaseName].[dbo].[TableName]
(
"Field_1" float,
"Field_2" float,
"Field_3" nvarchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS,
...
(other table fields here)
...
)
with
(
check_constraints
)

As you can see there is no FROM clause. Also, it's "insert bulk", not
widely-known "bulk insert". (???)

My goal is to achieve the same (or similar speed) with ADO.NET, if
possible. But how?

Also, DTS is not an option, because the tables I'm dealing with change
from batch to batch (different table names, not schema)

Thanks!
 
That performance is not achievable in ADO.NET. The problem is that the 1000
rows that you are updating are still updating 1 row at a time. So 1000
individual calls to the database.

The only way to increase that is to use DTS (or you could query the dBase
files, store as text file, and use BCP, Bulk Insert, Etc.). If you need this
to happen in /Net code, you can create the DTS package and then call it from
your .Net application via COM Interop.

HTH,

John Scragg
 
I believe that ADO.NET 2.0 will have bulk loading capabilities from a DataReader or a DataTable
using the SqlBulkCopy class.
 
John,

Thanks for responding.

Calling a saved DTS from within .NET is not an option, because, as I
mentioned, it must be dynamic because table names differ for every
batch. I suppose, I could re-create the inner guts of the package in
my code using DTS object model but then the program will have to run
on the SQL server, otherwise that model won't be available. In
addition, this would be mean stepping away from ADO.NET completely.

Instead I tried yesterday the same thing you mentioned: export the
data into a temp file with OleDbDataReader and then use SqlCommand to
import the file with BULK INSERT. Worked like a charm: 3.5 sec to
export and 3.5 sec to import - 7 seconds total.

:)

That performance is not achievable in ADO.NET. The problem is that the 1000
rows that you are updating are still updating 1 row at a time. So 1000
individual calls to the database.

The only way to increase that is to use DTS (or you could query the dBase
files, store as text file, and use BCP, Bulk Insert, Etc.). If you need this
to happen in /Net code, you can create the DTS package and then call it from
your .Net application via COM Interop.

HTH,

John Scragg

Usenet User said:
I am programmatically importing lots of rows from dBase 5 into SQL and
want to get an idea what the optimal approach would be for this
scenario.

Currently I'm doing the following:

1) Using OleDbConnection, OleDbCommand and OleDbDataReader to read the
rows from dBase tables;

2) Putting the extracted data into a DataSet;

3) Once the number of rows in the DataSet reaches, say, 1000, I'm
using SqlDataAdapter to submit the data to the database. This step
creates a transaction, issues an INSERT statement for each row in the
DataSet, then commits the transaction.

4) Go to Step 1 and repeat until all data is processes.

The above is inefficient and slow even if the both database
connections are kept open during the whole batch process. For example,
50,000 rows get processed in around 30 seconds.

On the other hand, if I use Import and Export Wizard, the same task
(dBase -> SQL) takes less than 5 seconds.

Out of curiosity, I used SQL Profiler to trace the Import and Export
Wizard to see what it's doing. The last statement in the log is as
follows:

insert bulk [DatabaseName].[dbo].[TableName]
(
"Field_1" float,
"Field_2" float,
"Field_3" nvarchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS,
...
(other table fields here)
...
)
with
(
check_constraints
)

As you can see there is no FROM clause. Also, it's "insert bulk", not
widely-known "bulk insert". (???)

My goal is to achieve the same (or similar speed) with ADO.NET, if
possible. But how?

Also, DTS is not an option, because the tables I'm dealing with change
from batch to batch (different table names, not schema)

Thanks!
 
I believe that ADO.NET 2.0 will have bulk loading capabilities from a DataReader or a DataTable
using the SqlBulkCopy class.

Well, thanks, but we're still on ADO.NET 1.1.
 
You'll get better results if you scrap the whole dataset/dataadapter
mechanism and roll your own import:

- Start a transaction on the destination connection object
- Create your own parameterized insert command, associated with the
connection and transaction
- Create and add the parameters for the command and keep a reference to them
in an array for quick access to avoid using the ParametersCollection of the
command
- Explicitly Prepare() the command beforehand

Your datareader's Read() loop will then merely assign the .Value to each
parameter from the datareader and call ExecuteNonQuery() on the command.

Once finished, commit the transaction of course. It won't be quite as fast
as a bulk insert, but its as close as you can get with ADO.NET 1.1, and it
will be faster than the dataadapter/dataset method.

Robert
 
Back
Top