M
Massimo
I have this SQL Server 2000 database, which I want to populate with data
generated from a program; the schema is quite simple (one table, seven
columns, each one one byte wide), but there are plenty of data: about 450
million rows, for an estimated total of ~3 GB.
So, I wrote my data-generating .NET application, and then tried to put data
into a DataSet and then storing it into the DB using a SqlDataAdapter... and
it was painful. My first guess was to store as much data as possible into
the DataSet, so to reduce the DB updates, but I found that when reaching
about 100K rows the DataSet becomes slower and slower, each insertion taking
more time than the preceding ones. So i resorted to update the DB more
often, and I tried doing it every 20000, every 10000 and every 5000 rows
added, but it always seemed to take a lot of time (i.e. more than one minute
to insert 20K rows).
Finally, I decided to avoid DataSets, and I simply wrote all of the program
output to a 6 GB CSV text file, and imported it into the DB using DTS; and,
wonderful... it started inserting 40-50K rows *per second*; after about five
hours the job was completed.
So, here's the question: why is it so painfully slow to insert big chunks of
data into a DB using DataSets? It's so slow that even a 6 GB text file
provides a *much* better throughput...
Massimo
generated from a program; the schema is quite simple (one table, seven
columns, each one one byte wide), but there are plenty of data: about 450
million rows, for an estimated total of ~3 GB.
So, I wrote my data-generating .NET application, and then tried to put data
into a DataSet and then storing it into the DB using a SqlDataAdapter... and
it was painful. My first guess was to store as much data as possible into
the DataSet, so to reduce the DB updates, but I found that when reaching
about 100K rows the DataSet becomes slower and slower, each insertion taking
more time than the preceding ones. So i resorted to update the DB more
often, and I tried doing it every 20000, every 10000 and every 5000 rows
added, but it always seemed to take a lot of time (i.e. more than one minute
to insert 20K rows).
Finally, I decided to avoid DataSets, and I simply wrote all of the program
output to a 6 GB CSV text file, and imported it into the DB using DTS; and,
wonderful... it started inserting 40-50K rows *per second*; after about five
hours the job was completed.
So, here's the question: why is it so painfully slow to insert big chunks of
data into a DB using DataSets? It's so slow that even a 6 GB text file
provides a *much* better throughput...
Massimo