I just did some more tests of pumping several hundred thousand rows into a
database.
I did the dir /s/b > C:\filelist.txt. and replicated to 509905 rows
made the table as below
create table filelist
(
file varchar(896) not null,
iid int not null,
)
go
(the 896 is coz the largest pk is 900 bytes.
For the iid I assigned incrementing numbers. The results were:
509905rows
Elapsed time = 12.8324091 secs
rows/sec = 39735.718837081
I added the same pk (path, iid) and got the below numbers
509905rows
Elapsed time = 68.0459818 secs
rows/sec = 7493.5357902353
No super hardware involved, just a cool new bulk copy feature of .net 2
Code :
private void button1_Click(object sender, EventArgs e)
{
Stopwatch sw = new Stopwatch();
sw.Start();
DataTable table = new DataTable();
table.Columns.Add(new DataColumn("File",typeof(string)));
table.Columns.Add(new DataColumn("IID",typeof(int)));
table.Columns[1].AutoIncrement = true;
table.Columns[1].AutoIncrementSeed = 1;
table.Columns[1].AutoIncrementStep = 1;
StreamReader sr = new StreamReader("c:\\filelist.txt");
while (!sr.EndOfStream ){
table.Rows.Add(sr.ReadLine());
}
sw.Stop();
Debug.Write(sw.Elapsed.TotalSeconds + " seconds for " + table.Rows.Count +
" = " + Convert.ToDouble(table.Rows.Count) / sw.Elapsed.TotalSeconds + " rows
per second loaded to datatable");
sw.Start();
SqlConnection sqlcon = new SqlConnection("data source=lon0371xns;initial
catalog=SonarBackup;integrated security=sspi");
SqlBulkCopy bc = new SqlBulkCopy(sqlcon);
bc.DestinationTableName = "FileList";
bc.NotifyAfter = 5000;
bc.SqlRowsCopied += new SqlRowsCopiedEventHandler(bc_SqlRowsCopied);
bc.ColumnMappings.Add(new SqlBulkCopyColumnMapping("File", "File"));
bc.ColumnMappings.Add(new SqlBulkCopyColumnMapping("IID", "IID"));
sqlcon.Open();
bc.BulkCopyTimeout = 500;
bc.WriteToServer(table);
sw.Stop();
Debug.Write(sw.Elapsed.TotalSeconds + " seconds for " + table.Rows.Count +
" = " + Convert.ToDouble(table.Rows.Count) / sw.Elapsed.TotalSeconds + " rows
per second loaded to db");
}
void bc_SqlRowsCopied(object sender, SqlRowsCopiedEventArgs e)
{
Debug.Write("Written ");
Debug.WriteLine(e.RowsCopied.ToString());
}
--
Ciaran O''''Donnell
http://wannabedeveloper.spaces.live.com
Stephany Young said:
Curiouser and curiouser ....
I just did some tests of pumping several hundred rows into a database.
I did a dir /s/b > C:\test.txt. The result was about 147,000 rows and so I
replicated the content twice more.
This is the table schema:
create table path
(
path varchar(355) not null,
iid int not null,
)
go
For the first test I created the database with the defaults (3MB + 1 MB and
1MB + 10% for the initial + growth factors for the mdf and ldf files
respectively).
For the iid I assigned 1. The results were:
443538 rows
Elapsed time = 552 secs
rows/sec = 803.510869565217
File size: 54MB and 1MB respectively
For the second test I created the database with 60MB + 1 MB and 1MB + 10%
for the initial + growth factors for the mdf and ldf files respectively.
The results were:
443538 rows
Elapsed time = 579.234575 secs
rows/sec = 765.731488225297
File size: 60MB and 1MB respectively
This surprised me because I expected it to be quicker because the .mdf was
already sized above the maximum needed. Why it was significantly slower
escapes me.
For the third test I created the database with the defaults (3MB + 1 MB and
1MB + 10% for the initial + growth factors for the mdf and ldf files
respectively).
This time I opend and closed the connection for each insert. The results
were:
443538 rows
Elapsed time = 706.40625 secs
rows/sec = 627.879495686795
File size: 54MB and 1MB respectively
As expected, significantly slower but still handling in excess of 600 rows
per second.
For the fourth test I add a primary key on (path,iid).
The results were:
443538 rows
Elapsed time = 756.671875 secs
rows/sec = 586.169533524687
Again a little slower but still acceptable.
All these tests were done over a network so network latentcy is a factor.
To read the input file only the results were:
443538 rows
Elapsed time = 0.65625 secs
rows/sec = 675867.428571429
so reading the data is an insignificant overhead.
I suspect that, to be getting the results you report, there must be other
code in your load process that you haven't told us about.
It might be an idea if you posted the pertinent code section