Appending to an indexed DataTable - performance degrades

  • Thread starter Thread starter juliensellgren
  • Start date Start date
J

juliensellgren

Hi, I have a huge in-memory DataTable (300,000 records), and I am
finding that performance for adding rows at this stage degrades
significantly if the table has a Primary Key, or is attached to a
DataView with a Sort. For example:

// Fill a large datatable (300,000 rows)...

// Add a PK, or an index (without this, performance is good)
largeDataTable.DefaultView.Sort = "ItemID";

// Time the append operation
DateTime ts = DateTime.Now;
System.Diagnostics.Debug.Write("Adding 10,000 rows: ");

// Append a bunch of rows
for (int i=0; i < 10,000; i++)
{
DataRow newRow = largeDataTable.NewRow();
newRow["ItemID"] = System.Guid.NewGuid().ToString();
newRow["Value"] = "Blah";
largeDataTable.Rows.Add(newRow);
newRow.AcceptChanges();
}

// Output time for the operation
Diagnostics.Debug.WriteLine(DateTime.Now.Subtract(ts).TotalMilliseconds
+ " Milliseconds");

It takes about 30,000 milliseconds to append this many rows with the
index on, but 171 milliseconds, without the index. If the table was
small to begin with, the append speed is good - which leads me be to
believe that index maintenance time is proportional to the size of the
target table, as opposed to the number of records being appended. In
otherwords, performance is degrading. :(

A workaround seems to be to temporarly remove the index, and add it
back after the append operation, but I am wondering if anyone has any
other thoughts? With this much data, re-creating the index takes ~ 5000
milliseconds. I have tried DataTable.LoadDataRow(),
DataRowCollection.InsertAt(), and DataTable.ImportRow(), and all suffer
from the same index maintenance issues.

cheers,

- Julien
 
These might be useful for you

DataSet Performance Tips with Indicies
http://objectsharp.com/Blogs/datasetfaq/archive/2004/05/20/468.aspx

Improving ADO.NET Performance
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/ScaleNetChapt12.asp

ADO.NET in Disconnected Mode
http://www.microsoft.com/mspress/books/sampchap/5199a.asp

When you have used LoadDataRow, you probably have used BeginLoadData and
EndLoadData with it to turn off index maintenance and then add it on again
after addition is done. Does it take the time in EndLoadData then (E.g index
maintenance takes the time)?
 
BeginLoadData and EndLoadData seem to have no effect on the index
maintenance. I.e. the index maintenance does not seem to occur in the
call to EndLoadData.

After struggling with this issue for some time, I ended up sub-classing
DataTable, and implementy my own, hash-table based index, which gets
maintained using the RowChanged events of the DataTable. The
performance gain is substantial, with the index maintenance time
negligable.
 
Back
Top