Removing rows from a DataTable is VERY slow

  • Thread starter Thread starter Markus Hjärne
  • Start date Start date
M

Markus Hjärne

I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that aren't
used anymore from the DataTable to save memory. But my tests shows that it
takes very long time to remove the rows. Running the code below on a Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed + "
ms.");

Grateful for any help on this,

Markus Hjarne
 
Indexing engine for DataTable in ADO.NET 1.x is not very optimized, thus you
facing this issue.
It has been re-written in ADO.NET 2.0, and will do these operations much
faster. I believe there is nothing much you can do about this. Storing your
data in multiple data tables will boost performance, but you will have to
access this based on your requirements.
 
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech
 
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

Martin Schaech said:
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

"Markus Hjärne" <[email protected]> wrote in message
I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that aren't
used anymore from the DataTable to save memory. But my tests shows that it
takes very long time to remove the rows. Running the code below on a Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed + "
ms.");

Grateful for any help on this,

Markus Hjarne
 
Try the process without any index on tha table in question.

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
SLODUG - Slovene Developer Users Group
www.rthand.com

Markus Hjärne said:
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

Martin Schaech said:
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

"Markus Hjärne" <[email protected]> wrote in message
I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that aren't
used anymore from the DataTable to save memory. But my tests shows that it
takes very long time to remove the rows. Running the code below on a Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed + "
ms.");

Grateful for any help on this,

Markus Hjarne

 
Markus,

I have as well seen that removing rows from a datatable is slow.

When I saw your message I thought is it not possible to clone the table, add
the datarows that should stayto the new one, remove the old and than rename
the new one.

It was just an idea, I did not try it.

Cor


Markus Hjärne said:
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

Martin Schaech said:
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

"Markus Hjärne" <[email protected]> wrote in message
I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that aren't
used anymore from the DataTable to save memory. But my tests shows that it
takes very long time to remove the rows. Running the code below on a Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed + "
ms.");

Grateful for any help on this,

Markus Hjarne

 
First why is it slow?

.... it is slow because removing involves two main painful tasks that grow in
pain exponentially with rowcount - a) Checking to see if the row even exists
before removing it, for which it has to go thru 100,000 rows twice (one for
nulls and one for the actual object) .. I believe this is in
SyncIList.Remove. .. and b) Settling the rows as you remove it, because the
index will always go from 1,2,3,4, .. rather than 1,2,4 after you remove row
#3. This is unfortunately and logically so the built in arraylist behavior
which you cannot change - but hey it makes sense to have it the way it is.

Cor your suggestion will probably not work for two reasons -
a) DataTable.Clone will not do a deep clone. But that can be gotten around
using a BinaryFormatter/MemoryStream combination.
b) Even when you do have a perfectly deep copy assuming your server didn't
crap out for memory reasons - you still won't be able to add a row into the
previousdt, *unless* you remove it from the new data table (cloned) - and
there you take the penalty of removal anyway.

So we're stuck between a rock and a hardspot - well DataTables are not meant
to store such large number of rows - so my kneejerk reaction is to try and
minimize THAT MUCH data in one datatable - and if you have no other
alternative, look into implementing your own business object instead. Even
if this business object's purpose in life is to simply act as a bucket for
your datarows - creating a datatable when you need it will still probably
work better than what you have now. Not to mention, you can easily code up
the business object to be databindable etc. etc. (So you really don't need
the DataTable to do this dance).

If you really really wanted to be anal and absolutely must have to fix it -
then I believe you can use Reflection and tinker with the inner arraylist
and remove the exist checking portion and code the rest yourself. That's
about the only way I can think of :-/ .. but then again maybe some
concoction with DataView might help .. I don't know .. but before I take
that pain, I'd first try and minimize the amount of data in the datatable.

Another option would be to implement your very own datatable - which is
albeit more work - but it's really a generic business object that suits your
needs. A DataTable is not designed for 100,000 rows IMHO - what is the upper
limit - that is subjective to your use.

I know this isn't really a solution but still hope it helped anyway :-).

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
http://blogs.apress.com/authors.php?author=Sahil Malik





Cor Ligthert said:
Markus,

I have as well seen that removing rows from a datatable is slow.

When I saw your message I thought is it not possible to clone the table, add
the datarows that should stayto the new one, remove the old and than rename
the new one.

It was just an idea, I did not try it.

Cor


Markus Hjärne said:
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

Martin Schaech said:
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

"Markus Hjärne" <[email protected]> wrote in message
I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that aren't
used anymore from the DataTable to save memory. But my tests shows
that
it
takes very long time to remove the rows. Running the code below on a Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster
way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " +
elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne

 
I didn't think I had any index on the DataTable in my attached sample code -
no UniqueConstraint, no primary key, no ForeignKey constraint, nothing. Can
you please explain further what you mean.

Regards,

Markus

Miha Markic said:
Try the process without any index on tha table in question.

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
SLODUG - Slovene Developer Users Group
www.rthand.com

Markus Hjärne said:
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

Martin Schaech said:
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

"Markus Hjärne" <[email protected]> wrote in message
I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that aren't
used anymore from the DataTable to save memory. But my tests shows
that
it
takes very long time to remove the rows. Running the code below on a Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster
way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " +
elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne

 
Thanks for your suggestion,

I have actually thought about something simliar myself, but the problem is
that in my real application there aren't just one DataTable, there are quite
a few and there are also data relations between them and on top of that
there are also custom business objects referencing the DataRows...
So I dropped the idea pretty soon, but seeing how slow Remove is I'm quite
sure following your idea will yield better performance than using Remove.
And if I can't find any better idea, I might have to follow that dark road
in the end.

Thanks again,

Markus


Cor Ligthert said:
Markus,

I have as well seen that removing rows from a datatable is slow.

When I saw your message I thought is it not possible to clone the table, add
the datarows that should stayto the new one, remove the old and than rename
the new one.

It was just an idea, I did not try it.

Cor


Markus Hjärne said:
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

Martin Schaech said:
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

"Markus Hjärne" <[email protected]> wrote in message
I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that aren't
used anymore from the DataTable to save memory. But my tests shows
that
it
takes very long time to remove the rows. Running the code below on a Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster
way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " +
elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne

 
Thanks for yor reply,

can you tell me what's your source of information when you say that the
indexing engine has been re-written i ADO.NET 2.0?

Regards,

Markus

Deepak said:
Indexing engine for DataTable in ADO.NET 1.x is not very optimized, thus you
facing this issue.
It has been re-written in ADO.NET 2.0, and will do these operations much
faster. I believe there is nothing much you can do about this. Storing your
data in multiple data tables will boost performance, but you will have to
access this based on your requirements.

--
Regards,

Deepak
[I Code, therefore I am]

Markus Hjärne said:
I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests shows that it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way to
do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed +
"
ms.");

Grateful for any help on this,

Markus Hjarne

 
Thanks for your informative reply,

I can't remember reading anywhere how many rows a DataTable are 'meant' to
hold and if the test results I'm getting is not caused by anything odd I
have done, I actually think Microsoft has done a really lousy job in
implementing the Remove method. I think you can expect more from one of the
largest software companies in the world, especially since you as a
Windows-programmer almost always are at their mercy.

Regards,

Markus

Sahil Malik said:
First why is it slow?

... it is slow because removing involves two main painful tasks that grow in
pain exponentially with rowcount - a) Checking to see if the row even exists
before removing it, for which it has to go thru 100,000 rows twice (one for
nulls and one for the actual object) .. I believe this is in
SyncIList.Remove. .. and b) Settling the rows as you remove it, because the
index will always go from 1,2,3,4, .. rather than 1,2,4 after you remove row
#3. This is unfortunately and logically so the built in arraylist behavior
which you cannot change - but hey it makes sense to have it the way it is.

Cor your suggestion will probably not work for two reasons -
a) DataTable.Clone will not do a deep clone. But that can be gotten around
using a BinaryFormatter/MemoryStream combination.
b) Even when you do have a perfectly deep copy assuming your server didn't
crap out for memory reasons - you still won't be able to add a row into the
previousdt, *unless* you remove it from the new data table (cloned) - and
there you take the penalty of removal anyway.

So we're stuck between a rock and a hardspot - well DataTables are not meant
to store such large number of rows - so my kneejerk reaction is to try and
minimize THAT MUCH data in one datatable - and if you have no other
alternative, look into implementing your own business object instead. Even
if this business object's purpose in life is to simply act as a bucket for
your datarows - creating a datatable when you need it will still probably
work better than what you have now. Not to mention, you can easily code up
the business object to be databindable etc. etc. (So you really don't need
the DataTable to do this dance).

If you really really wanted to be anal and absolutely must have to fix it -
then I believe you can use Reflection and tinker with the inner arraylist
and remove the exist checking portion and code the rest yourself. That's
about the only way I can think of :-/ .. but then again maybe some
concoction with DataView might help .. I don't know .. but before I take
that pain, I'd first try and minimize the amount of data in the datatable.

Another option would be to implement your very own datatable - which is
albeit more work - but it's really a generic business object that suits your
needs. A DataTable is not designed for 100,000 rows IMHO - what is the upper
limit - that is subjective to your use.

I know this isn't really a solution but still hope it helped anyway :-).

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
http://blogs.apress.com/authors.php?author=Sahil Malik





Cor Ligthert said:
Markus,

I have as well seen that removing rows from a datatable is slow.

When I saw your message I thought is it not possible to clone the table, add
the datarows that should stayto the new one, remove the old and than rename
the new one.

It was just an idea, I did not try it.

Cor


Markus Hjärne said:
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

"Martin Schaech" <[email protected]> skrev i meddelandet
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

I'm using a DataTable to hold a selection of rows (100,000+) from a
large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests shows that
it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000
rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed
+
"
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne


 
Markus Hjärne said:
I didn't think I had any index on the DataTable in my attached sample
code -
no UniqueConstraint, no primary key, no ForeignKey constraint, nothing.
Can
you please explain further what you mean.

Ah, right, I overlooked the sample code :-)
You are right - it is slow.
It is a bit faster in .net 2 beta but still slow.
I guess the DataRowCollection isn't really optimised for Remove (I can
immagine that Remove is not a frequent operation afterall) method.
However, as others suggested, try retrieving less data at once.

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
SLODUG - Slovene Developer Users Group
www.rthand.com
Regards,

Markus

Miha Markic said:
Try the process without any index on tha table in question.

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
SLODUG - Slovene Developer Users Group
www.rthand.com

Markus Hjärne said:
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

"Martin Schaech" <[email protected]> skrev i meddelandet
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

I'm using a DataTable to hold a selection of rows (100,000+) from a
large
database table (1,000,000+ rows). At times I want to remove rows
that
aren't
used anymore from the DataTable to save memory. But my tests shows that
it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000
rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed
+ "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne


 
Cor your suggestion will probably not work for two reasons -
a) DataTable.Clone will not do a deep clone. But that can be gotten around
using a BinaryFormatter/MemoryStream combination.

There should not be made a deep copy. The schema should be copied and the
rows that stayed inserted to that new table and that is exactly what the
datatable clone does.

http://msdn.microsoft.com/library/d...l/frlrfsystemdatadatatableclassclonetopic.asp

Where you got the idea that I suggested a deep copy, that gives no result at
all.

When is told creating the table cost 187Ms and removing the rows 32Sec than
why not creating it partially new with the rows that stay?

However it was only an idea.

Cor
 
Markus - if you had to write a datatable, how would you store the rows?
Microsoft uses an ArrayList.

I am curious to know because it is possible to write your own datatable,
albeit given the complexity, you'd have to reduce the feature set - BUT - if
nothing else, people from Microsoft are reading these replies and you just
might have a bright idea there.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik








Markus Hjärne said:
Thanks for your informative reply,

I can't remember reading anywhere how many rows a DataTable are 'meant' to
hold and if the test results I'm getting is not caused by anything odd I
have done, I actually think Microsoft has done a really lousy job in
implementing the Remove method. I think you can expect more from one of
the
largest software companies in the world, especially since you as a
Windows-programmer almost always are at their mercy.

Regards,

Markus

Sahil Malik said:
First why is it slow?

... it is slow because removing involves two main painful tasks that grow in
pain exponentially with rowcount - a) Checking to see if the row even exists
before removing it, for which it has to go thru 100,000 rows twice (one for
nulls and one for the actual object) .. I believe this is in
SyncIList.Remove. .. and b) Settling the rows as you remove it, because the
index will always go from 1,2,3,4, .. rather than 1,2,4 after you remove row
#3. This is unfortunately and logically so the built in arraylist
behavior
which you cannot change - but hey it makes sense to have it the way it
is.

Cor your suggestion will probably not work for two reasons -
a) DataTable.Clone will not do a deep clone. But that can be gotten
around
using a BinaryFormatter/MemoryStream combination.
b) Even when you do have a perfectly deep copy assuming your server
didn't
crap out for memory reasons - you still won't be able to add a row into the
previousdt, *unless* you remove it from the new data table (cloned) - and
there you take the penalty of removal anyway.

So we're stuck between a rock and a hardspot - well DataTables are not meant
to store such large number of rows - so my kneejerk reaction is to try
and
minimize THAT MUCH data in one datatable - and if you have no other
alternative, look into implementing your own business object instead.
Even
if this business object's purpose in life is to simply act as a bucket
for
your datarows - creating a datatable when you need it will still probably
work better than what you have now. Not to mention, you can easily code
up
the business object to be databindable etc. etc. (So you really don't
need
the DataTable to do this dance).

If you really really wanted to be anal and absolutely must have to fix it -
then I believe you can use Reflection and tinker with the inner arraylist
and remove the exist checking portion and code the rest yourself. That's
about the only way I can think of :-/ .. but then again maybe some
concoction with DataView might help .. I don't know .. but before I take
that pain, I'd first try and minimize the amount of data in the
datatable.

Another option would be to implement your very own datatable - which is
albeit more work - but it's really a generic business object that suits your
needs. A DataTable is not designed for 100,000 rows IMHO - what is the upper
limit - that is subjective to your use.

I know this isn't really a solution but still hope it helped anyway :-).

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
http://blogs.apress.com/authors.php?author=Sahil Malik





Cor Ligthert said:
Markus,

I have as well seen that removing rows from a datatable is slow.

When I saw your message I thought is it not possible to clone the
table, add
the datarows that should stayto the new one, remove the old and than rename
the new one.

It was just an idea, I did not try it.

Cor


"Markus Hjärne" <[email protected]>

Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

"Martin Schaech" <[email protected]> skrev i meddelandet
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add
them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

I'm using a DataTable to hold a selection of rows (100,000+) from
a
large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests shows that
it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add
100,000
rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " +
elapsed
+
"
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne


 
I did some further tests with Remove() on ArrayList, and it's also very slow
when using the Remove method. But by first creating a Hashtable with all the
elements in the ArrayList as keys and their indices in the original
ArrayList as values, then using the Hashtable to create another ArrayList
with the indices of all the elements to remove, sorting it and then using
RemoveAt() on the original ArrayList, I managed to get resonable
performance.

The sort was necessary to ensure that the element with the highest index was
removed first, then the next highest and so on, since RemoveAt() changes the
indices of all elements with higher index. This algorithm also requires that
each element in the original ArrayList is unique.

So, why not use this technique for the DataTable? Simple because
DataRowCollection.RemoveAt() is as slow as DataRowCollection.Remove()!

But since every DataRow in a DataTable is a unique object, Microsoft could
use this technique in the DataTable class and add a
BeginRemove()/EndRemove() pair of methods (like
BeginLoadData()/EndLoadData()).

/Markus


Sahil Malik said:
Markus - if you had to write a datatable, how would you store the rows?
Microsoft uses an ArrayList.

I am curious to know because it is possible to write your own datatable,
albeit given the complexity, you'd have to reduce the feature set - BUT - if
nothing else, people from Microsoft are reading these replies and you just
might have a bright idea there.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik








Markus Hjärne said:
Thanks for your informative reply,

I can't remember reading anywhere how many rows a DataTable are 'meant' to
hold and if the test results I'm getting is not caused by anything odd I
have done, I actually think Microsoft has done a really lousy job in
implementing the Remove method. I think you can expect more from one of
the
largest software companies in the world, especially since you as a
Windows-programmer almost always are at their mercy.

Regards,

Markus

Sahil Malik said:
First why is it slow?

... it is slow because removing involves two main painful tasks that
grow
in
pain exponentially with rowcount - a) Checking to see if the row even exists
before removing it, for which it has to go thru 100,000 rows twice (one for
nulls and one for the actual object) .. I believe this is in
SyncIList.Remove. .. and b) Settling the rows as you remove it, because the
index will always go from 1,2,3,4, .. rather than 1,2,4 after you
remove
row
#3. This is unfortunately and logically so the built in arraylist
behavior
which you cannot change - but hey it makes sense to have it the way it
is.

Cor your suggestion will probably not work for two reasons -
a) DataTable.Clone will not do a deep clone. But that can be gotten
around
using a BinaryFormatter/MemoryStream combination.
b) Even when you do have a perfectly deep copy assuming your server
didn't
crap out for memory reasons - you still won't be able to add a row into the
previousdt, *unless* you remove it from the new data table (cloned) - and
there you take the penalty of removal anyway.

So we're stuck between a rock and a hardspot - well DataTables are not meant
to store such large number of rows - so my kneejerk reaction is to try
and
minimize THAT MUCH data in one datatable - and if you have no other
alternative, look into implementing your own business object instead.
Even
if this business object's purpose in life is to simply act as a bucket
for
your datarows - creating a datatable when you need it will still probably
work better than what you have now. Not to mention, you can easily code
up
the business object to be databindable etc. etc. (So you really don't
need
the DataTable to do this dance).

If you really really wanted to be anal and absolutely must have to fix it -
then I believe you can use Reflection and tinker with the inner arraylist
and remove the exist checking portion and code the rest yourself. That's
about the only way I can think of :-/ .. but then again maybe some
concoction with DataView might help .. I don't know .. but before I take
that pain, I'd first try and minimize the amount of data in the
datatable.

Another option would be to implement your very own datatable - which is
albeit more work - but it's really a generic business object that suits your
needs. A DataTable is not designed for 100,000 rows IMHO - what is the upper
limit - that is subjective to your use.

I know this isn't really a solution but still hope it helped anyway :-).

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
http://blogs.apress.com/authors.php?author=Sahil Malik





Markus,

I have as well seen that removing rows from a datatable is slow.

When I saw your message I thought is it not possible to clone the
table,
add
the datarows that should stayto the new one, remove the old and than
rename
the new one.

It was just an idea, I did not try it.

Cor


"Markus Hjärne" <[email protected]>

Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

"Martin Schaech" <[email protected]> skrev i meddelandet
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add
them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

I'm using a DataTable to hold a selection of rows (100,000+) from
a
large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests shows
that
it
takes very long time to remove the rows. Running the code below
on
a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add
100,000
rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster
way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " +
elapsed +
"
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " +
elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne


 
Then I think Microsoft should consider optimizing it for Remove, they got
the chance now when .NET 2.0 is still in beta.

They cannot reasonably figure out how all we developers will come to use the
..NET Framework, IMHO the only thing they can do is to really try to do their
best with every method of every class.

/Markus

Miha Markic said:
Markus Hjärne said:
I didn't think I had any index on the DataTable in my attached sample
code -
no UniqueConstraint, no primary key, no ForeignKey constraint, nothing.
Can
you please explain further what you mean.

Ah, right, I overlooked the sample code :-)
You are right - it is slow.
It is a bit faster in .net 2 beta but still slow.
I guess the DataRowCollection isn't really optimised for Remove (I can
immagine that Remove is not a frequent operation afterall) method.
However, as others suggested, try retrieving less data at once.

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
SLODUG - Slovene Developer Users Group
www.rthand.com
Regards,

Markus

Miha Markic said:
Try the process without any index on tha table in question.

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
SLODUG - Slovene Developer Users Group
www.rthand.com

Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

"Martin Schaech" <[email protected]> skrev i meddelandet
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

I'm using a DataTable to hold a selection of rows (100,000+) from a
large
database table (1,000,000+ rows). At times I want to remove rows
that
aren't
used anymore from the DataTable to save memory. But my tests shows that
it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000
rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed
+ "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne


 
Hi Markus,

This is the source of my information on DataTable Indexing engine for
ADO.NET 2.0

http://msdn.microsoft.com/data/default.aspx?pull=/library/en-us/dnadonet/html/datasetenhance.asp

Regards,

Deepak
[I Code, therefore I am]


Markus Hjärne said:
Thanks for yor reply,

can you tell me what's your source of information when you say that the
indexing engine has been re-written i ADO.NET 2.0?

Regards,

Markus

Deepak said:
Indexing engine for DataTable in ADO.NET 1.x is not very optimized, thus you
facing this issue.
It has been re-written in ADO.NET 2.0, and will do these operations much
faster. I believe there is nothing much you can do about this. Storing your
data in multiple data tables will boost performance, but you will have to
access this based on your requirements.

--
Regards,

Deepak
[I Code, therefore I am]

Markus Hjärne said:
I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests shows that it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way to
do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed +
"
ms.");

Grateful for any help on this,

Markus Hjarne


 
Markus,

Creating business objects or using System.Data.* - that is a problem area -
which is why the entire Science of ORM exists. Anyway, I read your other
suggestion (about hashtable), and this being a MS news group, more than
certainly some microsoft guy will read that.

Good going.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik






Markus Hjärne said:
Thanks for your suggestion,

I have actually thought about something simliar myself, but the problem is
that in my real application there aren't just one DataTable, there are quite
a few and there are also data relations between them and on top of that
there are also custom business objects referencing the DataRows...
So I dropped the idea pretty soon, but seeing how slow Remove is I'm quite
sure following your idea will yield better performance than using Remove.
And if I can't find any better idea, I might have to follow that dark road
in the end.

Thanks again,

Markus


Cor Ligthert said:
Markus,

I have as well seen that removing rows from a datatable is slow.

When I saw your message I thought is it not possible to clone the table, add
the datarows that should stayto the new one, remove the old and than rename
the new one.

It was just an idea, I did not try it.

Cor


Markus Hjärne said:
Thanks for your reply,

but I'm not using a DataGrid, only a DataTable, and everything except
removing rows from it is as fast as needed.

Regards,

Markus Hjarne

"Martin Schaech" <[email protected]> skrev i meddelandet
Datagrids are painfully slow with over one or two thousand records.
Instead, use a dataset to store and manipulate the data then add them
to a label. This method is not as pretty or sortable, but faster.

Martin Schaech

I'm using a DataTable to hold a selection of rows (100,000+) from a
large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests shows that
it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000
rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way
to do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed
+
"
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed
+ "
ms.");

Grateful for any help on this,

Markus Hjarne


 
Back
Top