Freeing memory used by old datarows

  • Thread starter Thread starter Leo Tohill
  • Start date Start date
L

Leo Tohill

Here's an interesting finding about when the memory of a datarow is freed.

if you do this:

DataRow newRow = dataTable.NewRow();

The new row will never be GC'd, even after newRow goes out of scope, and
even though the new row has state "Detached".

On the other hand, do this:

DataRow newRow = dataTable.NewRow();
newRow.Table.Rows.Add(newRow);
newRow.Delete(); // changes state to "detached"

This allows newRow to be GC'd. This looks like a gimmick to me - it should
not be necessary. But, it works.

Just FYI y'all.

- Leo Tohill
 
Hi Leo,

That is because DataTable keeps track of its rows.
Why would you creata a row and not add it to table, anyway?
 
The result of calling dataTable.NewRow() is a row that is NOT in the table.
The DataTable does not keep track of rows that have not been added to the
table.

The fact that the row IS garbage collected after a Add() and Delete()
confirms that it indeed CAN be GC'd. The only mystery is why it isn't GC'd
when it has only been NewRow()'d. After all, in each case the status of the
row is Detached.

Why create a row and not add it? In this case, the row is merely a
convenience structure for communicating data from one component to another.
It will never be saved to the db. No table constraints are needed. If it
had behaved as I expected, I could have avoided the overhead of the Add()
and the Delete().
 
Leo Tohill said:
Here's an interesting finding about when the memory of a datarow is freed.

if you do this:

DataRow newRow = dataTable.NewRow();

The new row will never be GC'd, even after newRow goes out of scope, and
even though the new row has state "Detached".

How did you determine this?

Here's a program which suggests otherwise:

using System;
using System.Data;
using System.Threading;

class Test
{
static void Main()
{
DataTable table = new DataTable();
table.Columns.Add(new DataColumn ("Test", typeof(Test)));

DataRow newRow = table.NewRow();
newRow[0] = new Test();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine ("Sleeping");
Thread.Sleep(5000);

// GC.KeepAlive(newRow);
}

~Test()
{
Console.WriteLine ("Finalizer called");
}
}

With the commented line commented as shown, the result is:

Finalizer called
Sleeping

- in other words, the created instance of Test is finalized before the
sleep. The only thing that has a reference to that instance is the
DataRow, but uncommenting the KeepAlive line changes the output to:

Sleeping
Finalizer called

That suggests that the row is being garbage collected normally to me.

Do you have a program which demonstrates that *not* happening?
 
OP is right but not for the reasons stated. The table still maintains a
reference to the row. Garbage collection cannot occur in this case as long
as the table is still around. The following code takes advantage of the fact
that a row can only exist in one table. If it did not have an existing
reference the code would not fail.

System.Data.DataSet ds = new System.Data.DataSet();
System.Data.DataSet ds2 = new System.Data.DataSet();

//use this to maintain a root

System.Data.DataTable dataTable = new System.Data.DataTable("test");

System.Data.DataTable dataTable2 = new System.Data.DataTable("test2");

ds.Tables.Add(dataTable);

ds2.Tables.Add(dataTable2);

//newRow is owned by dataTable. It cannot be collected

System.Data.DataRow newRow = dataTable.NewRow();

newRow.Table.Rows.Add(newRow);


//this line does not remove the existing reference

newRow.Delete(); // changes state to "detached"

//ds.Tables[0].Rows.Add(newRow);

//newRow = null;

ds2.Tables[0].Rows.Add(newRow);

I'm not entirely clear on why Jon's code shows that GC occurs. To me this
seems wrong. Maybe somebody examine this in more detail.


--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/3he3b
Jon Skeet said:
Leo Tohill said:
Here's an interesting finding about when the memory of a datarow is freed.

if you do this:

DataRow newRow = dataTable.NewRow();

The new row will never be GC'd, even after newRow goes out of scope, and
even though the new row has state "Detached".

How did you determine this?

Here's a program which suggests otherwise:

using System;
using System.Data;
using System.Threading;

class Test
{
static void Main()
{
DataTable table = new DataTable();
table.Columns.Add(new DataColumn ("Test", typeof(Test)));

DataRow newRow = table.NewRow();
newRow[0] = new Test();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine ("Sleeping");
Thread.Sleep(5000);

// GC.KeepAlive(newRow);
}

~Test()
{
Console.WriteLine ("Finalizer called");
}
}

With the commented line commented as shown, the result is:

Finalizer called
Sleeping

- in other words, the created instance of Test is finalized before the
sleep. The only thing that has a reference to that instance is the
DataRow, but uncommenting the KeepAlive line changes the output to:

Sleeping
Finalizer called

That suggests that the row is being garbage collected normally to me.

Do you have a program which demonstrates that *not* happening?
 
OP is right but not for the reasons stated.

Nearly right, at least...
The table still maintains a
reference to the row. Garbage collection cannot occur in this case as long
as the table is still around.

Yup, you're right. Oops.
I'm not entirely clear on why Jon's code shows that GC occurs. To me this
seems wrong. Maybe somebody examine this in more detail.

There's nothing else holding a reference to the DataTable in my code,
so that won't be stopping the row from being collected.

So, it looks like the OP was wrong in saying that the new row would
*never* be collected - but it won't be collected until the table is.
 
Jon Skeet said:
So, it looks like the OP was wrong in saying that the new row would
*never* be collected - but it won't be collected until the table is.

Hmm... I can't get the new row to be deleted using the Add/Delete
method shown by the OP though. I'm basically somewhat confused... if
the OP could show a complete example showing *both* behaviours, it
would be very helpful.
 
The row isn't actually deleted. It is only flagged as deleted. It's more
efficient this way and also allows the changes to be reversed by calling
rejectchanges. It's a matter of simply clearing the flag either way. Was
that what you were asking?
 
I'll have a code posting in a few minutes, but some questions on this:

1) you show
// newRow is owned by dataTable. It cannot be collected
System.Data.DataRow newRow = dataTable.NewRow();
how do you support the statement that newRow is owned by the dataTable? It
is, in fact, a detached row. I know that due to lack of GC that is "seems"
that the dt owns it, but it could be something else. Perhaps the dataset
holds a collectioin of all new, never attached rows.

2) how does this test prove or disprove the GC of the row?

Alvin Bruney said:
OP is right but not for the reasons stated. The table still maintains a
reference to the row. Garbage collection cannot occur in this case as long
as the table is still around. The following code takes advantage of the fact
that a row can only exist in one table. If it did not have an existing
reference the code would not fail.

System.Data.DataSet ds = new System.Data.DataSet();
System.Data.DataSet ds2 = new System.Data.DataSet();

//use this to maintain a root

System.Data.DataTable dataTable = new System.Data.DataTable("test");

System.Data.DataTable dataTable2 = new System.Data.DataTable("test2");

ds.Tables.Add(dataTable);

ds2.Tables.Add(dataTable2);

//newRow is owned by dataTable. It cannot be collected

System.Data.DataRow newRow = dataTable.NewRow();

newRow.Table.Rows.Add(newRow);


//this line does not remove the existing reference

newRow.Delete(); // changes state to "detached"

//ds.Tables[0].Rows.Add(newRow);

//newRow = null;

ds2.Tables[0].Rows.Add(newRow);

I'm not entirely clear on why Jon's code shows that GC occurs. To me this
seems wrong. Maybe somebody examine this in more detail.


--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/3he3b
Jon Skeet said:
Leo Tohill said:
Here's an interesting finding about when the memory of a datarow is freed.

if you do this:

DataRow newRow = dataTable.NewRow();

The new row will never be GC'd, even after newRow goes out of scope, and
even though the new row has state "Detached".

How did you determine this?

Here's a program which suggests otherwise:

using System;
using System.Data;
using System.Threading;

class Test
{
static void Main()
{
DataTable table = new DataTable();
table.Columns.Add(new DataColumn ("Test", typeof(Test)));

DataRow newRow = table.NewRow();
newRow[0] = new Test();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine ("Sleeping");
Thread.Sleep(5000);

// GC.KeepAlive(newRow);
}

~Test()
{
Console.WriteLine ("Finalizer called");
}
}

With the commented line commented as shown, the result is:

Finalizer called
Sleeping

- in other words, the created instance of Test is finalized before the
sleep. The only thing that has a reference to that instance is the
DataRow, but uncommenting the KeepAlive line changes the output to:

Sleeping
Finalizer called

That suggests that the row is being garbage collected normally to me.

Do you have a program which demonstrates that *not* happening?
 
This code illustrates the problem. On my machine, the version with the
commented lines nets 500k+ of memory, while uncommenting the lines
gives a net 1k .

private static void DataRowCleanup()
{
DataTable table = new DataTable();
table.Columns.Add(new DataColumn ("Test", typeof(string)));
GC.Collect(); // just to assure a decent baseline
System.GC.WaitForPendingFinalizers(); //
long startMemory = GC.GetTotalMemory(true);
for (int i=0; i < 10000; i++)
{
DataRow newRow = table.NewRow();
newRow[0] = "a fairly long string that will eat up some memory, huh?";
// uncomment the next two lines, and the new rows become GC-able
//table.Rows.Add(newRow);
//newRow.Delete();
}
// assertion: there are no remaining references from this class to the new
rows.
// in fact, those rows are totally inaccessible from any user code.
GC.Collect();
System.GC.WaitForPendingFinalizers();
long endMemory = GC.GetTotalMemory(true);
long memUsed = endMemory - startMemory;
System.Console.WriteLine(memUsed.ToString());
}

- Leo Tohill (the OP)


Jon Skeet said:
Leo Tohill said:
Here's an interesting finding about when the memory of a datarow is freed.

if you do this:

DataRow newRow = dataTable.NewRow();

The new row will never be GC'd, even after newRow goes out of scope, and
even though the new row has state "Detached".

How did you determine this?

Here's a program which suggests otherwise:

using System;
using System.Data;
using System.Threading;

class Test
{
static void Main()
{
DataTable table = new DataTable();
table.Columns.Add(new DataColumn ("Test", typeof(Test)));

DataRow newRow = table.NewRow();
newRow[0] = new Test();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine ("Sleeping");
Thread.Sleep(5000);

// GC.KeepAlive(newRow);
}

~Test()
{
Console.WriteLine ("Finalizer called");
}
}

With the commented line commented as shown, the result is:

Finalizer called
Sleeping

- in other words, the created instance of Test is finalized before the
sleep. The only thing that has a reference to that instance is the
DataRow, but uncommenting the KeepAlive line changes the output to:

Sleeping
Finalizer called

That suggests that the row is being garbage collected normally to me.

Do you have a program which demonstrates that *not* happening?
 
The row isn't actually deleted. It is only flagged as deleted. It's more
efficient this way and also allows the changes to be reversed by calling
rejectchanges. It's a matter of simply clearing the flag either way. Was
that what you were asking?

I'd forgotten to call table.AcceptChanges, certainly - but even so, I
still don't get to see the row being garbage collected afterwards.

Basically, I can't reproduce what the OP was seeing.
 
Leo Tohill said:
This code illustrates the problem. On my machine, the version with the
commented lines nets 500k+ of memory, while uncommenting the lines
gives a net 1k .

A few things about this code:

1) On my box I get results of "-1632" (with the comments in) and
"-1308" (without the comments) when run on its own (I've basically
changed your method to be Main).

2) You don't actually end up giving a decent baseline, because it
doesn't include memory being snaffled by the JIT compiler - you need to
run it twice, basically.

3) Your string isn't actually taking up much space at all - it's just a
reference to the string data which will never get garbage collected. A
better test is to have a column with a byte array in, and make that
byte array large (and new for each row).

I've done the latter, and *then* I see an OutOfMemoryException when I
don't have the commented code in, depending on the size of byte array I
create.

There's clearly *something* odd going on here, but I'm not entirely
sure what. I think it's more likely to be one of the obscure GC bugs
which occasionally crops up than a problem with DataTable. Adding a
value of 10K per row, I can still get through 10000 iterations, and
endMemory is only 48K, rather than the 100M which would be visible if
the GC weren't happening at all.

I suspect that the Add/Delete calls are just creating some small
objects which perturb the garbage collector just enough to get
everything working.
 
how do you support the statement that newRow is owned by the dataTable?
In my example, System.Data.DataRow newRow = dataTable.NewRow();
creates the dependency. The active schema for the table object is forced
unto the newRow object implicitly creating a dependency. I suspect that my
example departs from yours in that regard. Mine was used to show that a
reference is being kept. Maybe i should have constructed an example closer
to your scenario.
2) how does this test prove or disprove the GC of the row?
objects with roots cannot be collected. If the row has a reference to it,
collection will not occur. My strategy was to show that the row still
maintained a reference which is why the collection did not occur. make
sense?

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/3he3b
Leo Tohill said:
I'll have a code posting in a few minutes, but some questions on this:

1) you show
// newRow is owned by dataTable. It cannot be collected
System.Data.DataRow newRow = dataTable.NewRow();
how do you support the statement that newRow is owned by the dataTable? It
is, in fact, a detached row. I know that due to lack of GC that is "seems"
that the dt owns it, but it could be something else. Perhaps the dataset
holds a collectioin of all new, never attached rows.

2) how does this test prove or disprove the GC of the row?

Alvin Bruney said:
OP is right but not for the reasons stated. The table still maintains a
reference to the row. Garbage collection cannot occur in this case as long
as the table is still around. The following code takes advantage of the fact
that a row can only exist in one table. If it did not have an existing
reference the code would not fail.

System.Data.DataSet ds = new System.Data.DataSet();
System.Data.DataSet ds2 = new System.Data.DataSet();

//use this to maintain a root

System.Data.DataTable dataTable = new System.Data.DataTable("test");

System.Data.DataTable dataTable2 = new System.Data.DataTable("test2");

ds.Tables.Add(dataTable);

ds2.Tables.Add(dataTable2);

//newRow is owned by dataTable. It cannot be collected

System.Data.DataRow newRow = dataTable.NewRow();

newRow.Table.Rows.Add(newRow);


//this line does not remove the existing reference

newRow.Delete(); // changes state to "detached"

//ds.Tables[0].Rows.Add(newRow);

//newRow = null;

ds2.Tables[0].Rows.Add(newRow);

I'm not entirely clear on why Jon's code shows that GC occurs. To me this
seems wrong. Maybe somebody examine this in more detail.


--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/3he3b
Jon Skeet said:
Here's an interesting finding about when the memory of a datarow is freed.

if you do this:

DataRow newRow = dataTable.NewRow();

The new row will never be GC'd, even after newRow goes out of scope, and
even though the new row has state "Detached".

How did you determine this?

Here's a program which suggests otherwise:

using System;
using System.Data;
using System.Threading;

class Test
{
static void Main()
{
DataTable table = new DataTable();
table.Columns.Add(new DataColumn ("Test", typeof(Test)));

DataRow newRow = table.NewRow();
newRow[0] = new Test();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine ("Sleeping");
Thread.Sleep(5000);

// GC.KeepAlive(newRow);
}

~Test()
{
Console.WriteLine ("Finalizer called");
}
}

With the commented line commented as shown, the result is:

Finalizer called
Sleeping

- in other words, the created instance of Test is finalized before the
sleep. The only thing that has a reference to that instance is the
DataRow, but uncommenting the KeepAlive line changes the output to:

Sleeping
Finalizer called

That suggests that the row is being garbage collected normally to me.

Do you have a program which demonstrates that *not* happening?
 
"The active schema for the table object is forced unto the newRow object
implicitly creating a dependency."
Seems to me that the datarow would hold a reference to the table's schema,
not vice versa. A reference from the row to the table schema should still
allow the row to be GC'd. But of course, we can't know how this is done
without the source, so I certainly agree that something in the table, or
dataset, is referring the row. The question is, why is it necessary to
add/delete to clear the reference?

"My strategy was to show that the row still maintained a reference which is
why the collection did not occur. make sense?"
As above, a reference from the row to the table shouldn't prevent gc of the
row.

regards,

leo

Alvin Bruney said:
how do you support the statement that newRow is owned by the dataTable?
In my example, System.Data.DataRow newRow = dataTable.NewRow();
creates the dependency. The active schema for the table object is forced
unto the newRow object implicitly creating a dependency. I suspect that my
example departs from yours in that regard. Mine was used to show that a
reference is being kept. Maybe i should have constructed an example closer
to your scenario.
2) how does this test prove or disprove the GC of the row?
objects with roots cannot be collected. If the row has a reference to it,
collection will not occur. My strategy was to show that the row still
maintained a reference which is why the collection did not occur. make
sense?

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/3he3b
Leo Tohill said:
I'll have a code posting in a few minutes, but some questions on this:

1) you show
// newRow is owned by dataTable. It cannot be collected
System.Data.DataRow newRow = dataTable.NewRow();
how do you support the statement that newRow is owned by the dataTable? It
is, in fact, a detached row. I know that due to lack of GC that is "seems"
that the dt owns it, but it could be something else. Perhaps the dataset
holds a collectioin of all new, never attached rows.

2) how does this test prove or disprove the GC of the row?

Alvin Bruney said:
OP is right but not for the reasons stated. The table still maintains a
reference to the row. Garbage collection cannot occur in this case as long
as the table is still around. The following code takes advantage of
the
fact
that a row can only exist in one table. If it did not have an existing
reference the code would not fail.

System.Data.DataSet ds = new System.Data.DataSet();
System.Data.DataSet ds2 = new System.Data.DataSet();

//use this to maintain a root

System.Data.DataTable dataTable = new System.Data.DataTable("test");

System.Data.DataTable dataTable2 = new System.Data.DataTable("test2");

ds.Tables.Add(dataTable);

ds2.Tables.Add(dataTable2);

//newRow is owned by dataTable. It cannot be collected

System.Data.DataRow newRow = dataTable.NewRow();

newRow.Table.Rows.Add(newRow);


//this line does not remove the existing reference

newRow.Delete(); // changes state to "detached"

//ds.Tables[0].Rows.Add(newRow);

//newRow = null;

ds2.Tables[0].Rows.Add(newRow);

I'm not entirely clear on why Jon's code shows that GC occurs. To me this
seems wrong. Maybe somebody examine this in more detail.


--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/3he3b
Here's an interesting finding about when the memory of a datarow is
freed.

if you do this:

DataRow newRow = dataTable.NewRow();

The new row will never be GC'd, even after newRow goes out of
scope,
and
even though the new row has state "Detached".

How did you determine this?

Here's a program which suggests otherwise:

using System;
using System.Data;
using System.Threading;

class Test
{
static void Main()
{
DataTable table = new DataTable();
table.Columns.Add(new DataColumn ("Test", typeof(Test)));

DataRow newRow = table.NewRow();
newRow[0] = new Test();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine ("Sleeping");
Thread.Sleep(5000);

// GC.KeepAlive(newRow);
}

~Test()
{
Console.WriteLine ("Finalizer called");
}
}

With the commented line commented as shown, the result is:

Finalizer called
Sleeping

- in other words, the created instance of Test is finalized before the
sleep. The only thing that has a reference to that instance is the
DataRow, but uncommenting the KeepAlive line changes the output to:

Sleeping
Finalizer called

That suggests that the row is being garbage collected normally to me.

Do you have a program which demonstrates that *not* happening?
 
Thanks for the points, you are of course correct.

I've now changed mine to form the string with a suffix of "+new
Random().ToString();" so a new string should be created on each iteration.
I also did the warmup run for the JIT. I still get pretty much the same
results.


- leo
 
How many iterations can you complete with your 10k per row?


Leo Tohill said:
Thanks for the points, you are of course correct.

I've now changed mine to form the string with a suffix of "+new
Random().ToString();" so a new string should be created on each iteration.
I also did the warmup run for the JIT. I still get pretty much the same
results.


- leo
 
Leo Tohill said:
How many iterations can you complete with your 10k per row?

Well that's interesting - I've just tried again, and it *does* appear
to be losing memory, when you use GetTotalMemory to display it - but
when it's claiming (say) 400Mb (after 40000 iterations) TaskManager is
only displaying about 150Mb.

Also, if you put something in to recreate the *table* after every (say)
2000 iterations, the memory usage stays minimal.

It looks like the table *does* maintain a reference to newly created
rows. Whether this is a bug or not, I don't know. It should certainly
be documented though.
 
Back
Top