Disposing framework objects

  • Thread starter Thread starter Chris Capel
  • Start date Start date
C

Chris Capel

I've heard a lot of talk on the forum about memory managment in .NET and
disposing things, finalizing them, garbage collection, etc. My question is
which managed types need to be disposed after being used, and for which is
it OK to simply let the garbage collector do its work, without worrying
about calling Dispose? Is there a general rule that most don't need to be
explicitly disposed, with an exception here and there? Or the other way
around?

Chris
 
That's not necessarily true. The SqlConnection class implements
IDisposable, but does not need to be disposed. Calling the Close method is
enough to close the connection and release resources.

Another example: DataSet class. This does not need to be disposed either.

I think Dispose is needed when the garbage collector is collecting objects
that have not had their resources released, then it calls Dispose to do
this. For example, if you forgot to close your SqlConnection, when the
garbage collector runs, it calls Dispose, and this releases the connection.
Of course, until it runs, your connection is just hanging around, and this
is why the developer should call Close explicitly.
 
Jon Skeet wrote:
[snip]
I think it would be a good idea to do so though. It can't do any harm,
and in fact it's likely to make garbage collection quicker, as it can
unregister the instance for finalization.

I agree that you'd better call Dispose() on types with finalizers if you
want the best performance (both memory footprint and speed). Because of this
fact and also because I hadn't yet grocked all the implications of GC, I was
a very strong advocate of the "Call Dispose() whenever possible" policy
until recently.
But, with a few thousand lines of experience behind me, I slowly but surely
come to the conclusion that following this policy often clutters your code
more than anything else (>99% of the code I write is not
performance-critical). I even think that I introduced more bugs this way.
The following real world code, which I have written today, makes for a good
example:

public void Log( string eventLogMsg )
{
MemoryStream stream = new MemoryStream();

// add eventLogMsg to stream and reset position
long origin = stream.Position;
StreamWriter writer = new StreamWriter( stream );
writer.Write( eventLogMsg );
writer.Flush();
stream.Position = origin;

// read stream into DataSet
StreamReader reader = new StreamReader( stream );
DataSet dataSet = new DataSet();
dataSet.ReadXml( reader );

// ...
}

eventLogMsg contains an XML message, which I need to read into dataSet,
which is then used for further processing. While I might have missed easier
ways to achieve the same, I do claim that this is perfectly reasonable code.
Two months ago, I would have written the same function as follows:

public void Log( string eventLogMsg )
{
using( MemoryStream stream = new MemoryStream() )
{
// add eventLogMsg to stream and reset position
long origin = stream.Position;
using ( StreamWriter writer = new StreamWriter( stream ) )
{
writer.Write( eventLogMsg );
}
stream.Position = origin; // BOOM

// read stream into DataSet
DataSet dataSet = new DataSet();
using( StreamReader reader = new StreamReader( stream ) )
{
dataSet.ReadXml( reader );
}

// ...
}
}

Problem is, this function will invariably throw an exception when it tries
to reset the Stream.Position property, which is not immediately obvious
unless you know that StreamWriter.Dispose() also calls Close() on the stream
object we passed to its constructor.

Sure, there are situations where you absolutely need to call Dispose() or
your program would simply not behave as expected (as was discussed here:
http://groups.google.com/[email protected]).
However, I believe these situations are more rare than the ones where a
naive programmer could introduce a bug with calling Dispose() too early
(Jeffrey Richter in his excellent book "Applied .NET framework programming
in C sharp" voices similar concerns).

Regards,

Andreas
 
Andreas Huber said:
I agree that you'd better call Dispose() on types with finalizers if you
want the best performance (both memory footprint and speed). Because of this
fact and also because I hadn't yet grocked all the implications of GC, I was
a very strong advocate of the "Call Dispose() whenever possible" policy
until recently.
But, with a few thousand lines of experience behind me, I slowly but surely
come to the conclusion that following this policy often clutters your code
more than anything else (>99% of the code I write is not
performance-critical). I even think that I introduced more bugs this way.
The following real world code, which I have written today, makes for a good
example:

<snip>

I think your example only shows that you should know what Dispose is
likely to do - and I would always have considered your previous code
somewhat suspect.

Dispose isn't just necessary for performance reasons, and I think it's
always safer to dispose appropriately - knowing what's going to happen.
I'll stick by always disposing.
 
<snip>

I think your example only shows that you should know what Dispose is
likely to do - and I would always have considered your previous code
somewhat suspect.

Dispose isn't just necessary for performance reasons, and I think it's
always safer to dispose appropriately - knowing what's going to happen.
I'll stick by always disposing.

--
There are also some subtle GC-related bugs that can occur if you don't call
Dispose. Calling it has the side-effect of keeping a reference to the object
until it has been Disposed, which prevents the GC from collecting it until
you really are done with it.
 
Dave said:
There are also some subtle GC-related bugs that can occur if you
don't call Dispose. Calling it has the side-effect of keeping a
reference to the object until it has been Disposed, which prevents
the GC from collecting it until you really are done with it.

Exactly what kind of bugs do you have in mind (apart from the one I
described in the linked article)?

Regards,

Andreas
 
Dave said:
There are also some subtle GC-related bugs that can occur if you don't call
Dispose. Calling it has the side-effect of keeping a reference to the object
until it has been Disposed, which prevents the GC from collecting it until
you really are done with it.

What GC-related bugs do you have in mind (apart from the one discussed
in the article I referenced)?

Regards,

Andreas
 
Andreas Huber said:
Definitely. I just wanted to show that calling Dispose() often and
early is not as easy as it might seem. I think beginners might do more
harm than good when trying to follow that policy.

But I believe the right way to deal with that to become more proficient
with Dispose, rather than to avoid Dispose itself.
What do you mean by "safer"? More correct?

Yes - less risk of losing data, etc.
If yes, I'd have to
disagree. I think both extremes (never calling Dispose() vs. always
calling Dispose()) have their potential for bugs. In my experience,
for beginners the latter is more buggy than the former.

But in the end, I suspect they'll need to go to the latter - so why
learn a hard way to avoid some of the bugs from never calling Dispose
rather than learn to avoid some of the bugs from always calling Dispose
which they'll go towards near the end.
 
Jon Skeet wrote:
[snip]
But in the end, I suspect they'll need to go to the latter - so why
learn a hard way to avoid some of the bugs from never calling Dispose

Sorry, I wasn't very clear: I'd never tell beginners to just stay away from
Dispose(). Rather, I'd tell them to only call Dispose() when there will be a
problem if they don't, e.g. with Files, DB connections, sockets, etc.

I believe that it's equally easy for a proficient .NET programmer to write
code in either mode (fewest possible calls to Dispose() vs. always dispose
as early as possible), with the only difference that the latter is less
readable, IMHO.
So, I wouldn't say that all .NET programmers must eventually fall into the
routine of always disposing. Instead, I prefer to only call Dispose() when
necessary for correctness. If profiling shows that finalization of some
objects is a bottleneck then I'd optimize selectively (I suspect I will
never have to do this).

Note that either style leads to correct and reasonably fast programs, so why
clutter your program with unnecessary using blocks and try/finally blocks?
(I don't want to convince you, the question is just to focus the discussion
:-)

Regards,

Andreas
 
Andreas Huber said:
Note that either style leads to correct and reasonably fast programs, so why
clutter your program with unnecessary using blocks and try/finally blocks?
(I don't want to convince you, the question is just to focus the discussion
:-)

I believe that always disposing is more likely to lead to correct
programs. If you dispose too early, that will usually be very obvious
very quickly. If you dispose too late (or not at all), then you may
well get into the situation where *most* of the time (or possibly all
of the time during testing) it works fine - but then you lose some data
in the field.
 
Andreas Huber said:
What makes you think so? Can you provide an example, where the
behavior as a result of not calling Dispose() is hard to reproduce?

Sure. Say you don't bother closing or disposing a stream, and you later
try to read the file that the stream was writing to. If the stream
*happens* to have been finalized before you try to read the file,
that's fine - otherwise you either end up reading just *some* of the
data written to the stream (ie everything that hasn't been flushed) or
you might even fail to be able to open the file in the first place,
depending on the file sharing specified.

Basically, implementing IDisposable means that this class or some
subclass may well have a resource which should be deterministically
disposed of - so you should dispose it at exactly the point where you
know that nothing in the instance is still useful to you.

If you rely on the finalizer to do stuff for you - whether knowingly or
not - then it might be fine during testing, if you happen to generate
enough garbage to get the finalizer called fairly swiftly, but things
might occur differently in real life. It's not easily visible from the
code just when the resource will be disposed, and it clearly *does*
matter with things like files etc. There will be other times when it
matters but it *isn't* as clear - how certain are you that you know
absolutely accurately when something is okay to remain undisposed?
We have several .NET projects running at our company, most programmers
I know only call Dispose() if they must. Yes, there were some related
bugs, but none of the people I asked have spent more than a few
minutes to fix them, because the problem was always reproducible.
Nobody remembers ever having shipped software with such bugs.

Wouldn't the same have been true when *always* using Dispose though, as
well as providing more determinism, better performance and generally
easier coding? (It's easier to remember to always dispose than to check
every time, and the code you provided which gave problems is really the
exception rather than the rule, IME - and easily diagnosed precisely
because the disposal happens deterministically.)
 
Exactly what kind of bugs do you have in mind (apart from the one I
described in the linked article)?

Regards,

Andreas

There are two potential problems; first, the object does not get collected
soon enough so that you are unable to access a system resource. The second
is when it gets collected too soon (and unexpectedly) so that you no longer
have access to the system resource.

Adam Nathan's blog posted an example where not calling Dispose on an object
which wrapped a native handle had the potential for the object being garbage
collected too soon.

public static void Main ()

{

// Open the key for read-only access

RegistryKey key = Registry.LocalMachine.OpenSubKey(

"Software\\Microsoft\\.NETFramework");

Console.WriteLine("Install root: " +

key.GetValue("InstallRoot"));

}


This link describes the problem.
http://blogs.gotdotnet.com/anathan/commentview.aspx/8ec0b7b2-6290-4500-a8c7-1b6c677214cb

The basic problem is that since key is a local variable it goes out of scope
as soon as key.GetValue is invoked, which means that it is eligible for
garbage collection. But this occurs prior to the p/invoke call to the
underlying registry method that GetValue invokes. So you now have a race
condition between the two events; you'll win that race most of the time, but
that 0.0001% will burn you with a very hard to reproduce error. If Dispose
had been called after the call to GetValue then the reference would have
been kept alive until the end of the method.

An alternative is to use ...
using ( RegistryKey key= Registry...yadayadayada() )
{
/// code using key goes here
}
 
[email protected] (ozbear) wrote in message news: said:
Anytime you are using a non-shareable, unmanaged resource, like a
serial port.

How true, that nicely characterizes it!
If you don't explicitly Dispose the object managing the port, it is
in the laps of the gods when the object will be disposed of for you
and the port closed. Another instance in the same program may
experience intermintent "port is in use" open errors depending on when
the object is disposed of for you. Machines used for development
might have far more resources of all kinds available to them so the
GC and its finalizers might not dispose of the object (and close the
port) in the development cycle.

So, the bug would show up almost always on development machines, while
there is a chance that it goes undetected in the field? ;-)

Regards,

Andreas
 
Dave wrote:
[snip]
Adam Nathan's blog posted an example where not calling Dispose on an
object
which wrapped a native handle had the potential for the object being
garbage
collected too soon.

Adam Nathan himself says that this is a bug in the implementation of the
RegistryKey class. Calling Dispose() would sure prevent the bug from
manifesting, but to me this is more of a side-effect than the root of the
problem. In other words: What the heck are finalizers for, if correctness
mandates that we call Dispose() on *all* objects with unmanaged resources
anyway?
The problem somewhat resembles the one of temporary object lifetime in C++.
I vaguely recall that in the pre-standard days (~1995) there were huge
discussions of lifetime issues of (stack-allocated) temporary object created
during expression evaluation. E.g.

Matrix r, a, b, c, d;

// fill matrices

r = a * b * c * d;

Here, the C++ standard mandates that the temporary created by a * b must
live until the result has been assigned to r, although the temporary is no
longer needed when it has been multiplied with c, the results of which will
be stored in a new temporary object. It seems that the problem Nathan
documents only crops up because there aren't similar rules in C#.

Regards,

Andreas
 
Jon said:
Sure. Say you don't bother closing or disposing a stream, and you
later try to read the file that the stream was writing to. If the
stream *happens* to have been finalized before you try to read the
file,
that's fine - otherwise you either end up reading just *some* of the
data written to the stream (ie everything that hasn't been flushed) or
you might even fail to be able to open the file in the first place,
depending on the file sharing specified.

Ok, you do have a point concerning the ability to reopen the file later.
*When* the file will be closed entirely depends on when the FileStream
finalizer is run, which might be hard to reproduce. However, it's still a
bug that's easy to find, IMHO. The same goes for unflushed buffers. A
buffers works deterministically and writing the same data must therefore
always lead to the same portion being lost.
Basically, implementing IDisposable means that this class or some
subclass may well have a resource which should be deterministically
disposed of - so you should dispose it at exactly the point where you
know that nothing in the instance is still useful to you.

I wouldn't do so unconditionally on all objects. As Oz puts it:

"Anytime you are using a non-shareable, unmanaged resource, like a serial
port"

I'd refine this to:

"Dispose() must be called on all objects that directly or indirectly own
either a non-shareable resource or a scarce unmanaged resource."

Exactly what is a scarce resource depends on what you're doing. Some
programs will never open more than few files and will thus work just fine
even if they don't call Dispose() (provided nobody else has to reopen).
Other programs might open loads and loads of files, thus easily bringing the
system to its file handle limit when not calling Dispose() as early as
possible. However, I'm not sure whether that's just a theoretical issue or
something that does happen in the real world.
To be on the safe side, library developers - not knowing what kind of
systems their clients will implement - should therefore always call
Dispose() on objects the number of which depends on the outside world.
If you rely on the finalizer to do stuff for you - whether knowingly
or not - then it might be fine during testing, if you happen to
generate enough garbage to get the finalizer called fairly swiftly,
but things might occur differently in real life. It's not easily
visible from the code just when the resource will be disposed, and it
clearly *does*
matter with things like files etc. There will be other times when it
matters but it *isn't* as clear - how certain are you that you know
absolutely accurately when something is okay to remain undisposed?

I use the rule defined above, which - admittedly - is difficult to follow in
rare cases. The resulting bugs will often surface quite quickly.
The situation with always calling Dispose() is similar. In rare cases it's
difficult to determine when an object is no longer needed. Call Dispose()
too early and you have introduced a bug. However, I do see that the bug will
generally surface in a more reproducible manner.
Wouldn't the same have been true when *always* using Dispose though,
as well as providing more determinism, better performance and
generally easier coding? (It's easier to remember to always dispose
than to check every time, and the code you provided which gave
problems is really the exception rather than the rule, IME - and
easily diagnosed precisely because the disposal happens
deterministically.)

I thought a long time about this. I guess I don't have enough experience to
know. You do have a point that the "always call Dispose()" rule is easier to
follow than my rather not so sharp rule. The only downside I'm still
wrestling with is the fact that the resulting programs are harder to
read/understand with all those using blocks and try/finally blocks. Maybe
that's just a matter of getting used to.

Now that we nailed most of the relevant facts around Dispose(), I'm
wondering what all the experts are doing.
Jeffrey Richter, for one, explicitly voices concerns against always calling
Dispose(). It would be nice to know how other experts think.

Regards,

Andreas
 
Dave said:
Andreas Huber said:
Dave wrote:
[snip]
Adam Nathan's blog posted an example where not calling Dispose on an
object
which wrapped a native handle had the potential for the object being
garbage
collected too soon.

Adam Nathan himself says that this is a bug in the implementation of
the RegistryKey class. Calling Dispose() would sure prevent the bug
from manifesting, but to me this is more of a side-effect than the
root of the problem. In other words: What the heck are finalizers
for, if correctness mandates that we call Dispose() on *all* objects
with unmanaged resources anyway?

It may be a bug/limitation in the Registry class, but at this point
in time if we don't take that into account then it will show up as a
flaw in our code when we utilize it. There is no guarantee that there
aren't many other classes with the same problem, both those written
by MSFT developers and especially those written by non-MSFT
developers, who are less likely to be aware of the potential flaws.

Also, the problem Nathan described is not that someone did not call
Dispose, rather, it was caused by the code not maintaining a
reference to the object alive long enough to ensure that the object
would not be collected until the code had actually finished using it.
The call to Dispose served to maintain the reference to the object
until after the call that used the object had finished executing.


Yes, that was clear.
This is more of a lifetime issue then a Dispose/finalizer issue,
though there is some relationship between the two - calling Dispose
when done with an object ensures a reference to it exists until it is
no longer needed. I doubt there will be rules written into the C#
language that will prevent this sort of problem.

I agree, given how GC works.
Nathan didn't say it
was a language problem so much as it was an implementation problem.

I know. I found the behavior disturbingly counter-intuitive, that's why I
suggested there might be a way to fix this within the language.

Regards,

Andreas
 
Back
Top