K
Ken Durden
I am in search of a comprehensive methodology of using these two
object cleanup approaches to get rid of a number of bugs,
unpleasantries, and cleanup-ordering issues we currently have in our
4-month old C#/MC++ .NET project project.
I'd like to thank in advance anyone who takes the time to read and/or
respond to this message. At a couple points, it may seem like a rant
against C# / .NET, but we are pretty firmly stuck with this approach
and I am simply trying to find solutions which balance all the
difference goals and criterion we have for project quality.
First, a little background:
a.)
We're all previously C++ programmers with lots of experience with
ref-counted, synchronized auto-ptrs and things like that. We bought
into .NET because of its remoting advantages over COM, not because of
the notion that it makes resource management easier. We _knew_ it made
non-memory resource management harder, but decided it was worth it.
b.)
We happen to doing a lot of non-memory resource management in C#.
Some of this is implemented in MC++ classes which contain unmanaged
3rd party resources. These 3rd party handles have ordering constraints
requiring all resources of one type to be released before resources of
a different type can be released. For example, all image handles must
be released before the API handle can be released.
We're also doing things which IMHO make a lot of sense, but puts our
face right in the door of .NET so to speak; like, for example, having
an object contain a thread as a member variable which it stops as part
of its destructor. (More on why this blows up further down)
c.)
We're currently using the following IDisposable pattern for any object
which we think needs to have its lifetime managed explicitly (because
it contains large resources, for example).
class A : IDisposable
{
~A()
{
Dispose(false);
}
public void Dispose()
{
Dispose(true);
GC.SupressFinalize(this);
}
protected virtual void Dispose( bool bExplicitDispose )
{
if( bExplicitDispose )
{
// Cleanup managed resources...
// This means, call Dispose() on all your "owned" members which
// implement it. Ownership is kindof a touchy subject in C#, but
// IDisposable forces it to be considered.
}
// Cleanup unmanaged resources
// This means, delete any _truly_ unmanaged resources you have
// (like __nogc) pointers for a MC++ class. This also means
performing
// actions on managed objects other than calling Dispose though.
// For example, a Thread is a managed object, but this an
appropriate
// time to do a controlled stop on it.
}
}
Hopefully everyone will be familiar with this basic pattern as IMO it
is pretty common and popularly recommended by the so-called experts.
I've tried to document our interpretation of it pretty well, however,
because most of the examples I saw looked more like this:
protected virtual void Dispose( bool bDisposing )
{
if( bDisposing )
{
// Cleanup managed resources
}
// Unmanaged resources
}
Not only was the guidance poor on what to actually do in these blocks,
the variable name bDisposing which was used in 90% of example code is
fairly confusing given that you are inside of a function called
Dispose().
The initial guidance from literature we had read was that only certain
objects needed to have an IDisposable, and that IDisposable was really
against the .NET way of doing things. The above pattern supported this
by providing a client-side optimization choice to Dispose of an object
or not, but the code would basically work either way. The destructor
is implemented using Dispose so that clients not concerned with the
optimization get the same effect at some unspecified time later.
Since then, I've read articles discussing some of the problems with
properly implementing destructors make the argument that every object
should have an IDisposable interface on it, and that clients should
_always_ call this when they are done using an object. This changes
the intent of IDisposable from an optimization for special
circumstances to the everyday-preferred technique; in this case, the
role of the destructor is to guard against a careless programmer
forgetting to call Dispose.
The difference may be subtle, but it changes the roles of the two
functions fairly significantly IMO.
d.)
We designed a singleton pattern around our understanding of the way
the GC worked using the following concept.
class ASingleton
{
public ASingleton Instance
{
get
{
// Assorted locking, and allocation of the singleton variable.
// ...
return m_singleton;
}
}
public void Release()
{
m_singleton = null;
}
~ASingleton()
{
// Stuff
// ...
}
private ASingleton m_singleton = null;
}
The implementation of the Release method was the key to this concept.
Rather than disposing the singleton, we simply set the singleton
reference to the singleton to null. From that point on, any client
requesting the singleton would get a null reference and would be
expected to handle that situation accordingly. We thought this would
resolve an issue which is fairly hard to address in C++ (without the
support of Ref-Counted AutoPtrs at least)... Singleton lifetimes near
the end of the program lifetime.
For example... Consider the following code:
void Cleanup()
{
ASingleton aSingleton = ASingleton.Instance;
// If the singleton is still alive, then use it... otherwise
// the program must be going down and we can't use it
if( aSingleton != null )
{
aSingleton.DoSomething();
}
}
The singleton pattern we developed guaranteed that if the if-test
succeeded, the call to DoSomething() would have a valid object to work
on. There were no timing issues introduce the possibility that the
singleton could be destructed between the if-check and the function
call because the local reference inside the function prevented that
problem.
Of course, its still possible for clients to abuse this power:
void Cleanup
{
// If the singleton is still alive, then use it... otherwise
// the program must be going down and we can't use it
if( ASingleton.Instance != null )
{
ASingleton.Instance.DoSomething();
}
}
This was a potential bug we believed code reviews would catch.
e.)
For the 3rd party resources I mentioned before, we bundled them into
MC++ classes which derive from C# classes which are the primary
currency of the application. Let's have two classes, ImageAPI and
Image. The OEM Imaging Library requires that all OEM-image handles are
released before releasing the ImageAPI. To enforce this, each Image
object contains a reference to the ImageAPI. It happens to need this
_anyway_ so it can perform error checking on operations. This includes
the operations it needs to do inside of its destructor / IDisposable
interface.
The idea was that as long as Images were in the system which hadn't
been destructed, the ImageAPI couldn't be destructed. As each Image
was destructed, its reference to the ImageAPI would be removed and
eventually no Image object would contain a reference to the ImageAPI.
If the ImageAPI Singleton reference was also gone, then the ImageAPI
would also be disposed.
Problems:
a.)
The IDisposable pattern I wrote up earlier has a huge hole in it. The
hole is that because order of destructor calls is indeterminate, it is
not possible for an object to safely reference any of its own member
variables which are references.
Just to be clear on this, I'll provide a short example:
using System.Threading;
class A
{
~A()
{
m_event.Set();
}
ManualResetEvent m_event = new ManualResetEvent(false);
}
If possible, I'd prefer to not delve into a discussion of whether this
code is necessary, or go to the trouble of creating a realistic
example where it is necessary and/or useful to access member variables
in an object's destructor call.
It is my understanding based on a reading of the standard and from
debugging existing problems, that there is no guarantee that this code
will execute correctly. In fact, the problem I frequently see is an
ObjectDisposedException coming from the member variable when I try to
access it.
My initial understanding was that once an object which could not be
referenced anymore it is available for Finalization (call of
destructor) and Collection (release of object memory). As a subtle
point, this is different from saying the object has no more references
to it. This is to support the cyclical dependency issue where two
objects both have back-references to each other; if those two objects
are the only path to each other, then both are available for
Finalization and Collection.
However, this is not what is indicated by the standard. The standard
explicitly says that objects are available for Finalization once it is
not possible to access them via any mechanism ___other than destructor
calls___. This means, in the cyclical back-reference example I
mentioned a second ago, one object can still reference the other in
its destructor and there is no way to ensure or check that the other
object has not been destructed already.
CS Spec (10.9.2):
If the object, or any part of it, cannot be accessed by any possible
continuation of execution, ___other than than the running of
destructors___, the object is considered no longer in use, and it
becomes eligible for destruction.
Right after, this specification, an example is given of how an object
(B) can bring another object (A) back to life by taking its stored
reference to the second object and placing it in some global location
(Test) as part of its destructor logic. This prevents the second
object (A) from being Collected, but since there is no ordering
constraint on when the two objects (A&B) are destructed, there is no
way to ensure that the object which is now alive and has not been
collected is actually useful. All of its members may have already been
destructed. The language specification even says (p.87):
"In the above program, if the garbage collector chooses to run the
destructor of A before the destructor of B, then the output of this
program might be:"
"To avoid confusion and unexpected behavior, it is generally a good
idea for destructors to only perform cleanup on data stored in their
object's own fields, and not to perform any actions on references
objects or static fields."
This seems like a collosal restriction on the role of destructors
which has been glossed over by every C# book I've read (which is up to
about 4 by this point). Furthermore, I'm suprised its not a bigger
issue for large, complex projects with demanding requirements for
object interaction. I would have expected to see significantly more
griping about this problem than I have; the last thread to
significantly address the issue was back in December of 2002 and was
primarily talking about limitations of beta 1.
b.)
The same problem that caused the Dispose pattern to fall on its face
caused the Singleton pattern to fall on its face. If we can't depend
on member variables to be valid inside of a destructor call, then we
can't depend on object references inside of an Image class to prevent
destruction of an ImageAPI class.
Conclusions:
I honestly don't have very many. It seems two elegant designs which
seem like they would have worked perfectly in a large-scale software
environment and avoided some of the problems I've encountered on C++
development in the past are apparently impossible. I'm not necessarily
saying the design of the Garbage Collector is incorrect (its certainly
not a bug, it appears to meet all the specifications of the language
spec)... but I'm not convinced its correct either. Ability to access
member references would have been pretty high on my list of
requirements for a GC implementation.
One of the points I've seen some argue is to proliferate the
IDisposable interface on every single object and require objects to be
extremely diligent in calling of this function. I personally think
this obviates most of the advantage of having a GC implementation to
begin with; this is painfully similar to having to explicitly delete
every object as in C++. Of course, in modern C++ we have auto-ptrs
which manage this for us and delete's are a rare occurence.
The using keyword (another keyword I still believe was an afterthought
when they realized how painful non-memory resource management was
going to be) probably makes this easier for most people, but we have
severe exception-safety requirements which requires us to handle the
case where object Dispose functions throw... please don't make me go
into this, but the using keyword is off-limits for us until MS makes
some other enhancements to the language.
Another problem with propogating the IDisposable interface to every
object is that it becomes increasingly hard to decide when its safe to
call it. Explicitly calling Dispose, or a using statement works fine
when an object is allocated inside a function, used, and destroyed in
that same function. In a more complex environment where the object may
be stored in multiple places, or a multi-threaded environment where
multiple threads must get their chance to access it, the problem
becomes much more difficult.
If we have objects floating around the system being used in an
asychronous fashion by multiple threads, we want the objects to be
released sometime after all threads are done with it. We understand
that it doesn't have to be done ASAP, and that the GC will do it
whenever it pleases unless we call GC.Collect (which we do when
necessary).
If you remove the ability to do useful work in a destructor, and move
the cleanup logic from the destructor to the IDisposable interface,
then every object in the system needs an explicit refcount so we can
keep track of _when_ to call Dispose on it. Again, in C++ we'd have a
templatized auto-ptr to do the dirty work; in C#, we have to write
tons of agonizing code to increment and decrement reference counts,
and we have to do it in a way that is 100% exception-safe, or our
object will never be properly cleaned up.
The conclusion I came to within a couple months of working on C# is
that it was only safe to call Dispose() on an object when you are 100%
certain that you are the only code referencing it. This points
directly to the stupidity of using Dispose in this case to begin
with... if you have to be certain you're the only one referencing it,
then why call Dispose in the first place, release your reference on it
and let the GC collect it. That way, you avoid having thousands of
latent assumptions about when object lifetimes are over. Everytime a
new piece of logic is added in one place you don't have to check every
Dispose() call in the rest of the code to see if the new code is now
the final owner of the object.
The other conclusion I might be willing to draw is that
"ObjectDisposedException" may be predominate software bug in .NET
pretty soon just like Memory Access Violation is in C++. In C++, you
get this message when you access a memory location you don't own.
Usually, (my experience), this is because you're accessing a dangling
pointer which used to refer to a valid reference. C# / .NET purports
to get rid of this very common problem (and all the reference count
management often needed to get rid of it), but in reality it just
changes it from a crash into an exception. Better, perhaps, but still
far from correct.
Also:
I'd honestly like to know if I'm extremely off-base in my usage of the
language or the problems I'm running into. I think I may be a somewhat
special case because of the application domain (hint, its not web
servers or business software) and the nature of resource ownership
that is sometimes associated with that. But I think nearly all of the
problems I've been running into would be applicable to all problem
domains to varying extents.
I'd also love to know that there's some future MS patch which will
allow me to be confident that an object's member references haven't
been destructed before said object.... but I don't think thats gonna
happen.
Thanks,
-ken
object cleanup approaches to get rid of a number of bugs,
unpleasantries, and cleanup-ordering issues we currently have in our
4-month old C#/MC++ .NET project project.
I'd like to thank in advance anyone who takes the time to read and/or
respond to this message. At a couple points, it may seem like a rant
against C# / .NET, but we are pretty firmly stuck with this approach
and I am simply trying to find solutions which balance all the
difference goals and criterion we have for project quality.
First, a little background:
a.)
We're all previously C++ programmers with lots of experience with
ref-counted, synchronized auto-ptrs and things like that. We bought
into .NET because of its remoting advantages over COM, not because of
the notion that it makes resource management easier. We _knew_ it made
non-memory resource management harder, but decided it was worth it.
b.)
We happen to doing a lot of non-memory resource management in C#.
Some of this is implemented in MC++ classes which contain unmanaged
3rd party resources. These 3rd party handles have ordering constraints
requiring all resources of one type to be released before resources of
a different type can be released. For example, all image handles must
be released before the API handle can be released.
We're also doing things which IMHO make a lot of sense, but puts our
face right in the door of .NET so to speak; like, for example, having
an object contain a thread as a member variable which it stops as part
of its destructor. (More on why this blows up further down)
c.)
We're currently using the following IDisposable pattern for any object
which we think needs to have its lifetime managed explicitly (because
it contains large resources, for example).
class A : IDisposable
{
~A()
{
Dispose(false);
}
public void Dispose()
{
Dispose(true);
GC.SupressFinalize(this);
}
protected virtual void Dispose( bool bExplicitDispose )
{
if( bExplicitDispose )
{
// Cleanup managed resources...
// This means, call Dispose() on all your "owned" members which
// implement it. Ownership is kindof a touchy subject in C#, but
// IDisposable forces it to be considered.
}
// Cleanup unmanaged resources
// This means, delete any _truly_ unmanaged resources you have
// (like __nogc) pointers for a MC++ class. This also means
performing
// actions on managed objects other than calling Dispose though.
// For example, a Thread is a managed object, but this an
appropriate
// time to do a controlled stop on it.
}
}
Hopefully everyone will be familiar with this basic pattern as IMO it
is pretty common and popularly recommended by the so-called experts.
I've tried to document our interpretation of it pretty well, however,
because most of the examples I saw looked more like this:
protected virtual void Dispose( bool bDisposing )
{
if( bDisposing )
{
// Cleanup managed resources
}
// Unmanaged resources
}
Not only was the guidance poor on what to actually do in these blocks,
the variable name bDisposing which was used in 90% of example code is
fairly confusing given that you are inside of a function called
Dispose().
The initial guidance from literature we had read was that only certain
objects needed to have an IDisposable, and that IDisposable was really
against the .NET way of doing things. The above pattern supported this
by providing a client-side optimization choice to Dispose of an object
or not, but the code would basically work either way. The destructor
is implemented using Dispose so that clients not concerned with the
optimization get the same effect at some unspecified time later.
Since then, I've read articles discussing some of the problems with
properly implementing destructors make the argument that every object
should have an IDisposable interface on it, and that clients should
_always_ call this when they are done using an object. This changes
the intent of IDisposable from an optimization for special
circumstances to the everyday-preferred technique; in this case, the
role of the destructor is to guard against a careless programmer
forgetting to call Dispose.
The difference may be subtle, but it changes the roles of the two
functions fairly significantly IMO.
d.)
We designed a singleton pattern around our understanding of the way
the GC worked using the following concept.
class ASingleton
{
public ASingleton Instance
{
get
{
// Assorted locking, and allocation of the singleton variable.
// ...
return m_singleton;
}
}
public void Release()
{
m_singleton = null;
}
~ASingleton()
{
// Stuff
// ...
}
private ASingleton m_singleton = null;
}
The implementation of the Release method was the key to this concept.
Rather than disposing the singleton, we simply set the singleton
reference to the singleton to null. From that point on, any client
requesting the singleton would get a null reference and would be
expected to handle that situation accordingly. We thought this would
resolve an issue which is fairly hard to address in C++ (without the
support of Ref-Counted AutoPtrs at least)... Singleton lifetimes near
the end of the program lifetime.
For example... Consider the following code:
void Cleanup()
{
ASingleton aSingleton = ASingleton.Instance;
// If the singleton is still alive, then use it... otherwise
// the program must be going down and we can't use it
if( aSingleton != null )
{
aSingleton.DoSomething();
}
}
The singleton pattern we developed guaranteed that if the if-test
succeeded, the call to DoSomething() would have a valid object to work
on. There were no timing issues introduce the possibility that the
singleton could be destructed between the if-check and the function
call because the local reference inside the function prevented that
problem.
Of course, its still possible for clients to abuse this power:
void Cleanup
{
// If the singleton is still alive, then use it... otherwise
// the program must be going down and we can't use it
if( ASingleton.Instance != null )
{
ASingleton.Instance.DoSomething();
}
}
This was a potential bug we believed code reviews would catch.
e.)
For the 3rd party resources I mentioned before, we bundled them into
MC++ classes which derive from C# classes which are the primary
currency of the application. Let's have two classes, ImageAPI and
Image. The OEM Imaging Library requires that all OEM-image handles are
released before releasing the ImageAPI. To enforce this, each Image
object contains a reference to the ImageAPI. It happens to need this
_anyway_ so it can perform error checking on operations. This includes
the operations it needs to do inside of its destructor / IDisposable
interface.
The idea was that as long as Images were in the system which hadn't
been destructed, the ImageAPI couldn't be destructed. As each Image
was destructed, its reference to the ImageAPI would be removed and
eventually no Image object would contain a reference to the ImageAPI.
If the ImageAPI Singleton reference was also gone, then the ImageAPI
would also be disposed.
Problems:
a.)
The IDisposable pattern I wrote up earlier has a huge hole in it. The
hole is that because order of destructor calls is indeterminate, it is
not possible for an object to safely reference any of its own member
variables which are references.
Just to be clear on this, I'll provide a short example:
using System.Threading;
class A
{
~A()
{
m_event.Set();
}
ManualResetEvent m_event = new ManualResetEvent(false);
}
If possible, I'd prefer to not delve into a discussion of whether this
code is necessary, or go to the trouble of creating a realistic
example where it is necessary and/or useful to access member variables
in an object's destructor call.
It is my understanding based on a reading of the standard and from
debugging existing problems, that there is no guarantee that this code
will execute correctly. In fact, the problem I frequently see is an
ObjectDisposedException coming from the member variable when I try to
access it.
My initial understanding was that once an object which could not be
referenced anymore it is available for Finalization (call of
destructor) and Collection (release of object memory). As a subtle
point, this is different from saying the object has no more references
to it. This is to support the cyclical dependency issue where two
objects both have back-references to each other; if those two objects
are the only path to each other, then both are available for
Finalization and Collection.
However, this is not what is indicated by the standard. The standard
explicitly says that objects are available for Finalization once it is
not possible to access them via any mechanism ___other than destructor
calls___. This means, in the cyclical back-reference example I
mentioned a second ago, one object can still reference the other in
its destructor and there is no way to ensure or check that the other
object has not been destructed already.
CS Spec (10.9.2):
If the object, or any part of it, cannot be accessed by any possible
continuation of execution, ___other than than the running of
destructors___, the object is considered no longer in use, and it
becomes eligible for destruction.
Right after, this specification, an example is given of how an object
(B) can bring another object (A) back to life by taking its stored
reference to the second object and placing it in some global location
(Test) as part of its destructor logic. This prevents the second
object (A) from being Collected, but since there is no ordering
constraint on when the two objects (A&B) are destructed, there is no
way to ensure that the object which is now alive and has not been
collected is actually useful. All of its members may have already been
destructed. The language specification even says (p.87):
"In the above program, if the garbage collector chooses to run the
destructor of A before the destructor of B, then the output of this
program might be:"
"To avoid confusion and unexpected behavior, it is generally a good
idea for destructors to only perform cleanup on data stored in their
object's own fields, and not to perform any actions on references
objects or static fields."
This seems like a collosal restriction on the role of destructors
which has been glossed over by every C# book I've read (which is up to
about 4 by this point). Furthermore, I'm suprised its not a bigger
issue for large, complex projects with demanding requirements for
object interaction. I would have expected to see significantly more
griping about this problem than I have; the last thread to
significantly address the issue was back in December of 2002 and was
primarily talking about limitations of beta 1.
b.)
The same problem that caused the Dispose pattern to fall on its face
caused the Singleton pattern to fall on its face. If we can't depend
on member variables to be valid inside of a destructor call, then we
can't depend on object references inside of an Image class to prevent
destruction of an ImageAPI class.
Conclusions:
I honestly don't have very many. It seems two elegant designs which
seem like they would have worked perfectly in a large-scale software
environment and avoided some of the problems I've encountered on C++
development in the past are apparently impossible. I'm not necessarily
saying the design of the Garbage Collector is incorrect (its certainly
not a bug, it appears to meet all the specifications of the language
spec)... but I'm not convinced its correct either. Ability to access
member references would have been pretty high on my list of
requirements for a GC implementation.
One of the points I've seen some argue is to proliferate the
IDisposable interface on every single object and require objects to be
extremely diligent in calling of this function. I personally think
this obviates most of the advantage of having a GC implementation to
begin with; this is painfully similar to having to explicitly delete
every object as in C++. Of course, in modern C++ we have auto-ptrs
which manage this for us and delete's are a rare occurence.
The using keyword (another keyword I still believe was an afterthought
when they realized how painful non-memory resource management was
going to be) probably makes this easier for most people, but we have
severe exception-safety requirements which requires us to handle the
case where object Dispose functions throw... please don't make me go
into this, but the using keyword is off-limits for us until MS makes
some other enhancements to the language.
Another problem with propogating the IDisposable interface to every
object is that it becomes increasingly hard to decide when its safe to
call it. Explicitly calling Dispose, or a using statement works fine
when an object is allocated inside a function, used, and destroyed in
that same function. In a more complex environment where the object may
be stored in multiple places, or a multi-threaded environment where
multiple threads must get their chance to access it, the problem
becomes much more difficult.
If we have objects floating around the system being used in an
asychronous fashion by multiple threads, we want the objects to be
released sometime after all threads are done with it. We understand
that it doesn't have to be done ASAP, and that the GC will do it
whenever it pleases unless we call GC.Collect (which we do when
necessary).
If you remove the ability to do useful work in a destructor, and move
the cleanup logic from the destructor to the IDisposable interface,
then every object in the system needs an explicit refcount so we can
keep track of _when_ to call Dispose on it. Again, in C++ we'd have a
templatized auto-ptr to do the dirty work; in C#, we have to write
tons of agonizing code to increment and decrement reference counts,
and we have to do it in a way that is 100% exception-safe, or our
object will never be properly cleaned up.
The conclusion I came to within a couple months of working on C# is
that it was only safe to call Dispose() on an object when you are 100%
certain that you are the only code referencing it. This points
directly to the stupidity of using Dispose in this case to begin
with... if you have to be certain you're the only one referencing it,
then why call Dispose in the first place, release your reference on it
and let the GC collect it. That way, you avoid having thousands of
latent assumptions about when object lifetimes are over. Everytime a
new piece of logic is added in one place you don't have to check every
Dispose() call in the rest of the code to see if the new code is now
the final owner of the object.
The other conclusion I might be willing to draw is that
"ObjectDisposedException" may be predominate software bug in .NET
pretty soon just like Memory Access Violation is in C++. In C++, you
get this message when you access a memory location you don't own.
Usually, (my experience), this is because you're accessing a dangling
pointer which used to refer to a valid reference. C# / .NET purports
to get rid of this very common problem (and all the reference count
management often needed to get rid of it), but in reality it just
changes it from a crash into an exception. Better, perhaps, but still
far from correct.
Also:
I'd honestly like to know if I'm extremely off-base in my usage of the
language or the problems I'm running into. I think I may be a somewhat
special case because of the application domain (hint, its not web
servers or business software) and the nature of resource ownership
that is sometimes associated with that. But I think nearly all of the
problems I've been running into would be applicable to all problem
domains to varying extents.
I'd also love to know that there's some future MS patch which will
allow me to be confident that an object's member references haven't
been destructed before said object.... but I don't think thats gonna
happen.
Thanks,
-ken