How the GC reclaim the memory of the objects with circled reference

Bill · Mar 7, 2004

Let's simplify using public data member. Say I have (the code should
look same in C# and Java):

Class A
{
public B m_b;
}

Class B
{
public A m_a;
}

A a = new A();
B b = new B();
a.m_b = b;
b.m_a = a;

When a and b are not referenced by any other objects, a and b still
reference to each other. How does .NET know that a and b should be
garbage collected?

I raised the same question when JDK 1.0.x just came out, but don't
recall that I saw a concreate answer (maybe I missed it somehow). I
am pretty sure that Sun is handling it correctly in their current JVM.
If anybody knows how it works in JVM, please make your comments here
too. Thank you!

Regards,
Bill

Richard A. Lowe · Mar 7, 2004

Very simple - the GC marks EVERYTHING as collectable, then walks though all
the references in every stack frame marking all referenceable objects (and
their child objects) as non-collectable. Objects that reference each other,
but have no 'live' reference in the current or parent stack frames will
remain marked collectable.

Richard

Bill · Mar 8, 2004

Thank you very much, Richard. That is exactly what I was looking for.

Bill

Chris Lyon [MSFT] · Mar 8, 2004

Hi Bill

The .NET garbage collector uses a mark-sweep generational algorithm, which means it starts at GC Roots (locals, statics, threads, etc) and traces through their references.
Anything it doesn't touch is considered garbage and is eventually collected. Straight reference counting algorithms break down ith circular references, as you've
demonstrated.

You can read more about how .NET's GC works in these articles:
http://msdn.microsoft.com/msdnmag/issues/1100/gci/
http://msdn.microsoft.com/msdnmag/issues/1200/GCI2/default.aspx

And this site is great for general GC information, including descriptions of various algorithms and analysis:
http://www.memorymanagement.org

Hope that helps.
-Chris

--------------------

From: (e-mail address removed) (Bill)
Newsgroups: microsoft.public.dotnet.languages.csharp,comp.lang.java.programmer
Subject: How the GC reclaim the memory of the objects with circled reference
Date: 7 Mar 2004 12:38:05 -0800
Organization: http://groups.google.com
Lines: 30
Message-ID: <[email protected]>
NNTP-Posting-Host: 67.69.243.132
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: posting.google.com 1078691885 32757 127.0.0.1 (7 Mar 2004 20:38:05 GMT)
X-Complaints-To: (e-mail address removed)
NNTP-Posting-Date: Sun, 7 Mar 2004 20:38:05 +0000 (UTC)
Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!newsfeed00.sul.t-online.de!t-online.de!border2.nntp.ash.giganews.com!border1.nntp.ash.giganews.com! nntp.giganews.com!news.glorb.com!postnews1.google.com!not-for-mail
Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.languages.csharp:227090
X-Tomcat-NG: microsoft.public.dotnet.languages.csharp

Let's simplify using public data member. Say I have (the code should
look same in C# and Java):

Class A
{
public B m_b;
}

Class B
{
public A m_a;
}

A a = new A();
B b = new B();
a.m_b = b;
b.m_a = a;

When a and b are not referenced by any other objects, a and b still
reference to each other. How does .NET know that a and b should be
garbage collected?

I raised the same question when JDK 1.0.x just came out, but don't
recall that I saw a concreate answer (maybe I missed it somehow). I
am pretty sure that Sun is handling it correctly in their current JVM.
If anybody knows how it works in JVM, please make your comments here
too. Thank you!

Regards,
Bill

--

This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

Note: For the benefit of the community-at-large, all responses to this message are best directed to the newsgroup/thread from which they originated.

xarax · Mar 8, 2004

Bill said:
Thank you very much, Richard. That is exactly what I was looking for.

Bill

"Richard A. Lowe" <[email protected]> wrote in message

/snip/

The explanation is actually a brute-force technique for GC.
There are other advanced techniques that do not involve
brute-force. In fact, the JVM specification (including JNI)
is such that the GC component can know *precisely* when an
object loses its last strong reference without an exhaustive
search. AFAIK, the Sun JVM does not use such an advanced
technique.

cody · Mar 8, 2004

The explanation is actually a brute-force technique for GC.

There are other advanced techniques that do not involve
brute-force. In fact, the JVM specification (including JNI)
is such that the GC component can know *precisely* when an
object loses its last strong reference without an exhaustive
search. AFAIK, the Sun JVM does not use such an advanced
technique.

The GC in java and .net are non-determinisitc, you know.
Programming an deterministic Mark&Sweep algorithm is imho impossible.
If you should someday invent such an algorithm you'll be certainly a
candidate
for the Nobel Award of Computer Science.

Bill · Mar 8, 2004

Thank you!

Could you please name a few of those advanced techniques and how they
work? Is there a link somewhere that I can take a look?

Bill

Bill · Mar 8, 2004

Thank you! I always love Richter's books and articles.

paul brown · Mar 9, 2004

maybe its handled in a similar way that people handle circular references
for object persistence frameworks?

i think they use a flag which says to the algorithm doing the
GCing/persisting..."you've been here during your first time round the
circle, so back off"

under the covers, each object has an associated 'marked for GC' flag which
when set tell the GC algorithm it already flagged this object for GC.

if your example continued like this:
a = null;
b = null;

If i understand your question, I think you're wondering how either object is
released, as they still hold a reference to each other??

So, prior to executing these two lines, maybe the GC sees two references
held to 'a' (i'm assuming no other refs held anywhere

After these line execute there is 1 reference each on 'a.b' and 'b.a'...

At the point where 'a=null; b=null;' is done, i think that all a's member
variables are checked for GC before 'a' is GC'd. Lets flag 'a' as marked for
GC. So next the member 'a.b' is looked at. Just one outstanding reference to
'a.b' still around so we need to see if it holds onto anything itself as its
a candidate for GC. if 'a.b' doesnt hold any references to any referenced
objects then it can be GC'd. for the time being flag it too as marked for
GC. So, next we check references held by 'a.b' and find its holding onto a
reference to 'a'...same again, we need to check if that data member is ready
for GC. Except this time round we find 'a' is already flagged for GC, so we
back off (since there's no other data member involved in this example). on
the way back out, we see that 'a.b' had only one data member that is flagged
for GC, and only one reference on itself, so we can GC it. then we come out
one more layer and do the same to the enclosing 'a'. So then, both 'a' and
'a.b' have been GC'd.

i expect this is full of holes!!!!!!!!!!!!!!!!!!

Richard A. Lowe · Mar 9, 2004

Bill, I'm no expert myself, but since the 'brute-force' method .NET uses
would be O(N) time - i.e. linear to the number of objects under
consideration, I'm willing to bet some transitive closure algorithms could
beat that time - GC would certainly be a problem domain appropriate (at
least outwardly) for transitive closures.

Richard

Chris Uppal · Mar 9, 2004

Richard said:
Very simple - the GC marks EVERYTHING as collectable, then walks though
all the references in every stack frame marking all referenceable objects
(and their child objects) as non-collectable. Objects that reference
each other, but have no 'live' reference in the current or parent stack
frames will remain marked collectable.

This is a good, quick, summary of the basic idea underlying many GC algorithms.
I'm not criticising Richard at all when I add:

But please do be aware that this is only a summary, GC algorithms have been
studied for many years (there's at least one whole book about them), and there
are several complex and subtle variants that are in use. Indeed there probably
aren't any mainline Java implementations with a GC that is more than
recognisably similar to the above.

Common ways for algorithms to vary:

Whether they move data as it is GCed
Whether they collect the whole heap at one time (very few do).
Whether they are incremental, or stop the whole program while they run.
Whether they make effective use of >1 processor.
Whether they make any use of reference counting.

-- chris

xarax · Mar 9, 2004

paul brown said:
maybe its handled in a similar way that people handle circular references
for object persistence frameworks?

i think they use a flag which says to the algorithm doing the
GCing/persisting..."you've been here during your first time round the
circle, so back off"

/snip/

Please remember that a "reachable" object must be
reachable starting with a local variable or with a
static field and following the reference chains of
the instance fields. (How else could a thread reach it?)

Note that JNI uses global and local "locking" of an
instance which serve analogous purposes to a static
field and an local variable, respectively. Do not
confuse JNI "locking" with synchronization (holding
an instance monitor). They are two entirely different
concepts. A global "lock" survives across a JNI method
call, while a local "lock" does not. When a JNI method
returns, all local "locks" that the method acquired are
automatically released. (This is similar to popping a
stackframe when a Java method returns and its local
variables are essentially nullified.) Releasing a "lock"
is analogous to assigning null (for the purposes of this
topic).

An object that is only reachable from an instance
field implies that the instance field that refers
to the object is owned by an instance that is also
unreachable. I.e., if you can only reach an instance
by starting at an instance field, then the instance
that owns that instance field is also only reachable by
an instance field (not a static field or a local variable).

So, the "brute-force" GC simply starts its search with
marking everything "collectable" (i.e., the "mark" phase),
then walks the chains of object references that are rooted
in the local variables of live threads (and the local
"locks" of active JNI methods) and all static fields of
loaded classes (and the global "locks" acquired by JNI
methods).

E.g., the GC search will not encounter a circular list of
instances that cannot be reached from any local variable of any
live thread or any static field, and thus all of those instances
will stay marked "collectable". The second phase, called "sweep",
will reclaim everything in the circular list.

The actual mark-and-sweep technique is slightly more
complicated due to SoftReference, WeakReference,
and PhantomReference processing, and some GC algorithms
copy/move live instances to defragment the heap.

--
----------------------------
Jeffrey D. Smith
Farsight Systems Corporation
24 BURLINGTON DRIVE
LONGMONT, CO 80501-6906
http://www.farsight-systems.com
z/Debug debugs your Systems/C programs running on IBM z/OS!
Are ISV upgrade fees too high? Check our custom product development!

How the GC reclaim the memory of the objects with circled reference

Bill

Richard A. Lowe

Bill

Chris Lyon [MSFT]

xarax

cody

Bill

Bill

paul brown

Richard A. Lowe

Chris Uppal

xarax