GC with lots of small ones

  • Thread starter Thread starter AlexS
  • Start date Start date
A

AlexS

Hi,

I wonder if anybody can comment if what I see is normal in FW 1.1 and how to
avoid this.

I have .Net assembly, which creates literally thousands of temporary strings
and other objects when running. Usually it is something like
{
string s=some value;
some local processing here
...
}
so, expectation is that GC will collect it some time after as unused
reference. However, it looks like in lots of cases when strings are returned
to calling method, GC has problems with finding such references and cleaning
them up. Especially, when objects are created on one thread and processed in
another.

Same with arrays, hashtables etc.

I've seen that some such temporary objects survive through tens of GC
cycles. However, when number of such temporary objects is low - less than
20000 or so, GC seems to be able to do the job.

Because of this, assembly starts choking on memory in 2-4 hours during heavy
duty use. VM grows 10 and more times very easily. My intuition is that GC
times out before finding majority of freed references - maybe because it
relocates lot of data during first phase?

Are there any "real" recommendations, which techniques should be used to
make app more GC-friendly? E.g. like, don't create more than 1000 objects
per minute, or always set temp strings after use to null or something like
this?

Thanks
Alex
 
AlexS said:
Hi,

I wonder if anybody can comment if what I see is normal in FW 1.1 and how
to
avoid this.

I have .Net assembly, which creates literally thousands of temporary
strings
and other objects when running. Usually it is something like
{
string s=some value;
some local processing here
...
}
so, expectation is that GC will collect it some time after as unused
reference. However, it looks like in lots of cases when strings are
returned
to calling method, GC has problems with finding such references and
cleaning
them up. Especially, when objects are created on one thread and processed
in
another.

Same with arrays, hashtables etc.

I've seen that some such temporary objects survive through tens of GC
cycles. However, when number of such temporary objects is low - less than
20000 or so, GC seems to be able to do the job.

It sounds like alot of objects are being promoted. Are you just working with
strings, collections, etc or are you using other, more complicated objects?
Because of this, assembly starts choking on memory in 2-4 hours during
heavy
duty use. VM grows 10 and more times very easily. My intuition is that GC
times out before finding majority of freed references - maybe because it
relocates lot of data during first phase?

The GC usually only does a generation 0 sweep, doing gen 1 and 2 sweeps less
often. If your objects are living just long enough to make it to gen 1, then
you may end up with longer cleanup times. Are you using any objects with
finalizers? An object with a finalizer causes its entire object graph to be
promoted, making it effectivly an automatic gen 1. If you are using objects
with finalizers, make sure you are disposing them(or if you wrote them,
provide a IDisposable implementation that calls GC.SupressFinalize).
Are there any "real" recommendations, which techniques should be used to
make app more GC-friendly? E.g. like, don't create more than 1000 objects
per minute, or always set temp strings after use to null or something like
this?
It'd be hard to be sure about the object creatino rate, and i doubt thats
the case. I've seen benchmarks of millions of allocations and deallocations
in a minutes time, I don't think object restrictions are going to help. Nor
will setting temp variables to null unless you are in a particular
circumstance.

In situations like(this ignores string interning):

{
string s = "a very large string indeed";
DoSomething(s);
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string" //the first instance of s can be collected
here
DoSomethingElse(s);
}

whereas
{
string s = "a very large string indeed";
DoSomething(s);
s = null; //the first instance of s can be collected here
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string"
DoSomethingElse(s);

}

but I expect that situation to be rather rare and unless the strings are
truely huge(several megs at the least) I wouldn't bother with it.
 
Daniel,

see in text

Thanks
Alex

Daniel O'Connell said:
It sounds like alot of objects are being promoted. Are you just working with
strings, collections, etc or are you using other, more complicated
objects?

Mostly it's strings and various collections - hashtables, arraylist, simple
arrays. I wouldn't say they are "more complicated". Is hashtable of classes,
which contain collections of strings more complicated? I am not sure here.
But according to CLR profiler problem seems to be with promotions.

The GC usually only does a generation 0 sweep, doing gen 1 and 2 sweeps less
often. If your objects are living just long enough to make it to gen 1, then
you may end up with longer cleanup times. Are you using any objects with
finalizers? An object with a finalizer causes its entire object graph to be
promoted, making it effectivly an automatic gen 1. If you are using objects
with finalizers, make sure you are disposing them(or if you wrote them,
provide a IDisposable implementation that calls GC.SupressFinalize).

No finalizers. It was easy to find and fight leaks like SolidBrushes, where
I can use Dispose, Not for strings.
It'd be hard to be sure about the object creatino rate, and i doubt thats
the case. I've seen benchmarks of millions of allocations and deallocations
in a minutes time, I don't think object restrictions are going to help. Nor
will setting temp variables to null unless you are in a particular
circumstance.

In situations like(this ignores string interning):

{
string s = "a very large string indeed";
DoSomething(s);
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string" //the first instance of s can be collected
here
DoSomethingElse(s);
}

whereas
{
string s = "a very large string indeed";
DoSomething(s);
s = null; //the first instance of s can be collected here
DoSomethingTimeConsumingButUnrelatedToS();
s = "another huge string"
DoSomethingElse(s);

}

but I expect that situation to be rather rare and unless the strings are
truely huge(several megs at the least) I wouldn't bother with it.

Some of strings were collected more efficiently when I used second variant.
If s=null; is absent, strings are shown as floating around in heap -
relocated and live. It happens not always, but happens a lot. Especially in
loops, and, it seems, in recursive calls. But strange, that you think it is
only for big strings. My heap is full of small ones - 0.1-10K.
I've seen also lots of chunks from String.Split.

I wonder if there is real difference for GC between

return <string expression>

and

string str=<string expression>;
return str;

?
 
objects?

Mostly it's strings and various collections - hashtables, arraylist,
simple
arrays. I wouldn't say they are "more complicated". Is hashtable of
classes,
which contain collections of strings more complicated? I am not sure here.
But according to CLR profiler problem seems to be with promotions.

Hrm, nothing that needs finalization or disposal, so I wouldn't consider
anything complicated here. By the sounds of it your objects are living long
enough to survive to Generation 2, which could be a problem as the program
runs for a while.
How long does the processing take on these strings? And are there alot of
duplicated strings?
Some of strings were collected more efficiently when I used second
variant.
If s=null; is absent, strings are shown as floating around in heap -
relocated and live. It happens not always, but happens a lot. Especially
in
loops, and, it seems, in recursive calls. But strange, that you think it
is
only for big strings. My heap is full of small ones - 0.1-10K.
I've seen also lots of chunks from String.Split.
Without knowing the specifics, I do have some thoughts. Is your design such
that you create a string, do some work that allocates a good many
objects(enough to trigger a gen1 collectino), then create another into the
same variable? If so its possible you are prolonging the life of your
strings into a higher generation, nulling would fix that. If this algorithm
is highly recursive, it might actually be a serious source of memory
problems.
I wonder if there is real difference for GC between

return <string expression>

and

string str=<string expression>;
return str;

There shouldn't be. The JIT would probably generate similar or identical
code.
 
Daniel, thanks

Looks like you confirm some of my suspicions.
How long does the processing take on these strings? And are there alot of
duplicated strings?

In terms of absolute time - less than 1 second. In terms how many objects
could be created during this period - hundreds if not thousands. Also,
strings could be created in one thread and passed to another before becoming
obsolete.

Without knowing the specifics, I do have some thoughts. Is your design such
that you create a string, do some work that allocates a good many
objects(enough to trigger a gen1 collectino), then create another into the
same variable? If so its possible you are prolonging the life of your
strings into a higher generation, nulling would fix that. If this algorithm
is highly recursive, it might actually be a serious source of memory
problems.

Lots of code with such behavior. Because I have thousands of objects and
calls, clr profiler literally chokes. Standard profile log is 50-100MB,
which kills it usually. Exceptions, hanging, not enough memory - I've seen
it all :-(
There shouldn't be. The JIT would probably generate similar or identical
code.

Some small consolation :-)

Thanks, Daniel.

I am trying now to think out some way to clean up this mess.
 
Lots of code with such behavior. Because I have thousands of objects and
calls, clr profiler literally chokes. Standard profile log is 50-100MB,
which kills it usually. Exceptions, hanging, not enough memory - I've seen
it all :-(

Hrmm, this isn't good.
I am trying now to think out some way to clean up this mess.

From what I understand, I'm afraid the best course may be to redesign your
app. I think the problem is inherent to the design. You either need to
serialize processing so objects disappear quickly or change the object
allocation code so that the allocations occur just before calcuations and
disappear right after. Mutlithreading may be a big part of this.

Are alot of your strings identical?
 
Daniel O'Connell said:
Hrmm, this isn't good.

From what I understand, I'm afraid the best course may be to redesign your
app. I think the problem is inherent to the design. You either need to
serialize processing so objects disappear quickly or change the object
allocation code so that the allocations occur just before calcuations and
disappear right after. Mutlithreading may be a big part of this.

Are alot of your strings identical?

If you add same string to 2 different collections - are the collection items
identical? I think not.

Could you expand a bit on serializing processing to make objects disappear
quickly? I am not sure I see how it could be done when collections are
filled by recursion or strings are passed between threads.

Unfortunately redesign is out of question - thousands of lines, which were
developed by several people.

So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment
- if objects could exist for long time, they should be nulled explicitly
- if object implements IDispose it must be disposed before next assigment
explicitly
- when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly
- try to avoid as much as possible string.concat

Doesn't look very convincing, what do you think? Most of this I never seen
in simple applications, where objects are not highly volatile. And I see now
all of this when doing a real processing for real files - parsing, editing.
I mean - negative impact on heap.

Did I miss anything?

Thanks
Alex
 
If you add same string to 2 different collections - are the collection
items
identical? I think not.

Genreally not, but if you tend to have large numbers of strings that are
identical, you may save memory by interning(or you may not, I forget what
happens when you intern a string that will never be referenced again).
Could you expand a bit on serializing processing to make objects disappear
quickly? I am not sure I see how it could be done when collections are
filled by recursion or strings are passed between threads.
It really isn't, it was a suggestion for a potential redesign.
Unfortunately redesign is out of question - thousands of lines, which were
developed by several people.

Thats unfortunate...I hope you can figure out a way to reduce memory usage.
I would rarely recommend this, but perhaps you should insert a GC.Collect(2)
call on a timer that goes every half hour and see if it clears gen 2 for
you. It is a hack but if the problem is over promotion, it just might work.
So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment
- if objects could exist for long time, they should be nulled explicitly

No, if variables could exist for a long time, they should be nulled. Its not
possible to null an object, ;).
- if object implements IDispose it must be disposed before next assigment
explicitly Yes.
- when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly

Neither of these are true. Nulling a variable won't change anything.
- try to avoid as much as possible string.concat
Maybe, maybe not. String.Concat can be efficent if you are dealing with 2 or
3 strings, but if you are doing more than that definatly go with
StringBuilder.
 
Hi Alex
So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap

True. If your object survives a generation 0 collection, then dies, you've got "mid-life crisis", and that object (in your case a string) will be in memory much longer than you need
it.
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment

By changing the reference from the first string, you've abandoned it in memory and the GC will take care of it. So nulling won't help, but won't hurt either.
- if objects could exist for long time, they should be nulled explicitly

Only if they are member variables, and the container object is still alive.
- if object implements IDispose it must be disposed before next assigment
explicitly

That's generally good practice. Consider using the C# using pattern where appropriate.
- when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly

Nulling them won't help them get collected any faster unless they are members.
- try to avoid as much as possible string.concat

Um, maybe. Stringbuilder is your friend if you are creating many large strings.


If you haven't already, check out
Rico Mariani's blog (http://weblogs.asp.net/ricom/)
Brad Abram's blog (http://weblogs.asp.net/brada/archive/2004/05/24/140645.aspx)
Improving .NET Application Performance and Scalability Chapter 5 (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/scalenetchapt05.asp).


Hope that helps
-Chris

--------------------
Subject: Re: GC with lots of small ones
Date: Wed, 26 May 2004 13:16:02 -0400
Lines: 72
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1409
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409
Message-ID: <[email protected]>
Newsgroups: microsoft.public.dotnet.general
NNTP-Posting-Host: toronto-hse-ppp3855754.sympatico.ca 67.70.1.127
Path: cpmsftngxa10.phx.gbl!TK2MSFTNGXA01.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP10.phx.gbl
Xref: cpmsftngxa10.phx.gbl microsoft.public.dotnet.general:135301
X-Tomcat-NG: microsoft.public.dotnet.general




If you add same string to 2 different collections - are the collection items
identical? I think not.

Could you expand a bit on serializing processing to make objects disappear
quickly? I am not sure I see how it could be done when collections are
filled by recursion or strings are passed between threads.

Unfortunately redesign is out of question - thousands of lines, which were
developed by several people.

So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap
- if big string is replaced by another big string, like str=<big string>;
<process>; str=<another big string> better to use str=null or
str=String.Empty before next assignment
- if objects could exist for long time, they should be nulled explicitly
- if object implements IDispose it must be disposed before next assigment
explicitly
- when objects are passed between threads or asynch methods - null them
explicitly
- when objects are passed between recursive calls - null them explicitly
- try to avoid as much as possible string.concat

Doesn't look very convincing, what do you think? Most of this I never seen
in simple applications, where objects are not highly volatile. And I see now
all of this when doing a real processing for real files - parsing, editing.
I mean - negative impact on heap.

Did I miss anything?

Thanks
Alex


--

This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

Note: For the benefit of the community-at-large, all responses to this message are best directed to the newsgroup/thread from which they originated.
 
Thanks, Chris

I just want to confirm that you confirmed my findings. As most of strings
and other objects are members in long existing containers most of my points
seems to be in line with what you said. I wonder if next version of FW will
be better in this respect. Note, this is heavy duty processing - I have
hundreds of 000's of objects created and existing during app lifetime.

Now I have much cleaner picture and better behaving app. I am sad only
because now I have to find out how to make clr profiler to behave in more
correct fashion.
It eats too much memory - log files exceeding 100MB
It crashes on exceptions or says not enough disk space - while there is
plenty - to display some graph all too frequently
It doesn't allow to filter out namespaces, objects and calls

Anyway, I managed to make the app more less eating heap. That's already a
progress.

Rgds
Alex


"Chris Lyon [MSFT]" said:
Hi Alex
So, if to sum up
- if string creation and processing before releasing reference take some
time bigger than GC0 and GC1, they could be lost in heap

True. If your object survives a generation 0 collection, then dies,
you've got "mid-life crisis", and that object (in your case a string) will
be in memory much longer than you need
it.


By changing the reference from the first string, you've abandoned it in
memory and the GC will take care of it. So nulling won't help, but won't
hurt either.
Only if they are member variables, and the container object is still alive.

That's generally good practice. Consider using the C# using pattern where appropriate.

Nulling them won't help them get collected any faster unless they are members.

PS:
By the way, I found stringbuilder helps with small ones too. What is big - 8
chars or 80?
 
Back
Top