GC does not release memory...memory keeps growing!!!!

  • Thread starter Thread starter Mahesh Prasad
  • Start date Start date
M

Mahesh Prasad

Hi,

I'm having a very very frustrating experience with the .NET. I've a simple
crawler console application.
The main objective of the crawler is to read a list of URLs and make HTTP
calls to a web server and save
the html files locally.

I had setup perfmon to monitor the memory usage of the application. I found
that the Gen 2 heap size keeps increasing
and ultimately the system runs out of memory. Whereas Gen 0 and Gen 1 heap
size is stable (it increases and decreases as GC runs).

I understand that the objects that have lived long enough are ultimately
promoted to Gen 2. But none of my objects have
that much state information to cause the Gen 2 heap to grow incessantly!!
I'm using many temporary objects like
HttpWebRequests, StringBuilder and Streams. But these objects live only as
long as the HTTP request lasts. I'm not
saving these objects as my class members.

I would appreciate if someone can throw some light on this strange
behaviour. I'm so frustrated that I'm planning to
re-write the code in C++.....atleast I'll have control over when the memory
is to be released.

Thanks in advance.

Mahesh
 
Hi,

I'm having a very very frustrating experience with the .NET. I've a simple
crawler console application.
The main objective of the crawler is to read a list of URLs and make HTTP
calls to a web server and save
the html files locally.

I had setup perfmon to monitor the memory usage of the application. I found
that the Gen 2 heap size keeps increasing
and ultimately the system runs out of memory. Whereas Gen 0 and Gen 1 heap
size is stable (it increases and decreases as GC runs).

I understand that the objects that have lived long enough are ultimately
promoted to Gen 2. But none of my objects have
that much state information to cause the Gen 2 heap to grow incessantly!!
I'm using many temporary objects like
HttpWebRequests, StringBuilder and Streams. But these objects live only as
long as the HTTP request lasts. I'm not
saving these objects as my class members.

I would appreciate if someone can throw some light on this strange
behaviour. I'm so frustrated that I'm planning to
re-write the code in C++.....atleast I'll have control over when the memory
is to be released.

Are you properly calling Dispose on those objects that implement
IDisposable? And when you say "the system runs out of memory" what do
you mean? Does the computer crash?
 
Yes, I'm calling Close() on all the objects that implement the IDisposable
interface. MSDN docs say that Close() and Dispose() do the same thing.

I didn't let the process run till the system crashed. I'm running the
application on the system with 1GB of RAM.
And after running the application for 6 hours, the available memory was less
than 1 MB. The application had become
very sluggish and so I killed the app.

Thanks,

-Mahesh
 
Yes, I'm calling Close() on all the objects that implement the IDisposable
interface. MSDN docs say that Close() and Dispose() do the same thing.

That depends. Some objects implement IDisposable but don't have a Close
method (such as many of the GDI objects and others like HttpApplication)
I didn't let the process run till the system crashed. I'm running the
application on the system with 1GB of RAM.
And after running the application for 6 hours, the available memory was less
than 1 MB. The application had become
very sluggish and so I killed the app.

You could try adding a GC.Collect() at some point in your code where you
know you need a garbage collection, but that's usually a bad thing to
do. Prematurely forcing a collection can actually make the GC perform
worse.
 
Have you tried wrapping the "using" statement around your
IDisposable objects? It will automatically call the
Dispose event when the object goes out of scope. This
will help you take care of potential memory problems.
 
I would appreciate if someone can throw some light on this strange
behaviour.

What you're doing right now is trying to guess where you're holding
references. In my experience, trying to shotgun the problem is a tempting,
but generally futile and frustrating exercise. The tools are your friends.
Use a profiler and you'll immediately see where your references are.
 
Hi HuangTM,

I'll try the profiler...but in the mean time I'll provide more info about
the application.
I can't share the code, but here's the basic logic is as follows...

1. I've a set of 30,000 urls that I have to crawl. These urls are stored in
a database.

2. I use a single DataSet object to read 100 urls at a time.

3. For each url I retreive from DataSet , I make an HTTP call to the web
server using HttpWebRequest object.

4. When the web server returns the HTTP Response, I save the Response stream
in a StringBuilder object.

5. I then parse the HTML text stored in the StringBuilder object looking for
<img> tags, using the RegEx class.
If I find some image references, I then use another HttpWebRequest
object to request for images and save
the images locally, using the FileStream object.

6. I then save the HTML text in the StringBuilder object as html file
locally using a FileStream object.

7. After all the 100 urls have been crawled, I clear the DataSet object
using DataSet.Clear(). Then retreive next 100 URLs
and the process continues. I use DataSet, because I have to set the
success or failure status for each URL record. I do that
by calling DataAdapter.Update() after all the 100 URLs have been
crawled.

The only objects that can take up memory is StringBuilder, HttpResponse,
FileStream. But these objects are created in the
function scope (i.e they are not members of my class), so they go out of
scope after Step 6 above. Also I call Close() on all
the objects that support this method.

During the process I noticed that the Gen 2 heap size just keeps growing.
What I don't understand is none of my class member
variables hold up that much memory (i.e they retain very little state
information), so why does Gen 2 heap size increases. The only objects that
can take up memory are the ones that are created at function scope and so
should be freed the next time GC kicks in.
And so they should't be part of the Gen 2 heap.

I tried calling GC.Collect(), but with no success. Please let me know if you
want more details.

Thanks
Mahesh
 
After running the profiler I found that most of the memory is taken up by
String and Byte[] objects.

I use the string object to read the html returned by the HttpWebResponse
object.

function GetPage(string URL)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
string sHTML = sr.ReadToEnd();

// close the response stream
stream.Close();
resp.Close();
// close the reader stream
sr.Close();

// save the html file locally
StreamWriter sw = File.CreateText(localFilePath);
sw.Write(sHTML);
// close the reader and write streams
sw.Close();

}

And the Byte array is used to store the binary data from the response stream
and save it locally.

void SaveImage(string URL)
{
// Get Image from web server
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
if(resp.StatusCode == HttpStatusCode.OK)
{
// save the image locally
SaveImagesLocally(ref resp);
}
resp.Close();

}

protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory

}
 
What conerns me about your code or at least the portion you have posted is
that it is not guarded. An exception at any point traps the buffer and all
your objects in memory. Same for your file stream object. At least use a
finally block to guard against resource leaks. Consider for example:

using(FileStream fs = new FileStream(localFilePath,FileMode.Create))
{

}

Constructs such as these provide a basic degree of protection against
leakage. Really, your code needs to be a lot more robust than this, I point
this out so that you will at least be pointed in the right direction.

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
Mahesh Prasad said:
After running the profiler I found that most of the memory is taken up by
String and Byte[] objects.

I use the string object to read the html returned by the HttpWebResponse
object.

function GetPage(string URL)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
string sHTML = sr.ReadToEnd();

// close the response stream
stream.Close();
resp.Close();
// close the reader stream
sr.Close();

// save the html file locally
StreamWriter sw = File.CreateText(localFilePath);
sw.Write(sHTML);
// close the reader and write streams
sw.Close();

}

And the Byte array is used to store the binary data from the response stream
and save it locally.

void SaveImage(string URL)
{
// Get Image from web server
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
if(resp.StatusCode == HttpStatusCode.OK)
{
// save the image locally
SaveImagesLocally(ref resp);
}
resp.Close();

}

protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory

}



Mahesh Prasad said:
Hi HuangTM,

I'll try the profiler...but in the mean time I'll provide more info about
the application.
I can't share the code, but here's the basic logic is as follows...

1. I've a set of 30,000 urls that I have to crawl. These urls are stored in
a database.

2. I use a single DataSet object to read 100 urls at a time.

3. For each url I retreive from DataSet , I make an HTTP call to the web
server using HttpWebRequest object.

4. When the web server returns the HTTP Response, I save the Response stream
in a StringBuilder object.

5. I then parse the HTML text stored in the StringBuilder object looking for
<img> tags, using the RegEx class.
If I find some image references, I then use another HttpWebRequest
object to request for images and save
the images locally, using the FileStream object.

6. I then save the HTML text in the StringBuilder object as html file
locally using a FileStream object.

7. After all the 100 urls have been crawled, I clear the DataSet object
using DataSet.Clear(). Then retreive next 100 URLs
and the process continues. I use DataSet, because I have to set the
success or failure status for each URL record. I do that
by calling DataAdapter.Update() after all the 100 URLs have been
crawled.

The only objects that can take up memory is StringBuilder, HttpResponse,
FileStream. But these objects are created in the
function scope (i.e they are not members of my class), so they go out of
scope after Step 6 above. Also I call Close() on all
the objects that support this method.

During the process I noticed that the Gen 2 heap size just keeps growing.
What I don't understand is none of my class member
variables hold up that much memory (i.e they retain very little state
information), so why does Gen 2 heap size increases. The only objects that
can take up memory are the ones that are created at function scope and so
should be freed the next time GC kicks in.
And so they should't be part of the Gen 2 heap.

I tried calling GC.Collect(), but with no success. Please let me know if you
want more details.

Thanks
Mahesh
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=36a3e
 
Hi Alvin,

I only posted the main portion of the code and stripped out the rest. All
the code is within a try-catch block and
I'm making sure that the stream and other objects are closed properly even
when exceptions are thrown.

Also in the GetPage() method, before I save the HTML text to a file, I pass
the sHTML string object to a methof
called "ParseHtmlForImages(ref sHtml)". I forgot to add that in my previous
post.

Thanks,
-Mahesh

Alvin Bruney said:
What conerns me about your code or at least the portion you have posted is
that it is not guarded. An exception at any point traps the buffer and all
your objects in memory. Same for your file stream object. At least use a
finally block to guard against resource leaks. Consider for example:

using(FileStream fs = new FileStream(localFilePath,FileMode.Create))
{

}

Constructs such as these provide a basic degree of protection against
leakage. Really, your code needs to be a lot more robust than this, I point
this out so that you will at least be pointed in the right direction.

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
Mahesh Prasad said:
After running the profiler I found that most of the memory is taken up by
String and Byte[] objects.

I use the string object to read the html returned by the HttpWebResponse
object.

function GetPage(string URL)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
string sHTML = sr.ReadToEnd();

// close the response stream
stream.Close();
resp.Close();
// close the reader stream
sr.Close();

// save the html file locally
StreamWriter sw = File.CreateText(localFilePath);
sw.Write(sHTML);
// close the reader and write streams
sw.Close();

}

And the Byte array is used to store the binary data from the response stream
and save it locally.

void SaveImage(string URL)
{
// Get Image from web server
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
if(resp.StatusCode == HttpStatusCode.OK)
{
// save the image locally
SaveImagesLocally(ref resp);
}
resp.Close();

}

protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory

}



Mahesh Prasad said:
Hi HuangTM,

I'll try the profiler...but in the mean time I'll provide more info about
the application.
I can't share the code, but here's the basic logic is as follows...

1. I've a set of 30,000 urls that I have to crawl. These urls are
stored
in
a database.

2. I use a single DataSet object to read 100 urls at a time.

3. For each url I retreive from DataSet , I make an HTTP call to the web
server using HttpWebRequest object.

4. When the web server returns the HTTP Response, I save the Response stream
in a StringBuilder object.

5. I then parse the HTML text stored in the StringBuilder object
looking
for
<img> tags, using the RegEx class.
If I find some image references, I then use another HttpWebRequest
object to request for images and save
the images locally, using the FileStream object.

6. I then save the HTML text in the StringBuilder object as html file
locally using a FileStream object.

7. After all the 100 urls have been crawled, I clear the DataSet object
using DataSet.Clear(). Then retreive next 100 URLs
and the process continues. I use DataSet, because I have to set the
success or failure status for each URL record. I do that
by calling DataAdapter.Update() after all the 100 URLs have been
crawled.

The only objects that can take up memory is StringBuilder, HttpResponse,
FileStream. But these objects are created in the
function scope (i.e they are not members of my class), so they go out of
scope after Step 6 above. Also I call Close() on all
the objects that support this method.

During the process I noticed that the Gen 2 heap size just keeps growing.
What I don't understand is none of my class member
variables hold up that much memory (i.e they retain very little state
information), so why does Gen 2 heap size increases. The only objects that
can take up memory are the ones that are created at function scope and so
should be freed the next time GC kicks in.
And so they should't be part of the Gen 2 heap.

I tried calling GC.Collect(), but with no success. Please let me know
if
you
want more details.

Thanks
Mahesh


Hello Mahesh,

In addition, you may use Allocation Profiler to check the memory
usage
in
your application:
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=36a3e
 
my apologies then

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
Mahesh Prasad said:
Hi Alvin,

I only posted the main portion of the code and stripped out the rest. All
the code is within a try-catch block and
I'm making sure that the stream and other objects are closed properly even
when exceptions are thrown.

Also in the GetPage() method, before I save the HTML text to a file, I pass
the sHTML string object to a methof
called "ParseHtmlForImages(ref sHtml)". I forgot to add that in my previous
post.

Thanks,
-Mahesh

Alvin Bruney said:
What conerns me about your code or at least the portion you have posted is
that it is not guarded. An exception at any point traps the buffer and all
your objects in memory. Same for your file stream object. At least use a
finally block to guard against resource leaks. Consider for example:

using(FileStream fs = new FileStream(localFilePath,FileMode.Create))
{

}

Constructs such as these provide a basic degree of protection against
leakage. Really, your code needs to be a lot more robust than this, I point
this out so that you will at least be pointed in the right direction.

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
Mahesh Prasad said:
After running the profiler I found that most of the memory is taken up by
String and Byte[] objects.

I use the string object to read the html returned by the HttpWebResponse
object.

function GetPage(string URL)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
string sHTML = sr.ReadToEnd();

// close the response stream
stream.Close();
resp.Close();
// close the reader stream
sr.Close();

// save the html file locally
StreamWriter sw = File.CreateText(localFilePath);
sw.Write(sHTML);
// close the reader and write streams
sw.Close();

}

And the Byte array is used to store the binary data from the response stream
and save it locally.

void SaveImage(string URL)
{
// Get Image from web server
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
if(resp.StatusCode == HttpStatusCode.OK)
{
// save the image locally
SaveImagesLocally(ref resp);
}
resp.Close();

}

protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory

}



Hi HuangTM,

I'll try the profiler...but in the mean time I'll provide more info about
the application.
I can't share the code, but here's the basic logic is as follows...

1. I've a set of 30,000 urls that I have to crawl. These urls are stored
in
a database.

2. I use a single DataSet object to read 100 urls at a time.

3. For each url I retreive from DataSet , I make an HTTP call to the web
server using HttpWebRequest object.

4. When the web server returns the HTTP Response, I save the Response
stream
in a StringBuilder object.

5. I then parse the HTML text stored in the StringBuilder object looking
for
<img> tags, using the RegEx class.
If I find some image references, I then use another HttpWebRequest
object to request for images and save
the images locally, using the FileStream object.

6. I then save the HTML text in the StringBuilder object as html file
locally using a FileStream object.

7. After all the 100 urls have been crawled, I clear the DataSet object
using DataSet.Clear(). Then retreive next 100 URLs
and the process continues. I use DataSet, because I have to set the
success or failure status for each URL record. I do that
by calling DataAdapter.Update() after all the 100 URLs have been
crawled.

The only objects that can take up memory is StringBuilder, HttpResponse,
FileStream. But these objects are created in the
function scope (i.e they are not members of my class), so they go
out
of objects
that and
so
know
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=36a3e
 
Hi Mahesh,

Thanks for your information. I am performing research on this issue and
will update you with my information.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Hello Mahesh,

I reviewed your code snippet carefully, however, I did not find any obvious
problems that will cause the problem. I suggest you to check who allocated
the memory in Generation 2 in the Allocation Profiler. In addition, I
believe the following MSDN article is helpful for debugging memory problems.

Debugging Memory Problems (Production Debugging for .NET Framework
Applications)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/
DBGch02.asp

I look forward to hearing from you.

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
I finally got around the GC issue. As I mentioned in my previous posts, the
profiler
was showing that the memory was taking up by the byte array (Byte[]) , that
I was using to save images. The
function (I posted this function in my previous thread) is as follows :

protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory

}

As the Byte[] has a function scope, the 10K Byte Array was getting created
everytime the "SaveImageLocally()" function
was called. I did this with the belief that the GC will free up all the
memory once the the array goes out of the function scope.
But as I found out, it does not!!!

So to get around this problem, I made the "byte[] buffer" array a member of
my class, thus giving it an object scope, rather
than a function scope. In other words, I'm now reusing the buffer for all
the images.

I went thru tons of articles on .NET GC to understand how it works ...and
found lotsa interesting stuff, like object resurrection and
how objects that implement Finalize() do not get released even after the
first GC.
Bottom line : Even if technologies like Java and .NET promise to relieve you
from memory management woes, they have their
own set of quirks and problems. I'm going to be more careful about how I
write my code in .NET from now on ( maybe more
careful than when I was writing in C++ :-)

Thanks everyone for their time and help!

Mahesh



Alvin Bruney said:
my apologies then

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
Mahesh Prasad said:
Hi Alvin,

I only posted the main portion of the code and stripped out the rest. All
the code is within a try-catch block and
I'm making sure that the stream and other objects are closed properly even
when exceptions are thrown.

Also in the GetPage() method, before I save the HTML text to a file, I pass
the sHTML string object to a methof
called "ParseHtmlForImages(ref sHtml)". I forgot to add that in my previous
post.

Thanks,
-Mahesh
posted
is
that it is not guarded. An exception at any point traps the buffer and all
your objects in memory. Same for your file stream object. At least use a
finally block to guard against resource leaks. Consider for example:

using(FileStream fs = new FileStream(localFilePath,FileMode.Create))
{

}

Constructs such as these provide a basic degree of protection against
leakage. Really, your code needs to be a lot more robust than this, I point
this out so that you will at least be pointed in the right direction.

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
After running the profiler I found that most of the memory is taken
up
by
String and Byte[] objects.

I use the string object to read the html returned by the HttpWebResponse
object.

function GetPage(string URL)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
string sHTML = sr.ReadToEnd();

// close the response stream
stream.Close();
resp.Close();
// close the reader stream
sr.Close();

// save the html file locally
StreamWriter sw = File.CreateText(localFilePath);
sw.Write(sHTML);
// close the reader and write streams
sw.Close();

}

And the Byte array is used to store the binary data from the response
stream
and save it locally.

void SaveImage(string URL)
{
// Get Image from web server
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
if(resp.StatusCode == HttpStatusCode.OK)
{
// save the image locally
SaveImagesLocally(ref resp);
}
resp.Close();

}

protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory

}



Hi HuangTM,

I'll try the profiler...but in the mean time I'll provide more info
about
the application.
I can't share the code, but here's the basic logic is as follows...

1. I've a set of 30,000 urls that I have to crawl. These urls are stored
in
a database.

2. I use a single DataSet object to read 100 urls at a time.

3. For each url I retreive from DataSet , I make an HTTP call to
the
web
server using HttpWebRequest object.

4. When the web server returns the HTTP Response, I save the Response
stream
in a StringBuilder object.

5. I then parse the HTML text stored in the StringBuilder object looking
for
<img> tags, using the RegEx class.
If I find some image references, I then use another HttpWebRequest
object to request for images and save
the images locally, using the FileStream object.

6. I then save the HTML text in the StringBuilder object as html file
locally using a FileStream object.

7. After all the 100 urls have been crawled, I clear the DataSet object
using DataSet.Clear(). Then retreive next 100 URLs
and the process continues. I use DataSet, because I have to
set
the
success or failure status for each URL record. I do that
by calling DataAdapter.Update() after all the 100 URLs have been
crawled.

The only objects that can take up memory is StringBuilder, HttpResponse,
FileStream. But these objects are created in the
function scope (i.e they are not members of my class), so they go
out
of
scope after Step 6 above. Also I call Close() on all
the objects that support this method.

During the process I noticed that the Gen 2 heap size just keeps
growing.
What I don't understand is none of my class member
variables hold up that much memory (i.e they retain very little state
information), so why does Gen 2 heap size increases. The only objects
that
can take up memory are the ones that are created at function scope and
so
should be freed the next time GC kicks in.
And so they should't be part of the Gen 2 heap.

I tried calling GC.Collect(), but with no success. Please let me
know
if
you
want more details.

Thanks
Mahesh


Hello Mahesh,

In addition, you may use Allocation Profiler to check the memory usage
in
your application:
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=36a3e
 
I completely agree with this. Automatic memory management environments still
require a certain measure of vigilance and responsibility when
alloc/deallocating. Assuming that this environment works right is always at
the programmers expense which may lead to a misbehaved program. 2cents.

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/3he3b
Mahesh Prasad said:
I finally got around the GC issue. As I mentioned in my previous posts, the
profiler
was showing that the memory was taking up by the byte array (Byte[]) , that
I was using to save images. The
function (I posted this function in my previous thread) is as follows :

protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory

}

As the Byte[] has a function scope, the 10K Byte Array was getting created
everytime the "SaveImageLocally()" function
was called. I did this with the belief that the GC will free up all the
memory once the the array goes out of the function scope.
But as I found out, it does not!!!

So to get around this problem, I made the "byte[] buffer" array a member of
my class, thus giving it an object scope, rather
than a function scope. In other words, I'm now reusing the buffer for all
the images.

I went thru tons of articles on .NET GC to understand how it works ...and
found lotsa interesting stuff, like object resurrection and
how objects that implement Finalize() do not get released even after the
first GC.
Bottom line : Even if technologies like Java and .NET promise to relieve you
from memory management woes, they have their
own set of quirks and problems. I'm going to be more careful about how I
write my code in .NET from now on ( maybe more
careful than when I was writing in C++ :-)

Thanks everyone for their time and help!

Mahesh



Alvin Bruney said:
my apologies then

--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
posted and
all
use
taken
up
by
String and Byte[] objects.

I use the string object to read the html returned by the HttpWebResponse
object.

function GetPage(string URL)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
string sHTML = sr.ReadToEnd();

// close the response stream
stream.Close();
resp.Close();
// close the reader stream
sr.Close();

// save the html file locally
StreamWriter sw = File.CreateText(localFilePath);
sw.Write(sHTML);
// close the reader and write streams
sw.Close();

}

And the Byte array is used to store the binary data from the response
stream
and save it locally.

void SaveImage(string URL)
{
// Get Image from web server
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
if(resp.StatusCode == HttpStatusCode.OK)
{
// save the image locally
SaveImagesLocally(ref resp);
}
resp.Close();

}

protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory

}



Hi HuangTM,

I'll try the profiler...but in the mean time I'll provide more info
about
the application.
I can't share the code, but here's the basic logic is as follows...

1. I've a set of 30,000 urls that I have to crawl. These urls are
stored
in
a database.

2. I use a single DataSet object to read 100 urls at a time.

3. For each url I retreive from DataSet , I make an HTTP call to the
web
server using HttpWebRequest object.

4. When the web server returns the HTTP Response, I save the Response
stream
in a StringBuilder object.

5. I then parse the HTML text stored in the StringBuilder object
looking
for
<img> tags, using the RegEx class.
If I find some image references, I then use another HttpWebRequest
object to request for images and save
the images locally, using the FileStream object.

6. I then save the HTML text in the StringBuilder object as html file
locally using a FileStream object.

7. After all the 100 urls have been crawled, I clear the DataSet
object
using DataSet.Clear(). Then retreive next 100 URLs
and the process continues. I use DataSet, because I have to set
the
success or failure status for each URL record. I do that
by calling DataAdapter.Update() after all the 100 URLs have been
crawled.

The only objects that can take up memory is StringBuilder,
HttpResponse,
FileStream. But these objects are created in the
function scope (i.e they are not members of my class), so they
go
out
of
scope after Step 6 above. Also I call Close() on all
the objects that support this method.

During the process I noticed that the Gen 2 heap size just keeps
growing.
What I don't understand is none of my class member
variables hold up that much memory (i.e they retain very little state
information), so why does Gen 2 heap size increases. The only objects
that
can take up memory are the ones that are created at function
scope
and
so
should be freed the next time GC kicks in.
And so they should't be part of the Gen 2 heap.

I tried calling GC.Collect(), but with no success. Please let me know
if
you
want more details.

Thanks
Mahesh


Hello Mahesh,

In addition, you may use Allocation Profiler to check the memory
usage
in
your application:
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=36a3e confers
 
Back
Top