I finally got around the GC issue. As I mentioned in my previous posts, the
profiler
was showing that the memory was taking up by the byte array (Byte[]) , that
I was using to save images. The
function (I posted this function in my previous thread) is as follows :
protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory
}
As the Byte[] has a function scope, the 10K Byte Array was getting created
everytime the "SaveImageLocally()" function
was called. I did this with the belief that the GC will free up all the
memory once the the array goes out of the function scope.
But as I found out, it does not!!!
So to get around this problem, I made the "byte[] buffer" array a member of
my class, thus giving it an object scope, rather
than a function scope. In other words, I'm now reusing the buffer for all
the images.
I went thru tons of articles on .NET GC to understand how it works ...and
found lotsa interesting stuff, like object resurrection and
how objects that implement Finalize() do not get released even after the
first GC.
Bottom line : Even if technologies like Java and .NET promise to relieve you
from memory management woes, they have their
own set of quirks and problems. I'm going to be more careful about how I
write my code in .NET from now on ( maybe more
careful than when I was writing in C++
Thanks everyone for their time and help!
Mahesh
Alvin Bruney said:
my apologies then
--
Regards,
Alvin Bruney
Got tidbits? Get it here...
http://tinyurl.com/2bz4t
posted and
all
use
taken
up
by
String and Byte[] objects.
I use the string object to read the html returned by the HttpWebResponse
object.
function GetPage(string URL)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
string sHTML = sr.ReadToEnd();
// close the response stream
stream.Close();
resp.Close();
// close the reader stream
sr.Close();
// save the html file locally
StreamWriter sw = File.CreateText(localFilePath);
sw.Write(sHTML);
// close the reader and write streams
sw.Close();
}
And the Byte array is used to store the binary data from the response
stream
and save it locally.
void SaveImage(string URL)
{
// Get Image from web server
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
req.Timeout = 60000;
HttpWebResponse resp = (HttpWebResponse )req.GetResponse();
if(resp.StatusCode == HttpStatusCode.OK)
{
// save the image locally
SaveImagesLocally(ref resp);
}
resp.Close();
}
protected void SaveImageLocally(ref HttpWebResponse resp)
{
Stream stream = resp.GetResponseStream();
// This stream does not support seeking, so it cannot return length.
// so allocate enough memory to save the binary image data.
byte[] buffer = new Byte[10000];
int BytesToRead = (int)buffer.Length;
int BytesRead = 0;
int n = 0;
do
{
n = stream.Read(buffer,BytesRead,BytesToRead);
BytesToRead-=n;
BytesRead+= n;
}while(n>0);
stream.Close();
FileStream fs = new FileStream(localFilePath,FileMode.Create);
fs.Write(buffer,0,BytesRead);
fs.Close();
buffer = null; // making sure GC frees this memory
}
Hi HuangTM,
I'll try the profiler...but in the mean time I'll provide more info
about
the application.
I can't share the code, but here's the basic logic is as follows...
1. I've a set of 30,000 urls that I have to crawl. These urls are
stored
in
a database.
2. I use a single DataSet object to read 100 urls at a time.
3. For each url I retreive from DataSet , I make an HTTP call to the
web
server using HttpWebRequest object.
4. When the web server returns the HTTP Response, I save the Response
stream
in a StringBuilder object.
5. I then parse the HTML text stored in the StringBuilder object
looking
for
<img> tags, using the RegEx class.
If I find some image references, I then use another HttpWebRequest
object to request for images and save
the images locally, using the FileStream object.
6. I then save the HTML text in the StringBuilder object as html file
locally using a FileStream object.
7. After all the 100 urls have been crawled, I clear the DataSet
object
using DataSet.Clear(). Then retreive next 100 URLs
and the process continues. I use DataSet, because I have to set
the
success or failure status for each URL record. I do that
by calling DataAdapter.Update() after all the 100 URLs have been
crawled.
The only objects that can take up memory is StringBuilder,
HttpResponse,
FileStream. But these objects are created in the
function scope (i.e they are not members of my class), so they
go
out
of
scope after Step 6 above. Also I call Close() on all
the objects that support this method.
During the process I noticed that the Gen 2 heap size just keeps
growing.
What I don't understand is none of my class member
variables hold up that much memory (i.e they retain very little state
information), so why does Gen 2 heap size increases. The only objects
that
can take up memory are the ones that are created at function
scope
and
so
should be freed the next time GC kicks in.
And so they should't be part of the Gen 2 heap.
I tried calling GC.Collect(), but with no success. Please let me know
if
you
want more details.
Thanks
Mahesh
Hello Mahesh,
In addition, you may use Allocation Profiler to check the memory
usage
in
your application:
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=36a3e confers