HttpWebRequest and Multi Threaded apps Question

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi,

Im in the process of writing a program that crawls a website. Im using the
HttpWebRequest and HttpWebResponse classed to get content. To make my
application more scalable, my application is multithreaded, with each thread
making a different request.

One problem Ive ran into is that when writing the response stream to the
filesystem, from the HttpWebResponse.GetResponseStream() method, if the
response is of a sufficient size, it blocks all other other executing
threads. If this exceeds the timeout, then all the other threads that are in
the process of making a Web Request time out.

Ive taken every effort to ensure that my application is not locking, I was
wondering if given the scenario that Ive listed, would using Asynchronous Web
Requests be more efficient? Ive not worked with them before so Id be grateful
of your opinions.

Cheers,

Mark
 
Mark said:
Hi,

Im in the process of writing a program that crawls a website. Im
using the HttpWebRequest and HttpWebResponse classed to get content.
To make my application more scalable, my application is
multithreaded, with each thread making a different request.

One problem Ive ran into is that when writing the response stream to
the filesystem, from the HttpWebResponse.GetResponseStream() method,
if the response is of a sufficient size, it blocks all other other
executing threads. If this exceeds the timeout, then all the other
threads that are in the process of making a Web Request time out.

Ive taken every effort to ensure that my application is not locking,
I was wondering if given the scenario that Ive listed, would using
Asynchronous Web Requests be more efficient? Ive not worked with them
before so Id be grateful of your opinions.

I agree with Ralph. Without being able to look at your code, this seems
pretty bizarre.

One thing you might have run into is the connection limit of persistent
or non-persistent connections a ServicePoint allows to a single host.
This will block your threads due to the limited number of available
physical TCP connections.

Cheers,
 
Hi Joerg,

Thanks for the response. Im wondering if indeed my problem is ServicePoint
related. Do you know of any good resources where I can go and learn about
this? In my scenario each active thread makes one Http Request and gets the
response.

Ive looked into my problem further and I do find that I have occasional
problems, when reading the response stream. I find that when calling
Stream.Read() this sometimes times out with the following exception:

Message: The operation has timed out
Stack Trace: at System.ConnectStream.Read(...)

Now it doesnt matter if I set HttpRequest.Timeout = 10000 or 100000, I still
occasionally get these timeouts. When I debug the application its like the
thread is in the process of reading bytes part of the way through and then
hangs.

I can provide code if requested, but I was wondering if anyone else has seen
symptoms like this, and if they could suggest a remedy.

Thanks,

Mark
 
Mark said:
Hi Joerg,

Thanks for the response. Im wondering if indeed my problem is
ServicePoint related. Do you know of any good resources where I can
go and learn about this? In my scenario each active thread makes one
Http Request and gets the response.

Mark, ServicePoints are unfortunately one of the more arcane classes in
the BCL. I'm not even aware of any MSDN article discussing them.

You'll find some information in "Network Programming for the .NET
Framework" (MS Press), but other than that the class documentation is
your best friend.

Note: The connection limit imposed by ServicePoint(Manager) has a
reason -- it is specified in RFC 2616 (HTTP 1.1).
Ive looked into my problem further and I do find that I have
occasional problems, when reading the response stream. I find that
when calling Stream.Read() this sometimes times out with the
following exception:

Message: The operation has timed out
Stack Trace: at System.ConnectStream.Read(...)

Now it doesnt matter if I set HttpRequest.Timeout = 10000 or 100000,
I still occasionally get these timeouts. When I debug the application
its like the thread is in the process of reading bytes part of the
way through and then hangs.

Note that the Timeout property is only relevant for pending responses,
not responses that are already being received. Faulty HTTP
implementations can cause a lot of problems here (e.g. wrong
Content-Length, or broken chunking).

If you need a timeout for "in-flight" responses, use async I/O with
ThreadPool.WaitForSingleObject() to register a timeout for the async
operation.

Cheers,
 
My guess is that you're making a request to an HTTPS url. There's a
default limit of 2 simultaneous https requests to the same domain built
into the framework. Just put this line of code into your code after
you've created the HttpWebRequest object,

request.ServicePoint.ConnectionLimit = 50;

Bruce Dunwiddie
http://www.csvreader.com
 
Back
Top