T
tobin
Hi there,
Question 1:
I'm writing a spider in C# and want to be able to achieve the maximum
number of page fetches per second using the HttpWebRequest. Is, for
example, 100 pages/sec possible? I'm sure I've seen open source
perl/java spiders boast up to 500 pages/sec performance on an average
spec developer machine.
Note that I'm polite and not trying to hit any given server more than
once within any given minute, our spider has lots of sites to scan and
so has plenty to do without breaking the 2 concurrent requests rule!
Question 2:
I suspect one main area of pain is to be the threadpool. If I do this
(example only):
for( int i=0; i<3000; i++)
{
...
WebRequest r = HttpWebRequest.Create( serverUrl );
r.BeginGetRequest(SomeCallBack, someState);
}
.... will this use the treadpool behind the scenes? If so, will the pool
throw an exception as soon as the 26th item gets added, or does it have
some flexible scheme for queueing callbacks? This appears to be the
behavour I'm seeing, and I don't know how to make BeginGetRequest() NOT
use the pool, or to stop the pool from overflowing.
Any help really really appreciated
Tobin
Question 1:
I'm writing a spider in C# and want to be able to achieve the maximum
number of page fetches per second using the HttpWebRequest. Is, for
example, 100 pages/sec possible? I'm sure I've seen open source
perl/java spiders boast up to 500 pages/sec performance on an average
spec developer machine.
Note that I'm polite and not trying to hit any given server more than
once within any given minute, our spider has lots of sites to scan and
so has plenty to do without breaking the 2 concurrent requests rule!
Question 2:
I suspect one main area of pain is to be the threadpool. If I do this
(example only):
for( int i=0; i<3000; i++)
{
...
WebRequest r = HttpWebRequest.Create( serverUrl );
r.BeginGetRequest(SomeCallBack, someState);
}
.... will this use the treadpool behind the scenes? If so, will the pool
throw an exception as soon as the 26th item gets added, or does it have
some flexible scheme for queueing callbacks? This appears to be the
behavour I'm seeing, and I don't know how to make BeginGetRequest() NOT
use the pool, or to stop the pool from overflowing.
Any help really really appreciated
Tobin