B
Brian C
I've recently begun my first foray into the world of windows forms. I
wrote a little utility to spider a website and compile a list of the
urls.
The app calls a routine that analyzes a page, finds links on the page,
then recursively calls the app to analyze new links.
It works fine except that the application would freeze up while it was
working (eventually it would unfreeze when finished).
I thought, ok .. here's where I learn about asynchronous processing.
I found this code and rebuilt the app:
http://www.eggheadcafe.com/articles/20060120.asp
However, problems immediately arose. Since each spidered page
recursively calls the same (asynchronous) routine to spider new links
complications arose all over the place.
Possible issues as I understand it:
1. keeping track of all the site urls (I had been using a simple
arraylist)
2. keeping track of when all the threads are finished (not an issue in
a single threaded/synchronous application)
3. keeping a lid on the number of threads
I've come across references to 'locking' and thought that it might
solve issues 1 & 2 but I'm not sure where to go next. As for issue 3,
I'm at a bit of a loss.
I'm at the point where I figure I should start from the beginning in
laying out the logic.
Suggestions? Is the code linked above a good starting point or should
I perhaps try something completely different?
Googling is difficult when you don't know what you're looking for.
wrote a little utility to spider a website and compile a list of the
urls.
The app calls a routine that analyzes a page, finds links on the page,
then recursively calls the app to analyze new links.
It works fine except that the application would freeze up while it was
working (eventually it would unfreeze when finished).
I thought, ok .. here's where I learn about asynchronous processing.
I found this code and rebuilt the app:
http://www.eggheadcafe.com/articles/20060120.asp
However, problems immediately arose. Since each spidered page
recursively calls the same (asynchronous) routine to spider new links
complications arose all over the place.
Possible issues as I understand it:
1. keeping track of all the site urls (I had been using a simple
arraylist)
2. keeping track of when all the threads are finished (not an issue in
a single threaded/synchronous application)
3. keeping a lid on the number of threads
I've come across references to 'locking' and thought that it might
solve issues 1 & 2 but I'm not sure where to go next. As for issue 3,
I'm at a bit of a loss.
I'm at the point where I figure I should start from the beginning in
laying out the logic.
Suggestions? Is the code linked above a good starting point or should
I perhaps try something completely different?
Googling is difficult when you don't know what you're looking for.