website/webclient

  • Thread starter Thread starter Sehboo
  • Start date Start date
S

Sehboo

I have a website, and I need to programmatically find out all the
pages that the first page referes to, and then goto those pages and
retrive some information. The problem is that the links could be
relative or absolute. I know how to use webclient. I just don't know
how to parse out the relative or absolute links.

Does anybody know how? I could have many links on the first page.

Thanks
 
Hi Sehboo,

You can use mshtml to go to your document and find all the anchors <A> tags,
than with that you can find the references to other pages..

It is no easy stuff to do.

You have to set a reference to it, but do not set an import to mshtml,
because your IDE will freeze.

Cor
 
Sehboo,
Matthew MacDonald's book "Microsoft Visual basic .NET Programmer's Cookbook"
from MS Press has a sample of using the System.Net.WebResponse class
combined with a System.Text.RegularExpressions.RegEx class to look for all
the links on a page.

Hope this helps
Jay
 
Back
Top