Saving webpage graphics with code

  • Thread starter Thread starter Better Ferret
  • Start date Start date
B

Better Ferret

I am creating a data harvesting application with VB.NET
that parses information from the HTML of web pages. Part
of the information to be collected is in the form of
images. The parsing of the text from the HTML was a chore
but I am stumped trying to retrieve images based on the
src inside the IMG tag. If anyone has some insite into how
to save images locally based on the URL of the graphic, I
would be very grateful to hear about it...
Thanx

Better
 
Hello,

Better Ferret said:
I am creating a data harvesting application with VB.NET
that parses information from the HTML of web pages. Part
of the information to be collected is in the form of
images. The parsing of the text from the HTML was a chore
but I am stumped trying to retrieve images based on the
src inside the IMG tag. If anyone has some insite into how
to save images locally based on the URL of the graphic, I
would be very grateful to hear about it...

http://www.codeproject.com/csharp/webdownload.asp
 
Read the documentation on:
System.Text.RegularExpressions.Regex
System.Net.WebClient

Use the Regex object to search your HTML source for <img src="xxx"> tags.
Use groups in your expression so that you can easily retrieve the image url
(xxx).

Determine if the image url absolute (includes the servername, etc) or
relative (use the System.Uri object). If it is relative, append the
servername (again, System.Uri should help).

Now retrieve the image file by using the Url with a WebClient object. You
can retrieve the image as a stream, and save the stream to disk using a
BinaryWriter.



Thanks Herfried,
It was a bit more than I needed but provided some good
insite. Not up to speed on C# yet but it was helpful.
Any VB.NET samples that you know of that are similar.
The graphics I am downloading are very small and no
callbacks are needed.

Better Ferret
 
Back
Top