Using HTTPWebRequest to make a web spider but some pages return nothing (groups.google.com for insta

  • Thread starter Thread starter David
  • Start date Start date
D

David

I'm trying to use HTTPWebRequest to make a web spider. I was able to
retrieve Yahoo's main page with it but the data differed greatly from what
Firefox or IE retrieve (none of the script commands were in what my program
retrieved). Even when I turned IE's security up all the way so everything
was disabled the script commands were still in the page source, I'm not sure
why.

Groups.google.com does not seem to return a thing to my spider. I tried
changing the UserAgent property a few times to things like "Mozilla/5.0
Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/2007/0309
Firefox/2.0.0.3" but the results didn't change.

Any ideas on what I need to do ?

Thanks,

David.
 
I've had this problem myself. HttpWebRequest will not include IE stored
cookies. If the page requested relies on a cookie value to be present, the
page may return different data. Here is some code I have written which
reads IE's cookies for a specific URI and returns them in a
'CookieCollection' which can be put in a 'CookieContainer' and placed into
'HttpWebRequest'. You may also need to manually store cookies from an
HttpWebResponse for use in future requests.

Public Shared Function GetCookie(ByVal Uri As Uri) As CookieCollection
Dim cookieFolder As String =
System.Environment.GetFolderPath(Environment.SpecialFolder.Cookies)
Dim cookieFiles() As String = IO.Directory.GetFiles(cookieFolder,
"*.txt")

For Each cookie As String In cookieFiles
Dim filename As String =
IO.Path.GetFileNameWithoutExtension(cookie)
filename = filename.Substring(filename.IndexOf("@") + 1)
If filename.Contains("[") Then filename = filename.Substring(0,
filename.IndexOf("["))
If Uri.Host.Contains(filename) Then
Dim fs As New IO.FileStream(cookie, IO.FileMode.Open,
IO.FileAccess.Read, IO.FileShare.ReadWrite)
Dim data(fs.Length - 1) As Byte
fs.Read(data, 0, data.Length)
fs.Close()
Dim cookieData As String =
System.Text.ASCIIEncoding.ASCII.GetString(data)
Dim entries() As String = Split(cookieData, "*")
Dim cookies As New CookieCollection
For Each entry As String In entries
If entry.Length = 1 Then Continue For
If entry.StartsWith(vbLf) Then entry = entry.Remove(0,
1)
Dim sCookie() As String = Split(entry, vbLf)
cookies.Add(New Cookie(sCookie(0), sCookie(1)))
'cookies.Add(sCookie(0) & "=" & sCookie(1))
Next

Return cookies
End If
Next

Return Nothing
End Function

Hope this helps,
Andrew
 
Back
Top