Programmatically Generate a POST to Log Into Site and Screen Scrape

  • Thread starter Thread starter Tony Pino
  • Start date Start date
T

Tony Pino

Hi,

Lets say there's a web site with simple authentication. It
asks you to type a uname/password into a couple text
boxes, and then it gives you a cookie and you're logged in
for 20 minutes or so. What I need to do is automate that.

In other words, in my code behind, how can I generate a
POST request (with username and password data) to a
server , get the cookie it returns, issue a request (using
that cookie) to a secured page so I can scrape the data?

Thanks
 
Tony Pino said:
Any ideas?

Check out System.Net.(Http)WebRequest and System.Net.(Htp)WebResponse.
For simple use cases, System.Net.WebClient will work as well.

Cheers,
 
Thanks for the reply.

I understand (and have seen examples) of using those
classes to simply request a page (like google.com) and
store the HTML in a string object. However, I'm still a
bit confused with how to store an auth cookie so the next
request I make will be authenticated so I can access a
private page.
 
Hi Tony,

HttpWebRequest and HttpWebResponse provide the container to hold cookies
both for the sending and receiving ends but it doesn't automatically
persist them so that becomes your responsibility.

Because the Cookie collections are nicely abstracted in these objects it's
fairly easy to save and restore them. The key to make this work is to have
a persistent object reference to the cookie collection and then reuse the
same cookie store each time.

To do this let's assume you are running the request on a form (or some
other class - this in the example below). You'd create a property called
Cookies:

CookieCollection Cookies;

On the Request end of the connection before the request is sent to the
server you can then check whether there's a previously saved set of cookies
and if so use them:

Request.CookieContainer = new CookieContainer();

if (this.Cookies != null &&

this.Cookies.Count > 0)

Request.CookieContainer.Add(this.Cookies);

So, if you previously had retrieved cookies, they were stored in the
Cookies property and then added back into the Request's CookieContainer
property. CookieContainer is a collection of cookie collections - it's
meant to be able to store cookies for multiple sites. Here I only deal with
tracking a single set of cookies for a single set of requests.

On the receiving end once the request headers have been retrieved after the
call to GetWebResponse(), you then use code like the following:

// *** Save the cookies on the persistent object

if (Response.Cookies.Count > 0)

this.Cookies = Response.Cookies;

This saves the cookies collection until the next request when it is then
reassigned to the Request which sends it to the server. Note, that this is
a very simplistic cookie management approach that will work only if a
single or a single set of cookies is set on a given Web site. If multiple
cookies are set in multiple different places of the site you will actually
have to retrieve the individual cookies and individually store them into
the Cookie collection. Here's some code that demonstrates:

if (loWebResponse.Cookies.Count > 0)

if (this.Cookies == null)

{

this.Cookies = loWebResponse.Cookies;

}

else

{

// If we already have cookies update list

foreach (Cookie oRespCookie in

loWebResponse.Cookies)

{

bool bMatch = false;

foreach(Cookie oReqCookie in

this.oCookies) {

if (oReqCookie.Name ==

oRespCookie.Name) {

oReqCookie.Value =

oRespCookie.Name;

bMatch = true;

break;

}

}

if (!bMatch)

this.Cookies.Add(oRespCookie);

}

}

}

This should give you a good starting point.

Best regards,

Jacob Yang
Microsoft Online Partner Support
Get Secure! ¨C www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Great, thanks!
-----Original Message-----
Hi Tony,

HttpWebRequest and HttpWebResponse provide the container to hold cookies
both for the sending and receiving ends but it doesn't automatically
persist them so that becomes your responsibility.

Because the Cookie collections are nicely abstracted in these objects it's
fairly easy to save and restore them. The key to make this work is to have
a persistent object reference to the cookie collection and then reuse the
same cookie store each time.

To do this let's assume you are running the request on a form (or some
other class - this in the example below). You'd create a property called
Cookies:

CookieCollection Cookies;

On the Request end of the connection before the request is sent to the
server you can then check whether there's a previously saved set of cookies
and if so use them:

Request.CookieContainer = new CookieContainer();

if (this.Cookies != null &&

this.Cookies.Count > 0)

Request.CookieContainer.Add(this.Cookies);

So, if you previously had retrieved cookies, they were stored in the
Cookies property and then added back into the Request's CookieContainer
property. CookieContainer is a collection of cookie collections - it's
meant to be able to store cookies for multiple sites. Here I only deal with
tracking a single set of cookies for a single set of requests.

On the receiving end once the request headers have been retrieved after the
call to GetWebResponse(), you then use code like the following:

// *** Save the cookies on the persistent object

if (Response.Cookies.Count > 0)

this.Cookies = Response.Cookies;

This saves the cookies collection until the next request when it is then
reassigned to the Request which sends it to the server. Note, that this is
a very simplistic cookie management approach that will work only if a
single or a single set of cookies is set on a given Web site. If multiple
cookies are set in multiple different places of the site you will actually
have to retrieve the individual cookies and individually store them into
the Cookie collection. Here's some code that demonstrates:

if (loWebResponse.Cookies.Count > 0)

if (this.Cookies == null)

{

this.Cookies = loWebResponse.Cookies;

}

else

{

// If we already have cookies update list

foreach (Cookie oRespCookie in

loWebResponse.Cookies)

{

bool bMatch = false;

foreach(Cookie oReqCookie in

this.oCookies) {

if (oReqCookie.Name ==

oRespCookie.Name) {

oReqCookie.Value =

oRespCookie.Name;

bMatch = true;

break;

}

}

if (!bMatch)

this.Cookies.Add(oRespCookie);

}

}

}

This should give you a good starting point.

Best regards,

Jacob Yang
Microsoft Online Partner Support
Get Secure! ¨C www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

.
 
Back
Top