Read source code of a web page

  • Thread starter Thread starter Nicolae Fieraru
  • Start date Start date
N

Nicolae Fieraru

Hi All,

I use VC .Net 2003 and I want to create a small program which is able to
read the source code of a web site. I have a textbox with the URL, a button
and a multiline textbox for the source code. If the web page is secure, how
can I post my login details?

Any help appreciated.

Regards,
Nicolae
 
Nicolae Fieraru said:
Hi All,

I use VC .Net 2003 and I want to create a small program which is able to
read the source code of a web site. I have a textbox with the URL, a button
and a multiline textbox for the source code. If the web page is secure, how
can I post my login details?

Any help appreciated.

Regards,
Nicolae

Depends on how the server is secured. In some cases

http://username:[email protected]/path/filename

may work but in the majority of cases you'll have to use the
WebClient class and set up its Credentials property.

WebClient Class
http://msdn.microsoft.com/library/d...ef/html/frlrfsystemnetwebclientclasstopic.asp

WebClient.Credentials
http://msdn.microsoft.com/library/d...rfsystemnetwebclientclasscredentialstopic.asp

NetworkCredential Class
http://msdn.microsoft.com/library/d...frlrfsystemnetnetworkcredentialclasstopic.asp

NetworkCredential Constructor
http://msdn.microsoft.com/library/d...fsystemnetnetworkcredentialclassctortopic.asp


'The real crux of the "software crisis" is that software IS hard.'
Robert C. Martin,
'Designing Object-Oriented C++ Applications Using the Booch Method', p. ix
 
The advice from UAError is only appropriate if the web site is on YOUR
computer. If you are trying to read the html contents of a page that is
running on someone else's computer, you will need to be able to log in to
that site. I assume you are talking about a situation where you can log in,
but you want to download the HTML to an application.

Most sites do what .Net does, and that is to make a login page available
that uses cookies. In this model, you log in to the server. The server
checks its database or directory for your credentials. If found, the server
issues a token. This is a string of characters to you, but means something
to the server. It provides that token to your site as a cookie.

All subsequent pages on the site require the cookie. Web browsers already
do this. If a cookie comes from a web site, with instructions for how long
the cookie should remain in the browser, the browser will automatically pass
the cookie back to every subsequent request from that web site.

So, your app has to pretend to be a browser. You have to use the GET and
POST methods of the HTTPWebRequest class to ask for the login page and to
provide the credentials. You will get back a collection of cookies. You
need to attach that cookie collection to all subsequent Web Requests against
that site.

So, look to the HTTPWebRequest HTTPWebResponse and CookieCollection classes
for further information on how to do what you want.

Note: you said you wanted to read the "source code" of the site. You will
be able to get the HTML, not the source code per se. If you want the C# or
ASP.NET code for a site, you will need direct access to the filesystem it is
on. That will normally require the cooperation of the system administrator.
In that case, my advice is meaningless. (I'm guessing here).

I hope this helps,

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
 
The advice from UAError is only appropriate if the web site is on YOUR
computer.

Surely you mean a computer within your domain, not your own
local computer? When I'm wrong, I'm wrong but the latter
would seem so restrictive that it is useless. The sample

How To Use WebClient Class To Make HTTP Requests
http://support.microsoft.com/default.aspx?scid=kb;en-us;328820

states that WebClient supports
- Basic authentication
- Integrated Windows authentication

All subsequent pages on the site require the cookie. Web browsers already
do this. If a cookie comes from a web site, with instructions for how long
the cookie should remain in the browser, the browser will automatically pass
the cookie back to every subsequent request from that web site.

I was under the (now obviously mistaken) impression that
WebClient was like a stripped down browser. I guess I should
have looked for a Cookies or CookieContainer property and
drawn my own conclusions when it wasn't there.
You have to use the GET and POST methods of the HTTPWebRequest
class to ask for the login page and to provide the credentials.

You had me confused here until I found the
HttpWebRequest.Method property, which was impossible to find
under "Public Methods" :)


Thanks for straightening this out.
 
sorry for not being clear and I certainly didn't mean to offend.

Thanks for taking my remarks in a good spirit.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
 
The advice from UAError is only appropriate if the web site is on YOUR
computer. If you are trying to read the html contents of a page that is
running on someone else's computer, you will need to be able to log in to
that site. I assume you are talking about a situation where you can log
in,
but you want to download the HTML to an application.

That is not true Nick, if you use Internet Explorer as browser than you see
an icon "Edit" which will allow you to change the page content you are on
even with notepad. (The only point is you can only save it on your own
computers)

Cor
 
Back
Top