Remote Webpage data extraction

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

HI,
i have a question abt dot net. Let me tell you the sceneraio...i have a
asp.net page written in C#. there is a textbox and a button in the page thatz
it. the functionality i want is that whatever the URL i enter into that
textbox, my code should read that webpage and show that within my page. i
dont want to redirect the user to that page. instead of that when he hits the
submit in the backend my C# code should read that remote webpage and extract
the data and insert into my own page. if i didnt got the images that will do.
but i should all the text data.
Anybody have any idea how to do this. I cant use any third party component

Thanks in advance

Deepson
 
Deepson,

The trouble with what you are doing is that HTML is a descriptionlanguage
which makes it possible to get text from a lot of resources. So what you see
on a page does not have to be on that page. It can be an url that uses an
url that uses an url.

As well there can be a lot of text in javascript and/including macromedia
pluggins, java pluggins or other pluggins.

That makes it in my opinion very difficult to do what you want.

Probably is some client side script that creates an Iframe in your page a
more proper way to go.

However in this just my thought,

Cor
 
Deepson said:
HI,
i have a question abt dot net. Let me tell you the sceneraio...i
have a asp.net page written in C#. there is a textbox and a button in
the page thatz it. the functionality i want is that whatever the URL
i enter into that textbox, my code should read that webpage and show
that within my page. i dont want to redirect the user to that page.
instead of that when he hits the submit in the backend my C# code
should read that remote webpage and extract the data and insert into
my own page. if i didnt got the images that will do. but i should all
the text data.
Anybody have any idea how to do this. I cant use any third party
component

You can fetch web pages using either System.Net.WebClient or
System.Net.WebRequest, but this will only work if the page doesn't use
client-side scripting to render itself fully or partially.

Cheers,
 
Back
Top