Download HTML As Plain Text

Doominato · May 18, 2004

good day,

I was just wondering how can I download a web page as plain text from a
certain web site. I have tried to use the OpenURL() method from INET control
in my VB.NET app, but it returns elements such as this within the plain
text. Is there a way to filter them or to simply download the page as plain
text?

any help would be greatly appreciated.

Chad Z. Hower aka Kudzu · May 18, 2004

Doominato said:
I was just wondering how can I download a web page as plain text from a
certain web site. I have tried to use the OpenURL() method from INET
control in my VB.NET app, but it returns elements such as this 
within the plain text. Is there a way to filter them or to simply
download the page as plain text?

No. Web pages are not plain text, they are HTML. If you download it it, it
will always come in the format that it is, being HTML.

To have it as plain text you will need to convert it.

--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Make your ASP.NET applications run faster
http://www.atozed.com/IntraWeb/

Doominato · May 18, 2004

thanks for reply,

I realize that but I should have said that it is an HTML format but it
contains plain text (btw, this is the type of the page that i'm talking
about
http://www.wunderground.com/history/station/71624/2004/5/18/DailyHistory.html?format=1).
If you look at it and it's source you will see that they are pretty much
look the same except that source contains these tags sunch as , so the
question is how do I remove these tags and convert it to plain text???

thanks

Raterus · May 18, 2004

That seems really stupid of weather underground not to actually provide a comma delimited file!but that junk. I'd be finding out if I couldn't find someone in their computer department to create real csv files (I don't know who's idea it was to do it like that)

Meanwhile, If you can get the entire document into a string, you can use a Replace(wholeDoc, " ", vbCrLf), and then output that to a real csv file.

Also you've probalby noticed that there are no line breaks separating the actual data, which makes replacing those with CRLF even more critical!

Good Luck!
--Michael

Doominato · May 18, 2004

Hello,

I got an upper-hand on this and was able to clear out all the tags, so now I
got a clean CSV file.

Thank you so much for your help.

Cor Ligthert · May 19, 2004

Hi Doominato

In addition to the others

In an HTML page you have always the property InnerText and OuterText.

The Innertext is between the tags, the Outertext including the tags.

HTML.outertext is almost forever a complete document including all tags and
whatever, however without the strange enough now more and more preceding
declaration line of a HTML page which is as far as I know unreachable using
the Document Object Model.

I hope this helps?

Cor

Herfried K. Wagner [MVP] · May 19, 2004

* "Doominato said:
I was just wondering how can I download a web page as plain text from a
certain web site. I have tried to use the OpenURL() method from INET control
in my VB.NET app, but it returns elements such as this within the plain
text. Is there a way to filter them or to simply download the page as plain
text?

Nice algorithm, implemented in VB6:

<URL:http://groups.google.com/groups?selm=ebXm3efoCHA.1976@TK2MSFTNGP10>

Cor Ligthert · May 19, 2004

Hi Herfried,

Have a time a look at mshtml, this is very amateuristique in my opinion.

http://msdn.microsoft.com/library/d...html/cerefInternetExplorerMSHTMLDHTMLAPIs.asp

Cor

Herfried K. Wagner [MVP] · May 19, 2004

* "Cor Ligthert said:
Have a time a look at mshtml, this is very amateuristique in my opinion.

I know that it's possible with MSHTML, but Olaf's algorithm is in VB6
/very/ fast and often it's good enough. I am not sure if it will work
with the "shorttag" option and stuff like that enabled.

Convert plain text string to HTML	6	Jan 3, 2007
Net.Mail.MailMessage.AlternateViews problem sending html and text	2	Aug 17, 2007
HTML messages showing up as plain text	7	Mar 5, 2009
Send Email with html body	3	Nov 6, 2008
Outlook 2007 randomly showing HTML messages as plain text	3	May 14, 2008
Plain text versus HTML email	1	Jul 13, 2009
HTML and Platn Test setting?	7	May 3, 2008
HTML Get Form	1	Aug 29, 2007

Download HTML As Plain Text

Doominato

Chad Z. Hower aka Kudzu

Doominato

Raterus

Doominato

Cor Ligthert

Herfried K. Wagner [MVP]

Cor Ligthert

Herfried K. Wagner [MVP]

Ask a Question

Similar Threads