HTML Emails, General Handling Of HTML strings

Tappy Tibbons · Oct 30, 2003

We are writing a dotnet app that among other things retrieves messages from
a standard POP3 mailbox, and then performs certain actions with that email
automatically.

Some of the emails are coming in as HTML, which makes the email itself 8
times as large when it comes time to store it, as well as makes it
impossible to display to the user using any sort of standard dotnet control.

Is there a way to STRIP OUT all the html tags and get plain text from it? I
know it is possible, and somewhat easy, as IE can save html as plain text,
and Outlook can easily save the Emails themselves as plain text. All I have
found so far to get this done is some kludgey search and replace routines.

Additionally, does anyone know of a textbox control that can actually
display simple html? I don't think a rich text can, or I can't seem to get
it to.

Thanks....

steve · Oct 30, 2003

there are browser tools built-in to display html...but anyway...

this strips all tags, so you may want to clean it up by replacing <br>, <p>,
</div>, etc. with vbCrLf to keep some kind of formatting...else this gives
you the string as is and doesn't remove the source vbCrLf's that may/not
effect how the text would appear in the browser.

having said that...hth,

steve

Imports System.Text
Imports System.Text.RegularExpressions
Imports System.Web
Imports System.Web.HttpUtility

Public Function stripTags(ByVal html As String) As String
Dim tagPattern As String = "<[^>]*>"
Dim regEx As New Regex(tagPattern)
If Not regEx.IsMatch(html, tagPattern) Then Return Nothing
html = regEx.Replace(html, tagPattern, " ")
html = HttpUtility.HtmlDecode(html) ' convert stuff like > ™ etc.
Return html
End Function

Herfried K. Wagner [MVP] · Oct 30, 2003

* "Tappy Tibbons said:
We are writing a dotnet app that among other things retrieves messages from
a standard POP3 mailbox, and then performs certain actions with that email
automatically.

Some of the emails are coming in as HTML, which makes the email itself 8
times as large when it comes time to store it, as well as makes it
impossible to display to the user using any sort of standard dotnet control.

Is there a way to STRIP OUT all the html tags and get plain text from it? I
know it is possible, and somewhat easy, as IE can save html as plain text,
and Outlook can easily save the Emails themselves as plain text. All I have
found so far to get this done is some kludgey search and replace routines.

Additionally, does anyone know of a textbox control that can actually
display simple html? I don't think a rich text can, or I can't seem to get
it to.

You can use the WebBrowser control to display HTML:

311303 WebOCHostVB.exe Hosts the WebBrowser Control in Visual Basic .NET
<http://support.microsoft.com/?id=311303>

--
Herfried K. Wagner
MVP · VB Classic, VB.NET
<http://www.mvps.org/dotnet>

Improve your quoting style:
<http://learn.to/quote>
<http://www.plig.net/nnq/nquote.html>

Cor · Oct 30, 2003

"Tappy Tibbons

Is there a way to STRIP OUT all the html tags and get plain text from it? I
know it is possible, and somewhat easy, as IE can save html as plain text,
and Outlook can easily save the Emails themselves as plain text. All I have
found so far to get this done is some kludgey search and replace routines.

To strip out tags I would use mshtml (you have to set a reference to it and
when you use it don't set the import, you there are so many objects in that
you'r Ide becomes terrible slow, the best to use is
Dim iDocument As mshtml.IHTMLDocument2)
You can loop for each tag in the document object model way.

For to see the HTML tag is of course the normal textbox, that is the same as
the old and still used HTML editor Notepad.

To see it in browser look the already by Herfried mentioned webbrowser.

I hope this helps a little bit?

Cor

Fergus Cooney · Oct 31, 2003

Hi Tappy,

I wanted to answer this hours ago. Thanks to Cor, I can do so now.

If you add the Microsoft Web Browser control (ShDocVw.) to your ToolBox
and drag it onto your Form, you can use the Navigate2 (sUrl) method to load a
page. Once the page is loaded (ie. <not> in the next statement) you can access
the Document property. This gives you practically the full range of things as
if you were writing JavaScript or VbScript within an Html page.

So to get a page you could have.
axWebBrowser1.Navigate2 ("http://www.google.co.uk")

The Document will be available when the AxWebBrowser1.DocumentComplete
event occurs.

In the handler for this event you can do
Dim oHtmlDoc = axWebBrowser1.Document
Dim oBody = oHtmlDoc.Body

Then this will give you the text with no Html.
TextBox1.Text = oBody.InnerText

To display your own Html in the browser you just need to stuff it into the
Body.

oBody.InnerHtml = "<div style='background-color:yellow'>" _
& "<h1>Hi Tappy!!</h1></div>"

(And it needn't be 'simple' Html, after all, this is the engine of
Internet Explorer in your app!!)

The bits I'm not sure of are the types for oHtmlDoc and oBody. That's why
they're missing from the declarations (and Option Strict is definitely Off).

Regards,
Fergus

Fergus Cooney · Oct 31, 2003

Hi again Tappy,

I forgot to add the documentation:
Start here and work your way up and down.

Programming and Reusing the Browser
WebBrowser Control
Reference for Visual Basic Developers

http://tinyurl.com/t3vf

You'll probably get most use out of the link to the Document property on
the page given, and then that page's IHtmlDocument2 link.

Have fun. ;-)

Regards,
Fergus

The Tiny Url is the equivalent of:

http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/webbrowser
/webbrowser.asp

Herfried K. Wagner [MVP] · Oct 31, 2003

* "Fergus Cooney said:
The Tiny Url is the equivalent of:

http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/webbrowser
/webbrowser.asp

You may want to install the patches for OE in order to prevent it from
wrapping urls. I remember the "mondo" patch fixed this issue.

Fergus Cooney · Oct 31, 2003

HERFRIED

GET OFF MY BACK.

I WANT NOTHING FROM YOU

NOTHING

NOTHING AT ALL

ABSOLUTELY NOTHING

CAN YOU COMPLY?

HTML Emails, General Handling Of HTML strings

Tappy Tibbons

steve

Herfried K. Wagner [MVP]

Cor

Fergus Cooney

Fergus Cooney

Herfried K. Wagner [MVP]

Fergus Cooney