Stripping HTML

  • Thread starter Thread starter David Sawyer
  • Start date Start date
D

David Sawyer

I am trying to read in an HTML file and strip out the HTML
code so that all I have left is the text of the body.

Does anyone have any suggestions for doing this?
Any HTML stripping routines or objects that perform the
function?
 
Public Function StripHTML(ByVal HTML As String) As String
Dim strContent As String, mString As String
Dim mStartPos As Long, mEndPos As Long
Dim i, j

strContent = HTML.Replace("</P>", vbCrLf)
strContent = strContent.Replace("</p>", vbCrLf)

mStartPos = InStr(strContent, "<")
mEndPos = InStr(strContent, ">")
Do While mStartPos <> 0 And mEndPos <> 0 And mEndPos > mStartPos
mString = Mid(strContent, mStartPos, mEndPos - mStartPos + 1)
strContent = Replace(strContent, mString, "")
mStartPos = InStr(strContent, "<")
mEndPos = InStr(strContent, ">")
Loop
strContent = Replace(strContent, "&nbsp;", " ")
strContent = Replace(strContent, "&amp;", "&")
strContent = Replace(strContent, "&quot;", "'")
strContent = Replace(strContent, "&#", "#")
strContent = Replace(strContent, "&lt;", "<")
strContent = Replace(strContent, "&gt;", ">")
strContent = Replace(strContent, "%20", " ")
strContent = LTrim(Trim(strContent))
Do While Left(strContent, 1) = Chr(13) Or Left(strContent, 1) =
Chr(10)
strContent = Mid(strContent, 2)
Loop
Return strContent.Replace(vbCrLf, "<br>")
End Function
 
Back
Top