O
Owen Wong
Please look at my newly written class. It is meant to be used to filter
suspicious html input from an online html editor. I need help about 2
things:
1. Does it need to filter more things? Which I think is of course
needed although I don't know where to improve.
2. You see I try to filter any link. If the target address is not
started with "http://" or "mailto:", it will be replaced with an empty
string. But I think the code I wrote can be rewritten to make it more
performant. But how?
=========================
Public Class strOp
Public Function filterHtml(ByVal s As String)
s = Regex.Replace(s,
"<script>|</script>|<iframe.*?><!--#include.*?>", "",
RegexOptions.IgnoreCase)
s = Regex.Replace(s, "<.*? (?
nload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
Dim re As New Regex("<a .*?href\s*=\s*[""]?([^""
Dim s1, s2 As String
Dim ms As MatchCollection
ms = re.Matches(s)
For Each m In ms
s1 = m.Value.ToLower.ToString
s2 = re.Replace(s1, "$1")
If Not (s2.StartsWith("mailto:") Or
s2.StartsWith("http://")) Then
s = s.Replace(s1, "<a href=''>")
End If
Next
Return s
End Function
End Class
suspicious html input from an online html editor. I need help about 2
things:
1. Does it need to filter more things? Which I think is of course
needed although I don't know where to improve.
2. You see I try to filter any link. If the target address is not
started with "http://" or "mailto:", it will be replaced with an empty
string. But I think the code I wrote can be rewritten to make it more
performant. But how?
=========================
Public Class strOp
Public Function filterHtml(ByVal s As String)
s = Regex.Replace(s,
"<script>|</script>|<iframe.*?><!--#include.*?>", "",
RegexOptions.IgnoreCase)
s = Regex.Replace(s, "<.*? (?

]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
Dim re As New Regex("<a .*?href\s*=\s*[""]?([^""
Dim m As Match]*)[""]?.*?>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim s1, s2 As String
Dim ms As MatchCollection
ms = re.Matches(s)
For Each m In ms
s1 = m.Value.ToLower.ToString
s2 = re.Replace(s1, "$1")
If Not (s2.StartsWith("mailto:") Or
s2.StartsWith("http://")) Then
s = s.Replace(s1, "<a href=''>")
End If
Next
Return s
End Function
End Class