With negative lookahead in .NET regular expressions, you can write this in a
much simpler form:
<(?!br|/br|p|/p>.+?>
That will match everything inside of <> except for br, /br, p, or /p, and
you can use that to replace all those tags with an empty string. This is
also more robust as you don't have to make sure you hit all the tags. I
noticed that <script> is noticeably absent from the list below, which could
possibly lead to a security exploit (somebody enters script code, and when
it gets echoed back, it executes on a user's computer).
You will want to use a case-insensitive match or you won't allow the
uppercase versions of the strings.
--
Eric Gunnerson
Visit the C# product team at
http://www.csharp.net
Eric's blog is at
http://weblogs.asp.net/ericgu/
This posting is provided "AS IS" with no warranties, and confers no rights.
Tian Min Huang said:
Hello Jim,
Thanks for your post. I wrote the following pattern which will remove all
html tags except for <p>, </p>, <br> and </br>:
|</b[^r]+>|</br[a-z]+>
Please check it on your side and let know your result.
Have a nice day!
Regards,
HuangTM
Microsoft Online Partner Support
MCSE/MCSD
Get Secure! --
www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.