Regular expression to remove all html tags except for p and br

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi all

Can someone help me out with a regex to remove all html tags except for <p>,</p>,<br>,<br/> from a string

Thank

Jim
 
Hi jim,

Thanks for posting in the community.

Currently I am looking for somebody who could help you on it. We will reply
here with more information as soon as possible.
If you have any more concerns on it, please feel free to post here.


Thanks!

Best regards,

Gary Chang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
 
Hello Jim,

Thanks for your post. I wrote the following pattern which will remove all
html tags except for <p>, </p>, <br> and </br>:

<[^/bp][^>]*>|<p[a-z][^>]*>|<b[^r][^>]*>|<br[a-z][^>]*>|</[^bp]+>|</p[a-z]+>
|</b[^r]+>|</br[a-z]+>

Please check it on your side and let know your result.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
With negative lookahead in .NET regular expressions, you can write this in a
much simpler form:

<(?!br|/br|p|/p>.+?>

That will match everything inside of <> except for br, /br, p, or /p, and
you can use that to replace all those tags with an empty string. This is
also more robust as you don't have to make sure you hit all the tags. I
noticed that <script> is noticeably absent from the list below, which could
possibly lead to a security exploit (somebody enters script code, and when
it gets echoed back, it executes on a user's computer).

You will want to use a case-insensitive match or you won't allow the
uppercase versions of the strings.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
Tian Min Huang said:
Hello Jim,

Thanks for your post. I wrote the following pattern which will remove all
html tags except for <p>, </p>, <br> and </br>:
|</b[^r]+>|</br[a-z]+>

Please check it on your side and let know your result.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Back
Top