Regular Expression Help request

  • Thread starter Thread starter Burak
  • Start date Start date
B

Burak

Hi,

I am trying to parse html tags.

For tags with quotes like

<input type="submit" value="order bed">

I am using

\s*=\s*\"*\'*[^"'>]*

and for tags without any quotes

<td align=right SIZE=5 >

I am using

\s*=\s*[^\s]*


Is there a way to combine the two expressions? When I tried to combine
them like follows,

\s*=\s*\"*\'*[^"'s>]*

I did not get good results

Thank you,

Burak
 
Your match could have one of 3 forms:

1) Double quotes around value
2) Single quotes around value
3) No quotes around value

The three expressions (separate) would be:

=\s*"[^">]*"
=\s*'[^'>]*'
=\s*[^\s>]*

You can combine them into one using alternation:

=\s*("[^">]*"|'[^'>]*'|[^\s>]*)


I don't know exactly how you are using this expression, but if you want to
capture the attribute's value without the "=" or quotes, you could use a
named group like this:

=\s*("(?<value>[^">]*)"|'(?<value>[^'>]*)'|(?<value>[^\s>]*))

This will allow you to access the value through the Groups collection of the
Match.


Hope this helps,

Brian Davis
http://www.knowdotnet.com
 
Back
Top