Regex question

  • Thread starter Thread starter remy rakic
  • Start date Start date
R

remy rakic

Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutely not</a>" obviously only
gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results "<really>Really
not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a><really>Really
not<cool/><at>all</at></a></tag2>...<tag3><a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure whether
the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance
 
Aaah the non greedy option, now i know what it is used for. Thx ron, it
works like a charm !

Ron Bullman said:
remy,

How bout <a>(?<1>.+?)</a>


Ron
remy rakic said:
Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutely not</a>" obviously only
gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results
not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a><really>Really
not<cool/><at>all</at></a></tag2>...<tag3><a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure whether
the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance
 
Back
Top