A question about a failing regular expression

  • Thread starter Thread starter Anthony P.
  • Start date Start date
A

Anthony P.

Hello Everyone,

My application needs to parse some HTML. As is usual in HTML parsing,
I just need the data between two HTML tags. So here is my regular
expression:

Dim myRegex2 = New Regex("<td headers=""re2 e1"" align=""right""
valign=""bottom"">" & _
"((.|\n)*?)<sup>",
RegexOptions.IgnoreCase)

Now, this is suppose to get the text between the <td headers tag> and
the <sup> tag. But, instead, it returns the entire tag including all
of the attributes. What am I doing wrong?

Thanks!
 
Sorry, I forgot to add that I am also doing the required myMatch =
myRegex2.Match(sContent) after the expression thereby performing the
match against the string sContent.
 
The expression is returning what you have asked for. Maybe not what you are
interested in, but what you have asked for.

You need to look at what the author of my favorite reference (Balena) calls
"zero width positive/negative look-ahead/behind assertions". These are
"grouping constructs". Maybe you could use a "noncapturing group" - I don't
think I've used that construct.

(I'd like to be more specific but I am at the wrong computer at the moment.)

ALSO ... do yourself a favor and get a FREE product named Expresso from
Ultrapico. It is WONDERFUL for developing regular expressions.

Regular expressions are very useful but not very intuitive. Ask if you have
further questions.

Good Luck, Bob
 
Anthony P. wrote:
My application needs to parse some HTML. As is usual in HTML parsing,
I just need the data between two HTML tags. So here is my regular
expression:

Dim myRegex2 = New Regex("<td headers=""re2 e1"" align=""right""
valign=""bottom"">" & _
                                          "((.|\n)*?)<sup>",
RegexOptions.IgnoreCase)

Now, this is suppose to get the text between the <td headers tag> and
the <sup> tag. But, instead, it returns the entire tag including all
of the attributes.  What am I doing wrong?
<snip>

You probably figured it out at this point, but it seems you need to
retrieve the grouped text from the Match's Groups property (the groups
collection is 0 based, but the 0th item is the full matched text, thus
you need to retrieve group(1):

<example>
Dim M As Match = MyRegex2.Match(sContent)
Do While M.Success
'////
Dim Text As String = M.Groups(1).Value
'////
'...
'Do something with Text
'...
M = M.NextMatch
Loop
</example>

HTH

Regards,

Branco
 
You probably figured it out at this point, but it seems you need to
retrieve the grouped text from the Match's Groups property (the groups
collection is 0 based, but the 0th item is the full matched text, thus
you need to retrieve group(1):
<snip?

Hi Branco,

No, I hadn't figured it out yet and I thank you for your help. I saw
something about the match's groups the other day but it didn't click
that was what I needed thank you sir!

Anthony
 
Back
Top