Regex help on a nested table.

  • Thread starter Thread starter Matt T.
  • Start date Start date
M

Matt T.

I am trying to replace the nested table tag in the follow string[1]
using a regular expression, but I am not having any success. I am new
at using regular expressions, so I am sure I am just overlooking
something simple.

I thought <table.*>.*<table.*>.*</table> or some variation of that
would work to create a match, but it does not. Anyone have an idea as
to what would work?

[1]
<table *deleted attributes*>
<tr>
<td>
<table *deleted attributes*>
<tr>
<td><a>Nov</a></td>
<td>December 2003</td>
<td><a>Jan</a></td>
</tr>
</table>
</table>
 
You need to use a backward-looking operator (there's a
much better jargon word for this which escapes me at the
moment).

*? will match as few characters as possible, so you
should be able to do this:

<table[^>]*>.*?<table[^>]*>.*?</table>

I have a feeling that this operator can cause performance
lag, but it should work fine in your situation.

I haven't tested the code, so give it a try -- did that
work?

JER
 
When you need the '.' to span multiple lines, use singleline mode. Try some
variation of this expression:

(?s)(?<=<table[^>]*>.*?)<table[^>]*>.*?</table>

The (?s) turns on the singleline option, which you could also do in code
like this:

Regex r = new
Regex(@"(?<=<table[^>]*>.*?)<table[^>]*>.*?</table>",RegexOptions.Singleline
);

The (?<=...) construct is a zero-width positive look-behind assertion,
which means "match this before what comes next, but don't include it in the
resulting match". Like the previous poster said, you also need to use lazy
quantifiers - *? - and negated character classes - [^>]. Lots of
big-sounding words and regex jargon here, but once you get a handle on it,
it offers a lot of power.


Brian Davis
www.knowdotnet.com
 
Back
Top