Regex Matching Q

  • Thread starter Thread starter George Durzi
  • Start date Start date
G

George Durzi

Consider the following HTML snippet. I want to extract the section shown
below.

<!-- some html -->
<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0><?xml version=1.0
encoding=UTF-16?>
** There is some HTML here that I want to extract **
</TABLE>
<!-- some html -->

I did this:

Regex regex = new Regex(@<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0
border=0><?xml version=1.0 encoding=UTF-16?>" + "((.|\n)*?)" + "</TABLE>",
RegexOptions.IgnoreCase|RegexOptions.Multiline|RegexOptions.IgnorePatternWhi
tespace|RegexOptions.Compiled);

No worky ...

Any idea what I'm doing wrong?
Am I building the regular expression correctly ?
 
George said:
Consider the following HTML snippet. I want to extract the section shown
below.

<!-- some html -->
<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0><?xml version=1.0
encoding=UTF-16?>
** There is some HTML here that I want to extract **
</TABLE>
<!-- some html -->

I did this:

Regex regex = new Regex(@<TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0
border=0><?xml version=1.0 encoding=UTF-16?>" + "((.|\n)*?)" + "</TABLE>",
RegexOptions.IgnoreCase|RegexOptions.Multiline|RegexOptions.IgnorePatternWhi
tespace|RegexOptions.Compiled);

No worky ...

Any idea what I'm doing wrong?
Am I building the regular expression correctly ?

Don't know if this is all that's wrong, but right off the bat you'll
need to escape the "?" characters that you're trying to match in the
"<?xml ... ?>" tag.

You should do the same for the "." character, even though it's probably
not the problem since it'll match the '.' in the target anyway.

So that part of the regex string might need to look like:

<\?xml version=1\.0 encoding=UTF-16\?>
 
Back
Top