Regular Expression Pattern Help

  • Thread starter Thread starter Martin Andert
  • Start date Start date
M

Martin Andert

Hello,
i want to parse some html with regex and have the following problem:


--- html to parse start ---

some text <span class="x">
some text with linebreaks
and tabs and <b>tags <i>in it</i>
goes here
</span> another text

--- html to parse end ---


Now my question: How do I have to write the pattern so I get
"some text with linebreaks and tabs and <b>tags <i>in it</i> goes here"
as a match?

TIA Martin
 
I highly recommend getting Regular Expression Workbench by Eric Gunnerson .
It helps a lot with this sort of stuff.
http://www.gotdotnet.com/Community/...mpleGuid=c712f2df-b026-4d58-8961-4ee2729d7322

The important thing is to set RegexOptions to SingleLine. Another important
note is to use .*? to match zero or more character between span tags, but
non-greedy (the ? makes it non greedy) that is- match the next </span> tag
found.

System.Text.RegularExpressions.Regex regex = new
System.Text.RegularExpressions.Regex(@"<span class="x">(?<Text>.*?)</span>",
System.Text.RegularExpressions.RegexOptions.Singleline);

Match match = regex..Match(str);

It will create a capture called Text. see ms help for Match.Captures for
what to do with the result of the above line.

ms-help://MS.VSCC.2003/MS.MSDNQTR.2003FEB.1033/cpref/html/frlrfsystemtextreg
ularexpressionsgroupclasscapturestopic.htm

Hope that gets you started!

Mike Mayer - Visual C# MVP
http://www.mag37.com/csharp/
(e-mail address removed)
 
Back
Top