Problems with Regular Expression

  • Thread starter Thread starter Anders Borum
  • Start date Start date
A

Anders Borum

Hello!

I'm starting to get into RegEx, but am troubled by creating a RegEx.
Basically, I would like to capture any content between (and including) the
first <div> and last </div> tag in a string. I've got the C# code up and
running but could need some help to get me started ..

string regEx = "^?(<div)";

Meaning:
^ (anchor to start of string)
? (zero or one time)
Capture
<div
End Capture

Or am I completely wrong here? I am trying to get into RegEx and am reading
up on the subject. It's a completely new way of thinking your logic, so
please help me out here :-)

<html>
...

(START CAPTURE HERE)
<div>
<div>
</div>
</div>
(END CAPTURE HERE)
</html>

Thanks in advance!
 
Hi,

Try:

Match m = Regex.Match (textBox1.Text, "^.*?(?'mygroup'<DIV>.*</DIV>).*?$",
RegexOptions.IgnoreCase);

if (m.Success)
Console.Write (m.Groups["mygroup"].Captures[0].ToString());


Some explenation:
^,$ begin and end of sentence, in this case, you need them, because it
must apply to the whole sentence
..*? every character 0 or more, but not greedy, it tries to match as
little as possible
..* every character 0 or more, greedy
() capture group
(?'xxx') named capture group
(?:) group, but do not capture (not used here)


HTH
Greetings
 
Just wanted you to know, that I've been working intensively with RegEx in
the past month now, and really see this as an great alternative to many
situations where StringOps would come short.

RegEx are great - if that's not an understatement. Really great - and quite
easy too, once you get the hang of the syntax! Output groups rule! :-)
 
Back
Top