Regex question

Du Dang · Apr 2, 2004

Text:
=====================
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>

=====================

Regex:
<script>[\s\S]+</script>

I use "[\s\S]" intead of "." because there is newline char within the text.

The regex above will give me the match from <script1> to </script2>
instead of two separated matches.

How do I extract <script1> ... </script1> and <script2> ... </script2> as a
separted matches?

Thanks,

Du

Chris R. Timmons · Apr 2, 2004

Text:
=====================
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>

=====================

Regex:
<script>[\s\S]+</script>

I use "[\s\S]" intead of "." because there is newline char
within the text.

The regex above will give me the match from <script1> to
</script2> instead of two separated matches.

How do I extract <script1> ... </script1> and <script2> ...
</script2> as a separted matches?

Du,

You can use the "." character to match a newline if you use the
RegexOptions.Singleline option.

Try this:

string inputText = @"
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>";

string regex = @"<script\d>(?<contents>.*?)</script\d>";

MatchCollection mc = Regex.Matches(inputText, regex,
RegexOptions.Singleline |
RegexOptions.IgnoreCase |
RegexOptions.IgnorePatternWhitespace);

foreach (Match m in mc)
Console.WriteLine(m.Groups["contents"].ToString());

Hope this helps.

Chris.

Brian Davis · Apr 2, 2004

In addition, you can use a named backreference to make sure that you don't
match anything like "<script1>....</script2>":

<script(?<num>\d+)>(?<contents>.*?)</script\k<num>>

Brian Davis
http://www.knowdotnet.com

Chris R. Timmons said:
Text:
=====================
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>

=====================

Regex:
<script>[\s\S]+</script>

I use "[\s\S]" intead of "." because there is newline char
within the text.

The regex above will give me the match from <script1> to
</script2> instead of two separated matches.

How do I extract <script1> ... </script1> and <script2> ...
</script2> as a separted matches?

Click to expand...

Du,

You can use the "." character to match a newline if you use the
RegexOptions.Singleline option.

Try this:

string inputText = @"
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>";

string regex = @"<script\d>(?<contents>.*?)</script\d>";

MatchCollection mc = Regex.Matches(inputText, regex,
RegexOptions.Singleline |
RegexOptions.IgnoreCase |
RegexOptions.IgnorePatternWhitespace);

foreach (Match m in mc)
Console.WriteLine(m.Groups["contents"].ToString());

Hope this helps.

Chris.

Du Dang · Apr 3, 2004

Thanks Chris, it works like a charm.

//(?<contents>.*?)
one thing I don't understand .. why the second question mark is there?
my understanding of naming a regex is (?<name_here>expression_here)

I tried to removed the second question mark and the expression stop working

Thanks again for your help,

Du

Chris R. Timmons said:
Text:
=====================
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>

=====================

Regex:
<script>[\s\S]+</script>

I use "[\s\S]" intead of "." because there is newline char
within the text.

The regex above will give me the match from <script1> to
</script2> instead of two separated matches.

How do I extract <script1> ... </script1> and <script2> ...
</script2> as a separted matches?

Click to expand...

Du,

You can use the "." character to match a newline if you use the
RegexOptions.Singleline option.

Try this:

string inputText = @"
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>";

string regex = @"<script\d>(?<contents>.*?)</script\d>";

MatchCollection mc = Regex.Matches(inputText, regex,
RegexOptions.Singleline |
RegexOptions.IgnoreCase |
RegexOptions.IgnorePatternWhitespace);

foreach (Match m in mc)
Console.WriteLine(m.Groups["contents"].ToString());

Hope this helps.

Chris.

Du Dang · Apr 3, 2004

Hi Brian, thanks for helping out!!!

regard,

Du

Brian Davis said:
In addition, you can use a named backreference to make sure that you don't
match anything like "<script1>....</script2>":

<script(?<num>\d+)>(?<contents>.*?)</script\k<num>>

Brian Davis
http://www.knowdotnet.com

Chris R. Timmons said:

Text:
=====================
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>

=====================

Regex:
<script>[\s\S]+</script>

I use "[\s\S]" intead of "." because there is newline char
within the text.

The regex above will give me the match from <script1> to
</script2> instead of two separated matches.

How do I extract <script1> ... </script1> and <script2> ...
</script2> as a separted matches?

Click to expand...

Du,

You can use the "." character to match a newline if you use the
RegexOptions.Singleline option.

Try this:

string inputText = @"
<script1>
***stuff A
</script1>

***more stuff

<script2>
***stuff B
</script2>";

string regex = @"<script\d>(?<contents>.*?)</script\d>";

MatchCollection mc = Regex.Matches(inputText, regex,
RegexOptions.Singleline |
RegexOptions.IgnoreCase |
RegexOptions.IgnorePatternWhitespace);

foreach (Match m in mc)
Console.WriteLine(m.Groups["contents"].ToString());

Hope this helps.

Chris.

Click to expand...

Chris R. Timmons · Apr 3, 2004

Thanks Chris, it works like a charm.

//(?<contents>.*?)
one thing I don't understand .. why the second question mark is
there? my understanding of naming a regex is
(?<name_here>expression_here)

I tried to removed the second question mark and the expression
stop working

Quantifiers like + and * are "greedy". They will match as many
characters as they can. The question mark makes the quantifiers non-
greedy, so they match the minimum number of characters required for a
successful match.

A utility like Expresso
(http://www12.brinkster.com/ultrapico/Expresso.htm) can help in
understanding how greedy and non-greedy quantifiers behave.

Hope this helps.

Chris.

Du Dang · Apr 3, 2004

I think that clear thing quite a bit.

Thank you so much for your help!!!

regard,

Du

Rookie thoughts on Regex--useful but not complete	28	Nov 17, 2008
non-backtracking subexpression	1	Jan 2, 2010
Query String or Connection String with Regex	3	Jan 10, 2007
Can't put a comma in a regex pattern?	4	Mar 6, 2007
Help:About Regex !!!	11	Oct 14, 2004
c# regex word boundaries	6	Nov 13, 2006
Regex: replacing \n and spaces	4	Jan 5, 2007
optimizing file i/o	2	Apr 17, 2005

Regex question

Du Dang

Chris R. Timmons

Brian Davis

Du Dang

Du Dang

Chris R. Timmons

Du Dang

Ask a Question

Similar Threads