How to parse for a substring using regular expressions??

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi,

does anybody know I can extract a substring of a text with regular expressions. Let’s consider the following text: “Regular expressions are often used to make sure that a string matches a certain pattern.â€.

I e.g. want to extract everything between “expressions†and “a†which is in the above example “are often used to make sure thatâ€.

The unfortunately best I could do was the following piece of code:

Dim myRegularExpression As New System.Text.RegularExpressions.Regex("expressions[\s\S]*\sa\s")

myMatch = myRegularExpression.Match("Regular expressions are often used to make sure that a string matches a certain pattern.")

Console.WriteLine(myMatch.Value)

which lead to the following result:
"expressions are often used to make sure that a string matches a "

Unfortunately my RegEx takes the last “a†in the text string but I want it to recognize the first appearance of “aâ€.

Does anybody have a clever idea?

Thank you!
Daniel Walzenbach
 
Yep, Regex algorithms will match the longest matching substring in general.

However, you can alter the behavior with the ? metacharacter. (as in ".*?" )

For example, to match 'expressions' followed by 'a' try this...

/* untested code */
//////////////////////
Matches matches = Regex.Matches(str, @"expressions.*?a");
foreach( Match m in matches )
{
Console.WriteLine(m.Value);
}

/////////////////////

...... you should find that the output is the series of shorter substrings.

Hope this helps
 
Back
Top