Help with Regex

  • Thread starter Thread starter TumurS
  • Start date Start date
T

TumurS

Hi!
I need to parse an input string. The string must consist of 1, 2 or 3 float
numbers
separated with blanks:

Regex r = new
Regex(@"^\s*(?<x>[0-9.]+)(\s+(?<y>[0-9.]+)(\s+(?<z>[0-9.]+))*)*\s*$",
RegexOptions.ExplicitCapture);

Match m = r.Match(intput_str);

string x, y, z;

if (m.Success)
{
x = m.Groups["x"].Value;
y = m.Groups["y"].Value;
z = m.Groups["z"].Value;
if (x != "" && y != "" && z != "")
...//all three numbers
else if (x != "" && y != "")
...//only 1st and 2nd
else if (x != "")
...//only one
}

Is my regular expression pattern valid? And how to get a number of captures?
When I check m.Captures.Count, it always returns 1, not depending on how
many numbers was entered.
 
There is a problem with your regular expression. It will accept a "naked"
dot as a number, and it will accept any number of dots in a number. While a
floating point number may contain a dot with nothing after it, and with
nothing before it, it must either precede or follow a digit, and there may
only be 1 dot in it. Examples:

3.14 // good
..14 // good
3. // good
.. // bad
3.14.0 // bad

The following will work perfectly:

^\s*(?>(?:\d+\.\d+|\.\d+|\d+\.|\d+)\s*){3}\s*$

Here's the explanation, starting from the innermost part:

(?:\d+\.\d+|\.\d+|\d+\.|\d+)

A match is one of 3 possible combinations:
a. 1 or more digits followed by a dot, followed by one or more digits
b. 1 or more digits alone
c. A dot followed by 1 or more digits

The sequence of the options in the "or" portion is critical, to avoid having
a dot match more than once. The nature of regular expressions is sequential
and consuming. That is, each option will be checked, and if a match is
found, the following options will not be evaluated, and the contents of the
match are "consumed," meaning that they will not be re-evaluated (except in
the case of back-tracking, which I'm coming to). Therefore, any floating
point beginning with 1 or more digits will match the entire number. However,
if there is no dot, the entire number will match next. However, if the
number begins with a dot, the entire number will match. So, the first option
rules out the second, and the second rules out the third, ensuring only 1
dot in a match. There is no quantifier for this, so only one combination
will form a match.

This is followed by \s* indicating that 0 or more spaces may follow. The \s*
is outside the grouping, due to the "or"-ing of the combinations preceding
it.

Now for the advanced part:

(?>(?:\d+\.\d+|\.\d+|\d+\.|\d+)\s*){3}

The (?>) indicates that enclosed group is "atomic," which prevents
back-tracking. Once the match is found, the regular expression engine is
prevented from back-tracking into the group to find another match. The {3}
indicates that the group must be repeated exactly 3 times. And of course you
recognize the start and end of the expression, which indicates that it must
begin and end at the beginning and end of the text, and may have spaces
before or after.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

TumurS said:
Hi!
I need to parse an input string. The string must consist of 1, 2 or 3
float numbers
separated with blanks:

Regex r = new
Regex(@"^\s*(?<x>[0-9.]+)(\s+(?<y>[0-9.]+)(\s+(?<z>[0-9.]+))*)*\s*$",
RegexOptions.ExplicitCapture);

Match m = r.Match(intput_str);

string x, y, z;

if (m.Success)
{
x = m.Groups["x"].Value;
y = m.Groups["y"].Value;
z = m.Groups["z"].Value;
if (x != "" && y != "" && z != "")
...//all three numbers
else if (x != "" && y != "")
...//only 1st and 2nd
else if (x != "")
...//only one
}

Is my regular expression pattern valid? And how to get a number of
captures?
When I check m.Captures.Count, it always returns 1, not depending on how
many numbers was entered.
 
Back
Top