Regular expression question

K

Kevin

Hi,

Is anyone in this group good at regular expression? I have a seemingly
simple problem, but cannot figure it out myself. I appreciate any help!

What I want to check is "dog" appear exactly twice in the searchString. If I
enter "^(.*dog.*){2}$", the expression matches even when "dog" appear 3
times. I want to check if "dog" appear exactly 2 times.

Regex regEx = new Regex("???");
string searchString = "my dog is my dog";

Thanks!
 
M

mdb

What I want to check is "dog" appear exactly twice in the
searchString. If I enter "^(.*dog.*){2}$", the expression matches even
when "dog" appear 3 times. I want to check if "dog" appear exactly 2
times.


Hmmm by nature I don't think that regular expressions will do that in one
single statement. Easiest way to do that is to just match against "dog"
and count the number of matches...

if (Regex.Matches(input, "dog").Count == 2) { ... }
 
G

Guest

Kevin said:
Hi,

Is anyone in this group good at regular expression? I have a seemingly
simple problem, but cannot figure it out myself. I appreciate any help!

What I want to check is "dog" appear exactly twice in the searchString. If I
enter "^(.*dog.*){2}$", the expression matches even when "dog" appear 3
times. I want to check if "dog" appear exactly 2 times.

Regex regEx = new Regex("???");
string searchString = "my dog is my dog";

I think your problem is the ".*" in front and after the "dog" pattern. This
allows the pattern to match something like "foo dog bar dog bletch dog" by
throwing the third "dog" into the "match anything" section before or after
one of the legitimate matches.

You probably need to limit this to being within a word, that is, instead of
".*" you probably want something like "\w*" -- match zero or more word
characters in front of and behind the occurrence of "dog".

Note that this will count input like "dogeatdog" as only one match, not as
two.

-- Tom
 
K

Kevin

Thanks, but is it really not possible? This seems to be a common thing
people want to check...
 
K

Kevin

Thanks, but the result is not good. The regex result showed failed match
even when dog appears 2 times and the searchString includes dog 2 times.
 
G

Guest

Kevin said:
Thanks, but is it really not possible? This seems to be a common thing
people want to check...

The {2} regex syntax demands two and exactly two *repeats* of the matched
pattern. Remember that even the * pattern to match zero or more is just
shorthand for {0,}.

Besides, what do you want to do with the successful match -- do you need the
entire captured substring that contains the two "dog" matches or do you just
need to know the match was successful. If the latter case then mdb's
approach is very clean and simple.

-- Tom
 
G

Greg Bacon

: Is anyone in this group good at regular expression? I have a
: seemingly simple problem, but cannot figure it out myself. I
: appreciate any help!
:
: What I want to check is "dog" appear exactly twice in the
: searchString. If I enter "^(.*dog.*){2}$", the expression matches
: even when "dog" appear 3 times. I want to check if "dog" appear
: exactly 2 times.

Doable but perhaps nonobvious:

static void Main(string[] args)
{
string[] dogs = new string[] {
"",
"no canines",
"abc dog",
"dog abc dog",
"dog dog dog",
"xyzzy dog foo dog bar dog w00t doggone",
};

Regex twodogs = new Regex(
String.Format("^{0}dog{0}dog{0}$", "([^d]|d[^o]|do[^g])*"));

foreach (string input in dogs)
Console.WriteLine(input + ": " + twodogs.IsMatch(input));
}

The fencepost -- ([^d]|d[^o]|do[^g])* -- matches strings, possibly
empty, that don't have "dog".

The ^ and $ anchors are important. Without them, you'll see matches for
all strings with at least two dogs.

Hope this helps,
Greg
 
G

Guest

Kevin said:
Thanks, but the result is not good. The regex result showed failed match
even when dog appears 2 times and the searchString includes dog 2 times.

Sorry, I was replying off the top of my head (and still am)... let's narrow
this down a bit...

Do you want to find "dog" only as a standalone word or even if it occurs
within a word? If the latter, do you want to find (and count) each
occurrence within a word or count it only once for the entire word no matter
how many times "dog" occurs within it?

"dog" obviously matches the three letter sequence no matter where it occurs.

"(?<=\b)dog(?=\b)" matches any standalone word, "dog". This might fail,
though, for "dog" at the beginning or end of the input.

"(?<=^|\b)dog(?=$|\b)" should match the standalone "dog" anywhere, even at
the very beginning or end of the line.

"(?<=^|\b).*dog.*(?=$|\b)" should match any occurence of "dog" even when
embedded within a larger word.

Perhaps, then, "((?<=^|\b).*dog.*(?=$|\b)){2}" will be closer to what you
need? Again, this is all off the top of my head so give it try and see what
happens.

-- Tom
 
M

mdb

Thanks, but is it really not possible? This seems to be a common thing
people want to check...

Well the problem is that regular expressions are meant to search
substrings, not strings as whole entities. (Certainly there are start-of-
string and end-of-string qualifiers, but since you didn't indicate that any
of the 'dog's would be at the beginning or end of the string, they aren't
really applicable.) They are, in a sense, "mathematical" representations
of string searches. And mathematically, if you are searching for 2 dogs
and you find 3 dogs, then mathematically, you've still found two dogs (2
dogs is a subset of 3 dogs).

This is all just to say that no, I don't think it is possible with ONE
regex statement (I'm certainly not 100% sure about this)... But unless
you're just interested in finding out if it is possible for your own
gratification, there's no reason not to simply ask the Regex system how
many times 'dog' matches.

Of course, there is usually more than one way to do it... here's a way that
doesn't use Matches.Count, but still not in one single statement...

if (
Regex.IsMatch(input, "dog.*dog")
&& !Regex.IsMatch(input, "dog.*dog.*dog")
)
{
// 'dog' matches two times, but not three.
...
}
 
K

Kevin

I understand what you mean. I prefer a expression that will work in a
language-independent way. It will certainly work in C#.
 
G

Greg Bacon

: I tried it and it worked! As you mentioned, this is not obvious at
: all. Thank you, Greg.

Glad to help.

Greg
 
L

Ludovic SOEUR

An easier way using backtracking :
^.*(dog.*){2}(?<!(dog.*){3})$
And this other example with only one dog in the regex
^.*((dog).*){2}(?<!(\2.*){3})$

static void Main() {
string[] dogs = new string[] {
"",
"no canines",
"abc dog",
"dog abc dog",
"dog dog dog",
"xyzzy dog foo dog bar dog w00t doggone"
};

//Regex twodogs = new Regex(@"^.*(dog.*){2}(?<!(dog.*){3})$");
Regex twodogs = new Regex(@"^.*((dog).*){2}(?<!(\2.*){3})$");

foreach (string input in dogs)
MessageBox.Show(input + ": " + twodogs.IsMatch(input));
}

If you want to do the same with long words (for example elephant), you only
have to replace the regex as following :
^.*(elephant.*){2}(?<!(elephant.*){3})$
or for the second regex :
^.*((elephant).*){2}(?<!(\2.*){3})$

If you want 5 elephants exactly, you only have to replace the regex as
following :
^.*(elephant.*){5}(?<!(elephant.*){6})$
or for the second regex :
^.*((elephant).*){5}(?<!(\2.*){6})$

Hope it helps,

Ludovic SOEUR.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top