Regular Expression problem

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

The crux of my problem is that I want a regular expression that will match a
sequence of numbers that have been "ANDed" or "ORed" together in text:
e.g.
1
1 OR 2
1 OR 2 OR 3...etc.
1 AND 2
1 AND 2 AND 3...etc.
(But not 1 OR 2 AND 3, which mixes ANDs and ORs)

I tried the regular expression:
(?:(?:[0-9]+(?: AND [0-9]+)*)|(?:[0-9]+(?: OR [0-9]+)*))

Given the string "1 OR 2" my expression will result in two matches (from a
call to Regex.Matches()), one matching "1" and another matching "2". I want
to match the whole string "1 OR 2".

Interestingly, given the string "1 AND 2", it does have the desired
behaviour (i.e. it matches the whole string as one match).

Am I doing something silly or can I not do what I want?

Any help appreciated

Mark
 
Hi Zoodor,

This one was a bit tricky. Here's the solution:

(?m)^\d+(?(?=.)(?:\s+(AND|OR)\s*))(?:\d+(?(?=.)(?:\s+\1\s*)))*$

It breaks down into 2 sections:

(?m)^\d+(?(?=.)(?:\s+(AND|OR)\s*))

First, '^' and '$' match at beginning and end of both strings and line
breaks. Second, it must begin at the beginning of a string, or a line. Match
any sequence of digits. If it is followed by any characters other than line
breaks, it must be followed by at least one space, plus one of the sequences
"AND" or "OR," followed by zero or more spaces. The result of the match
(AND|OR) is stored in Capturing Group 1.

The second section may be matched 0 or more times:

(?:\d+(?(?=.)(?:\s+\1\s*)))*$

Match any digit. If followed by anything other than a line break, it must be
followed by at least one space, plus the sequence captured in Capturing
Group 1, followed by zero or more spaces.

The result is that whichever of the "AND" or "OR" is captured is the
required match for any subsequent matching character sequences. It is only
optional for the first, as the Capturing Group is used in the second, which
is optional and may be repeated any number of times. However, the last time
it is repeated must be at the end of a line. This ensures that any line
having AND and OR in it is discarded altogether.

I tested this against the following:

1 * success
1 OR 2 * success
1 OR 2 OR 3 * success
2 AND 5 AND 6 AND 7 * success
1 AND 5 OR 6 * fail

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net
 
Back
Top