Regular Expression - Match All Except

  • Thread starter Thread starter Derek Stone
  • Start date Start date
D

Derek Stone

In my continuing inability to completely understand regular expressions I
have a new one for you.

I'd like to capture a string "A" unless it is anywhere in between string "B"
and string "C".

Therefore some matches are:

XYZAHIJ
ABC
BCA

Non matches include:

BAC
BXYZAHIJC
BAXYZC

Also, since I'm trying to replace "A" and only "A" the expression needs to
exclude the rest of the match. I assume that means using non-capturing
groups, but again, they and I don't mix.

Thank you,
Derek Stone
EliteVB.com
 
(^[^BC]+A[^BC]+$)|(^A.*)|(.*A$)

This says:

match one of three patterns - they are each enclosed by () and separated
by | in the above
the first pattern (^[^BC]+A[^BC]+$) says:
(^ = insist on beginning of line at the start of the match
$) = insist on end-of-line for the end of the match
[^BC] = matches anything except B or C.
[^BC]+ = match a string of chars that is anything except B or C
and of course the A in the middle is the target you are matching
the 2nd pattern (^A.*) says:
match any line that begins with A: (^A
the 3rd pattern is similar, it says
match any line that ends in A : A$)


Suggestion:
Get a visual regex designer, it's a simple tool that makes things much
easier.
One I use and can recommend is Regex, freebie download from organicbit.com:
http://www.organicbit.com/regex/fog0000000019.html
 
Try this:

Regex regex = new Regex(@"
^ # beginning of string
(?!B.*A.*C) # don't match B, followed by A, followed by C
.*A.* # match A anywhere
$ # end of string",
RegexOptions.ExplicitCapture |
RegexOptions.Compiled |
RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);

It uses a zero-width lookahead that says "if you match B, followed by A,
followed by C", the whole match should fail.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
 
Thank you gentlemen. That was most helpful, although I do have a follow-up.

Eric: I was using an expression similar to yours however it doesn't span
across multiple lines.

Example:

text
B
text
A
text C

The "A" should -not- match, however it does with both my original expression
and yours (due to the line breaks). Granted I didn't provide an example in
my question to cover such a case, so I have no one except myself to blame
for that. How do I account for this? I've tried using both a
RegexOptions.Multiline and RegexOptions.Singleline configuration, with no
luck.

In addition I do use a regular expressions utility, Expresso.

Regards,
Derek Stone
EliteVB.com
 
Back
Top