Regex Help

  • Thread starter Thread starter Jules
  • Start date Start date
J

Jules

All,

how do I describe a string consisting of any number of characters, with
an optional (but unique when occuring) end-of-line expression '/'? With
groups if possible, as in "(?<sentence>.*(?<eos>[^\\][\\]{1}$)?)"

Say I have the phrase 'My Taylor is /' as input; I'd like 'sentence'
and 'eos' to match.

Now if I have 'My Taylor is //' or 'My Taylor is / ', 'sentence' should
match, but not 'eos'. In other words, 'eos' should match only when it
occurs once, and last, in the string.

I have tried nesting eos within sentence, juxtaposing them, removing
quantifiers, removing the end-of-string assertion... nothing seems to
work satisfactorily.

Thanks for any help,
Jules
 
Sorry, misspelling: please read "(?<sentence>.*(?<eos>[^\\][\\]{1}$)?)"
as "(?<sentence>.*(?<eos>[^/][/]{1}$)?)"

J.
 
Jules said:
how do I describe a string consisting of any number of characters, with
an optional (but unique when occuring) end-of-line expression '/'? With
groups if possible, as in "(?<sentence>.*(?<eos>[^\\][\\]{1}$)?)"

Say I have the phrase 'My Taylor is /' as input; I'd like 'sentence'
and 'eos' to match.

Now if I have 'My Taylor is //' or 'My Taylor is / ', 'sentence' should
match, but not 'eos'. In other words, 'eos' should match only when it
occurs once, and last, in the string.

(?<sentence> .* [^/] ) (?<eos> / ) $

RegexOptions.IgnorePattermWhitespace

The sentence group may not end with a "/", and the eos group must be
the last character.
 
Hi Jules,

Check out how you can use "?" as a lazy modifier to existing quantifiers:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconquantifiers.asp

And negative lookbehind assertion:

http://msdn.microsoft.com/library/d...-us/cpgenref/html/cpcongroupingconstructs.asp

Try this with the IgnorePatternWhitespace and Singleline options:

string pattern =
@"(?<sentence>.*? ) # lazily match sentence so that '.' does not capture eos
(?<eos> (?<! /) # negative lookbehind for /
/)?$ # match single '/' at end of string or just end of string";

--
Dave Sexton

Jules said:
Sorry, misspelling: please read "(?<sentence>.*(?<eos>[^\\][\\]{1}$)?)"
as "(?<sentence>.*(?<eos>[^/][/]{1}$)?)"

J.
All,

how do I describe a string consisting of any number of characters, with
an optional (but unique when occuring) end-of-line expression '/'? With
groups if possible, as in "(?<sentence>.*(?<eos>[^\\][\\]{1}$)?)"

Say I have the phrase 'My Taylor is /' as input; I'd like 'sentence'
and 'eos' to match.

Now if I have 'My Taylor is //' or 'My Taylor is / ', 'sentence' should
match, but not 'eos'. In other words, 'eos' should match only when it
occurs once, and last, in the string.

I have tried nesting eos within sentence, juxtaposing them, removing
quantifiers, removing the end-of-string assertion... nothing seems to
work satisfactorily.

Thanks for any help,
Jules
 
Dave said:
string pattern =
@"(?<sentence>.*? ) # lazily match sentence so that '.' does not capture eos
(?<eos> (?<! /) # negative lookbehind for /
/)?$ # match single '/' at end of string or just end of string";

Almost - this matches "My Taylor is /" twice: once as it should, and
again on the empty string after the /. Those optional matches are
tricky!
 
Jules said:
Hi John -
Who?

I'm not sure what you mean by matching the empty string after '/'

The Matches method returns a MatchCollection with a Count of 2.
(Alternatively, Regex.Match().NextMatch.Success == true.)

The 2nd Match has a Length of 0, and a Value of "".
 
Jules said:
Hi John - I'm not sure what you mean by matching the empty string after
'/'

A regular expression which is composed entirely of elements which can
match zero-width strings (foo*, foo?, foo{0,<whatever>}) itself matches
each zero-width string (i.e. the gap between each character). The
difference with this regular expression is that it contains an assertion
at the end - the '$' - that's what stops it finding every zero-length
string between each character in the source. Instead, it finds only the
last one (as well as the actual match).

-- Barry
 
Hi,

I'm not sure where the problem is.

I tested the expression and it worked according to the op's requirements. Is it returning extra information that the op doesn't
require? Does that matter? Is there a better way?
 
I ran the following console app:

static void Main(string[] args)
{
RegexOptions options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline;

string pattern =
@"(?<sentence>.*? ) # lazily match sentence so that '.' does not capture eos
(?<eos> (?<! /) # negative lookbehind for /
/)?$ # match single '/' at end of string or just end of string";

// Expected: implicit, outer group match, "My Taylor is " and "/" (3)
Match("My Taylor is /", pattern, options, 3);

// Expected: implicit, outer group match and "My Taylor is //" (2)
Match("My Taylor is //", pattern, options, 2);

// Expected: implicit, outer group match and "My Taylor is / " (2)
Match("My Taylor is / ", pattern, options, 2);

Console.ReadLine();
}

private static void Match(string input, string pattern, RegexOptions options, int expectedSuccessCount)
{
Match match = Regex.Match(input, pattern, options);

Console.WriteLine("Input={0}: {1}", input, (match.Success) ? "Success" : "Failed");
Console.WriteLine("Expected # of successful groups: " + expectedSuccessCount);
Console.WriteLine();

if (!match.Success)
return;

Console.WriteLine("Match: {0}", match);
Console.WriteLine();

Console.WriteLine("Captures: ");

for (int c = 0; c < match.Captures.Count; c++)
Console.WriteLine("{0}: {1}", c, match.Captures[c]);

Console.WriteLine();

Console.WriteLine("Groups: ");

int actualSuccessCount = 0;

for (int g = 0; g < match.Groups.Count; g++)
{
if (match.Groups[g].Success)
actualSuccessCount++;

Console.WriteLine("{0} ({1}): {2}", g, (match.Groups[g].Success) ? "Success" : "Failed", match.Groups[g]);
}

Console.WriteLine();
Console.WriteLine("Success: {0}", (expectedSuccessCount == actualSuccessCount) ? "Yes" : "No");

Console.WriteLine(new string('-', 10));
Console.WriteLine();
}

which produced the following output:

Input=My Taylor is /: Success
Expected # of successful groups: 3

Match: My Taylor is /

Captures:
0: My Taylor is /

Groups:
0 (Success): My Taylor is /
1 (Success): My Taylor is
2 (Success): /

Success: Yes
----------

Input=My Taylor is //: Success
Expected # of successful groups: 2

Match: My Taylor is //

Captures:
0: My Taylor is //

Groups:
0 (Success): My Taylor is //
1 (Success): My Taylor is //
2 (Failed):

Success: Yes
----------

Input=My Taylor is / : Success
Expected # of successful groups: 2

Match: My Taylor is /

Captures:
0: My Taylor is /

Groups:
0 (Success): My Taylor is /
1 (Success): My Taylor is /
2 (Failed):

Success: Yes
 
Dave Sexton said:
I'm not sure where the problem is.

I wasn't describing a problem. I was describing the concept of regular
expressions matching zero-length strings, since Jules said that he
wasn't sure what "matching the empty string" meant.

That is, the provided RE will match each valid string twice.

-- Barry
 
Hi Barry,

Sorry, my post was intended for Jon, but I responded to you instead.

Please disregard!
 
Dave said:
I tested the expression and it worked according to the op's
requirements. Is it returning extra information that the op
doesn't require? Does that matter?

It may or may not matter. If the OP calls Matches and iterates through
the MatchCollection, the first iteration will be fine, but the second
will have empty captures.
Is there a better way?

Well, yes. The regex I posted before yours matches only once ... and
is actually a bit simpler.

(?<sentence> .* [^/] ) (?<eos> / ) $

RegexOptions.IgnorePattermWhitespace
 
Hi Jon,

I'm sorry but I don't see any post before mine. I see mine as the first response to Jules.

Also, I tried placing your pattern into the program I posted and the last two results fail.

pattern = "(?<sentence> .* [^/] ) (?<eos> / ) $";
[See previous post for code; Also, I only used IgnorePatternWhitespace as you suggested when testing your pattern]

Here is the console output:

Input=My Taylor is /: Success
Expected # of successful groups: 3

Match: My Taylor is /

Captures:
0: My Taylor is /

Groups:
0 (Success): My Taylor is /
1 (Success): My Taylor is
2 (Success): /

Success: Yes
----------

Input=My Taylor is //: Failed
Expected # of successful groups: 2

Input=My Taylor is / : Failed
Expected # of successful groups: 2

Could you please verify these results?

If, in fact, Jules does require Multiline support than my original code can be modified by swapping "*" for "+" and using the
Multiline option instead of Singleline. I think that'll do the trick, but I don't think that's what Jules was after in the op.

--
Dave Sexton

Jon Shemitz said:
Dave said:
I tested the expression and it worked according to the op's
requirements. Is it returning extra information that the op
doesn't require? Does that matter?

It may or may not matter. If the OP calls Matches and iterates through
the MatchCollection, the first iteration will be fine, but the second
will have empty captures.
Is there a better way?

Well, yes. The regex I posted before yours matches only once ... and
is actually a bit simpler.

(?<sentence> .* [^/] ) (?<eos> / ) $

RegexOptions.IgnorePattermWhitespace
 
Dave said:
I'm sorry but I don't see any post before mine. I see mine as the first response to Jules.

I guess your newsreader threads by title, not message ID? I posted a
reply under "Regex Help" at 10:50A, my time; you posted at 11:36A, my
time.
Also, I tried placing your pattern into the program I posted and the last two results fail.

That's what they should do, according to the OP.
Input=My Taylor is /: Success
Right.

Input=My Taylor is //: Failed
Right.

Input=My Taylor is / : Failed

Right.
 
Hi Jon,

I mean your post doesn't exist. I don't see it at all! I'm using OE, default settings. There are no posts, other than the OP,
titled, "Regex Help". I'd like to fix this if the problem is indeed on my end. Any thoughts?
That's what they should do, according to the OP.

Then we disagree on the requirements stated in the op.

OP:
Say I have the phrase 'My Taylor is /' as input; I'd like 'sentence'
and 'eos' to match.
Now if I have 'My Taylor is //' or 'My Taylor is / ', 'sentence' should
match, but not 'eos'. In other words, 'eos' should match only when it
occurs once, and last, in the string.
<snip>

I interpreted the requirements as follows:
Right.


Right.
Wrong. "sentence" should be "My Taylor is //" and "eos" should fail, but the entire match should not fail since "sentence" is still
valid.
Wrong. "sentence" should be "My Taylor is / " and "eos" should fail, but the entire match should not fail since "sentence" is still
valid.

I also interpreted, incorrectly I believe now after reviewing the original post, the op as requiring a SingleLine match. I now see
that a Multiline match is probably the desired behavior and of course you are correct that my example wouldn't be appropriate, as
is, since it was designed for SingleLine behavior. In this case, wouldn't your pattern require the Multiline flag as well?
Otherwise, $ would match the end of the string, not the end of each line.

I'd like to here from Jules on this. Do you require further modifications on either expression to get the desired behavior or does
one of them already work for you?
 
Dave, John, Barry,

thanks for the in-depth discussions, examples and snippets. Where my
specs are concerned, the string is single-line, the "wrap" character is
optional and, if present, must be single & at the end of the line.
Therefore Dave's suggestion's works for me. Using the Regular
Expression Workbench (a handy tool for this newbie,
http://blogs.msdn.com/ericgu/archive/2003/07/07/52362.aspx) I do see
more (zero-length) matches than I expected but the important thing is
that the groups Success and Value give me what I need.

Again, thanks to all. Cheers,
Jules
 
Hi Jules,

Thanks for the link. I don't have any regex tools and this one might be useful. The UI, however, is terrible ;)
 
Back
Top