Regex question: Lazy quantifiers

  • Thread starter Thread starter Jeff Johnson [MVP: VB]
  • Start date Start date
J

Jeff Johnson [MVP: VB]

What is the point of lazy * and lazy ? ? "Nothing" will always succeed
first, right? If not, can someone give me an example of when either of these
might be used?
 
In regular expressions, the (*) operator (star) is a cardinalty operator,
meaning "match zero or more occurences of the previous pattern". The (?)
operator is a cardinality operator for zero or one pattern match, of the
previous pattern.

so if you had:

(^abc*$) then you would match things like "abc", "abcc", "ab", "abcccccccc"

if you had:

(^abc?$) then you would match things like "ab", "abc"

then if you had:

(^ab*c?$) you would match "abc", "ac", "a", "abbbbbbbbc"

It's easy to confuse (*) and (?) with the globbing rules, which match ANY
STRING for (*) and ANY CHARACTER for (?). Regular expressions and globbing
are somewhat orthagonal in this case.

Hope that helps

-- Jake
 
Jeff Johnson said:
What is the point of lazy * and lazy ? ? "Nothing" will always succeed
first, right? If not, can someone give me an example of when either of
these might be used?

I'm not sure if I got your question right, you want to know when/why to use
lazy quntifiers (*? or ??), is this correct?

You are correct, "nothing" will always succeed first, but if the subpattern
after the lazy quantifier doesn't match, the regex engine will try "more
than nothing". A common example is this one: pattern: "<div>.*</div>". If
the input sequence is:
<div>Line 1</div><div>Line 2</div><div>Line 3</div>
the pattern will match the whole text, becaue ".*" matches as much as
possible. (You might try this in Expresso or Regulator or something alike).
Using "<div>.*?</div>" will result in 3 matches, one for each "div".
Doing the same thing without lazy quantifiers is quite complex. (something
like "<div>([^<]|<[^d]|<d[^i]|<di[^v]|<div[^>\s])*</div>").

Hope this helps,

Niki
 
You are correct, "nothing" will always succeed first, but if the
subpattern after the lazy quantifier doesn't match, the regex engine will
try "more than nothing". A common example is this one: pattern:
"<div>.*</div>". If the input sequence is:
<div>Line 1</div><div>Line 2</div><div>Line 3</div>
the pattern will match the whole text, becaue ".*" matches as much as
possible. (You might try this in Expresso or Regulator or something
alike). Using "<div>.*?</div>" will result in 3 matches, one for each
"div".
Doing the same thing without lazy quantifiers is quite complex. (something
like "<div>([^<]|<[^d]|<d[^i]|<di[^v]|<div[^>\s])*</div>").

Excellent! That's exactly what I needed to know. It tells me that *? and ??
are only useful if they are followed in the regular expression by something
else.
 
Back
Top