Questions on a Regular Rexpression

  • Thread starter Thread starter Ioannis Vranos
  • Start date Start date
I

Ioannis Vranos

Given the regular expression:

S"^([a-zA-Z]+|[a-zA-z]+\\s[a-zA-Z]+)$"


1) Isn't the "[a-zA-Z]+|[a-zA-z]+" part redundant? As far as I can
understand it means exactly the same as "[a-zA-Z]+" alone.

2) Isn't the parenthesis grouping redundant?

3) How can we define the parenthesis characters themselves as expected
characters in a match?


Thanks in advance.
 
Ioannis said:
Given the regular expression:

S"^([a-zA-Z]+|[a-zA-z]+\\s[a-zA-Z]+)$"


1) Isn't the "[a-zA-Z]+|[a-zA-z]+" part redundant? As far as I can
understand it means exactly the same as "[a-zA-Z]+" alone.

No, because of the alternative - it's

[a-zA-Z]+

-or-

[a-zA-z]+\\s[a-zA-Z]+

2) Isn't the parenthesis grouping redundant?

Since it's the entire expression, yes. If this expression was embedded
inside a larger regex then no - it defines the limits of the alternative.
3) How can we define the parenthesis characters themselves as expected
characters in a match?

Just escape them: \\(. You shouldn't need to escape the right paren in
most cases - just the left.

-cd
 
Carl said:
1) Isn't the "[a-zA-Z]+|[a-zA-z]+" part redundant? As far as I can
understand it means exactly the same as "[a-zA-Z]+" alone.


No, because of the alternative - it's

[a-zA-Z]+

-or-

[a-zA-z]+\\s[a-zA-Z]+


I did not understand what you mean with the above. May you explain with
some details?


Since it's the entire expression, yes. If this expression was embedded
inside a larger regex then no - it defines the limits of the alternative.




Just escape them: \\(. You shouldn't need to escape the right paren in
most cases - just the left.


Ok, thanks for the info.
 
Ioannis said:
Carl said:
1) Isn't the "[a-zA-Z]+|[a-zA-z]+" part redundant? As far as I can
understand it means exactly the same as "[a-zA-Z]+" alone.


No, because of the alternative - it's

[a-zA-Z]+

-or-

[a-zA-z]+\\s[a-zA-Z]+


I did not understand what you mean with the above. May you explain
with some details?

The alternative operation has low precendence - lower than concatenation, so

(bob|joe|sue)

parses as 'bob' or 'joe' or 'sue' not as 'bo'+('b' or 'j')+'o'+('e' or
's')+'ue'

similarly,

[a-zA-Z]+|[a-zA-Z]+\\s+[a-zA-Z]+

parses as

'[a-zA-Z]+' or '[a-zA-Z]+\\s[a-zA-Z]+'

instead of

('[a-zA-Z]+' or '[a-zA-Z]+')\\s+[a-zA-Z]+

does that make sense?

The original expression could be factored, since the alternatives have a
common prefix:

[a-zA-Z]+(\\s+[a-zA-Z]+)?

I would expect a DFA-based regex engine might well do that factoring as a
matter of course when computing the DFA.

-cd
 
Carl said:
The alternative operation has low precendence - lower than concatenation, so

(bob|joe|sue)

parses as 'bob' or 'joe' or 'sue' not as 'bo'+('b' or 'j')+'o'+('e' or
's')+'ue'

similarly,

[a-zA-Z]+|[a-zA-Z]+\\s+[a-zA-Z]+

parses as

'[a-zA-Z]+' or '[a-zA-Z]+\\s[a-zA-Z]+'

instead of

('[a-zA-Z]+' or '[a-zA-Z]+')\\s+[a-zA-Z]+

does that make sense?

The original expression could be factored, since the alternatives have a
common prefix:

[a-zA-Z]+(\\s+[a-zA-Z]+)?

I would expect a DFA-based regex engine might well do that factoring as a
matter of course when computing the DFA.


Thanks for the explanation.
 
IV> S"^([a-zA-Z]+|[a-zA-z]+\\s[a-zA-Z]+)$"

Note that the [A-z] character set listed above (in the second group) includes
non-alphabetic characters.
 
Serge said:
IV> S"^([a-zA-Z]+|[a-zA-z]+\\s[a-zA-Z]+)$"

Note that the [A-z] character set listed above (in the second group)
includes non-alphabetic characters.


Thanks for the correction, it was just a typo of mine, it was meant to be:


S"^([a-zA-Z]+|[a-zA-Z]+\\s[a-zA-Z]+)$"
 
Back
Top