Regex - NFA, DFA, POSIX - seeking clarification

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

The framework help says: "Microsoft .NET Framework regular expressions incorporate the most popular features of other regular expression implementations such as those in Perl and awk. Designed to be compatible with Perl 5 regular expressions, .NET Framework regular expressions include features not yet seen in other implementations, such as right-to-left matching and on-the-fly compilation."

Perl is traditional NFA (non-POSIX), while awk (and most of its derivatives) are DFA (which are mostly POSIX anyway), so the help is a little ambiguous regarding the nature of Regex's engine

..NET's Regex doesn't (seem) to implement the POSIX longest-leftmost mandate. Jeffrey E. F. Friedl explains this mandate using the expression "(to|top)(o|polo)?(gical|o?logical)" and an input of "topological" (ignore the quotes)

A POSIX engine would be expected to return "top", "", and "ological" for the subexpressions (see Mr. Friedl's book for an explanation of why). However, .NET's Regex returns "to", "polo", and "gical"

So, is it NFA, DFA, semi-POSIX, or ...? Or, am I doing something wrong (a real possibility)

Thanks in advance for any additional info or resources you might provide
 
From [1]:

"The .NET Framework regular expression engine is a backtracking regular
expression matcher that incorporates a traditional Nondeterministic Finite
Automaton (NFA) engine such as that used by Perl, Python, Emacs, and Tcl.
This distinguishes it from faster, but more limited, pure regular expression
Deterministic Finite Automaton (DFA) engines such as those found in awk,
egrep, or lex. This also distinguishes it from standardized, but slower,
POSIX NFAs."

The material you quote seems to relates primarily to source-code
compatibility rather than engine behaviour.

Nick Wienholt, MVP
Maximizing .NET Performance
http://www.apress.com/book/bookDisplay.html?bID=217
Sydney Deep .NET User Group www.sdnug.org

[1]
http://msdn.microsoft.com/library/d.../en-us/cpguide/html/cpconmatchingbehavior.asp

Eric W. Bachtal said:
The framework help says: "Microsoft .NET Framework regular expressions
incorporate the most popular features of other regular expression
implementations such as those in Perl and awk. Designed to be compatible
with Perl 5 regular expressions, .NET Framework regular expressions include
features not yet seen in other implementations, such as right-to-left
matching and on-the-fly compilation."
Perl is traditional NFA (non-POSIX), while awk (and most of its
derivatives) are DFA (which are mostly POSIX anyway), so the help is a
little ambiguous regarding the nature of Regex's engine.
.NET's Regex doesn't (seem) to implement the POSIX longest-leftmost
mandate. Jeffrey E. F. Friedl explains this mandate using the expression
"(to|top)(o|polo)?(gical|o?logical)" and an input of "topological" (ignore
the quotes).
A POSIX engine would be expected to return "top", "", and "ological" for
the subexpressions (see Mr. Friedl's book for an explanation of why).
However, .NET's Regex returns "to", "polo", and "gical".
 
Oops. I apparently stopped short in my searching and didn't notice the wealth of additional regex detail in the help. How embarrassing

Thanks much for taking the time. Sorry to trouble you.
 
Back
Top