Regular expression Match hangs up

  • Thread starter Thread starter Matt McKey
  • Start date Start date
M

Matt McKey

I developing an application that heavily relies on regular
expression processing I frequently have situations where
the Match seems to go to an infinite loop.

Does advise how handle these situations without a whole
application feezing?
 
Are you using alot of back references in the expression? What does the
offending expression look like?
 
Hi Matt,

I've had this problem a *lot*. I'm sending 300+ files a day through a
regular expression parser that I wrote, and it hangs on average 1 time every
3 days. And if I send the offending file through the parser again, it
doesn't hang.

What I finally ended up doing is to start a watcher thread (using
System.Threading.Timer) that, after 10 minutes, kills the thread doing the
parsing. And if the method that does the parsing ends normally (it doesn't
hang), the last thing it does is to get rid of the watcher thread timer by
calling Dispose().

I realize that's a bit of trouble, but it does work, and it enabled me to
save a relationship with my client.

- Steve Harclerode
 
You are probably encountering what is called a "super-linear" or
"exponential" match. This is the result of using nested quantifiers -
notably + and *. For instance, the following expression

^((ha)*)+$

matches the string "hahahaha", but not "hahahah". For a short string, a
non-match fails rather quickly. However, if we evaluate on the longer
string
"hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahah",
the regex seems to hang and never return. This is all because there is a *
enclosed by a +. The + and * have to fight over who gets what, so the regex
backtracks, and the number of backtracks increases exponentially as the
string gets longer. If there is a match, the regex returns quickly, but if
it is a non-match, it could take years or even lifetimes to return. The
short answer is that you really have to check both successful and
unsuccessful inputs with both short and long strings to fully test a regular
expression for this behavior.

Brian Davis
www.knowdotnet.com
 
Back
Top