regex - long operations

  • Thread starter Thread starter AlexS
  • Start date Start date
A

AlexS

Hi

Is there a way to interrupt long running regex parsing operation or perform
it in a way, which will allow to specify max time of parsing?

I use regex to check basic format of some messages. In some cases, when
format is wrong, regex takes 100% cpu for very long time - in excess of
minutes. I would like to be able to control this process, for example, when
regex expression can't be evaluated in 0.5 second it should be aborted. Then
I can use more specialized parsing expression.

I seem not to be able to find a way to solve this issue in elegant way.

Any pointers available?

Thanks!
 
This is generally the result of nesting patterns, combined with greedy
matching it can take a loooong time to run the regexp. I don't think you can
abort a running regexp matching...

Jerry
 
You can not abort a running regexp, but you can abort a
thread. Kick your regexp off with a thread and if it
doesn't return with in your max time, abort that thread
and move on.
 
In .Net framework you cannot kill a thread predictably. There's absolutely
no guarantee that the Abort (I hope you're talking about Thread.Abort) will
actually stop the thread. Read the manual.

Jerry
 
It doesn't make sense to post specific regex. Currently I have around 20
different of them. String do vary from 128b to 10Mb. Users will add some
more. They might also add something stupid. Which will run for a long time.

So, it looks like I have a wish. Anyway, my analysis shows that if source
string is improperly formatted - means not as regex is expecting - and you
have recursion in the expression or collect all repetitions of specific
pattern, this might happen. I mean 100% cpu for a long time.

So, either users won't be allowed to make their own regexes (most of them
don't know what is "greedy" except in the context of personal behavior),
either MS has to do something in this respect.

How's Java regex engine behaving in such situations - anybody?

Thanks for confirming my worst expectations...

Alex
 
Java's regexps will behave pretty much the same, this is a problem with
regexp algorithms, not an actual implementation. And personally - I don't
think it's a good idea to let users enter their own expressions, especially
when you acknowledge that most of them will have no idea what they would be
doing. It would create some nice DoS issues in your app.

Jerry
 
Back
Top