Regular expression problems

  • Thread starter Thread starter Saira
  • Start date Start date
S

Saira

Hello al



I am not sure where to post this, so I hope this is correct.



We have written a Visual Studio AddIn that relies heavily on regular
expressions, which are often executed thousands of times. When running, it
works successfully for a while, but the Windows Page File Usage gradually
increases, and eventually the application freezes. We have traced the
problem to the regular expressions, but the problem does not apply equally
to all expressions. For example, the expression "^([\s]*)('|REM(
|\t|\v))([\s\S]*)$" does not cause any problems, whereas the expression
"^((("(("")|[^"])*")|([\s\S]*?))*)(('([\s\S]*))|$)$" causes a severe memory
leak. We always execute the expressions in 'interpreted' mode. Is there a
problem with certain expression constructs?



Thanks for any help/tips



Saira
 
Saira said:
Hello al



I am not sure where to post this, so I hope this is correct.



We have written a Visual Studio AddIn that relies heavily on regular
expressions, which are often executed thousands of times. When running, it
works successfully for a while, but the Windows Page File Usage gradually
increases, and eventually the application freezes. We have traced the
problem to the regular expressions, but the problem does not apply equally
to all expressions. For example, the expression "^([\s]*)('|REM(
|\t|\v))([\s\S]*)$" does not cause any problems, whereas the expression
"^((("(("")|[^"])*")|([\s\S]*?))*)(('([\s\S]*))|$)$" causes a severe
memory leak. We always execute the expressions in 'interpreted' mode. Is
there a problem with certain expression constructs?

I don't know where your memory leak is, but I can tell you for sure that
this expression is burning CPU power and memory. Try it out with pen&pencil
on a 5 or 10-character string! It's O(n^2), and it stores every single match
(remember that MS RegEx's also store intermediate captures!). Also, what's
[\s\S] good for? Wouldn't '.' do the same?

What are you doing with the Match object the RegEx returns? It should be
quite big. I've tried it in Expresso on a medium-sized code file, took about
40 MB of memory - But everything seemd to be properly freed up when the GC
kicked in.

Can you create a small sample that shows the behaviour you've described
(memory leak not cleaned up by the GC)?

Niki
 
Back
Top