Slow in regular expression

  • Thread starter Thread starter Franz
  • Start date Start date
F

Franz

I have a string "=?big5?B?VHdpbnMgpbys9azb?=" ( a MIME encoded string )
I want to use regular expression to capture the strings "big5", "B",
"VHdpbnMgpbys9azb"
So I use this pattern string "\=\?(.*)+\?[QqBb]\?(.*)+\?\="
When I use Regex.Match for matching, the program doesn't have any response
for more than 1 min.
I would like to know if it is the problem of the pattern or the problem of
the regular expression service which .NET prvoides.
Thanks.
 
I'm not sure about the perf problem, but an alternative algorithm would be:

string sub = s.Substring(2, s.Length - 4);

string[] results = sub.Split('?');



where "s" is the string to parse and results the array of strings.
 
In your expression, you have the construct (.*)+, which means "one or more
of zero or more characters". This is essentially the same as "zero or more
characters", but the regex engine has to do quite a bit of backtracking to
see if it can match. The * and + are basically fighting over characters.
Should "big5" be considered 4 iterations of the * and 1 of the +, or should
it be 1 iteration of the * and 4 iterations of the +, etc.... Simply remove
the +s from the expression to get:

\=\?(.*)\?([QqBb])\?(.*)\?\=

If you want to make sure that at least on character is matched between ?s,
then you can just change the * to +:

\=\?(.+)\?([QqBb])\?(.+)\?\=


--By the way, I have inserted grouping parentheses around the [QqBb] in
order to capture the "B" in a group. This way you will have "big5" in group
1, "B" in group 2, and "VHdpbnMgpbys9azb" in group 3

Hope this helps

Brian Davis
www.knowdotnet.com
 
Back
Top