Regex Bug??????

  • Thread starter Thread starter abing
  • Start date Start date
A

abing

I use a Regex like:
^(https?:\/\/(([\w-]+.)+[\w-]+))?(\/?\w[\w-\.]*)*((\.\w{1,5})?((\?|#)[^'"\s]
*)?|\/)?$
(to match a url)
to Match a string :
"http://search.microsoft.com/us/SearchMS25.asp?boolean=ALL&nq=NEW&so
=RECCNT&ig=01&ig=02&ig=03&ig=04&ig=05&ig=06&ig=0
7&ig=08&ig=09&ig=10&i=00&i=01&i=02&i=03&i=04
&i=05&i=06&i=07&i=08&i=09&qu="
then my program dead!!!


Who can tell me why??????
 
That's right. I tried your strings and my app hanged.
Strange. Have you sent this issue to MS?

JN
NSQUARED2
 
Some regular expressions are just harder to evaluate than others. It's
possible to create a regular expression that causes a combinatorial
explosion of possible matches that need to be tested, so you have to be
careful how you form your expression if you want to avoid a long-running
search. Maybe that's what you're encountering here.

A simple technique that can sometimes help is to use *? and +? instead of *
and + so that you're lazy matching instead of greedy.
http://www.devnewz.com/devnewz-3-20030917FiveHabitsforSuccessfulRegularExpressions.html
is an interesting article that includes tips like this.

You might also check some of the existing regular expression libraries that
are available online. You'll find that many others have come up with
different solutions to this problem.


abing said:
I use a Regex like:
^(https?:\/\/(([\w-]+.)+[\w-]+))?(\/?\w[\w-\.]*)*((\.\w{1,5})?((\?|#)[^'"\s]
*)?|\/)?$
(to match a url)
to Match a string :
"http://search.microsoft.com/us/Sear...&i=00&i=01&i=02&i=03&i=04
&i=05&i=06&i=07&i=08&i=09&qu="
then my program dead!!!


Who can tell me why??????
 
I am not exactly sure of the intent of the entire expression, but it looks
like the "." at position 22 in your expression should be "\." With ".", it
matches any character, which makes the "([\w-]+.)+" potentially match the
same string in millions of different ways. This results in lots of
backtracking by the regex, which means that the expression is not "hung", it
just won't be finished evaluating for a few million years. These
expressions are sometimes called "exponential" or "super-linear". It is not
really a bug, just a really big gotcha when using regular expressions.

Anyway, here is what I _think_ the expression should look like:

^(https?:\/\/(([\w-]+\.)+[\w-]+))?(\/?\w[\w-\.]*)*((\.\w{1,5})?((\?|#)[^'"\s
]*)?|\/)?$


Again, I have not fully tested this expression, and I am not entirely sure
what you intend to match, so please test this carefully.

Brian Davis
www.knowdotnet.com



abing said:
I use a Regex like:
^(https?:\/\/(([\w-]+.)+[\w-]+))?(\/?\w[\w-\.]*)*((\.\w{1,5})?((\?|#)[^'"\s]
*)?|\/)?$
(to match a url)
to Match a string :
"http://search.microsoft.com/us/Sear...&i=00&i=01&i=02&i=03&i=04
&i=05&i=06&i=07&i=08&i=09&qu="
then my program dead!!!


Who can tell me why??????
 
Back
Top