Regular expression problem

  • Thread starter Thread starter Juan Gabriel Del Cid
  • Start date Start date
J

Juan Gabriel Del Cid

Hi, Marius.

No matter how complex your regex is, it will not lock the system. It may
take a little time to compile, but, it wont lock your system. I used your
pattern in a regex and the program finished without any problems.

My guess is you have this in a loop and the condition never gets changed
when there is a match, or something like that, and you end up in a
never-ending loop.

Maybe if you posted more of your code we could be of more help.

-JG
 
Hi,

I'm having an odd problem with a regular expression in C#. I'm trying
to parse the rows in a web page with this pattern, and the computer
locks up:

"<tr valign=middle>.+?<a
href=\"/listings/details/index\\.cfm\\?itemnum=.+?8000 FB
24/7.+?(Antonius|Marcus).+?PLATINUM (\\d+)K
PP.+?\\$<b>(\\d+(\\.\\d+)*)</b>.+?</tr>"

If the "(Antonius|Marcus).+?" part is taken out of the string,
everything works just fine, and IMO there's no error in that token.
And I've also tried with "(Antonius).+?" and "Antonius.+?", and it
still locks out.

Any help is greatly appreciated :)

Thanks.
 
The code is this:

server_pattern = "(Antonius|Marcus)";
sPattern = "<tr valign=middle>.+?<a
href=\"/listings/details/index\\.cfm\\?itemnum=.+?" + vendor_pattern
+ ".+?" + server_pattern + ".+?" + platinum_pattern + ".+?" +
price_pattern + ".+?</tr>";
Regex r = new Regex(sPattern, RegexOptions.IgnoreCase |
RegexOptions.Singleline);


mc = r.Matches(s);
if (mc.Count < 1)
{
s2 += "No matches found !\r\n";
}
else
{
s2 += mc.Count.ToString() + " matches found \r\n";
foreach (Match m in mc)
{
s2 += m.Groups.Count + " count: " + m.Groups[1].Captures[0] +"
\r\n";
}
}


The condition isn't in a loop.

And you're right, it dosn't lock up my entire PC, just the
application. In debugger, the instruction it locks at is "if (mc.Count
< 1) ", and the watch on mc.Count says 'error: cannot obtain value'.

What seems strange to me is that without that paranthesis the
expression works fine and returns in about one sec. (the page has
approximately 80 Kb and 50 matches to the expression)...
 
The code is this:
server_pattern = "(Antonius|Marcus)";
sPattern = "<tr valign=middle>.+?<a
href=\"/listings/details/index\\.cfm\\?itemnum=.+?" + vendor_pattern
+ ".+?" + server_pattern + ".+?" + platinum_pattern + ".+?" +
price_pattern + ".+?</tr>";
Regex r = new Regex(sPattern, RegexOptions.IgnoreCase |
RegexOptions.Singleline);


mc = r.Matches(s);
if (mc.Count < 1)
{
s2 += "No matches found !\r\n";
}
else
{
s2 += mc.Count.ToString() + " matches found \r\n";
foreach (Match m in mc)
{
s2 += m.Groups.Count + " count: " + m.Groups[1].Captures[0] +"
\r\n";
}
}


The condition isn't in a loop.

And you're right, it dosn't lock up my entire PC, just the
application.

Ok, I ran it with some sample in put and went to lunch... after a few hours
it finished. This means that you proved me wrong, :-). If a regular
expression is complex enough, it might take a (very) long time to compute
giving the impression that it is locked. For any practical use, it does lock
the application.

So, what you need to do is fix the regular expression. One way to do it is
split it into two stages:

1. Look for TRs (table rows) and then
2. Look for your product

I also computed match per match, instead of computing all of them. Here's
the code:

string trPattern = "<tr valign=middle>(.+?)</tr>";

string productPattern =
"<a href=\"/listings/details/index\\.cfm\\?" +
"itemnum=.+?8000 FB 24/7.+?(Xev|Antonius|Marcus)" +
".+?PLATINUM (\\d+)K PP.+?\\$<b>(\\d+(\\.\\d+)?)</b>";

Regex trRegex = new Regex(trPattern,
RegexOptions.IgnoreCase | RegexOptions.Singleline |
RegexOptions.Compiled);

Regex productRegex = new Regex(productPattern,
RegexOptions.IgnoreCase | RegexOptions.Singleline |
RegexOptions.Compiled);

Match trMatch = trRegex.Match(fileStr); // fileStr has the HTML

if (!trMatch.Success) {
Console.WriteLine("No TRs found.");
} else {
Console.WriteLine("Found some TRs");

while (trMatch.Success) {
Match productMatch = productRegex.Match(
trMatch.Groups[1].Captures[0].Value);

while (productMatch.Success) {
Console.WriteLine("Match found -- {0} groups, product: {1}",
productMatch.Groups.Count,
productMatch.Groups[1].Captures[0]);

productMatch = productMatch.NextMatch();
}
trMatch = trMatch.NextMatch();
}

This code returns a heck of a lot faster than the previous one.

Ok, I hope that helps,
-JG
 
Back
Top