RegEx to NOT find matches

  • Thread starter Thread starter Ralf Hermanns
  • Start date Start date
R

Ralf Hermanns

In my project, I get the HTML code of a website, and I need to check for
three texts: "Mail was send to...", "No data matching your search input"
and "Request timed out"

Problem is, one of the matches fits always and comes back very fast, the
two others take forever to execute (and confirm this text is NOT in the
page given back). With "fast" I mean "not measureable quick" and by
"slow" I mean around 3 minutes!

The webpage (=the text searched in) is between 60 and 150 kb long, the
patters are searched with
Dim m4 As Match = Regex.Match(ResponsePage, ".*Script\stimed\sout.*",
RegexOptions.Singleline Or RegexOptions.IgnoreCase)




Is that the normal behaviour of a RegEx to need so long to confirm
something is NOT in the text? Can I speed that up or avoid or???

Thanks!
Ralf
 
In my project, I get the HTML code of a website, and I need to check for
three texts: "Mail was send to...", "No data matching your search input"
and "Request timed out"

Problem is, one of the matches fits always and comes back very fast, the
two others take forever to execute (and confirm this text is NOT in the
page given back). With "fast" I mean "not measureable quick" and by
"slow" I mean around 3 minutes!

The webpage (=the text searched in) is between 60 and 150 kb long, the
patters are searched with
Dim m4 As Match = Regex.Match(ResponsePage, ".*Script\stimed\sout.*",
RegexOptions.Singleline Or RegexOptions.IgnoreCase)

Is that the normal behaviour of a RegEx to need so long to confirm
something is NOT in the text? Can I speed that up or avoid or???

Thanks!
Ralf

Hey Ralf,

You could try a couple of things:
1) Pre-compile the regular expression since you are clearly using it
heavily
2) Remove the IgnoreCase and specify the error in the exact case, thus the
regex has less work to do
3) If the error page has an error number (e.g. an IIS error, look for the
error number instead e.g. 500, 401, etc)
4) You might also want to investigate lazy quantifiers. They MIGHT help
 
Hey Ralf,

As you are only concerned with the presence of a match I would use the
static (shared) Regex.IsMatch method as this returns a value indicating
if there is a single match and does not return a match collection
containing all matches.

Also I would remove the .* from the start and end of the pattern as
they are not needed and as Rad said I would remove the IgnoreCase
option as Regex class will call input.ToLower() if the option is
specified. I would then adjust the pattern as such:

"[Ss]cript\s[Tt]imed\s[Oo]ut"

Alternation is very fast but this pattern will only work with initial
capitals.

The best course of action is to initialise the Regex once (instance
members of the Regex class are thread safe), assign it to a static
member and then use the instance IsMatch method.

Andy.
 
Back
Top