strings vs regular expressions

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

hi,
I need to comapare or check for substrings in a given string.
which would give better performance - string related comapare functions or
regualr expressions....
 
Hello AVL,

AIFAIK String.Contains will be the fastest, because it's only call IndexOf
when the regexp makes really waste processing

---
WBR, Michael Nemtsev [.NET/C# MVP].
My blog: http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

A> hi,
A> I need to comapare or check for substrings in a given string.
A> which would give better performance - string related comapare
A> functions or
A> regualr expressions...
 
Hello,

but string.IndexOf has very bad implemention. If you want a fast string
search, look for a .NET implementation of the Boyer-Moore algorithm - this
is also used in regular expressions internals.

Depending on the length of the text being searched and the frequency, you
might want to consider a precompiled regex.

Anyway, you should perform some performance testing yourself. It really
depends on the circumstances.

Best regards,
Henning Krause
 
Hello Henning Krause [MVP - Exchange],


H> Anyway, you should perform some performance testing yourself. It
H> really depends on the circumstances.

That's the point, coz we dont know what the OP is looking for
 
but string.IndexOf has very bad implemention. If you want a fast string
search, look for a .NET implementation of the Boyer-Moore algorithm - this
is also used in regular expressions internals.

I wouldn't say that IndexOf has a "very bad" implementation. In *some*
cases it won't be as fast as doing the "pre-work" involved for Boyer-
Moore, but I suspect in the vast majority of cases used in the real
world, it's far quicker to use the "brute force" method, given that
you're only looking for the string once (as far as String.IndexOf is
concerned - you may be calling it multiple times, of course).

I suppose String.IndexOf could apply some heuristics and guess whether
it's worth building the tables (or whatever) for Boyer-Moore, but as I
say, in the vast majority of real cases it won't make any odds.
Depending on the length of the text being searched and the frequency, you
might want to consider a precompiled regex.

Anyway, you should perform some performance testing yourself. It really
depends on the circumstances.

Agreed. If you know you're going to have to search for the same string
lots of times in a performance-critical environment, it may be worth
using regular expressions. I would use Contains until I'd actually
proved it was a bottleneck though :)

Jon
 
Hi AVL,

Just to clear things up regarding regular expressions versus string
functions. Use regular expressions when looking for a *pattern* of
characters in a string, which may be different characters in the same
pattern, and string functions for looking for substrings. What I mean by
"patterns" is, for example, a hyperlink in an HTML document.

A hyperlink is a string that must follow certain rules. It must begin with
the character sequence "<a" followed by one or more white space characters,
followed by 0 or more attribute name=value pairs, followed by the ">"
character. This is followed by a string of text that is followed by the
"</a>" character sequence. Note that only several of the characters are
specified, and you don't know what the rest of them will be. So, how do you
look for a string that satisfies these rules? Example:

(?m)(?i)(?<=<a)(?:(?:\s+href=(?<href>[^>]+))|(?:\s+[^=>]+=[^>]+))*>(?<innerHtml>[^<]*)(?=</a>)

The above is a regular expression that identifies substrings that satisfy
those rules. In addition, it captures 2 groups, one for the link text, one
for the innerHtml of the anchor.

You could not use a string function to find this pattern. Generally, string
functions are faster than regular expressions, but when looking for patterns
(groups of characters that satisfy rules), regular expressions are the
fastest method.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net
 
Hi,
Agreed. If you know you're going to have to search for the same string
lots of times in a performance-critical environment, it may be worth
using regular expressions. I would use Contains until I'd actually
proved it was a bottleneck though :)

"The First Rule of Program Optimization: Don't do it. The Second Rule of
Program Optimization (for experts only!): Don't do it yet." - Michael A.
Jackson
 
Back
Top