Looping through the string in a stringbuilder is probably the safest
way to do this:
[...]
int count = 0;
for (int i = 0; i < sb.Length; )
{
if (AreEqual(sb, search, i))
{
sb.Remove(i, search.Length);
sb.Insert(i, replacement);
i += replacement.Length;
count++;
}
else
{
i++;
}
}
[...]
It might be faster to use sb.Replace(search, replacement, i,
search.Length)
instead of a sb.Remove, sb.Insert, I'm not sure, but they won't
differ that much.
If you simply call StringBuilder.Replace(string, string, int, int)
instead of having your own AreEqual() method followed by a call to
Remove() and Insert(), the performance should be practically
identical, but you wouldn't get any information about how many
replacements occurred.
Alternatively, if you still call AreEqual() and then call
StringBuilder.Replace(string, string, int, int), you're duplicating
effort (which costs performance), because
StringBuilder.Replace(string, string, int, int) has to actually do
the string comparison again. That would actually be _slower_ than
your original code.
You could do a little hack by searching for the first character that
differs between the search and replacement strings (as an
initialization, not as part of the loop), and then bumping a counter
after each call to StringBuilder.Replace() based on whether the
character at the same offset within the current StringBuilder has
changed. That would be only slightly slower than just calling
StringBuilder.Replace(string, string, int, int), but would include
the count.
That said, I would hope that any Replace() method in .NET, including
Regex.Replace(), String.Replace(), or StringBuilder.Replace() would be
faster than the code you posted. The main reason being that all of
those methods have the opportunity to optimize the construction of
the new string, whereas your example doesn't optimize at all.
At the very least, I would not use the Remove()/Insert() pattern
you've shown. Instead, I would use a String as input, and a
StringBuilder as output, appending text segments to the output
StringBuilder as I scan the input String. That way the code avoids
having to repeatedly shift your character buffer in the StringBuilder
(which happens _twice_ for each replacement in your code). That's
exactly the kind of optimization I'd expect to find inside the .NET
classes.
It might even be worthwhile to defer creation of the output
StringBuilder until you detect the first match that needs to be
replaced, if there's an expectation that for a significant frequency
of input, no replacements would be needed.
It is probably a lot faster than using a regex, though I haven't done
any measurements.
I would expect Regex to be on par with other explicit mechanisms like
that, especially given the need to count the replacements (which for
non-Regex solutions requires replacing the search text twice). If
performance is an issue, then a "scan and build" approach as I suggest
above is probably slightly faster than using built-in Replace()
methods simply because you can incorporate a count into the
replacement logic.
All that said, if performance is an issue (and there's nothing in the
OP to suggest it is), the only way to know for sure what the best
solution is would be to try the different alternatives and measure
them. Even theoretical advantages and disadvantages may be
irrelevant for a typical data set, and intuition is a terrible way to
measure performance.
For best performance, it may be that none of the suggestions offered
so far are probably suitable. There's an optimized text search
algorithm, the name of which I can't recall at the moment, that can
probably be adapted, but if not then a degenerate state-graph
implementation (since there's only one string to search for) would
probably work too. Either approach would avoid having to keep
performing full string comparisons at each character index in the
original string (consider an original string
"aaaaaaaaaaaaaaaaaaaaaaaaaa" where you want to replace all occurrences
of "aaaaaab" with something
).
But even there, as I said, there's no way to know for sure without
measuring. Performance of the various choices is to some extent going
to be data dependent; liabilities that exist in the general case
might not really be that much of a problem. For example, if dealing
with essentially random data, it's not too terrible to just keep
comparing over and over at an incremented index, because those
comparisons will normally terminate quickly when there's no match.
In other words, even the theoretically worst-case implementation might
not turn out to be much different than the more optimized ones.
Until there's a performance problem shown, the OP should stick with
whatever solution _reads_ the best, and is the most maintainable. And
if there is a performance problem shown, measuring each viable
alternative is the only way to know for sure which will be fastest.
Pete