Hey Pavel, thanks for the reply.
strParse is typically a lengthy string, easily several thousand
characters in length. That is why I use Contains().
So you don't care about the rest of the string? But what's in there,
and can't it accidentally match one of your other keywords?
One other problem with this is that, for a large string, you're really
going to waste a lot of time on those repeated Contains calls, as each
one will scan the string over again; for large strings, and a lot of
key words to try (and your first post seems to imply that you really
don't have just these four as in your example), that will be
noticeable.
One way that _might_ work faster is to use a regex here. Now, I'm not
sure if it will actually be faster or not - but, given the triviality
of such a regex, and the fact that it can be compiled to IL, I
strongly suspect that it will be faster, as it will only have to scan
the string once, and it will do it one character at a time with no
backtracking. So what you do is this:
private static string[] categories = { "Consult", "Declaration",
"Packing", ... };
private static Regex categoriesRegex = new Regex(
// this will produce something like "(CONSULT)|(DECLARATION)|
(PACKING)|..."
string.Join("|", categories.Select(c => "(" + c.ToUpper() +
")").ToArray()
);
Note how every keyword ends up is in its own capture group. Now, the
function itself:
public string FindCategory(string strParse) {
Match m = categoriesRegex.Match(strParse);
// Determine which group matched (this assumes that only one
matched,
// otherwise it'll just pick the first one). Note that group
numbers
// start from #1, as #0 is the entire regex.
int n = 0;
for (int i = 1; i < m.Groups.Count; ++i) {
if (m.Groups
.Success) {
n = i;
break;
}
}
return (n > 0) ? categories[n - 1] : "Not Found";
}
So try this. However - and I cannot stress how important that is -
don't trust my words that it's going to be faster (after all, I'm just
guessing!). Profile it. Throw a few strings at it that are likely to
be of the size you'll encounter in production, and see how much the
improvement is. If it's marginal, then you might want to just go with
my earlier suggestion (that also calls Contain() repeatedly, as your
initial sample does), as it's likely to be much more readable and
easier to understand for anyone maintaining that code.
Also, whichever way you go, please post the results of your profiling.
I'd be curious to find out how good I am at this guessing game, and
whether my suggestion has merit or not; and probably so would anyone
else who might stumble onto this thread searching for a solution to a
similar problem.