# of occurences of string in another string

J

Jason Gleason

What's the most efficient way to get the number of occurences of a certain
string in another string..for instance i'm using the following code right
now...

private int CharacterCounter(String text,String Character)

{

int count = 0;

for (int i = 0; i < text.Length; i++)

{

if(text.Substring(i,1) == Character)

{

count++;

}

}

return count;

}

The problem with this way is it's not the fastest doing big strings multiple
times over the course of a program run. Is there an easier/faster way to do
it using regular expressions?
 
J

Justin Rogers

Regular expressions are going to be slower.

I wrote something a while back that stores the offset of all string
occurences within another string. You can see that here:
http://weblogs.asp.net/justin_rogers/archive/2004/03/14/89545.aspx
The relavent code is in SplitByString and appears below as well.

while(index < testString.Length) {
int indexOf = testString.IndexOf(split, index);
if ( indexOf != -1 ) {
offsets[offset++] = indexOf;
index = (indexOf+1);
} else {
index = testString.Length;
}
}

Now, what about fixing that code up to just count?


int index = 0;
while(index < testString.Length) {
int indexOf = testString.IndexOf(split, index);
if ( indexOf != -1 ) {
offsets[offset++] = indexOf;
index = (indexOf+1);
} else {
index = testString.Length;
}
}
 
J

Justin Rogers

Okay, I hit the send before done button again...

int index = 0;
int count = 0;

while(index < testString) {
int indexOf = testString.IndexOf(splitString, index);
if ( indexOf != -1 ) {
count++; index = (indexOf + splitString.Length);
} else { index = testString.Length; }
}

That should get you the number of occurences correctly.
Also note that an issue in my original code didn't take into
account split string length for purposes of offseting. That
makes a big difference in the output of something like (match
all occurences of (aa) in (aaaa). Normally that should be two,
but my old method would have returned three. I guess that is
a highly ambiguous case.


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

Justin Rogers said:
Regular expressions are going to be slower.

I wrote something a while back that stores the offset of all string
occurences within another string. You can see that here:
http://weblogs.asp.net/justin_rogers/archive/2004/03/14/89545.aspx
The relavent code is in SplitByString and appears below as well.

while(index < testString.Length) {
int indexOf = testString.IndexOf(split, index);
if ( indexOf != -1 ) {
offsets[offset++] = indexOf;
index = (indexOf+1);
} else {
index = testString.Length;
}
}

Now, what about fixing that code up to just count?


int index = 0;
while(index < testString.Length) {
int indexOf = testString.IndexOf(split, index);
if ( indexOf != -1 ) {
offsets[offset++] = indexOf;
index = (indexOf+1);
} else {
index = testString.Length;
}
}



Jason Gleason said:
What's the most efficient way to get the number of occurences of a certain
string in another string..for instance i'm using the following code right
now...

private int CharacterCounter(String text,String Character)

{

int count = 0;

for (int i = 0; i < text.Length; i++)

{

if(text.Substring(i,1) == Character)

{

count++;

}

}

return count;

}

The problem with this way is it's not the fastest doing big strings multiple
times over the course of a program run. Is there an easier/faster way to do
it using regular expressions?
 
J

Jimi

Jason Gleasonwrote:
Is there an easier/faster way to do
it using regular expressions?

Regular expressions are great fun for your spare time, but in addition
to being cryptic and unmaintainable, they'll kill the performance of
your app.
 
M

mikeb

Jason said:
What's the most efficient way to get the number of occurences of a certain
string in another string..for instance i'm using the following code right
now...

private int CharacterCounter(String text,String Character)

{

int count = 0;

for (int i = 0; i < text.Length; i++)

{

if(text.Substring(i,1) == Character)

{

count++;

}

}

return count;

}

The problem with this way is it's not the fastest doing big strings multiple
times over the course of a program run. Is there an easier/faster way to do
it using regular expressions?

If you're searching for exact matches, regex's will not buy you anything
over a well-designed string search. if you're going to search for
patterns, regex's are a good tool.

If you want to count instances of a single character in a string (like
your example) you might be able to tune it a bit (for example, "text"
is probably faster than "text.Substring(i,1)"), but your basic algorithm
is probably fine - if you're going to search the text string only once.

If you're going to count character instances in the same text string
multiple times, it would probably pay to run through the string once,
tallying up all the counts for the various characters into an array
that's indexed by the character value. Subsequent character counts are
just an indexed array access after that.

If you want to change the search from simply counting instances of a
particular character to counting actual instances of strings, you'll
probably want to do some research on the Boyer-Moore string search
algorithm. The framework's String.IndexOf() method might be a good
starting point for you, but I don't think it'll be as fast as a
hand-coded Boyer-Moore, since IndexOf() takes into account culture
information.

Do a net search on Boyer-Moore - it's a not-too-complex, but very, very
good algorithm for performing exact string matches.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top