Comparing strings

  • Thread starter Thread starter Dennieku
  • Start date Start date
D

Dennieku

Does anybody have an idea how to compare two strings?
What I want to know is the percentage of the differences between the strings

Thx,
Dennieku
 
Dennieku said:
Does anybody have an idea how to compare two strings?
What I want to know is the percentage of the differences
between the strings

Could you define exactly what you mean by "percentage of the
differences"? If you can define it *exactly*, that will probably
suggest a way of calculating it simply.
 
I think what you're looking for (basically) is the Levenshtein edit-distance
algorithm. This algorithm takes two strings as an input. It then counts
the number of corrections (insertions, deletions or substitution of
individual characters) that are necessary to make the strings the same.

For example, with the inputs "Hello World" and "Hllo Woorld," it will return
"2" since there are two corrections necessary: insert an "e" in the first
word, and delete an "o" in the second word.

This isn't built into the .Net framework, so you'll have to build your own.
Try a Google search on "edit distance algorithm." There's an explanation of
the algorithm here, with some VB6 code:

http://www.merriampark.com/ld.htm
 
Jon Skeet said:
Could you define exactly what you mean by "percentage of the
differences"? If you can define it *exactly*, that will probably
suggest a way of calculating it simply.

I think he means something like DIFFERENCE or
SOUNDEX in SQL Server.

Unfortunately there's nothing like this built into
..NET and it's not a simple algorithm, but I'm sure
you could probably find some examples on some
academic pages. I think it has something to do
with phonetic analysis of the words and then
comparing the likeness of the placement of the
various phonetic parts relative to the other
string. MSR might have something like this
already, they are always suprising me with
new projects.

-c
 
Robert Jacobson said:
I just noticed your cross-post to .dotnet.languages.csharp -- the VB example
probably isn't what you're looking for. <g> That link also has examples in
Java and C++ that would be easy to translate to C#.

In particular, I'd be more than happy to translate it from Java to C#,
if this is indeed the algorithm required.
 
Back
Top