Testing for names that sounds alike

  • Thread starter Thread starter Anthony Fontana
  • Start date Start date
A

Anthony Fontana

When entering new clients, we sometimes do not get correct spelling of their
names. Is there a way to test for names that sound alike but not necessarily
spelled similarly, esp in the first few characters.

Thanks.
 
Google Access groups on SOUNDEX code.

Here is one I put together long ago. You would have to compare soundex code
to soundex code and all that calculation could be slow.

WHERE fSoundex(SomeField) = fSoundex(NewValue) A

'=================== VBA code function follows =======================
Public Function fSoundex(strToEncode) As String
'AUTHOR: John Spencer
'LAST MODIFIED: June 30, 1999
'DESCRIPTION: Returns a string encoded as soundex code
'This version parallels the SOUNDEX used in MS SQL 6.5
'Procedure to encode string as soundex code using using the following rules
'Remove all w and h
'With exception of 1st character remove all aeiouy
'encode all letters in string
'collapse adjacent matching digits into one digit (3333 = 3)
'remove any zero values
'expand the code to 6 digits by adding zeroes to the end
'replace the first digit with the first letter of the original name
'KEEP first FOUR characters

Dim strSource As String, strEncode As String
Dim intPosition As Integer
Dim intLength As Integer
Dim strTEMP As String

On Error GoTo fSoundex_Error
'Get rid of leading & trailing spaces
strSource = Trim(strToEncode)

If Len(strSource) < 2 Then
strEncode = strSource & "000000"
Else
'Loop through remaining characters and encode them
For intPosition = 2 To Len(strSource)
Select Case Mid(strSource, intPosition, 1)
Case "b", "f", "p", "v" 'bfpv
strEncode = strEncode & "1"
Case "c", "g", "j", "k", "q", "s", "x", "z" 'cgjkqsxz
strEncode = strEncode & "2"
Case "d", "t" 'dt
strEncode = strEncode & "3"
Case "l" 'l
strEncode = strEncode & "4"
Case "m", "n" 'mn
strEncode = strEncode & "5"
Case "r" 'r
strEncode = strEncode & "6"
Case " " 'Space
strEncode = strEncode & "9"
Case Else
strEncode = strEncode & "0"
End Select
Next intPosition

If Len(strEncode) > 1 Then 'Remove adjacent duplicate codes
intLength = Len(strEncode)
For intPosition = intLength To 2 Step -1
If Mid(strEncode, intPosition - 1, 1) = _
Mid(strEncode, intPosition, 1) Then
strEncode = Mid(strEncode, 1, intPosition - 1) & _
Mid(strEncode, intPosition + 1)
End If
Next intPosition
End If


If Len(strEncode) > 1 Then 'REMOVE ZEROES
intLength = Len(strEncode)
For intPosition = 1 To intLength
If Mid(strEncode, intPosition, 1) <> "0" Then
strTEMP = strTEMP & Mid(strEncode, intPosition, 1)
End If
Next intPosition
strEncode = strTEMP
End If

strEncode = UCase(Left(strSource, 1)) & Mid(strEncode & "000000", 1, 5)

'if there is a space in the name then truncate at the space
If InStr(strEncode, "9") Then
strEncode = Left(strEncode, InStr(strEncode, "9") - 1) & "00000"
End If

End If 'Something is there

'Truncate value to 4 characters to conform with MS SQL 6.5 Soundex length
fSoundex = Mid(strEncode, 1, 4)
Exit Function

fSoundex_Error:
MsgBox Err.Description

End Function

John Spencer
Access MVP 2002-2005, 2007-2009
The Hilltop Institute
University of Maryland Baltimore County
 
John:

It seems to me that the SoundEx function worked on names
and could say that Smyth is a lot like Smith. Can it, or your
function tell me that these two strings are alike?

ONC123 prot assign.pdf
ONC123_Protocol_Assignment.pdf

It's probably obvious what I am doing here. Got a directory
full of files that don't follow naming conventions. They are all
close, but spaces where underscores belong and abbrevs.

Assuming I replaced underscores with spaces before I did
the compare (or perhaps not), would your fSoundex function
work on this? If not, something else? Fuzzy compare Fn?
 
No SOUNDEX is designed for matching names and even there it is not highly
accurate.

Matching something like that is going to be VERY difficult. I would look at
creating a custom vba function to try to regularize the names.

-- replace underscores with spaces
-- break the name down into pieces and keep the first 4 char of each word.
-- Keep the extension

Something like the following UNTESTED function

Public Function regularName(strIN, Optional iLen As Long = 4) As String
Dim StrOut As String
Dim vWords As Variant
Dim I As Long

If Len(Trim(strIN & "")) = 0 Then
regularName = strIN
Else

strIN = Replace(strIN, "_", " ")
vWords = Split(strIN, " ")
For I = 0 To UBound(vWords)
If Len(vWords(I)) > 0 Then
If InStr(vWords(I), ".") > 0 Then
StrOut = StrOut & " " & _
Left(Left(vWords(I), Len(vWords(I)) - 4), iLen) & _
Right(strIN, 4)
Else
StrOut = StrOut & " " & Left(vWords(I), iLen)
End If
End If
Next I
regularName = Mid(StrOut, 2)

End If

End Function

John Spencer
Access MVP 2002-2005, 2007-2009
The Hilltop Institute
University of Maryland Baltimore County
 
Thanks. This gives me a lot to work with. I realize that "sounds alike" is
a vague term too.
 
Back
Top