A
Andrew
Hi all
I am looking for your advice on a project of mine to create a fully-
featured dictionary class.
I have a source file with around 75000 words of UK English. These are
sorted, then loaded into a Dictionary object, each with a long as its
key.
I have methods (code at the end of this message) as follows:
1) CheckWordExists: Check if a given word exists. If so, returns its
position in the dictionary, as a long. If not, returns the compliment
of its nearest neighbour
2) ScrambleWord: Takes a string, and returns the letters randomly
scrambled
3) ReturnRandomWord: Returns a random word from the dictionary. (Code
not listed below)
4) FindAnagram: Given a string, invokes a loop, calling ScrambleWord.
Then passes this to CheckWordExists to determine if this is a genuine
word rather than a random string of characters. If not, passes the
string to ScrambleWord again, to retrieve another anagram and checks
if this is a real word. This loops either until a valid word is
returned (which is not the original word!) or until 100000 attempts
have been made, at which point it gives up!
Now, the first 3 routines work nicely - taking usually less than a
millisecond to return their values.
However, the FindAnagram method is rather slow. I created a test loop
which does the following:
1) Calls ReturnRandomWord to generate a random 8-character word
2) Passes this to FindAnagram to try to find an anagram
3) If this fails to find an anagram, generate another random word and
try again.
Using a couple of datetime variables and a timespan variable, I tested
this a few times, and gained the following information:
1) Each time FindAnagram runs, it takes an average of 4.3 seconds to
either find an anagram, or report failure.
2) Using 8-letter words results in finding an anagram for, on average,
1 word in 23 taking an average of 1:42 minutes.
Now, I'm wondering if I'm missing something - is there a better way to
approach this than the "brute force" method I'm using to find
anagrams?
If anyone has any thoughts or input, I'd be really grateful!
Thanks in advance
Andrew
Code: (dictionaryList is the name of the Dictionary object which
contains the list of words)
-----------------------------------------------------------
Function CheckWordExists(ByVal wordToCheck As String) As Long
'Function takes a word to find in the dictionary
'If found, will return the value of its position in the
dictionary
'If not, will return the compliment of its nearest neighbour
'NOTE- All values start at 1 - ie the first word in the
dictionary is word 1, etc
' This is so that if a word cannot be found, we never return a
-0 where the first word
' is the closest match.
Dim firstWord As String, lastWord As String
Dim firstVal As Double, lastVal As Double
wordToCheck = wordToCheck.ToLower
firstVal = 0
lastVal = dictionaryList.Count - 1
firstWord = dictionaryList.Item(CLng(firstVal)).ToLower
lastWord = dictionaryList.Item(CLng(lastVal)).ToLower
If wordToCheck >= firstWord And wordToCheck <= lastWord Then
Do Until firstWord = wordToCheck Or lastWord = wordToCheck
'Check if first and last vals are already adjacent -
we have no match
If (lastVal - firstVal) < 1 Then Exit Do
If firstWord.CompareTo(wordToCheck) < 0 Then 'first
word is before wordtocheck
If lastWord.CompareTo(wordToCheck) > 0 Then
'Last word is after wordtocheck
lastVal = firstVal + ((lastVal - firstVal) /
2) 'Move last word back to check again
Else 'Move first and last words up to check again
Dim diff As Double
diff = lastVal - firstVal
firstVal = lastVal
lastVal += diff
End If
Else 'First word is after wordtocheck
'Move firstval up by half difference between first
and last vals
firstVal = firstVal + (lastVal - firstVal) / 2
End If
firstWord =
dictionaryList.Item(CLng(firstVal)).ToLower
lastWord = dictionaryList.Item(CLng(lastVal)).ToLower
Loop
End If
'By this point, we are as close as possible to a match
If firstWord = wordToCheck Then
Return CLng(firstVal) + 1
ElseIf lastWord = wordToCheck Then
Return CLng(lastVal) + 1
Else
Dim i As Integer
On Error Resume Next
Do Until (i = Math.Min(firstWord.Length,
Math.Min(lastWord.Length, wordToCheck.Length)) - 1)
'Find whether firstWord or lastWord is the closest
match
'Do this by comparing each, letter by letter, with
wordToCheck
'until one is further away than the other.
'Stop if we run out of letters to compare!
If
Math.Abs(String.CompareOrdinal(wordToCheck.Chars(i),
firstWord.Chars(i))) < _
Math.Abs(String.CompareOrdinal(wordToCheck.Chars(i),
lastWord.Chars(i))) Then
Return -firstVal - 1
ElseIf
Math.Abs(String.CompareOrdinal(wordToCheck.Chars(i),
firstWord.Chars(i))) > _
Math.Abs(String.CompareOrdinal(wordToCheck.Chars(i),
lastWord.Chars(i))) Then
Return -lastVal - 1
End If
i = i + 1
Loop
Return -firstVal - 1 'By this point the two words are
identical to the given word,
'simply having more letters, which differ. Just return the
first one - no point checking further
End If
End Function
--------------------
Public Function ScrambleWord(ByVal OriginalWord As String) As
String
Dim NewWord(OriginalWord.Length - 1) As Char
Dim rtnString As String = String.Empty
Dim newPos As Integer
Randomize()
For i As Integer = 0 To OriginalWord.Length - 1
Do
newPos = Int(Rnd() * (OriginalWord.Length))
Loop Until NewWord(newPos) = Nothing
NewWord(newPos) = OriginalWord.Chars(i)
Next
For i As Integer = 0 To OriginalWord.Length - 1
rtnString &= NewWord(i)
Next
Return rtnString
End Function
-----------------------------------------------
Public Function FindAnagram(ByVal OriginalWord As String) As
String
Dim testWord As String = String.Empty
Dim i As Integer
Do
i = i + 1
If i > 100000 Then Return "-" 'Run 362880 checks, then
give up (value is 9 factorial)
testWord = Me.ScrambleWord(OriginalWord)
Loop Until Me.CheckWordExists(testWord) > 0 AndAlso
testWord.ToLower <> OriginalWord.ToLower
Return testWord
End Function
----------------------------------------------
I am looking for your advice on a project of mine to create a fully-
featured dictionary class.
I have a source file with around 75000 words of UK English. These are
sorted, then loaded into a Dictionary object, each with a long as its
key.
I have methods (code at the end of this message) as follows:
1) CheckWordExists: Check if a given word exists. If so, returns its
position in the dictionary, as a long. If not, returns the compliment
of its nearest neighbour
2) ScrambleWord: Takes a string, and returns the letters randomly
scrambled
3) ReturnRandomWord: Returns a random word from the dictionary. (Code
not listed below)
4) FindAnagram: Given a string, invokes a loop, calling ScrambleWord.
Then passes this to CheckWordExists to determine if this is a genuine
word rather than a random string of characters. If not, passes the
string to ScrambleWord again, to retrieve another anagram and checks
if this is a real word. This loops either until a valid word is
returned (which is not the original word!) or until 100000 attempts
have been made, at which point it gives up!
Now, the first 3 routines work nicely - taking usually less than a
millisecond to return their values.
However, the FindAnagram method is rather slow. I created a test loop
which does the following:
1) Calls ReturnRandomWord to generate a random 8-character word
2) Passes this to FindAnagram to try to find an anagram
3) If this fails to find an anagram, generate another random word and
try again.
Using a couple of datetime variables and a timespan variable, I tested
this a few times, and gained the following information:
1) Each time FindAnagram runs, it takes an average of 4.3 seconds to
either find an anagram, or report failure.
2) Using 8-letter words results in finding an anagram for, on average,
1 word in 23 taking an average of 1:42 minutes.
Now, I'm wondering if I'm missing something - is there a better way to
approach this than the "brute force" method I'm using to find
anagrams?
If anyone has any thoughts or input, I'd be really grateful!
Thanks in advance
Andrew
Code: (dictionaryList is the name of the Dictionary object which
contains the list of words)
-----------------------------------------------------------
Function CheckWordExists(ByVal wordToCheck As String) As Long
'Function takes a word to find in the dictionary
'If found, will return the value of its position in the
dictionary
'If not, will return the compliment of its nearest neighbour
'NOTE- All values start at 1 - ie the first word in the
dictionary is word 1, etc
' This is so that if a word cannot be found, we never return a
-0 where the first word
' is the closest match.
Dim firstWord As String, lastWord As String
Dim firstVal As Double, lastVal As Double
wordToCheck = wordToCheck.ToLower
firstVal = 0
lastVal = dictionaryList.Count - 1
firstWord = dictionaryList.Item(CLng(firstVal)).ToLower
lastWord = dictionaryList.Item(CLng(lastVal)).ToLower
If wordToCheck >= firstWord And wordToCheck <= lastWord Then
Do Until firstWord = wordToCheck Or lastWord = wordToCheck
'Check if first and last vals are already adjacent -
we have no match
If (lastVal - firstVal) < 1 Then Exit Do
If firstWord.CompareTo(wordToCheck) < 0 Then 'first
word is before wordtocheck
If lastWord.CompareTo(wordToCheck) > 0 Then
'Last word is after wordtocheck
lastVal = firstVal + ((lastVal - firstVal) /
2) 'Move last word back to check again
Else 'Move first and last words up to check again
Dim diff As Double
diff = lastVal - firstVal
firstVal = lastVal
lastVal += diff
End If
Else 'First word is after wordtocheck
'Move firstval up by half difference between first
and last vals
firstVal = firstVal + (lastVal - firstVal) / 2
End If
firstWord =
dictionaryList.Item(CLng(firstVal)).ToLower
lastWord = dictionaryList.Item(CLng(lastVal)).ToLower
Loop
End If
'By this point, we are as close as possible to a match
If firstWord = wordToCheck Then
Return CLng(firstVal) + 1
ElseIf lastWord = wordToCheck Then
Return CLng(lastVal) + 1
Else
Dim i As Integer
On Error Resume Next
Do Until (i = Math.Min(firstWord.Length,
Math.Min(lastWord.Length, wordToCheck.Length)) - 1)
'Find whether firstWord or lastWord is the closest
match
'Do this by comparing each, letter by letter, with
wordToCheck
'until one is further away than the other.
'Stop if we run out of letters to compare!
If
Math.Abs(String.CompareOrdinal(wordToCheck.Chars(i),
firstWord.Chars(i))) < _
Math.Abs(String.CompareOrdinal(wordToCheck.Chars(i),
lastWord.Chars(i))) Then
Return -firstVal - 1
ElseIf
Math.Abs(String.CompareOrdinal(wordToCheck.Chars(i),
firstWord.Chars(i))) > _
Math.Abs(String.CompareOrdinal(wordToCheck.Chars(i),
lastWord.Chars(i))) Then
Return -lastVal - 1
End If
i = i + 1
Loop
Return -firstVal - 1 'By this point the two words are
identical to the given word,
'simply having more letters, which differ. Just return the
first one - no point checking further
End If
End Function
--------------------
Public Function ScrambleWord(ByVal OriginalWord As String) As
String
Dim NewWord(OriginalWord.Length - 1) As Char
Dim rtnString As String = String.Empty
Dim newPos As Integer
Randomize()
For i As Integer = 0 To OriginalWord.Length - 1
Do
newPos = Int(Rnd() * (OriginalWord.Length))
Loop Until NewWord(newPos) = Nothing
NewWord(newPos) = OriginalWord.Chars(i)
Next
For i As Integer = 0 To OriginalWord.Length - 1
rtnString &= NewWord(i)
Next
Return rtnString
End Function
-----------------------------------------------
Public Function FindAnagram(ByVal OriginalWord As String) As
String
Dim testWord As String = String.Empty
Dim i As Integer
Do
i = i + 1
If i > 100000 Then Return "-" 'Run 362880 checks, then
give up (value is 9 factorial)
testWord = Me.ScrambleWord(OriginalWord)
Loop Until Me.CheckWordExists(testWord) > 0 AndAlso
testWord.ToLower <> OriginalWord.ToLower
Return testWord
End Function
----------------------------------------------