Most popular word

  • Thread starter Thread starter Umby
  • Start date Start date
U

Umby

Hi all,
I need a function to get the most frequent word in a cell. The length of the
output must be at least 3.
e.g. if a cell contents was "Vick rushed for 145 yards. Michael Vick handled
things in regulation. established in 1881 ; Yards Gear for the Holidays car
yards in crisis" , the output should be "yards".

Any idea?
Thank you.

Umby
 
Umby

Try this

Function MFW(Rng As Range) As String

Dim arrWords As Variant
Dim RangeText As String
Dim i As Long
Dim CurrCount As Long
Dim MaxCount As Long
Dim MaxWord As String

RangeText = UCase(Rng.Text)

RangeText = Replace(RangeText, ".", "")
RangeText = Replace(RangeText, ",", "")
RangeText = Replace(RangeText, ";", "")
RangeText = Replace(RangeText, ":", "")

arrWords = Split(RangeText, " ")

For i = LBound(arrWords) To UBound(arrWords)
If Len(arrWords(i)) >= 3 Then
CurrCount = (Len(RangeText) - _
Len(Replace(RangeText, arrWords(i), ""))) / Len(arrWords(i))
If CurrCount > MaxCount Then
MaxWord = arrWords(i)
MaxCount = CurrCount
End If
End If
Next i

MFW = MaxWord

End Function

Note that I had to strip out punctuation because "yards." is not the same as
"yards". I don't make any representation that the ones I elimate will be
sufficient. Also, I converted everything to caps.
 
thats a nice little function, however, if more than word appears with
the same frequency, it only returns the last one found with the
maxvalue
 
bdcrisp

Quite right. Except I think it returns the first one, not the last. If you
want a semicolon delimited list of the most frequent words, you would have
to adjust the function to this

Function MFW(Rng As Range) As String

Dim arrWords As Variant
Dim RangeText As String
Dim i As Long
Dim CurrCount As Long
Dim MaxCount As Long
Dim MaxWord As String

RangeText = UCase(Rng.Text)

RangeText = Replace(RangeText, ".", "")
RangeText = Replace(RangeText, ",", "")
RangeText = Replace(RangeText, ";", "")
RangeText = Replace(RangeText, ":", "")

arrWords = Split(RangeText, " ")

For i = LBound(arrWords) To UBound(arrWords)
If Len(arrWords(i)) >= 3 Then
CurrCount = (Len(RangeText) - _
Len(Replace(RangeText, arrWords(i), ""))) / Len(arrWords(i))
If CurrCount >= MaxCount Then
If InStr(1, MaxWord, arrWords(i) & ";") = 0 Then
MaxWord = MaxWord & arrWords(i) & ";"
End If
MaxCount = CurrCount
End If
End If
Next i

MFW = Left(MaxWord, Len(MaxWord) - 1)

End Function

--
Dick Kusleika
MVP - Excel
www.dicks-clicks.com
Post all replies to the newsgroup.

bdcrisp said:
thats a nice little function, however, if more than word appears with
the same frequency, it only returns the last one found with the
maxvalue


------------------------------------------------



~~Now Available: Financial Statements.xls, a step by step guide to
creating financial statements
 
Thank you very much!
Umby

-----Original Message-----
Umby

Try this

Function MFW(Rng As Range) As String

Dim arrWords As Variant
Dim RangeText As String
Dim i As Long
Dim CurrCount As Long
Dim MaxCount As Long
Dim MaxWord As String

RangeText = UCase(Rng.Text)

RangeText = Replace(RangeText, ".", "")
RangeText = Replace(RangeText, ",", "")
RangeText = Replace(RangeText, ";", "")
RangeText = Replace(RangeText, ":", "")

arrWords = Split(RangeText, " ")

For i = LBound(arrWords) To UBound(arrWords)
If Len(arrWords(i)) >= 3 Then
CurrCount = (Len(RangeText) - _
Len(Replace(RangeText, arrWords(i), ""))) / Len(arrWords(i))
If CurrCount > MaxCount Then
MaxWord = arrWords(i)
MaxCount = CurrCount
End If
End If
Next i

MFW = MaxWord

End Function

Note that I had to strip out punctuation because "yards." is not the same as
"yards". I don't make any representation that the ones I elimate will be
sufficient. Also, I converted everything to caps.


--
Dick Kusleika
MVP - Excel
www.dicks-clicks.com
Post all replies to the newsgroup.

cell. The length of
the Michael Vick
handled for the Holidays
car


.
 
"Umby" <[email protected]> ha scritto nel messaggio | Hi all,
| I need a function to get the most frequent word in a cell. The length of the
| output must be at least 3.
| e.g. if a cell contents was "Vick rushed for 145 yards. Michael Vick handled
| things in regulation. established in 1881 ; Yards Gear for the Holidays car
| yards in crisis" , the output should be "yards".
|
| Any idea?

in microsoft.public.it.office.excel
thread: cercare parole
date: dic 10, 2003

Public Function MFWord(buf1 As Range)
Dim buf2 As Object, buf3 As Object
Dim smax As String, nmax As Long
Set regexp = CreateObject("VBScript.RegExp")
regexp.Global = True
regexp.IgnoreCase = True
regexp.Pattern = "(\w{3,})"
Set buf2 = regexp.Execute(buf1.Text)
For Each occ In buf2
regexp.Pattern = occ & "\b|" & occ & "$"
Set buf3 = regexp.Execute(buf1.Text)
If buf3.Count > nmax Then smax = occ: nmax = buf3.Count
Next
MsgBox smax & " (" & nmax & ")"
MFWord = smax
End Function
.f
fernando cinquegrani
Microsoft MVP
http://www.prodomosua.it
 
Thank you for your function, however it doesn't consider
some character.
e.g. if cell value = "mip@it mip@it mip@it win win" ,
output = "mip"

Regards
Umby
 
"Umby" <[email protected]> ha scritto nel messaggio | Thank you for your function, however it doesn't consider
| some character.
| e.g. if cell value = "mip@it mip@it mip@it win win" ,
| output = "mip"
|

Public Function MFWord(buf1 As Range)
Dim buf2 As Variant, n as long
Dim smax As String, nmax As Long
Dim occ As Variant, acc As Variant
Set regexp = CreateObject("VBScript.RegExp")
regexp.Global = True
regexp.IgnoreCase = True
regexp.Pattern = "([a-z_&@]{3,})"
Set buf2 = regexp.Execute(buf1.Text)
For Each occ In buf2
n = 0
For Each acc In buf2
If occ = acc Then n = n + 1
If n > nmax Then smax = occ: nmax = n
Next
Next
'MsgBox smax & " (" & n & ")"
MFWord = smax
End Function

the pattern now includes
the characters from a to z (a-z)
and the symbols _, & and @
.f
fernando cinquegrani
Microsoft MVP
http://www.prodomosua.it
 
Thank you for your function, however it doesn't consider
some character.
e.g. if cell value = "mip@it mip@it mip@it win win" ,
output = "mip" ...
...

Rewrite the function to be more flexible. Also eliminate the MsgBox call in a
function - never a good idea.


Function mcw( _
textstr As String, _
Optional sep As String = " ", _
Optional matchpat As String = "\b\w{3,}\b", _
Optional caseinsens As Boolean = True _
) As String
'-------------------------------------------------------------
Dim wc() As Long, m As Object, mc As Object, re As Object
Dim k As Long, n As Long, wcmax As Long, t As String

t = textstr

Set re = CreateObject("VBScript.RegExp")
re.Pattern = matchpat
re.IgnoreCase = caseinsens
re.Global = True
Set mc = re.Execute(textstr)

If mc.Count = 0 Then Exit Function

ReDim wc(1 To mc.Count)

For Each m In mc
n = n + 1
k = Len(t)
re.Pattern = "\b" & m.Value & "\b"
t = re.Replace(t, "")
wc(n) = (k - Len(t)) / Len(m.Value)
If wc(n) > wcmax Then wcmax = wc(n)
Next m

For k = 1 To n
If wc(k) >= wcmax Then mcw = mcw & sep & mc.Item(k - 1).Value
Next k

Erase wc
Set m = Nothing
Set mc = Nothing
Set re = Nothing

mcw = Mid(mcw, Len(sep) + 1)
End Function


Change the second argument to use a word separator other than space. Change the
third and fourth arguments to use different definitions of 'word' as specified
by VBScript regular expressions.
 
"Harlan Grove" <[email protected]> ha scritto nel messaggio
| Rewrite the function to be more flexible.

correct

| Also eliminate the MsgBox call in a
function - never a good idea.

'MsgBox smax & " (" & n & ")"
see the original thread in microsoft.public.it.office.excel

| Change the second argument to use a word separator other than space.

how? " ", ",",";","!","?","."......

| Change the third and fourth arguments to use different definitions of 'word' as specified
| by VBScript regular expressions.

in VBScript regular expressions there is only one definition of 'word'.
parentheses denotes a 'subexpression'.
.f
 
fernando cinquegrani said:
"Harlan Grove" <[email protected]> ha scritto . . . ....
| Change the third and fourth arguments to use different definitions of
|'word' as specified by VBScript regular expressions.

in VBScript regular expressions there is only one definition of 'word'.
parentheses denotes a 'subexpression'.

I don't mean a word as VBScript would define the term (\b\w+\b), I mean a
word as the user wants to define it. My udf defaults to \b\w{3,}\b which is
clost to the VBScript definition, but if the user wants only letters, then
s/he could use mcw(somestring,,"\b[A-Za-z]{3,}\b").
 
Back
Top