ciara said:
I have an Excel file filled with sentences. What I need
Excel to do is compare each sentence with the one in the
row above it, then extract the individual words that match between
these and put these words in another column. I've found out lots of
stuff on how to return the location (in
numbers) of matching strings, but I need the actual
words. Any suggestions?
This isn't something Excel or any other spreadsheet is good at. This could
be done in 3 lines of awk or Perl code. But if you insist on using Excel to
do this, you'll need to use VBA to do this only because Excel provides no
other mechanism to concatenate an *arbitrary* number of strings in a single
expression.
If you were to use a defined name Seq referring to =ROW(INDIRECT("1:1024")),
then you could generate an array of all words in one sentence (A2) also
occurring (at least as a substring) in another sentence (A1) using
=IF(COUNTIF(A1,"*"&MID(A2,SMALL(IF(MID(" "&A2,Seq,1)=" ",Seq),
ROW(INDIRECT("1:"&(LEN(A2)-LEN(SUBSTITUTE(A2," ",""))+1)))),
SMALL(IF(MID(A2&" ",Seq,1)=" ",Seq),
ROW(INDIRECT("1:"&(LEN(A2)-LEN(SUBSTITUTE(A2," ",""))+1))))
-SMALL(IF(MID(" "&A2,Seq,1)=" ",Seq),
ROW(INDIRECT("1:"&(LEN(A2)-LEN(SUBSTITUTE(A2," ",""))+1)))))&"*"),
MID(A2,SMALL(IF(MID(" "&A2,Seq,1)=" ",Seq),
ROW(INDIRECT("1:"&(LEN(A2)-LEN(SUBSTITUTE(A2," ",""))+1)))),
SMALL(IF(MID(A2&" ",Seq,1)=" ",Seq),
ROW(INDIRECT("1:"&(LEN(A2)-LEN(SUBSTITUTE(A2," ",""))+1))))
-SMALL(IF(MID(" "&A2,Seq,1)=" ",Seq),
ROW(INDIRECT("1:"&(LEN(A2)-LEN(SUBSTITUTE(A2," ",""))+1))))),"")
Yes, that's all one big, ugly formula. There's so much redundancy in it that
a udf may actually be faster to recalc.
Function CommonWords(x As String, y As String) As String
Dim i As Long, j As Long, k As Long, m As Long, n As Long
Dim w As String
m = Len(x)
n = Len(y)
If m > n Then
'swap x and y to iterate through shorter string (x)
w = x
x = y
y = w
k = m
m = n
n = k
End If
For i = 1 To m
If Mid(x, i, 1) Like "[A-Za-z]" Then
j = i
Do While Mid(x, i, 1) Like "[A-Za-z]"
i = i + 1
Loop
w = Mid(x, j, i - j)
k = InStr(1, y, w)
If k <= 0 Then
'do nothing - expedient to use this
ElseIf k = 1 And i - j = n Then
CommonWords = CommonWords & " " & w
ElseIf k = 1 And Mid(y, i - j + 1, 1) Like "[!A-Za-z]" Then
CommonWords = CommonWords & " " & w
ElseIf k + i - j > n And Mid(y, k - 1, 1) Like "[!A-Za-z]" Then
CommonWords = CommonWords & " " & w
ElseIf Mid(y, k - 1, i - j + 2) Like "[!A-Za-z]" & w & "[!A-Za-z]"
Then
CommonWords = CommonWords & " " & w
End If
End If
Next i
CommonWords = LTrim(CommonWords)
End Function