code to remove all duplicate words in a document?

  • Thread starter Thread starter gil
  • Start date Start date
G

gil

Hi All,

I'm seeking VBA code to automatically remove all duplicate words in a document. This could be either two or more same words back to
back or anywhere in the document.

Cheers,
Gil
 
As I think about it, I guess it shouldn't be that difficult. Code could look like "Select next word, assign as variable, delete all of that variable, type variable into 2nd document, return to first document, repeat."

Maybe I'll give it a try. It may already be in Word in a much more facile fashion.
:)
Gil
Asheville, NC


Cheers . Gil

Gil Carter, MD, JD

60 second peek movies of TSMR in regular use: http://www.tensecondmedicalrecord.com/Demo.htm, in regular use since 1990; free thru 2007 & until TSMR2008 released, uses Microsoft Word; can be used as an adjunct to other EMR programs.



Note: the text of the above post will not cut and paste with the "quote function."
 
Hi Gil,

for rather small documents:
The logic is alright, I think, but it is so slow.

Sub Test333()
Dim oWrd1 As Range
Dim oWrd2 As Range
For Each oWrd1 In ActiveDocument.Words
For Each oWrd2 In ActiveDocument.Words
If oWrd1 = oWrd2 Then
If Not oWrd1.IsEqual(oWrd2) Then
oWrd2.Delete
End If
End If
Next
Next
End Sub

for larger documents up to x > 100 times faster: ;-)

Sub Test333x()
Dim rWrd As Range
Dim sTmp As String
Dim rTmp As Range
Set rTmp = ActiveDocument.Range
For Each rWrd In ActiveDocument.Words
sTmp = rWrd.Text
rTmp.Start = rWrd.End
rTmp.End = ActiveDocument.Range.End
With rTmp.Find
.Text = sTmp
.Replacement.Text = ""
.Execute Replace:=wdReplaceAll
End With
Next
End Sub
--

Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Vista Small Business, Office XP
 
Greeting to you Helmut, from North Carolina, USA

You MVP's are always expanding my brains. They're gonna fall out some day. :)

Thanks! I'll play with it. It seems to be working. Mind boggling. :)

Gil
 
A wildcard replace of
(<[a-zA-Z]@>) \1
with
\1
will remove duplicates, but you would have to run it multiple times to
remove more than two instances.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP

My web site www.gmayor.com

<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
Hi Gil,

you might take into consideration,
that "for each word" processes
what word thinks is a word.

That is different from what humans consider to be a word.

E.g., if a word is not followed by a delimiter
such as a punctuation mark, Word regards the trailing
space as part of the Word.

So look at the following text:

dog dog.

It consists of the three words:
"dog " and "dog" and "."

not to speak of the paragraph mark,
which is a word as well.

--

Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Vista Small Business, Office XP
 
Thank you, Helmut. Your code really worked pretty reasonably. There were a few minor work arounds ... but it worked. :)
Gil
 
Back
Top