Mining an array

  • Thread starter Thread starter Johnny Daly
  • Start date Start date
J

Johnny Daly

Hi,

I have an array of data with various combinations of two-
letter IDs for each observation. For instance, observation
1 has the following 6 IDs "FO CH GM HO VW MI" (in that
format), observation 2 has just 1 ID "GM", observation 3
has 4 IDs "NI VW MI CH" and so forth. In theory, there is
no limit for how many IDs an observation can have, but
glancing down the list, it seems that no observation has
more than 10 IDs. Is there a method of generating a list
of all the IDs that appear in the list, so that this list
would only show each ID once?

Thanks!
 
This is an adaptation of John Walkenbach's tip
http://j-walk.com/ss/excel/tips/tip47.htm

Output is sent to the immediate window. TestUniques generates and array
similar to the one you describe. The routine uses Split, so it runs in
xl2000 or later.


Option Explicit
' This example is based on a tip by J.G. Hussey,
' published in "Visual Basic Programmer's Journal"

Sub TestUniques()
Dim varr As Variant
Dim i As Long, j As Long
Dim sStr As String
Dim varr1 As Variant
ReDim varr(1 To 10)
For i = 1 To 10
sStr = ""
For j = 1 To Int(Rnd * 10 + 1)
sStr = sStr & Chr(Int(Rnd * 26 + 65)) _
& Chr(Int(Rnd * 26 + 65)) & " "
Next
varr(i) = Trim(sStr)
Next
varr1 = Uniques(varr)
For i = LBound(varr1) To UBound(varr1)
Debug.Print varr1(i)
Next
End Sub

Function Uniques(varr)
Dim AllCells As Variant, varr1 As Variant
Dim varr2 As Variant, varr3 As Variant
Dim NoDupes As New Collection
Dim i As Long, j As Long, k As Long
Dim Swap1, Swap2, Item

' The items are in A1:A105
varr1 = varr
' The next statement ignores the error caused
' by attempting to add a duplicate key to the collection.
' The duplicate is not added - which is just what we want!
On Error Resume Next
For i = LBound(varr1) To UBound(varr1)
varr3 = Split(varr1(i))
For k = LBound(varr3) To UBound(varr3)
NoDupes.Add varr3(k), CStr(varr3(k))
Next
' Note: the 2nd argument (key) for the Add method must be a string
Next i

' Resume normal error handling
On Error GoTo 0

' Sort the collection (optional)
For i = 1 To NoDupes.Count - 1
For j = i + 1 To NoDupes.Count
If NoDupes(i) > NoDupes(j) Then
Swap1 = NoDupes(i)
Swap2 = NoDupes(j)
NoDupes.Add Swap1, before:=j
NoDupes.Add Swap2, before:=i
NoDupes.Remove i + 1
NoDupes.Remove j + 1
End If
Next j
Next i

' Add the sorted, non-duplicated items to a ListBox
ReDim varr2(1 To NoDupes.Count)
i = 1
For Each Item In NoDupes
varr2(i) = Item
i = i + 1
Next Item

Uniques = varr2
End Function
 
Hi Tom,
Thank you very much for your help! The macro seems to work
(no error messages), however, I get no output!? I pasted a
sample dataset into A1:A105. Let's say I want to get my
output array in Sheet2 cell A1. How would I do that. Also,
should the function code reside in a VBA module and should
the macro procedure reside in the code module for Sheet1?
Again, thank you very much!
Henrik

Here is a are the first 105 observations in my dataset:

FO CH GM HO VW MI
GM
GM NI
FO
GM
CH GM
GM IS SU
GM CH
GM
GM CH
GM
GM
GM TO CH
GM CH
GM IS
GM
FO HO HY GM MA
GM
CH GM IS SU SA
GM
GM
GM
GM
GM IS NI
FO GM DC
GM
FO VO
FO GM
FO GM
GM
GM
FO GM IS
GM CH
FO HY
FO
FO
NI
NI SM
RO FO NI
FO
GM
FO GM CH SA HO
FO
CH NI TO
FO
GM IS
GM
GM
GM
GM
GM IS
GM
GM HO
GM
GM
GM
FO DC
FO
FO
GM IS VW MB
FO GM
FO GM
FO
GM
FO
FO GM
GM
FO
GM FO
KI GM DW MA
FO
GM
GM FO
GM
GM
FO GM
FO GM
GM CH
NI
NI
NI
KI NI
GM
NI
NI
CH NI
GM VW HY SA PO
NI
GM
FO
HO NI
GM
FO GM
GM CA
FO
SA GM
GM CH
GM
GM
FO
GM HU
GM
CH GM MI
FO
CH GM
 
As I said, the output goes to the Immediate window in the VBE
(view=>Immediate Window). I have modified the testuniques to do what you
describe.

If you want it somewhere else:

Sub TestUniques()
Dim varr As Variant
Dim i As Long, j As Long
Dim sStr As String
Dim varr1 As Variant, Varr2 as Variant
Dim rw as long
'ReDim varr(1 To 10)
'For i = 1 To 10
' sStr = ""
' For j = 1 To Int(Rnd * 10 + 1)
' sStr = sStr & Chr(Int(Rnd * 26 + 65)) _
' & Chr(Int(Rnd * 26 + 65)) & " "
' Next
' varr(i) = Trim(sStr)
'Next
With Worksheets("Sheet1")
varr = .Range("A1").Resize(105,1).Value
End With
Redim varr2(1 to 105)
for i = 1 to 105
varr2(i) = varr(i,1)
Next
varr1 = Uniques(varr2)
rw = 1
For i = LBound(varr1) To UBound(varr1)
' Debug.Print varr1(i)
worksheets("Sheet2").Cells(rw,1).Value = varr1(i)
Next
End Sub

The above code and the Uniques function should be in a general module.
Neither should be in a sheet module

Input data in Sheet1, A1:A105

Output in Sheet2, starting in A1
 
The program works, but it seems to permutate all the
possible 2 letter combination rather than giving me all
unique 2 letter IDs!? Is there a way to fix this?
Thanks,
Henrik
 
I ran it with your test data and got
CA 1
CH 15
DC 2
DW 1
FO 35
GM 73
HO 5
HU 1
HY 3
IS 8
KI 2
MA 2
MB 1
MI 2
NI 15
PO 1
RO 1
SA 4
SM 1
SU 2
TO 2
VO 1
VW 3

the numbers were produced with a countif formula
=COUNTIF(Sheet1!$A$1:$A$105,"*"&A1&"*")

It appears that all of the combinations are in your original data - so I
don't see any permuting going on.

Worked fine for me. I left one thing out of the testuniques macro - to
increment the row counter when it is writing the data to sheet2. Here is
the revision:

Sub TestUniques()
Dim varr As Variant
Dim i As Long, j As Long
Dim sStr As String
Dim varr1 As Variant, Varr2 As Variant, Varr5 As Variant
Dim rw As Long
'ReDim varr(1 To 10)
'For i = 1 To 10
' sStr = ""
' For j = 1 To Int(Rnd * 10 + 1)
' sStr = sStr & Chr(Int(Rnd * 26 + 65)) _
' & Chr(Int(Rnd * 26 + 65)) & " "
' Next
' varr(i) = Trim(sStr)
'Next
With Worksheets("Sheet1")
varr = .Range("A1").Resize(105, 1).Value
End With
ReDim Varr2(1 To 105)
For i = 1 To 105
Varr2(i) = varr(i, 1)
Next i
varr1 = Uniques(Varr2)
rw = 1
For i = LBound(varr1) To UBound(varr1)
' Debug.Print varr1(i)
Worksheets("Sheet2").Cells(rw, 1).Value = varr1(i)
rw = rw + 1 ' <=== added line
Next
End Sub
 
Back
Top