Random selection of records based on specifc value

  • Thread starter Thread starter Ixtreme
  • Start date Start date
I

Ixtreme

I would like to create sample files based on the total number of
unique values in column A. I use a vlookup value to calculate the
required sample records I need.
Suppose I have column A containing the following information:

apples
peers
oranges
apples
peers
etc etc.

Then I calculate the unique values:
apples = 200
peers = 450
oranges = 50

then I calculate:
total samples needed = 100

then I calculate the samples needed based on the 'weight':
apples = (200 / 700) * 100 = 29
peers = (450 / 700) * 100 = 64
oranges = (50 / 700) * 100 = 7

Here comes the problem:

Through vba I automatically want to select 29 random 'apples' rows, 64
random peers rows and 7 random oranges rows and copy them to sheet2.

Mark
 
Ixtreme said:
Through vba I automatically want to select 29 random
'apples' rows, 64 random peers rows and 7 random
oranges rows and copy them to sheet2.

Perhaps you can make use of the following, at least as a start.

This code assumes that each row of data consists of 3 columns: A, B and C.
And there are 1000 rows of data starting in row 2.

Note: Although the selection of each apple, orange and pear row is random,
the output consists of all apple samples, then all orange samples, then all
pear samples. If you want those shuffled, some additional work is needed.

-----

Option Explicit

Sub genData()
Const nSample As Long = 100
Dim s(1 To nSample, 1 To 3), v
Dim n As Long, i As Long
Dim nApple As Long, nOrange As Long, nPear As Long
Dim sApple As Long, sOrange As Long, sPear As Long

' input data
v = Sheets("Sheet1").Range("a2:c1001")
n = UBound(v, 1)
ReDim apple(1 To n) As Long
ReDim orange(1 To n) As Long
ReDim pear(1 To n) As Long
nApple = 0: nOrange = 0: nPear = 0
For i = 1 To n
Select Case v(i, 1)
Case "apple"
nApple = nApple + 1
apple(nApple) = i
Case "orange"
nOrange = nOrange + 1
orange(nOrange) = i
Case "pear"
nPear = nPear + 1
pear(nPear) = i
End Select
Next

' select random data
If nSample > nApple + nOrange + nPear Then
MsgBox "error"
Exit Sub
End If
Randomize
sApple = Int(nApple / n * nSample)
sOrange = Int(nOrange / n * nSample)
sPear = nSample - sApple - sOrange
doSelect sApple, apple, nApple, v, s, 0
doSelect sOrange, orange, nOrange, v, s, sApple
doSelect sPear, pear, nPear, v, s, sApple + sOrange
Sheets("sheet2").Range("a1", "c" & nSample) = s
End Sub


Private Sub doSelect(ByVal nSample As Long, myRow0() As Long, _
ByVal nRow As Long, v, s(), ByVal i As Long)
Dim j As Long, x As Long, r As Long
ReDim myRow(1 To nRow) As Long
If nSample <= 0 Then Exit Sub
For j = 1 To nRow: myRow(j) = myRow0(j): Next
Do
x = 1 + Int(nSample * Rnd)
r = myRow(x)
i = i + 1
s(i, 1) = v(r, 1)
s(i, 2) = v(r, 2)
s(i, 3) = v(r, 3)
If x < nSample Then myRow(x) = myRow(nSample)
nSample = nSample - 1
Loop Until nSample = 0
End Sub
 
Thanks for your answer. It is not exactly what I meant. I did some
clean up and think that I found a better setup. I have now 2 sheets.
Sheet 'Data" which contains all my data where the values in column [A]
are the record id's (the apples, pears, oranges etc etc).
On sheet "Stats" I have a few formulas that show me:
A The total rows used in Sheets "Data"
B A list of unique values from column [A] of the "Data" sheet.
Can be any number.
C A required sample needed for each item listed in [C]
A B C
200 rows apple 10
pear 20
.... ....
etc etc

The Data sheet looks like this
apple bla bla test test
orange sdjsdjh sdkj
apple dfjkdf dfkjdf
pear jsdkjs lkslksd
.....
etc etc

What I would like is: randomly select (the required number of sampes
[C]) rows from the "Data" sheet for each unique value I have listed in
column . So I need some for each loop I quess that first checks how
many unique items there are and then copy the correct number of
required samples from the "Data" sheet into the "Stats" sheet or a new
sheet.
 
Ixtreme said:
On sheet "Stats" I have a few formulas that show me:
A The total rows used in Sheets "Data"
B A list of unique values from column [A] of the
"Data" sheet. Can be any number.
C A required sample needed for each item listed in [C] [....]
The Data sheet looks like this
apple bla bla test test [....]
What I would like is: randomly select (the required
number of sampes [C]) rows from the "Data" sheet for
each unique value I have listed in column .


Try the following macro. Make any appropriate changes to cell references.

The macro assumes that Data!A:A has at least one empty cell after the last
valid column. Otherwise, Stats!A2 should contain the number of valid data
columns, just as Stats!A1 contains the number of valid data rows.

It also assumes that the list of key (what you variously call "record ids"
or "unique values") is followed by at least one empty cell, or a cell with
an error, or a cell with the null string. That should give some flexibility
in how you create the list of keys.

The random results are copied starting in A1 in a new worksheet inserted
before the "Data" worksheet. The macro makes no effort to copy formats or
adjust column widths in the new worksheet.

-----

Option Explicit

Sub genSample()
Dim dSheet As Worksheet, sSheet As Worksheet
Dim nRow As Long, nCol As Long, nKey As Long
Dim nSelect As Long
Dim i As Long, d As Long, k As Long, r As Long
Dim c As Long
Dim data, myKey

' input data and stats
Set dSheet = Sheets("Data")
Set sSheet = Sheets("Stats")
nRow = sSheet.Range("a1")
With dSheet
nCol = .Range("a1").End(xlToRight).Column
data = .Range("a1").Resize(nRow, nCol)
End With
With sSheet
nKey = .Range("b1").End(xlDown).Row
myKey = .Range("b1").Resize(nKey, 2)
For i = 1 To nKey
If IsError(myKey(i, 1)) Then Exit For
If myKey(i, 1) = "" Then Exit For
nSelect = nSelect + myKey(i, 2)
Next
nKey = i - 1
End With
If nSelect = 0 Or nSelect > nRow Then
MsgBox "error 1": Exit Sub
End If

' generate random results
ReDim res(1 To nSelect, 1 To nCol)
Randomize
For i = 1 To nSelect
d = 1 + Int(nRow * Rnd)
For k = 1 To nKey
If myKey(k, 1) = data(d, 1) Then Exit For
Next
If k <= nKey Then
If myKey(k, 2) > 0 Then
r = r + 1
For c = 1 To nCol
res(r, c) = data(d, c)
Next
myKey(k, 2) = myKey(k, 2) - 1
End If
End If
If d < nRow Then
For c = 1 To nCol
data(d, c) = data(nRow, c)
Next
End If
nRow = nRow - 1
Next

' write results to new worksheet
dSheet.Select
Sheets.Add
Range("a1").Resize(nSelect, nCol) = res
End Sub
 
Errata....

joeu2004 said:
For i = 1 To nSelect

Sorry. I failed to notice that the loop does not always generate the
correct number of samples.

See the attached macro. The for-loop is replaced by a do-loop.

-----

Option Explicit

Sub genSample()
Dim dSheet As Worksheet, sSheet As Worksheet
Dim nRow As Long, nCol As Long, nKey As Long
Dim nSelect As Long
Dim i As Long, d As Long, k As Long, r As Long
Dim c As Long
Dim data, myKey

' input data and stats
Set dSheet = Sheets("Data")
Set sSheet = Sheets("Stats")
nRow = sSheet.Range("a1")
With dSheet
nCol = .Range("a1").End(xlToRight).Column
data = .Range("a1").Resize(nRow, nCol)
End With
With sSheet
nKey = .Range("b1").End(xlDown).Row
myKey = .Range("b1").Resize(nKey, 2)
nSelect = 0
For i = 1 To nKey
If IsError(myKey(i, 1)) Then Exit For
If myKey(i, 1) = "" Then Exit For
nSelect = nSelect + myKey(i, 2)
Next
nKey = i - 1
End With
If nSelect = 0 Or nSelect > nRow Then
MsgBox "error 1": Exit Sub
End If

' generate random results
ReDim res(1 To nSelect, 1 To nCol)
Randomize
r = 0
Do
d = 1 + Int(nRow * Rnd)
For k = 1 To nKey
If myKey(k, 1) = data(d, 1) Then Exit For
Next
If k <= nKey Then
If myKey(k, 2) > 0 Then
r = r + 1
For c = 1 To nCol
res(r, c) = data(d, c)
Next
myKey(k, 2) = myKey(k, 2) - 1
If r = nSelect Then Exit Do
End If
End If
If d < nRow Then
For c = 1 To nCol
data(d, c) = data(nRow, c)
Next
End If
nRow = nRow - 1
Loop

' write results to new worksheet
dSheet.Select
Sheets.Add
Range("a1").Resize(nSelect, nCol) = res
End Sub
 
Back
Top