Recode genetic data

  • Thread starter Thread starter sedm1000
  • Start date Start date
S

sedm1000

This forum was great for my last problem, so here is hoping again...

I have a large amount of genetics data - genotypes to analyse. The dat
is outputed into a standard spreadsheet as assigned by the genotypin
software, but the analysis must be performed by unix software wit
limitations on the number of alleles present.

Essentially, this means that I must convert a collection of numbers t
their lowest sequential form - i.e."9 9 12 8 4" to "3 3 4 2 1". Th
conversion must take into account all the numbers in a column (actuall
two adjacent columns) to find intermediate numbers that are no
present and can be excluded.

Hopefully that is clear enough for somebody to figure out how t
achieve this.

Thanks in advance
 
sedm,
I must convert a collection of numbers to
their lowest sequential form - i.e."9 9 12 8 4" to "3 3 4 2 1".

Copy the code below (all three functions and the test macro) into a code
module, and call it from a worksheet using, for example:

=ReduceNumbers(A1& " " &B1)

If A1 has 9 9 12 8 4 and B1 has 9 12 8 3 5, it will return the single string

5 5 6 4 2 5 6 4 1 3
The conversion must take into account all the numbers in a column (actually
two adjacent columns) to find intermediate numbers that are not
present and can be excluded.

Bernie
MS Excel MVP

Option Explicit

Sub TestFunction()
MsgBox ReduceNumbers("9 9 12 8 4")
End Sub

Function ReduceNumbers(mystr As String)
Dim myVal As Variant
Dim i As Integer

myVal = Split(mystr, " ")
myVal = UniqueMembers(myVal)
myVal = ArrayBubbleSort(myVal, True)
mystr = " " & mystr & " "
mystr = Replace(mystr, " ", " ")
For i = LBound(myVal) To UBound(myVal)
mystr = Replace(" " & mystr & " ", " " & myVal(i) & " ", " " & i & " ", 1,
Len(mystr))
Next i
ReduceNumbers = Application.Trim(mystr)
End Function


Function UniqueMembers( _
myArray As Variant) As Variant
Dim i As Integer
Dim j As Integer
Dim myRet() As Variant
Dim Matched As Boolean

ReDim myRet(1 To 1)
myRet(1) = myArray(LBound(myArray))
For i = LBound(myArray) + 1 To UBound(myArray)
Matched = False
For j = 1 To UBound(myRet)
If myArray(i) = myRet(j) Then Matched = True
Next j
If Not Matched Then
ReDim Preserve myRet(1 To UBound(myRet) + 1)
myRet(UBound(myRet)) = myArray(i)
End If
Next i

UniqueMembers = myRet

End Function


Function ArrayBubbleSort( _
myArray As Variant, _
Optional Ascending As Boolean) _
As Variant
Dim myTemp As Variant
Dim myInt() As Variant
Dim i As Integer
Dim j As Integer

ReDim myInt(LBound(myArray) To UBound(myArray))
For i = LBound(myArray) To UBound(myArray)
myInt(i) = Val(Trim(myArray(i)))
Next i
'Do the sort
For i = LBound(myInt) To UBound(myInt) - 1
For j = i + 1 To UBound(myInt)
If Ascending Then
If myInt(i) > myInt(j) Then
myTemp = myInt(j)
myInt(j) = myInt(i)
myInt(i) = myTemp
End If
Else
If myInt(i) < myInt(j) Then
myTemp = myInt(j)
myInt(j) = myInt(i)
myInt(i) = myTemp
End If
End If
Next j
Next i

'Return the array
ArrayBubbleSort = myInt
End Function
 
Sorry - Watch the wrapping on these lines

mystr = Replace(" " & mystr & " ", " " & myVal(i) & " ", " " & i & " ", 1,
Len(mystr))

Change them to

mystr = Replace(" " & mystr & " ", " " & myVal(i) & " ", " " & i & " ", _
1, Len(mystr))

HTH,
Bernie
MS Excel MVP

Bernie Deitrick said:
sedm,
I must convert a collection of numbers to
their lowest sequential form - i.e."9 9 12 8 4" to "3 3 4 2 1".

Copy the code below (all three functions and the test macro) into a code
module, and call it from a worksheet using, for example:

=ReduceNumbers(A1& " " &B1)

If A1 has 9 9 12 8 4 and B1 has 9 12 8 3 5, it will return the single string

5 5 6 4 2 5 6 4 1 3
The conversion must take into account all the numbers in a column (actually
two adjacent columns) to find intermediate numbers that are not
present and can be excluded.

Bernie
MS Excel MVP

Option Explicit

Sub TestFunction()
MsgBox ReduceNumbers("9 9 12 8 4")
End Sub

Function ReduceNumbers(mystr As String)
Dim myVal As Variant
Dim i As Integer

myVal = Split(mystr, " ")
myVal = UniqueMembers(myVal)
myVal = ArrayBubbleSort(myVal, True)
mystr = " " & mystr & " "
mystr = Replace(mystr, " ", " ")
For i = LBound(myVal) To UBound(myVal)
mystr = Replace(" " & mystr & " ", " " & myVal(i) & " ", " " & i & " ", 1,
Len(mystr))
Next i
ReduceNumbers = Application.Trim(mystr)
End Function


Function UniqueMembers( _
myArray As Variant) As Variant
Dim i As Integer
Dim j As Integer
Dim myRet() As Variant
Dim Matched As Boolean

ReDim myRet(1 To 1)
myRet(1) = myArray(LBound(myArray))
For i = LBound(myArray) + 1 To UBound(myArray)
Matched = False
For j = 1 To UBound(myRet)
If myArray(i) = myRet(j) Then Matched = True
Next j
If Not Matched Then
ReDim Preserve myRet(1 To UBound(myRet) + 1)
myRet(UBound(myRet)) = myArray(i)
End If
Next i

UniqueMembers = myRet

End Function


Function ArrayBubbleSort( _
myArray As Variant, _
Optional Ascending As Boolean) _
As Variant
Dim myTemp As Variant
Dim myInt() As Variant
Dim i As Integer
Dim j As Integer

ReDim myInt(LBound(myArray) To UBound(myArray))
For i = LBound(myArray) To UBound(myArray)
myInt(i) = Val(Trim(myArray(i)))
Next i
'Do the sort
For i = LBound(myInt) To UBound(myInt) - 1
For j = i + 1 To UBound(myInt)
If Ascending Then
If myInt(i) > myInt(j) Then
myTemp = myInt(j)
myInt(j) = myInt(i)
myInt(i) = myTemp
End If
Else
If myInt(i) < myInt(j) Then
myTemp = myInt(j)
myInt(j) = myInt(i)
myInt(i) = myTemp
End If
End If
Next j
Next i

'Return the array
ArrayBubbleSort = myInt
End Function
 
sedm,

Below is code for a simpler User-Defined-Function than the one I posted
yesterday, used like

=Ordered(A1 & " " & B1)

for example.

HTH,
Bernie
MS Excel MVP

Function Ordered(inCell As String) As String
Dim i As Integer
Dim myCount As Integer
Dim myMax As Integer
Dim myVals As Variant

Ordered = Replace(inCell, " ", " ")
Ordered = " " & Ordered & " "

myVals = Split(inCell, " ")

myMax = myVals(0)

For i = 1 To UBound(myVals)
If myVals(i) > myMax Then myMax = myVals(i)
Next i

myCount = 1
For i = 1 To myMax
If InStr(1, Ordered, " " & i & " ") > 0 Then
Ordered = Replace(Ordered, " " & i & " ", _
" " & myCount & " ", 1, Len(Ordered))
myCount = myCount + 1
End If
Next i

End Function
 
The Analysis ToolPak function GCD can be used successively to find the
greatest common divisor of all the numbers.

The connection of this problem to genetics is not clear to me.

Jerry
 
Thanks greatly for your help...

(Genetics connection is reducing the magnitiude of the allele data fo
the mathematically challenged linkage software to handle).

Cheers
 
Hi,

Have tried and played with the macros posted, with limite
success. They both have a couple of flaws, maybe down to m
specifications...

identical numbers in adjacent columns are not processed properly:

1 6
3 2
5 5

becomes

1 2
2 1
1 5

but should be...

1 5
3 2
4 4

Seems that neither macros can take into account all the numbers of th
two columns, and get confused by the same number appearing in adjacen
cells.

I`m trying to work through this t the moment, but any input would b
appreciated. Just to recap, ideally:


A B
1 5
3 9
7 7

would become

A B C D
1 5 1 3
3 9 2 5
7 7 4 4



Cheers for any help..
 
sedm,

Your explanation is much clearer this time.

Copy and paste the code below into a code module.

If your string are in cells A1:B100, then select C1:D100, enter

=Ordered(A1:B100)

and press Ctrl-Shift-Enter. This enters the formula as an array formula.

Limits: If your numbers in the string can be higher than 100, then change

mySubs(1 To 100)

to

mySubs(1 To HighestNumberYouHave)

Also, the total length of all strings in all the cells is limited to 32K or
so. If you have more than that, the code can be rewritten to take the
values in chunks - you never said how many cells, how long the strings are,
etc.

HTH,
Bernie
MS Excel MVP


Function Ordered(inCells As Range) As Variant
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim myTemp As String
Dim myArr() As String
Dim myInput As String
Dim myCount As Integer
Dim myMax As Integer
Dim myVals As Variant
Dim mySubs(1 To 100) As Integer

ReDim myArr(1 To 2, 1 To inCells.Rows.Count)

For j = 1 To inCells.Cells.Count
myInput = myInput & " " & inCells(j).Value
Next j

myInput = " " & myInput & " "

myVals = Split(Application.Trim(myInput), " ")

myMax = myVals(0)

For i = 1 To UBound(myVals)
If myVals(i) > myMax Then myMax = myVals(i)
Next i

myCount = 1
For i = 1 To myMax
If InStr(1, myInput, " " & i & " ") > 0 Then
mySubs(i) = myCount
myCount = myCount + 1
End If
Next i

For i = 1 To 2
For j = 1 To inCells.Rows.Count
myVals = Split(inCells(j, i).Value, " ")
myTemp = ""
For k = 0 To UBound(myVals)
myTemp = myTemp & " " & mySubs(myVals(k))
Next k
myArr(i, j) = myTemp
Next j
Next i

Ordered = Application.Transpose(myArr)
End Function
 
Thanks, that works almost perfectly now.

Minor quibble, is there a way for it to handle (at least ignore)? No
big deal, can just blank and replace after running it ...

Cheers
 
Sedm,

Certainly will be easy, but I need to know if you mean to ignore ? marks
within your strings, or as whole cell values.

Bernie
 
Bernie said:
*Sedm,

Certainly will be easy, but I need to know if you mean to ignore
marks
within your strings, or as whole cell values.

Bernie
*
[/QUOTE]

Cheers - it was just cell values with the value "0"...but as I said
this seems to have been fixed by changing the mySubs to "(0 to 100).

Doesn`t seem to have made any errors anywhere, but if there is a
obvious mistake with this alteration, please let me know...

Thanks
 
Sedm,

That is fine, as long as you are happy with zeroes staying zeroes and, if
there aren't any zeroes, not having ones become zeroes.

Your redimensioning of that array simply sets the 'zeroth' element to zero.

So, if you don't have zeroes, do you want ones to become zeroes? If you
don't want to change the zeroes to 1s or, in the case of missing zeroes, 1s
to zeroes.

HTH,
Bernie
MS Excel MVP


Cheers - it was just cell values with the value "0"...but as I said,
this seems to have been fixed by changing the mySubs to "(0 to 100).

Doesn`t seem to have made any errors anywhere, but if there is an
obvious mistake with this alteration, please let me know...

Thanks.
[/QUOTE]
 
Zeros are fine as zeros - they are dealt with correctly by th
processing software. Ones are important, so shouldn`t be zeroed, bu
should remain as ones...:cool:

Thanks for all your help
 
Back
Top