Palindromes and repeats

  • Thread starter Thread starter Luciano Paulino da Silva
  • Start date Start date
L

Luciano Paulino da Silva

Dear all,
Some time ago (http://groups.google.com.br/group/
microsoft.public.excel.worksheet.functions/browse_thread/thread/
6b068321053a5c90/c6dcff10540e4bc2?q=palindromes+excel+bernie&lnk=ol&)
Bernie Deitrick helped me to solve a problem related to palindromes
and repeats detection on a string of letters. At present, I need
perform some change on that macros in order to detect non-redundant
palindromes and repeats. In this way, for the sequence bellow my
solution it would be:

QGAGAAAAAAAAGGAGQGG

13 Palindromes detected
GAG
AGA
GAAAAAAAAG
AA
AAA
AAAA
AAAAA
AAAAAA
AAAAAAA
AAAAAAAA
AGGA
GG
GQG 1 3

Now the solution it would be:

QGAGAAAAAAAAGGAGQGG

13 Non-redundant Palindromes detected

GAG
GAAAAAAAAG
GG

The big palindromes should be preferred in the occurrences.
Thanks in advance,
Luciano
 
I forgot to say that it would be selected the palindromes as big as
detected from the order they appear in the text. So, if a big
palindrome is detected the next one will be the following after it in
the string even when there are a palindrome inside it. Similar
procedure will be necessary perform with repeats.
Thanks in advance,
Luciano
 
I forgot to say that it would be selected the palindromes as big as
detected from the order they appear in the text. So, if a big
palindrome is detected the next one will be the following after it in
the string even when there are a palindrome inside it. Similar
procedure will be necessary perform with repeats.
Thanks in advance,Luciano
 
Dear all,
Some time ago (http://groups.google.com.br/group/
microsoft.public.excel.worksheet.functions/browse_thread/thread/
6b068321053a5c90/c6dcff10540e4bc2?q=palindromes+excel+bernie&lnk=ol&)
Bernie Deitrick helped me to solve a problem related to palindromes
and repeats detection on a string of letters. At present, I need
perform some change on that macros in order to detect non-redundant
palindromes and repeats. In this way, for the sequence bellow my
solution it would be:

QGAGAAAAAAAAGGAGQGG

13 Palindromes detected
GAG
AGA
GAAAAAAAAG
AA
AAA
AAAA
AAAAA
AAAAAA
AAAAAAA
AAAAAAAA
AGGA
GG
GQG     1       3

Now the solution it would be:

QGAGAAAAAAAAGGAGQGG

13 Non-redundant Palindromes detected

GAG
GAAAAAAAAG
GG

The big palindromes should be preferred in the occurrences.
Thanks in advance,
Luciano

I'm not quite sure what you mean by "Non-redundant". I take it to mean
something like "maximal non-overlapping"

I didn't modify Bernie Deitrick's code (so it might have a different
input/output) but came up with the following:

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Function Reverse(S As String) As String
Dim i As Long, n As Long
n = Len(S)
For i = 1 To n
Reverse = Reverse & Mid(S, n - i + 1, 1)
Next i
End Function

Function FindMaxInitPalin(SearchString As String) As String
'Given a string it returns the longest initial segment which is a
palindrome
Dim i As Long, j As Long
Dim init As String
For i = 1 To Len(SearchString)
init = Mid(SearchString, 1, i)
If init = Reverse(init) Then j = i
Next i
FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
'returns a variant array consisting of maximal nonoverlapping
palindromes
Dim Palindromes As Variant
Dim InputString As String
Dim chomp As String
Dim n As Long

InputString = SearchString
Do While Len(InputString) > 0
chomp = FindMaxInitPalin(InputString)
InputString = Mid(InputString, Len(chomp) + 1)
If Len(chomp) > 1 Then
If IsEmpty(Palindromes) Then
ReDim Palindromes(1 To 1)
Palindromes(1) = chomp
Else
n = UBound(Palindromes)
ReDim Preserve Palindromes(1 To n + 1)
Palindromes(n + 1) = chomp
End If
End If
Loop
FindMaxPalins = Palindromes
End Function

Sub Main()
Dim SearchString As String
Dim Results As Variant
Dim i As Long

SearchString = InputBox("Enter a string")
Results = FindMaxPalins(SearchString)
If IsEmpty(Results) Then
Range("A1").Value = "No palindromes found"
Else
Range("A1").Value = UBound(Results) & " maximal palidromes
found: "
For i = 1 To UBound(Results)
Range("A1").Offset(i).Value = Results(i)
Next i
End If
End Sub


''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

The function FindMaxPalins is the heart of the code. Main() is a
simple driver program that dumps the results in column A and can of
course be modified to put the output somewhere else. On your sample
input I get:

4 maximal palidromes found:
GAG
AAAAAAAA
GG
GQG

I notice that you didn't include GQG in your desired output. Was that
an oversight on your part? If not, I really don't know what you mean
by "nonredundant."

Hope that helps

-John Coleman
 
I'm not quite sure what you mean by "Non-redundant". I take it to mean
something like "maximal non-overlapping"

I didn't modify Bernie Deitrick's code (so it might have a different
input/output) but came up with the following:

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''­'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Function Reverse(S As String) As String
    Dim i As Long, n As Long
    n = Len(S)
    For i = 1 To n
        Reverse = Reverse & Mid(S, n - i + 1, 1)
    Next i
End Function

Function FindMaxInitPalin(SearchString As String) As String
    'Given a string it returns the longest initial segment which is a
palindrome
    Dim i As Long, j As Long
    Dim init As String
    For i = 1 To Len(SearchString)
        init = Mid(SearchString, 1, i)
        If init = Reverse(init) Then j = i
    Next i
    FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
'returns a variant array consisting of maximal nonoverlapping
palindromes
    Dim Palindromes As Variant
    Dim InputString As String
    Dim chomp As String
    Dim n As Long

    InputString = SearchString
    Do While Len(InputString) > 0
        chomp = FindMaxInitPalin(InputString)
        InputString = Mid(InputString, Len(chomp) + 1)
        If Len(chomp) > 1 Then
            If IsEmpty(Palindromes) Then
                ReDim Palindromes(1 To 1)
                Palindromes(1) = chomp
            Else
                n = UBound(Palindromes)
                ReDim Preserve Palindromes(1 To n + 1)
                Palindromes(n + 1) = chomp
            End If
        End If
    Loop
    FindMaxPalins = Palindromes
End Function

Sub Main()
    Dim SearchString As String
    Dim Results As Variant
    Dim i As Long

    SearchString = InputBox("Enter a string")
    Results = FindMaxPalins(SearchString)
    If IsEmpty(Results) Then
        Range("A1").Value = "No palindromes found"
    Else
        Range("A1").Value = UBound(Results) & " maximal palidromes
found: "
        For i = 1 To UBound(Results)
            Range("A1").Offset(i).Value = Results(i)
        Next i
    End If
End Sub

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''­'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''­''''''''''''''

The function FindMaxPalins is the heart of the code. Main() is a
simple driver program that dumps the results in column A and can of
course be modified to put the output somewhere else. On your sample
input I get:

4 maximal palidromes found:
GAG
AAAAAAAA
GG
GQG

I notice that you didn't include GQG in your desired output. Was that
an oversight on your part? If not, I really don't know what you mean
by "nonredundant."

Hope that helps

-John Coleman- Hide quoted text -

- Show quoted text -

I looked at my post and saw that google had introduced a couple of
line breaks that would prevent proper parsing. In case you can't
figure it out, I removed the offending comments and rewrote the code
slightly at the end to get shorter lines:

Function Reverse(S As String) As String
Dim i As Long, n As Long
n = Len(S)
For i = 1 To n
Reverse = Reverse & Mid(S, n - i + 1, 1)
Next i
End Function

Function FindMaxInitPalin(SearchString As String) As String
Dim i As Long, j As Long
Dim init As String
For i = 1 To Len(SearchString)
init = Mid(SearchString, 1, i)
If init = Reverse(init) Then j = i
Next i
FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
Dim Palindromes As Variant
Dim InputString As String
Dim chomp As String
Dim n As Long

InputString = SearchString
Do While Len(InputString) > 0
chomp = FindMaxInitPalin(InputString)
InputString = Mid(InputString, Len(chomp) + 1)
If Len(chomp) > 1 Then
If IsEmpty(Palindromes) Then
ReDim Palindromes(1 To 1)
Palindromes(1) = chomp
Else
n = UBound(Palindromes)
ReDim Preserve Palindromes(1 To n + 1)
Palindromes(n + 1) = chomp
End If
End If
Loop
FindMaxPalins = Palindromes
End Function

Sub Main()
Dim SearchString As String
Dim Results As Variant
Dim i As Long, n As Long

SearchString = InputBox("Enter a string")
Results = FindMaxPalins(SearchString)
If IsEmpty(Results) Then
Range("A1").Value = "No palindromes found"
Else
n = UBound(Results)
Range("A1").Value = n & " maximal palidromes found: "
For i = 1 To n
Range("A1").Offset(i).Value = Results(i)
Next i
End If
End Sub

-John Coleman
 
I looked at my post and saw that google had introduced a couple of
line breaks that would prevent proper parsing. In case you can't
figure it out, I removed the offending comments and rewrote the code
slightly at the end to get shorter lines:

Function Reverse(S As String) As String
    Dim i As Long, n As Long
    n = Len(S)
    For i = 1 To n
        Reverse = Reverse & Mid(S, n - i + 1, 1)
    Next i
End Function

Function FindMaxInitPalin(SearchString As String) As String
    Dim i As Long, j As Long
    Dim init As String
    For i = 1 To Len(SearchString)
        init = Mid(SearchString, 1, i)
        If init = Reverse(init) Then j = i
    Next i
    FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
    Dim Palindromes As Variant
    Dim InputString As String
    Dim chomp As String
    Dim n As Long

    InputString = SearchString
    Do While Len(InputString) > 0
        chomp = FindMaxInitPalin(InputString)
        InputString = Mid(InputString, Len(chomp) + 1)
        If Len(chomp) > 1 Then
            If IsEmpty(Palindromes) Then
                ReDim Palindromes(1 To 1)
                Palindromes(1) = chomp
            Else
                n = UBound(Palindromes)
                ReDim Preserve Palindromes(1 To n + 1)
                Palindromes(n + 1) = chomp
            End If
        End If
    Loop
    FindMaxPalins = Palindromes
End Function

Sub Main()
    Dim SearchString As String
    Dim Results As Variant
    Dim i As Long, n As Long

    SearchString = InputBox("Enter a string")
    Results = FindMaxPalins(SearchString)
    If IsEmpty(Results) Then
        Range("A1").Value = "No palindromes found"
    Else
        n = UBound(Results)
        Range("A1").Value = n & " maximal palidromes found: "
        For i = 1 To n
            Range("A1").Offset(i).Value = Results(i)
        Next i
    End If
End Sub

-John Coleman- Hide quoted text -

- Show quoted text -

Upon further reflection I realized that my algorithm was inefficient
in that Reverse() was starting from scratch at each function call
whereas in the way that it was actually used in FindMaxInitPalin(),
successive calls to Reverse() simply created strings that were one
character longer. I thus ditched Reverse() and modified FindMaxInit()
to keep a running copy of the reflection of initial segments. The
resulting code proved to be hundreds of times quicker when I tried it
on randomly generated strings of length 5000. If your usage is limited
to strings that you are typing by hand you may not notice the
difference, but if you are looping through large collections of
strings you might. Further optimizations are possible- for example, I
didn't attempt to exploit the fact that you can sometimes recognize
non-palindromes when slightly more than half of a string is scanned.

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Revised code:

Function FindMaxInitPalin(SearchString As String) As String
Dim i As Long, j As Long
Dim init As String
Dim reflection As String
For i = 1 To Len(SearchString)
init = Mid(SearchString, 1, i)
reflection = Mid(init, i) & reflection
If init = reflection Then j = i
Next i
FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
Dim Palindromes As Variant
Dim InputString As String
Dim chomp As String
Dim n As Long

InputString = SearchString
Do While Len(InputString) > 0
chomp = FindMaxInitPalin(InputString)
InputString = Mid(InputString, Len(chomp) + 1)
If Len(chomp) > 1 Then
If IsEmpty(Palindromes) Then
ReDim Palindromes(1 To 1)
Palindromes(1) = chomp
Else
n = UBound(Palindromes)
ReDim Preserve Palindromes(1 To n + 1)
Palindromes(n + 1) = chomp
End If
End If
Loop
FindMaxPalins = Palindromes
End Function

Sub Main()
Dim SearchString As String
Dim Results As Variant
Dim i As Long, n As Long

SearchString = InputBox("Enter a string")
Results = FindMaxPalins(SearchString)
If IsEmpty(Results) Then
Range("A1").Value = "No palindromes found"
Else
n = UBound(Results)
Range("A1").Value = n & " maximal palidromes found: "
For i = 1 To n
Range("A1").Offset(i).Value = Results(i)
Next i
End If
End Sub
 
Upon further reflection I realized that my algorithm was inefficient
in that Reverse() was starting from scratch at each function call
whereas in the way that it was actually used in FindMaxInitPalin(),
successive calls to Reverse() simply created strings that were one
character longer. I thus ditched Reverse() and modified FindMaxInit()
to keep a running copy of the reflection of initial segments. The
resulting code proved to be hundreds of times quicker when I tried it
on randomly generated strings of length 5000. If your usage is limited
to strings that you are typing by hand you may not notice the
difference, but if you are looping through large collections of
strings you might. Further optimizations are possible- for example, I
didn't attempt to exploit the fact that you can sometimes recognize
non-palindromes when slightly more than half of a string is scanned.

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Revised code:

Function FindMaxInitPalin(SearchString As String) As String
    Dim i As Long, j As Long
    Dim init As String
    Dim reflection As String
    For i = 1 To Len(SearchString)
        init = Mid(SearchString, 1, i)
        reflection = Mid(init, i) & reflection
        If init = reflection Then j = i
    Next i
    FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
    Dim Palindromes As Variant
    Dim InputString As String
    Dim chomp As String
    Dim n As Long

    InputString = SearchString
    Do While Len(InputString) > 0
        chomp = FindMaxInitPalin(InputString)
        InputString = Mid(InputString, Len(chomp) + 1)
        If Len(chomp) > 1 Then
            If IsEmpty(Palindromes) Then
                ReDim Palindromes(1 To 1)
                Palindromes(1) = chomp
            Else
                n = UBound(Palindromes)
                ReDim Preserve Palindromes(1 To n + 1)
                Palindromes(n + 1) = chomp
            End If
        End If
    Loop
    FindMaxPalins = Palindromes
End Function

Sub Main()
    Dim SearchString As String
    Dim Results As Variant
    Dim i As Long, n As Long

    SearchString = InputBox("Enter a string")
    Results = FindMaxPalins(SearchString)
    If IsEmpty(Results) Then
        Range("A1").Value = "No palindromes found"
    Else
        n = UBound(Results)
        Range("A1").Value = n & " maximal palidromes found: "
        For i = 1 To n
            Range("A1").Offset(i).Value = Results(i)
        Next i
    End If
End Sub

Dear John Coleman,
Thank you very much for your fantastic code. It is almost perfect for
my situation.
However, I need that instead of a same detected palindrome is being
repeated all the time it appear in the string, it should be shown once
a time in A column, and that B column should be shown the frequence of
occcurrences of that specific palindrome. Do you know some way to do
that?
Thank you,
Luciano
 
Dear John Coleman,
Thank you very much for your fantastic code. It is almost perfect for
my situation.
However, I need that instead of a same detected palindrome is being
repeated all the time it appear in the string, it should be shown once
a time in A column, and that B column should be shown the frequence of
occcurrences of that specific palindrome. Do you know some way to do
that?
Thank you,
Luciano


Dear Luciano,

This should work:

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Function FindMaxInitPalin(SearchString As String) As String
Dim i As Long, j As Long
Dim init As String
Dim reflection As String
For i = 1 To Len(SearchString)
init = Mid(SearchString, 1, i)
reflection = Mid(init, i) & reflection
If init = reflection Then j = i
Next i
FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
Dim Palindromes As Object
Dim InputString As String
Dim chomp As String
Dim n As Long

Set Palindromes = CreateObject("Scripting.Dictionary")
InputString = SearchString
Do While Len(InputString) > 0
chomp = FindMaxInitPalin(InputString)
InputString = Mid(InputString, Len(chomp) + 1)
If Len(chomp) > 1 Then
If Not Palindromes.Exists(chomp) Then
Palindromes.Add chomp, 1
Else
Palindromes.Item(chomp) = Palindromes.Item(chomp) + 1
End If
End If
Loop
If Palindromes.count > 0 Then Set FindMaxPalins = Palindromes
End Function

Sub Main()
Dim SearchString As String
Dim Results As Variant
Dim i As Long, n As Long
Dim palindrome As Variant

On Error GoTo no_palindromes
SearchString = InputBox("Enter a string")
Set Results = FindMaxPalins(SearchString)
n = Results.count
Range("A1").Value = n & " distinct palindromes found"
Range("A2").Value = "Palindrome"
Range("B2").Value = "Frequency"
For Each palindrome In Results.Keys
i = i + 1
Range("A2").Offset(i).Value = palindrome
Range("B2").Offset(i).Value = Results.Item(palindrome)
Next palindrome
Exit Sub
no_palindromes:
Range("A1").Value = "No palindromes found"
End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Also, there is no need to stick to an inputbox for entering the
string. It is easy enough to modify main() so that it gets its input
from a cell

Out of curiosity, what sort of things are you doing with this code?
Seems interesting.

-John Coleman
 
DearLuciano,

This should work:

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Function FindMaxInitPalin(SearchString As String) As String
    Dim i As Long, j As Long
    Dim init As String
    Dim reflection As String
    For i = 1 To Len(SearchString)
        init = Mid(SearchString, 1, i)
        reflection = Mid(init, i) & reflection
        If init = reflection Then j = i
    Next i
    FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
    Dim Palindromes As Object
    Dim InputString As String
    Dim chomp As String
    Dim n As Long

    Set Palindromes = CreateObject("Scripting.Dictionary")
    InputString = SearchString
    Do While Len(InputString) > 0
        chomp = FindMaxInitPalin(InputString)
        InputString = Mid(InputString, Len(chomp) + 1)
        If Len(chomp) > 1 Then
            If Not Palindromes.Exists(chomp) Then
                Palindromes.Add chomp, 1
            Else
                Palindromes.Item(chomp) = Palindromes.Item(chomp) + 1
            End If
        End If
    Loop
    If Palindromes.count > 0 Then Set FindMaxPalins = Palindromes
End Function

Sub Main()
    Dim SearchString As String
    Dim Results As Variant
    Dim i As Long, n As Long
    Dim palindrome As Variant

    On Error GoTo no_palindromes
    SearchString = InputBox("Enter a string")
    Set Results = FindMaxPalins(SearchString)
    n = Results.count
    Range("A1").Value = n & " distinct palindromes found"
    Range("A2").Value = "Palindrome"
    Range("B2").Value = "Frequency"
    For Each palindrome In Results.Keys
        i = i + 1
        Range("A2").Offset(i).Value = palindrome
        Range("B2").Offset(i).Value = Results.Item(palindrome)
    Next palindrome
    Exit Sub
no_palindromes:
    Range("A1").Value = "No palindromes found"
End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Also, there is no need to stick to an inputbox for entering the
string. It is easy enough to modify main() so that it gets its input
from a cell

Out of curiosity, what sort of things are you doing with this code?
Seems interesting.

-John Coleman

Dear John Coleman,
Thank you very much for your help. Let me try explain the aim of the
present study. Let me first introduce myself. My name is Luciano
Paulino da Silva, I`m a scientific researcher in the area of
bionanotechnology. In some of our studies we have characterized the
sequences of proteins and peptides that are formed by strings of 20
possible aminoacid residues commonly represented by letters. We always
needs a lot of statistical and mathematical comparisons among strings.
In this way, we have for some situations even thousands of these
strings and we are interested in analyze them from databanks. Some
time ago, I realized that some of these protein sequences have inside
them the presence of palindromes and repeats and such structures are
very important for their biological functions. In this way, this code
and others are important to test some situations in order to in some
days perform these analyses in large scale. In this way, if it would
be your interest I would be pleased if you accept collaborate with us.
Concerning your code, at present we need some additional things like:
1-The string shoud be typed in A1 cell instead use an inputbox;
2-In addition to palindromes, an identitical approach will have to be
performed to repeats in the strings;

Thanks in advance,
Luciano
 
DearLuciano,

This should work:

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Function FindMaxInitPalin(SearchString As String) As String
    Dim i As Long, j As Long
    Dim init As String
    Dim reflection As String
    For i = 1 To Len(SearchString)
        init = Mid(SearchString, 1, i)
        reflection = Mid(init, i) & reflection
        If init = reflection Then j = i
    Next i
    FindMaxInitPalin = Mid(SearchString, 1, j)
End Function

Function FindMaxPalins(SearchString As String) As Variant
    Dim Palindromes As Object
    Dim InputString As String
    Dim chomp As String
    Dim n As Long

    Set Palindromes = CreateObject("Scripting.Dictionary")
    InputString = SearchString
    Do While Len(InputString) > 0
        chomp = FindMaxInitPalin(InputString)
        InputString = Mid(InputString, Len(chomp) + 1)
        If Len(chomp) > 1 Then
            If Not Palindromes.Exists(chomp) Then
                Palindromes.Add chomp, 1
            Else
                Palindromes.Item(chomp) = Palindromes.Item(chomp) + 1
            End If
        End If
    Loop
    If Palindromes.count > 0 Then Set FindMaxPalins = Palindromes
End Function

Sub Main()
    Dim SearchString As String
    Dim Results As Variant
    Dim i As Long, n As Long
    Dim palindrome As Variant

    On Error GoTo no_palindromes
    SearchString = InputBox("Enter a string")
    Set Results = FindMaxPalins(SearchString)
    n = Results.count
    Range("A1").Value = n & " distinct palindromes found"
    Range("A2").Value = "Palindrome"
    Range("B2").Value = "Frequency"
    For Each palindrome In Results.Keys
        i = i + 1
        Range("A2").Offset(i).Value = palindrome
        Range("B2").Offset(i).Value = Results.Item(palindrome)
    Next palindrome
    Exit Sub
no_palindromes:
    Range("A1").Value = "No palindromes found"
End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Also, there is no need to stick to an inputbox for entering the
string. It is easy enough to modify main() so that it gets its input
from a cell

Out of curiosity, what sort of things are you doing with this code?
Seems interesting.

-John Coleman

And another thing that I forgot to say is that in addition to the
frequency, it is necessary a column C containing the number of letters
in the palindrome found listed in each line.
Thank you
Luciano
 
Dear John Coleman,
Thank you very much for your help. Let me try explain the aim of the
present study. Let me first introduce myself. My name is Luciano
Paulino da Silva, I`m a scientific researcher in the area of
bionanotechnology. In some of our studies we have characterized the
sequences of proteins and peptides that are formed by strings of 20
possible aminoacid residues commonly represented by letters. We always
needs a lot of statistical and mathematical comparisons among strings.
In this way, we have for some situations even thousands of these
strings and we are interested in analyze them from databanks. Some
time ago, I realized that some of these protein sequences have inside
them the presence of palindromes and repeats and such structures are
very important for their biological functions. In this way, this code
and others are important to test some situations in order to in some
days perform these analyses in large scale. In this way, if it would
be your interest I would be pleased if you accept collaborate with us.
Concerning your code, at present we need some additional things like:
1-The string shoud be typed in A1 cell instead use an inputbox;
2-In addition to palindromes, an identitical approach will have to be
performed to repeats in the strings;

Thanks in advance,
Luciano- Hide quoted text -

- Show quoted text -

Dear Luciano,

I suspected that your interests were biological. I am a mathematician
who likes to program as a hobby (and sometimes as an aid to research
or teaching). I have a friend in the biology department who sought my
aid a while back in writing a program which generated random protein
strings with certain properties, and I recognized G,A,Q from your
original post as possibly being abbreviations for glycine, etc. I am
on sabbatical this semester, so have some time on my hands and would
be willing to help you out some (but am currently writing a paper - so
I don't have unlimitted time), though for more serious analysis you
should probably recruit a specialist in bioinformatics who understands
string processing and pattern matching much more than I do. Feel free
to email me if you want. My address is jcoleman followed by @ followed
by franciscan dot edu (don't want to make life too easy for robot
spammers).

What do you mean by repeats? Do you mean strings like GAQGAQ (where
the repeated sequence is adjacent) or GAQCGAQ (where the repeated
sequence can be separated by intervening letters)? And how would you
handle GAQGAQGAQ? Also - do you want all repeats, or do you want
repeats which are in some sense maximal (similar to the palindrome
code I wrote)? Please clarify.

I wrote a new main() to take its input from A1 and report lengths in
column C. The other two functions in the code were unchanged , so just
delete main() and replace it with:


Sub Main()
Dim SearchString As String
Dim Results As Variant
Dim i As Long, n As Long
Dim palindrome As Variant
Dim R As Range 'output anchor

On Error GoTo no_palindromes
Set Results = FindMaxPalins(Range("A1").Value)
Set R = Range("A3")
n = Results.count
R.Value = n & " distinct palindromes found"
R.Offset(1).Value = "Palindrome"
R.Offset(1, 1).Value = "Frequency"
R.Offset(1, 2).Value = "Length"
For Each palindrome In Results.Keys
i = i + 1
R.Offset(i + 1).Value = palindrome
R.Offset(i + 1, 1).Value = Results.Item(palindrome)
R.Offset(i + 1, 2).Value = Len(palindrome)
Next palindrome
Exit Sub
no_palindromes:
R.Value = "No palindromes found"
End Sub

Hope that helps

-John Coleman
 
DearLuciano,

I suspected that your interests were biological. I am a mathematician
who likes to program as a hobby (and sometimes as an aid to research
or teaching). I have a friend in the biology department who sought my
aid a while back in writing a program which generated random protein
strings with certain properties, and I recognized G,A,Q from your
original post as possibly being abbreviations for glycine, etc. I am
on sabbatical this semester, so have some time on my hands and would
be willing to help you out some (but am currently writing a paper - so
I don't have unlimitted time), though for more serious analysis you
should probably recruit a specialist in bioinformatics who understands
string processing and pattern matching much more than I do. Feel free
to email me if you want. My address is jcoleman followed by @ followed
by franciscan dot edu (don't want to make life too easy for robot
spammers).

What do you mean by repeats? Do you mean strings like GAQGAQ (where
the repeated sequence is adjacent) or GAQCGAQ (where the repeated
sequence can be separated by intervening letters)? And how would you
handle GAQGAQGAQ? Also - do you want all repeats, or do you want
repeats which are in some sense maximal (similar to the palindrome
code I wrote)? Please clarify.

I wrote a new main() to take its input from A1 and report lengths in
column C. The other two functions in the code were unchanged , so just
delete main() and replace it with:

Sub Main()
    Dim SearchString As String
    Dim Results As Variant
    Dim i As Long, n As Long
    Dim palindrome As Variant
    Dim R As Range 'output anchor

    On Error GoTo no_palindromes
    Set Results = FindMaxPalins(Range("A1").Value)
    Set R = Range("A3")
    n = Results.count
    R.Value = n & " distinct palindromes found"
    R.Offset(1).Value = "Palindrome"
    R.Offset(1, 1).Value = "Frequency"
    R.Offset(1, 2).Value = "Length"
    For Each palindrome In Results.Keys
        i = i + 1
        R.Offset(i + 1).Value = palindrome
        R.Offset(i + 1, 1).Value = Results.Item(palindrome)
        R.Offset(i + 1, 2).Value = Len(palindrome)
    Next palindrome
    Exit Sub
no_palindromes:
    R.Value = "No palindromes found"
End Sub

Hope that helps

-John Coleman

Dear John Coleman,
Thak you very much for your code. It is perfect for this investigation
approach.
That`s nice, I`m a biologist but among my graduate students there are
a mathematician because your area is very important to our studies.
That`s interesting. I have also a smal VBA-based program to generate
random polypeptides and this is one of our interests.
I will prepare a small text telling you our interests in some string
processing and you will be able to tell us if could help us.

Yes, the repeat sequence could be adjacent or separated by intervening
letters. Both situations are possible on protein sequences. In the
case of GAQGAQGAQ it would be 3 GAQ repeats, using the same logic of
the palindromes. Yes, it is similar to the palindromes, and even some
palindromes are repeats but not all repeats are palindromes...
Thanks in advance,
Luciano
 
Dear John Coleman,
Thak you very much for your code. It is perfect for this investigation
approach.
That`s nice, I`m a biologist but among my graduate students there are
a mathematician because your area is very important to our studies.
That`s interesting. I have also a smal VBA-based program to generate
random polypeptides and this is one of our interests.
I will prepare a small text telling you our interests in some string
processing and you will be able to tell us if could help us.

Yes, the repeat sequence could be adjacent or separated by intervening
letters. Both situations are possible on protein sequences. In the
case of GAQGAQGAQ it would be 3 GAQ repeats, using the same logic of
the palindromes. Yes, it is similar to the palindromes, and even some
palindromes are repeats but not all repeats are palindromes...
Thanks in advance,
Luciano- Hide quoted text -

- Show quoted text -

Dear Luciano,

I noticed a potential problem with the code that I've written for you.
The problem is it handles overlapping palindromes. For example, if you
input ABBACCABCC, my code first detects the ABBA then starts looking
for palindromes in the remaining string (CCABCC). That is how I
interpreted you in your second post when you said "I forgot to say
that it would be selected the palindromes as big as detected from the
order they appear in the text. So, if a big palindrome is detected the
next one will be the following after it in the string even when there
are a palindrome inside it. " But notice that this way of handling
things misses BACCAB which begins in the third position since it
overlaps with ABBA. Similar questions will arise in repeat detection -
so I want to clarify things before proceeding further. What would you
like the palindrome output to be in the case of ABBACCABCC?

-John
 
DearLuciano,

I noticed a potential problem with the code that I've written for you.
The problem is it handles overlapping palindromes. For example, if you
input ABBACCABCC, my code first detects the ABBA then starts looking
for palindromes in the remaining string (CCABCC). That is how I
interpreted you in your second post when you said "I forgot to say
that it would be selected the palindromes as big as detected from the
order they appear in the text. So, if a big palindrome is detected the
next one will be the following after it in the string even when there
are a palindrome inside it. " But notice that this way of handling
things misses BACCAB which begins in the third position since it
overlaps with ABBA. Similar questions will arise in repeat detection -
so I want to clarify things before proceeding further. What would you
like the palindrome output to be in the case of ABBACCABCC?

-John

Dear John,
Thank you for your attention. You are correct and I have realized it
some hours ago. We must always detect the biggest palindrome linearly
according to their occurrence on the string following by smallest
palindromes excluding that palindromes that were inside the initially
detected palindrome. In this way, the output for your example it would
be:

Input: ABBACCABCC

Output:

BACCAB
CC

But anycase it is possible in the future that we require test other
situations. Thanks in advance!
Luciano
 
[snip]
Dear John,
Thank you for your attention. You are correct and I have realized it
some hours ago. We must always detect the biggest palindrome linearly
according to their occurrence on the string following by smallest
palindromes excluding that palindromes that were inside the initially
detected palindrome. In this way, the output for your example it would
be:

Input: ABBACCABCC

Output:

BACCAB
CC

But anycase it is possible in the future that we require test other
situations. Thanks in advance!
Luciano

So you *don't* want ABBA as part of the output?

What would the output be for BAABABBACCABCC?

-John
 
[snip]


Dear John,
Thank you for your attention. You are correct and I have realized it
some hours ago. We must always detect the biggest palindrome linearly
according to their occurrence on the string following by smallest
palindromes excluding that palindromes that were inside the initially
detected palindrome. In this way, the output for your example it would
be:
Input: ABBACCABCC


But anycase it is possible in the future that we require test other
situations. Thanks in advance!
Luciano

So you *don't* want ABBA as part of the output?

What would the output be for BAABABBACCABCC?

-John

Dear John,
Since ABBA woukd be smaller than BACCAB, at present it would not be
detected.
In your example:
Input: BAABABBACCABCC

Output:
BAAB
BACCAB
CC
 
[snip]
DearLuciano,
I noticed a potential problem with the code that I've written for you.
The problem is it handles overlapping palindromes. For example, if you
input ABBACCABCC, my code first detects the ABBA then starts looking
for palindromes in the remaining string (CCABCC). That is how I
interpreted you in your second post when you said "I forgot to say
that it would be selected the palindromes as big as detected from the
order they appear in the text. So, if a big palindrome is detected the
next one will be the following after it in the string even when there
are a palindrome inside it. " But notice that this way of handling
things misses BACCAB which begins in the third position since it
overlaps with ABBA. Similar questions will arise in repeat detection -
so I want to clarify things before proceeding further. What would you
like the palindrome output to be in the case of ABBACCABCC?
-John
Dear John,
Thank you for your attention. You are correct and I have realized it
some hours ago. We must always detect the biggest palindrome linearly
according to their occurrence on the string following by smallest
palindromes excluding that palindromes that were inside the initially
detected palindrome. In this way, the output for your example it would
be:
Input: ABBACCABCC
Output:
BACCAB
CC
But anycase it is possible in the future that we require test other
situations. Thanks in advance!
Luciano
So you *don't* want ABBA as part of the output?
What would the output be for BAABABBACCABCC?

Dear John,
Since ABBA woukd be smaller than BACCAB, at present it would not be
detected.
In your example:
Input: BAABABBACCABCC

Output:
BAAB
BACCAB
CC

Dear John,
I was thinking about the possibility that we could perform the choice
of a rule before perform palindromes or repeats detection. Since
detection of all palindromes including that inside other palindromes,
and maximal palindromes, and all other rules we could think about. Do
you think that it would be possible?
Thanks in advance,
Luciano
 
[snip]
DearLuciano,
I noticed a potential problem with the code that I've written for you.
The problem is it handles overlapping palindromes. For example, if you
input ABBACCABCC, my code first detects the ABBA then starts looking
for palindromes in the remaining string (CCABCC). That is how I
interpreted you in your second post when you said "I forgot to say
that it would be selected the palindromes as big as detected from the
order they appear in the text. So, if a big palindrome is detected the
next one will be the following after it in the string even when there
are a palindrome inside it. " But notice that this way of handling
things misses BACCAB which begins in the third position since it
overlaps with ABBA. Similar questions will arise in repeat detection -
so I want to clarify things before proceeding further. What would you
like the palindrome output to be in the case of ABBACCABCC?
-John
Dear John,
Thank you for your attention. You are correct and I have realized it
some hours ago. We must always detect the biggest palindrome linearly
according to their occurrence on the string following by smallest
palindromes excluding that palindromes that were inside the initially
detected palindrome. In this way, the output for your example it would
be:
Input: ABBACCABCC
Output:
BACCAB
CC
But anycase it is possible in the future that we require test other
situations. Thanks in advance!
Luciano
So you *don't* want ABBA as part of the output?
What would the output be for BAABABBACCABCC?

Dear John,
Since ABBA woukd be smaller than BACCAB, at present it would not be
detected.
In your example:
Input: BAABABBACCABCC

Output:
BAAB
BACCAB
CC- Hide quoted text -

- Show quoted text -

Dear Luciano,

Ok - how does the following algorithm sound? First find the longest
palindrome in the string, taking the leftmost if there is a tie
(BACCACB in the above example). Remove it from the string, leaving 0,1
or 2 remaining substrings (BAABAB and CC in the above example). Repeat
this process on these substrings until only substrings of length < 2
remain. Note that for input ABAB this would output ABA but not BAB.

-John
 
[snip]
DearLuciano,
I noticed a potential problem with the code that I've written foryou.
The problem is it handles overlapping palindromes. For example, if you
input ABBACCABCC, my code first detects the ABBA then starts looking
for palindromes in the remaining string (CCABCC). That is how I
interpreted you in your second post when you said "I forgot to say
that it would be selected the palindromes as big as detected fromthe
order they appear in the text. So, if a big palindrome is detected the
next one will be the following after it in the string even when there
are a palindrome inside it. " But notice that this way of handling
things misses BACCAB which begins in the third position since it
overlaps with ABBA. Similar questions will arise in repeat detection -
so I want to clarify things before proceeding further. What wouldyou
like the palindrome output to be in the case of ABBACCABCC?
-John
Dear John,
Thank you for your attention. You are correct and I have realized it
some hours ago. We must always detect the biggest palindrome linearly
according to their occurrence on the string following by smallest
palindromes excluding that palindromes that were inside the initially
detected palindrome. In this way, the output for your example it would
be:
Input: ABBACCABCC
Output:
BACCAB
CC
But anycase it is possible in the future that we require test other
situations. Thanks in advance!
Luciano
So you *don't* want ABBA as part of the output?
What would the output be for BAABABBACCABCC?
-John
Dear John,
Since ABBA woukd be smaller than BACCAB, at present it would not be
detected.
In your example:
Input: BAABABBACCABCC
Output:
BAAB
BACCAB
CC- Hide quoted text -
- Show quoted text -

DearLuciano,

Ok - how does the following algorithm sound? First find the longest
palindrome in the string, taking the leftmost if there is a tie
(BACCACB in the above example). Remove it from the string, leaving 0,1
or 2 remaining substrings (BAABAB and CC in the above example). Repeat
this process on these substrings until only substrings of length < 2
remain. Note that for input ABAB this would output ABA but not BAB.

-John

Dear John,
Perfect, it is exactly this! It sounds good. After that we could think
other algorithms (rules).
Thanks in advance,
Luciano
 
[snip]
Dear John,
Perfect, it is exactly this! It sounds good. After that we could think
other algorithms (rules).
Thanks in advance,
Luciano- Hide quoted text -

Dear Luciano,
See if this works for you. As before, it uses Main() as the driver
sub, so you should delete all of my old code or create a new workbook
and put this in a new code module so as
to avoid possible name conflicts. I've tried it on a number of strings
and it seems to work as intended, but you should give it a full
battery of tests as well in case I overlooked something.

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Dim palindromes As Object
Dim centers As Variant
Dim SearchString As String
Dim PalinCount As Long
Dim anchor As Range

Sub Initialize()
'initializes module-level variables
'centers contains, for each possible
'center of a palindrome (which may fall
'between two characters), the indices and
'length of the largest palindrome centered there.
'For arithmetical convience, centers will
'be represented by integers between 3 and
'2n-1. palindromes will be a dictionary
'object to store found palindromes and
'their frequencies

Dim i As Long, j As Long, k As Long, n As Long

Set palindromes = CreateObject("Scripting.Dictionary")
Set anchor = Range("A3")
SearchString = Range("A1").Value
n = Len(SearchString)
PalinCount = 0
ReDim centers(3 To 2 * n - 1, 1 To 3)

For i = 3 To 2 * n - 1
If i Mod 2 = 0 Then
'searching for palindromes like BAB
j = i / 2 - 1
k = j + 2
Else
'searching for palindromes like BAAB
j = Int(i / 2)
k = j + 1
End If
Do While j >= 1 And k <= n And _
Mid(SearchString, IIf(j > 0, j, 1), 1) = _
Mid(SearchString, k, 1)
'(kludge for lack of short-cicuit evaluation in VBA)
j = j - 1
k = k + 1
Loop
'last pass through the loop caused trouble, so correct:
j = j + 1
k = k - 1
'(this has weird effect of making j > k (j = k+1)
'if center cuts between two characters but no
'palindrome was found. Nevertheless, this gives
'correct length of 0 in this case, so ok)
centers(i, 1) = k - j + 1 'length
centers(i, 2) = j
centers(i, 3) = k
Next i
End Sub
Sub FindLargest(j As Long, k As Long)
'finds largest palindrome in range j-k
'after finding leftmost such
'it recursively calls itself on left and right remainders,
'printing the string if needed between calls

Dim i As Long, MinI As Long, MaxI As Long
Dim delta As Long
Dim MaxLen As Long, MaxStart As Long, MaxEnd As Long
Dim NewJ As Long, NewK As Long
Dim palindrome As String

MinI = 2 * j + 1
MaxI = 2 * k - 1

For i = MinI To MaxI
If centers(i, 1) > 1 Then
'first contract radius of palindrome
'to make it fit in j-k if needed
If centers(i, 2) < j Or centers(i, 3) > k Then
delta = j - centers(i, 2)
If delta < centers(i, 3) - k Then
delta = centers(i, 3) - k
End If
centers(i, 2) = centers(i, 2) + delta
centers(i, 3) = centers(i, 3) - delta
centers(i, 1) = centers(i, 3) - centers(i, 2) + 1
End If
If centers(i, 1) > MaxLen Then
MaxLen = centers(i, 1)
MaxStart = centers(i, 2)
MaxEnd = centers(i, 3)
End If
End If
Next i

If MaxLen <= 1 Then Exit Sub
'else
'first process any string remaining to left
NewK = MaxStart - 1
NewJ = j
If NewK - NewJ > 0 Then FindLargest NewJ, NewK
'process current node
palindrome = Mid(SearchString, MaxStart, MaxLen)
If Not palindromes.Exists(palindrome) Then
'found new palindrome - display it
PalinCount = PalinCount + 1
anchor.Offset(PalinCount) = palindrome
palindromes.Add palindrome, 1
Else
palindromes.Item(palindrome) = palindromes.Item(palindrome) + 1
End If
'now process any remaining right substring
NewJ = MaxEnd + 1
NewK = k
If NewK - NewJ > 0 Then FindLargest NewJ, NewK
End Sub

Sub Main()
Dim i As Long, palindrome As String

If Len(Range("A1").Value) < 2 Then
MsgBox "A string of length > 1 must be entered in A1"
Exit Sub
End If
Initialize
anchor.CurrentRegion.ClearContents
FindLargest 1, Len(SearchString)
If palindromes.count = 0 Then
anchor.Value = "No palindromes found"
Else
anchor.Value = "Palindrome"
anchor.Offset(0, 1) = "Frequency"
anchor.Offset(0, 2) = "Length"
For i = 1 To PalinCount
palindrome = anchor.Offset(i).Value
anchor.Offset(i, 1).Value = _
palindromes.Item(palindrome)
anchor.Offset(i, 2).Value = Len(palindrome)
Next i
End If
End Sub


''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Hope this works for you.

-John
 
Back
Top