Need Help - "Circular" string comparaison

  • Thread starter Thread starter Rogers
  • Start date Start date
R

Rogers

I want to compare strings of numbers that have a circular boundary
condition. This means that the string is arranged in a loop without an
end-of-string. The comparaison of two strings now becomes a different
operation than with regular strings because the circular string can be
"rotated", like this:

1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4

So under the "circular string comparator", all the above strings would be
equal.

I need to do a lot of those comparaisons so optimization is required. What
is the fastest implementation possible in VB.NET? The strings have lenths
ranging from 2 to ~100.

Any ideas?

Thanks in advance for your inputs,

Alain
 
Off the top of my head ( and you have probably already thought about this ).

Pseudo Code
--------------


'Slow Linear Method
For Each String in Array Of Strings
For 1 to String.Length
Check for search String
If Not Found Then Rotate String Else Break Loop

A faster method would may be to take the front or end portion of the string
and search for it and then rotate left or right one character at a time and
repeat the search adding one more letter of the search string each time
until the match is complete. Dont have time to work the code out for you but
this would work as well.

Cheers - OHM
 
Is this what you call 'String' theory ?, your not writing a DNA sequencer
are you ?

Regards OHM
 
In method two, you would only have to rotate the string up to the maximum
length of the search string but flip the search string once one direction
has been exhausted.

This is a very intriguing problem.

OHM
 
w if you split the string into an array of strings (based on the space char)
order that and compare the ordered arrays?

eric
 
I want to compare strings of numbers that have a circular boundary
condition. This means that the string is arranged in a loop without an
end-of-string. The comparaison of two strings now becomes a different
operation than with regular strings because the circular string can be
"rotated", like this:

1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4

So under the "circular string comparator", all the above strings would be
equal.

I need to do a lot of those comparaisons so optimization is required. What
is the fastest implementation possible in VB.NET? The strings have lenths
ranging from 2 to ~100.

Any ideas?

Consider doubling the string.

i.e. 1 2 3 4 5 1 2 3 4 5
You will note that all rotations exist in this string. And if you
incorporate a length check, you get the desired result. Leading to:

Function StringMatch(find As String,source As String) As Boolean
If Len(find) = Len(source) And _
(source & source).IndexOf(find) <> -1 Then
Return True
Else
Return False
End If
End Function

Rgds,
 
Yesh, I thought of that but it is possible that a match will occur because
of doubling hence giving you an incorrect match. If the strings were like
the example, then I grant you it would work, but it is not robust because of
it.


OHM
 
I think the original poster needs to comment before Im going to put more
effort into this.

OHM
 
Yesh, I thought of that but it is possible that a match will occur because
of doubling hence giving you an incorrect match. If the strings were like
the example, then I grant you it would work, but it is not robust because of
it.

Can you give me an example of a string that will fail?
 
my 2 cents ;)
if the order is always like this i think your sollution will be the fastest,
but the question is will it be?
"13254" and "12345" for example, if these are supposed to be equal 2 there
is a problem (and even if this is the case i think it would be best to use
your function first yust for the speed if it matches)

eric
 
Can you give me an example of a string that will fail?

By that I mean an example of a string that will cause the algorithm to
fail. If you do find one, I'd be very surprised!

I think you're confusing this with packing and encrpytion algorithms
(the old "ABBAB problems") etc. But they aren't relevant to a match,
as we consider only strings where the lengths are equal. i.e. The
result of the original string appended to itself is a string with the
following properties:

1) All rotations are included, and
2) The zero-rotated string is included twice (at the beginning, and at
the end), and
3) No other characters.

For a search string of size N, there is an infinite number of
rotations. However, there is exactly N-1 distinct rotations.

Remember that,

i Mod N == (i+N) Mod N, where i is the index of rotation.

Given a double string, S, we can see that

S(i Mod N) == S( (i+N) Mod N)

Hence there exists a rotation of the original string in S at every
position. The caveat of the lengths being equal (of the find text and
the source text) means that we never need consider strings that arrive
after N/2.

And that's why it works!

Rgds,
 
my 2 cents ;)
if the order is always like this i think your sollution will be the fastest,
but the question is will it be?
"13254" and "12345" for example, if these are supposed to be equal 2 there
is a problem (and even if this is the case i think it would be best to use
your function first yust for the speed if it matches)

Although they contain the same digits, 13254 is not a _rotation_ of
12345. I've been trying to think of a more Englishy way of describing
the properties of the rotated string...

Write down the original string.
Mark the last character.
Place a piece of paper to the left of the first character.
REPEAT
Write the character next to the edge of the paper at the end of the
string.
Move the paper to the right by one character.
(You now have the next rotation showing)
UNTIL the paper arrives at the last-character mark.

You have now seen ALL rotations of the string. And when you remove the
paper, you will see the original string appended to itself.


As you said, the algotithm provided is probably the fastest way to
perform the action in VB.NET. The "IndexOf" function could actually be
refined drastically. e.g. Introduce a KMP search. However, the hit
you'd take for implementing that in VB for short strings would
actually be far costlier than using the lowest-level function
available.

Rgds,
 
In the original post the lengths of all the string are the same, however,
the poster did not make this distinction, he only says that the bounderies
of the search and non existant ( A Circle ). This means for different length
strings an error could occur when using string doubling.. Here is an example
of a false match

String One
ABCEFG ( Double this becomes ABCDEFGABCDEFG )
String 2
GAB

GAB is found in the doubled string.

If however, the string length are allways equal and the rotation is the only
thing which changes, then the string doubling works just fine.

OHM
 
In the original post the lengths of all the string are the same, however,
the poster did not make this distinction, he only says that the bounderies
of the search and non existant ( A Circle ). This means for different length
strings an error could occur when using string doubling.. Here is an example
of a false match

String One
ABCEFG ( Double this becomes ABCDEFGABCDEFG )
String 2
GAB

Are you suggesting that "GAB" could in any way be considered a
rotation of "ABCEFG"???! The fact that "GAB" is a substring of the
cyclic buffer is completely irrelevant.
GAB is found in the doubled string.

In order for the input string to even be considered as a rotation of
the original string THEY BOTH MUST BE OF THE SAME LENGTH. The
algorithm takes care of that by making sure that they are. QED.
 
_Andy_
Are you suggesting that "GAB" could in any way be considered a
rotation of "ABCEFG"???! The fact that "GAB" is a substring of the
cyclic buffer is completely irrelevant.

No I am not suggesting this is the case, but that the poster did not specify
this to be the case. In fact the post is open to interpretation, it was
ambiguous at best.
In order for the input string to even be considered as a rotation of
the original string THEY BOTH MUST BE OF THE SAME LENGTH. The
algorithm takes care of that by making sure that they are. QED.


Yes Yes, I see what you are saying and in fact I also said you were correct
if this assumption stands, by the way, you dont need to capitalise youre
sentences in order for me to read them, I can read them just as well in
lower or proper case.

OHM
 
_Andy_


No I am not suggesting this is the case, but that the poster did not specify
this to be the case. In fact the post is open to interpretation, it was
ambiguous at best.

Ah, OK. I'll give you that. A cyclic buffer/string (aka circular)
means something very specific to me, as does rotation.
Yes Yes, I see what you are saying and in fact I also said you were correct
if this assumption stands, by the way, you dont need to capitalise youre
sentences in order for me to read them, I can read them just as well in
lower or proper case.

Sorry, I thought I was stating the bleedin' obvious... But if there
are different interpretations of the requirements, that may not be so.
:)

WMST RGDS, ;)
 
Thx Andy. This is a very good idea. However in my specific implementsion
this could lead to false positives:

String: 1 1 3 3
Double it: 1 1 3 3 1 1 3 3
IndexOf (3 3 1 3) = 3

Good thinking though, very clever :o)
 
Thx Andy, Good thinking. However in my implementation this algorithm could
lead to false positive:

Source: 1 1 3 3 -> Double 1 1 3 3 1 1 3 3
Compare: 3 3 1 1
IndexOf(3 3 1 1) <> -1

Good idea anyways, clever :o)

Alain
 
Thanks you all for your help. What I implemented so far is listed below.
Despite the brute force approach, it's still acceptable till the strings are
about 1 K long. I didn't barther comparing character per character. I
think the bulk of the CPU cycles are being spent in the string manipulation.
I could Split the string to an array but the string comes with no space and
unlike VB6 inplementation of the Split function, .NET doensn't split string
if the split character is an empty string (ie. Spilt("12345","") =
Array(0)="12345"). I can't find a way to split a string without separators.
Stupid implementation!

Anyways I don't think I can avoid goint to an Array representation of the
string since the real requirement really has 2D string with circular
bourdary condition (Spherical String) and 3D as well (Hyperspherical
Strings).

AL


Public Function CircStrComp(ByRef Str1 As String, ByRef Str2 As String)
As Boolean
' Using plain Strings
Dim TmpStr1 As String = Str2
Dim x As Integer

For x = 1 To TmpStr1.Length
If Str1 = TmpStr1 Then
Return True
Else
' Rotate String
TmpStr1 = TmpStr1.Remove(0, 1) & TmpStr1.Chars(0)
End If
Next
End Function
 
Thx Andy. This is a very good idea. However in my specific implementsion
this could lead to false positives:

String: 1 1 3 3
Double it: 1 1 3 3 1 1 3 3
IndexOf (3 3 1 3) = 3

Eh? No it isn't. IndexOf 3313 = -1
Good thinking though, very clever :o)

The algorithm supplied sovles your problem, does it not? It has been
challenged, but no evidence has been provided to show that it doesn't
work. I have tried to demonstrate how it solves the problem, but even
that hasn't been read.

In summary, this is a very trivial problem (a common one at that) and
I'm a little bemused by it causing so much confusion. Perhaps it's the
alcohol, perhaps it's the fact that I have just watched the most
"Boring Rugby" in my life... and beating the Aussies. <g> ;)

Rgds,
 
Back
Top