Intersect of 2 strings

  • Thread starter Thread starter Mark Rogers
  • Start date Start date
kndg said:
Your comments was very helpful.
Here is the bug-fix version.
Any inputs is highly appreciated.

Looks lots better. You do seem to have addressed both of the issues I
mentioned.

That said, note that you have an implicit third inner loop now. Your
code is equivalent to this (mostly…I added a couple of things):

for (int i = 0; i < str1.Length; i++)
{
int cchLeft1 = str1.Length - i;

// Can't do better than what we've got…bail!
if (cchLeft1 <= matchLength)
{
break;
}

for (int j = 0; j < str2.Length; j++)
{
int cchLeft2 = str2.Length - j;

if (cchLeft2 <= matchLength)
{
break;
}

int cchMax = Math.Min(cchLeft1, cchLeft2);
int ich1 = i;

for (int ich2 = j; ich2 < j + cchMax; ich1++, ich2++)
{
if (str2[ich2] != str1[ich1])
{
break;
}
}

if (ich1 - i > matchLength)
{
matchIndex = j;
matchLength = ich1 - i;
}
}
}

(Actually, not quite equivalent, because in rewriting your code, I found
another bug: you only updated matchIndex and matchLength if you found a
non-equal character; if it happened that the intersection was the entire
second string, you'd never hit the code to set matchIndex and
matchLength. I moved the check out of the inner-most loop – the new
third loop – so it's always evaluated, even if you never find a
non-equal character).

(I also added a minor optimization so that the code doesn't waste time
looking for more matches if it's already found one at least as large as
is possible to find in the remaining characters).

I think that writing the third loop explicitly, and in fact changing the
second loop from a while to a for so that it matches the other two,
helps the code flow a bit better. YMMV. :)

I'm pleased to see that there was in fact a three-loop solution (at
least, I think there is…I haven't been able to find anything fatally
wrong with this latest solution), but I was really hoping I had been
wrong and that a two-loop one was going to come out of this. :)

Of course, the fact remains that the original problem statement is
poorly defined. As we can see from the code you posted most recently
and my modifications, some string pairs may well have two or more
"intersections" of the same length, but of course we can return only one
(cue "Highlander" music… :) ). The whole thing seems kind of arbitrary
to me, at least approaching the problem this way (there are other
definitions of "intersection" that may make more sense).

Pete
 
Peter said:
[...]
(Actually, not quite equivalent, because in rewriting your code, I found
another bug: you only updated matchIndex and matchLength if you found a
non-equal character; if it happened that the intersection was the entire
second string, you'd never hit the code to set matchIndex and
matchLength. I moved the check out of the inner-most loop – the new
third loop – so it's always evaluated, even if you never find a
non-equal character).

Actually, I got it covered. It is just outside my while-loop. But I hate
it very much because it just duplicating what I wrote inside the loop.
Yours is much better.
(I also added a minor optimization so that the code doesn't waste time
looking for more matches if it's already found one at least as large as
is possible to find in the remaining characters).

Yeah, I had missed that.
I think that writing the third loop explicitly, and in fact changing the
second loop from a while to a for so that it matches the other two,
helps the code flow a bit better. YMMV. :)

I'm pleased to see that there was in fact a three-loop solution (at
least, I think there is…I haven't been able to find anything fatally
wrong with this latest solution), but I was really hoping I had been
wrong and that a two-loop one was going to come out of this. :)

:) but I think a two-loop is nearly impossible. Besides, this code
already fast enough to do the bidding.
Of course, the fact remains that the original problem statement is
poorly defined. As we can see from the code you posted most recently
and my modifications, some string pairs may well have two or more
"intersections" of the same length, but of course we can return only one
(cue "Highlander" music… :) ). The whole thing seems kind of arbitrary
to me, at least approaching the problem this way (there are other
definitions of "intersection" that may make more sense).

Yes, agreed. But I will leave that to the OP to do the modification.
I think it is not that hard...

Anyway, a great discussion.
Thanks for your comments and advices (and your patience also).
It is always a good pleasure to have the code reviewed by a more
experience developer.

Cheers.
 
kndg said:
Peter said:
[...]
(Actually, not quite equivalent, because in rewriting your code, I
found another bug: you only updated matchIndex and matchLength if you
found a non-equal character; if it happened that the intersection was
the entire second string, you'd never hit the code to set matchIndex
and matchLength. I moved the check out of the inner-most loop – the
new third loop – so it's always evaluated, even if you never find a
non-equal character).

Actually, I got it covered. It is just outside my while-loop. But I hate
it very much because it just duplicating what I wrote inside the loop.

Oh, I see. That's what I get for focusing on just the inner loop. :)
[...]
Anyway, a great discussion.
Thanks for your comments and advices (and your patience also).
It is always a good pleasure to have the code reviewed by a more
experience developer.

I enjoyed it. It's always a pleasure to be able to say something vague
and have someone skilled do the hard work of turning it into code. :)

Pete
 
kndg said:
I haven't look closely at your code (looks cool), but it failed (as I
understand from the OP's requirement) for below strings;

string str1 = "You've given an example of a string";
string str2 = "I'm trying to extract an Intersect";

I suppose it would return " an " but instead it give an empty string.

This amply demonstrates the trouble with the problem statement -- you and I
read the same post and came to two very different conclusions as to what
"intersect" means!
 
Back
Top