Looking for List comparison algorithm

  • Thread starter Thread starter Ben Fidge
  • Start date Start date
B

Ben Fidge

I have a requirement to compare two lists, generating a third which is a
combination of both.

With the first list as the reference list and the second as the test
list, I want the each of the result list's elements to contain a moniker
denoting whether that element:

1..Exists only in the reference list

or

2..Exists in boths lists

or

3...Exists only in the test list

In particular, I would like to use the most efficient algorithm possible
using the least amount of iterations and the least amount of
comparisons.

Any ideas?

Ben Fidge
 
Ben Fidge said:
I have a requirement to compare two lists, generating a third which is a
combination of both.

With the first list as the reference list and the second as the test
list, I want the each of the result list's elements to contain a moniker
denoting whether that element:

1..Exists only in the reference list

or

2..Exists in boths lists

or

3...Exists only in the test list

In particular, I would like to use the most efficient algorithm possible
using the least amount of iterations and the least amount of
comparisons.

Any ideas?

Do the elements have any kind of natural ordering? Does the order of
the final list matter? How big are the lists? Can you put the lists
into maps (mapping each element to itself to make lookup easier)? Do
you know whether two elements which are in both lists will actually be
references to the same object, or will they just display object
equality (i.e. calling Object.Equals)?
 
Hi Jon,

Thanks for your response. Here's my reply to your questions:

Q1 - "Do the elements have any kind of natural ordering?"
A1 - They are sorted alphabetically in ascending order by name.

Q2 - "Does the order of the final list matter?"
A2 - It would nice to have them sorted alphabetically at the end. This
can be done afterwards however.

Q3 - "How big are the lists?"
A3 - They could potentially contain a couple of thousand items each,
although several hundred is more likely.

Q4 - "Can you put the lists into maps (mapping each element to itself to
make lookup easier)?"
A4 - The ref and test lists are readonly collections of readonly
objects. The result list contains objects that reference either the ref
item, test item or both.

Q5 - "Do you know whether two elements which are in both lists will
actually be references to the same object, or will they just display
object equality (i.e. calling Object.Equals)?"
A5 - The objects in the ref and test lists are distinct and are compared
by the string names.

Thanks for your help,

Ben
 
Ben Fidge said:
Thanks for your response. Here's my reply to your questions:

Q1 - "Do the elements have any kind of natural ordering?"
A1 - They are sorted alphabetically in ascending order by name.

Q2 - "Does the order of the final list matter?"
A2 - It would nice to have them sorted alphabetically at the end. This
can be done afterwards however.

Q3 - "How big are the lists?"
A3 - They could potentially contain a couple of thousand items each,
although several hundred is more likely.

Q4 - "Can you put the lists into maps (mapping each element to itself to
make lookup easier)?"
A4 - The ref and test lists are readonly collections of readonly
objects. The result list contains objects that reference either the ref
item, test item or both.

Q5 - "Do you know whether two elements which are in both lists will
actually be references to the same object, or will they just display
object equality (i.e. calling Object.Equals)?"
A5 - The objects in the ref and test lists are distinct and are compared
by the string names.

Right. The lists being sorted makes things a *lot* easier. Basically,
you keep to indexes - one into the reference list and one into the test
list.

If you think about the lists being like this:

Test Ref

A
B
C
D
E E
F
G
H

etc an algorithm should present itself - you fetch the first entry of
each list, and find which one is "lower" - then keep adding entries
from that list until you find one which is equal to or greater than the
first entry of the other list - at which point you swap, etc.

That's a little bit vague, but hopefully gives you some idea of how to
proceed. If not, let me know and I'll write up some code for you.
 
Hi Jon,

Thanks for your response. Here's my reply to your questions:

Q1 - "Do the elements have any kind of natural ordering?"
A1 - They are sorted alphabetically in ascending order by name.

Q2 - "Does the order of the final list matter?"
A2 - It would nice to have them sorted alphabetically at the end. This
can be done afterwards however.

Q3 - "How big are the lists?"
A3 - They could potentially contain a couple of thousand items each,
although several hundred is more likely.

Q4 - "Can you put the lists into maps (mapping each element to itself to
make lookup easier)?"
A4 - The ref and test lists are readonly collections of readonly
objects. The result list contains objects that reference either the ref
item, test item or both.

Q5 - "Do you know whether two elements which are in both lists will
actually be references to the same object, or will they just display
object equality (i.e. calling Object.Equals)?"
A5 - The objects in the ref and test lists are distinct and are compared
by the string names.

Thanks for your help,

Ben
 
Jon,

Thanks a lot. That's very similar to what I concluded last night while
pondering on the subject. I didn't manage to get it to work before
giving up, so with your infinite wisdom at hand, I'll have another stab
tonight.

Thanks

Ben
 
Ben Fidge said:
I would be very interested to see how you approach this as my implementation
is returning duplicate entries in the result list. I found an example in
Java at the following site:

http://www.cs.sjsu.edu/faculty/fecteau/cs46b/ListMerge.html

with the actual Java code as follows:

http://www.cs.sjsu.edu/faculty/fecteau/cs46b/mainCS46B.html

If you have time, could I take you up on the offer of some sample code.

Sure. This example doesn't keep track of which element came from where, but
that's very easy to do by making each element of the new list an object
containing the original element and an enum value.

Let me know if you have any problems with the code below:

using System;
using System.Collections;

public class Test
{
public static void Main()
{
ArrayList a = new ArrayList (new object[]
{"ant", "attack", "bee",
"car", "frog",
"joke", "rage", "tree"});
ArrayList b = new ArrayList (new object[]
{"bar", "car", "ear",
"got", "joke", "smoke"});

IList merged = Merge (a, b);

foreach (string x in merged)
{
Console.WriteLine (x);
}
}

static IList Merge (IList a, IList b)
{
IList ret = new ArrayList();

IEnumerator enumA = a.GetEnumerator();
IEnumerator enumB = b.GetEnumerator();

enumA.Reset();
enumB.Reset();

bool aValid, bValid;
IComparable aElt = GetNext (enumA, out aValid);
IComparable bElt = GetNext (enumB, out bValid);

while (aValid && bValid)
{
int comp = aElt.CompareTo (bElt);
if (comp < 0)
{
ret.Add (aElt);
aElt = GetNext (enumA, out aValid);
}
else if (comp==0)
{
// No need to add both aElt and bElt,
// as they're the same
ret.Add (aElt);
aElt = GetNext (enumA, out aValid);
bElt = GetNext (enumB, out bValid);
}
else
{
ret.Add (bElt);
bElt = GetNext (enumB, out bValid);
}
}

while (aValid)
{
ret.Add (aElt);
aElt = GetNext (enumA, out aValid);
}

while (bValid)
{
ret.Add (bElt);
bElt = GetNext (enumB, out bValid);
}
return ret;
}

static IComparable GetNext (IEnumerator enumerator,
out bool valid)
{
valid = enumerator.MoveNext();
return valid ? (IComparable) enumerator.Current : null;
}
}
 
Jon,

Thanks a lot, that certainly clarified things for me. I made a few
tweaks however and managed to combine the two trailing while loops into
the main loop.

This was done by replacing the "and" in the loop termination condition
with an "or". Then having an if..if..else in the middle.

So..

iRefIndex = 1; // We are using COM collections
iRefCount = RefCollection.Count;
iTestIndex = 1;
iTestCount = TestCollection.Count;

while (iRefIndex <= iRefCount || iTestIndex <= iTestCount) {
if (iRefIndex > iRefCount) {
// We have an extra item in TestCollection
iTestIndex++;
}
else if (iTestIndex > iTestCount) {
// We have an extra item in RefCollection
iRefIndex++;
}
else {
// Do comparison logic here
string sRefItem = RefCollection.Item(iRefIndex);
string sTestItem = TestCollection.Item(iTestIndex);
int iCompare = sRefItem.CompareTo(sTestItem);

if (iCompare == 0) {
// We have a matching pair
iRefIndex++;
iTestIndex++;
}
else if (iCompare < 0) {
// We have added an item in RefCollection
iRefIndex++;
}
else {
// We have added an item in TestCollection
iTestIndex++
}
}
}

Simply really once it becomes clear. Thanks for your help Jon, please
remain on standby for my next algorithm 'angover!

Ben
 
Back
Top