Hi I have an array list in C# and I need to remove duplicates of data
groups
in an array list. After the array list is populated each group of data
is 5
elements in order (0-4,5-9,10-14), name (string), number (string),
user(string), startdate(datetime), enddate(datetime).
I would say that this example is a great example of how _not_ to store
your data, as well as an example of the hazards of using a type-agnostic
data structure like ArrayList.
I think it's a really bad idea to store data items in a collection in the
way you've done it. For a variety of reasons, but mainly because a
collection ought to have a direct mapping between a single data type and
an element in the collection. There's no way to describe what your
collection actually contains, because each element is different from 80%
of the other elements.
As far as your specific question goes...
I would first suggest that you stop using an ArrayList like that. Create
a new data structure that contains each of the data elements you're
referencing, and then store that data structure in a collection. Then,
make sure the new data structure implements IComparable, so that you can
do things like sorting and comparing for equality.
Then you can just store the data in a List<>, sorting the list after
you've received all of the data and removing the duplicates with a linear
scan through the list. Yet another alternative would be to use a
SortedList<>, adding items only if they already aren't in the list. This
avoids having to sort at the end, but of course you have the overhead of
inserting elements as you go along.
Alternatively, you can use a Dictionary<> containing your data, with the
data structure itself being its own key. That provides a fast and
convenient way to ensure that you only ever have one instance of any given
group of data. You'd have to override the Object.GetHashCode() method for
your new data structure though, basing your hash code on the data in the
object.
A dictionary would be faster than the sorted list, with the trade-off that
it requires you also implement GetHashCode(), and of course you don't get
a sorted list of data in the end (but that may not be needed anyway).
If for some reason you _must_ have this data in an ArrayList that looks
like what you've described, I would still do the above, but the convert
the results back to an ArrayList as needed. You could either brute-force
the problem by scanning the ArrayList considering five elements at a time
(easy to write, but terrible performance), or manually sorting the list,
again using five elements at a time for the sort (harder to write, but
good performance). But the framework won't give you any help there, and
while it's not hard to implement your own sort, it would be a pain
(especially given the "five elements at a time" requirement that would
lead you to have to do that in the first place) and certainly much more
trouble than just creating a new structure that implements IComparable.
If you have a data structure like this:
struct DataItem : IComparable<DataItem>
{
readonly string Name;
readonly int Number;
readonly string User;
readonly DateTime StartDate;
readonly DateTime EndDate;
public DataItem(string Name, int Number, string User, DateTime
StartDate, DateTime EndDate)
{
this.Name = Name;
this.Number = Number;
this.User = User;
this.StartDate = StartDate;
this.EndDate = EndDate;
}
static int CompareTo(DataItem diOther)
{
int compareResult;
compareResult = Name.CompareTo(diOther.Name);
if (compareResult != 0)
{
goto Done;
}
compareResult = Number.CompareTo(diOther.Number);
if (compareResult != 0)
{
goto Done;
}
compareResult = User.CompareTo(diOther.User);
if (compareResult != 0)
{
goto Done;
}
compareResult = StartDate.CompareTo(diOther.StartDate);
if (compareResult != 0)
{
goto Done;
}
compareResult = EndDate.CompareTo(diOther.EndDate);
Done:
return compareResult;
}
}
Then you can write code like this:
void AddDataItem(SortedList<DataItem, object> list, DataItem di)
{
if (!list.ContainsKey(di))
{
// All we really care about is the key, so don't bother
// with a non-null value
list.Add(di, null);
}
}
Or, not doing any of the duplicate-removing work until the end (i.e. using
a List<>):
// Do this for each new data item
void AddDataItem(List<DataItem> list, DataItem di)
{
list.Add(di);
}
// Once you've got all the data, do this
void RemoveDuplicates(List<DataItem> list)
{
if (list.Count > 0)
{
list.Sort();
int idi = 1;
DataItem di = list[0];
while (idi < list.Count)
{
if (di.CompareTo(list[idi]) == 0)
{
list.RemoveAt(idi);
}
else
{
di = list[idi++];
}
}
}
}
Some notes:
* The above isn't optimized. For example, you could improve the
removal performance by going backwards. But you said you've only got
hundreds of elements, and they come from a web service, so it seems
unlikely you'd get enough of a performance improvement to make it worth
obufscating the code here.
* At least in the case of the List<> class, you can sort without the
elements implementing IComparable, as long as you provide a comparison
delegate at the time you do the sort. So strictly speaking you don't need
to implement IComparable for your data structure (the CompareTo() method
could just be passed directly to the Sort() method as the comparer
method). Since the code has to go _somewhere_ and since there are other
potential benefits to implementing IComparable, I prefer doing so. I
think that using the Sort() overloads that take comparers are more useful
for data types you don't have control over.
Pete