Sorting lists in .Net - why it sucks

  • Thread starter Thread starter nightwatch77
  • Start date Start date
N

nightwatch77

Today I stumbled upon a very interesting 'feature' of .net 2.0
My application uses a generic SortedList class for storing a sorted
list of objects (on a DateTime key) - that is, SortedList<DateTime,
object>. However, the program kept crashing for some time until I
debugged it to find out that it crashes on inserting data to the
sorted list. Problem was that I was inserting two items with the same
date (the same key).
Hey, how do you think, should sorted list just crash when there is a
duplicate value? Normal lists are expected to store duplicate values,
but some great inventor at Microsoft has made it better. They have
documented this behavior, but why they assume sorting shouldn't work
for duplicate entries in the list? Normal sorting algorithms are just
fine with that.
Of course, I had to use non-generic ArrayList instead with custom
IComparer for sorting.
And there goes another surprise - ArrayList.Sort implements an
unstable sort algorithm. That is, for equal objects, it does not
guarantee that after sorting they will keep the original order. Of
course this is not documented at all. And of course causes problems -
in my case it would make sequential messages with timestamp come in
wrong order.

I don't know what sort algorithm did Microsoft use, but simple
QuickSort implementation is a stable algorithm and has no problem with
sorting duplicated values. When Microsoft recruits any developer, they
ask him to implement some list sorting algorithm during job interview,
so I thought all developers in Microsoft are capable of implementing
it correctly.

Best regards,
RG
 
For info, Wintellect (http://www.wintellect.com/PowerCollections.aspx) has
written a set of generic collection classes that support what you are
looking for (OrderedBag<T>).

From Jeffrey Richter Book:
---------------------------
At Microsoft's request, Wintellect has produced the "Power Collections
Library" to bring some of the C++ Standard Template Library's collection
classes to the CLR programmer.

- José
 
nightwatch77 said:
Today I stumbled upon a very interesting 'feature' of .net 2.0
My application uses a generic SortedList class for storing a sorted
list of objects (on a DateTime key) - that is, SortedList<DateTime,
object>. However, the program kept crashing for some time until I
debugged it to find out that it crashes on inserting data to the
sorted list. Problem was that I was inserting two items with the same
date (the same key).
Hey, how do you think, should sorted list just crash when there is a
duplicate value? Normal lists are expected to store duplicate values,
but some great inventor at Microsoft has made it better. They have
documented this behavior, but why they assume sorting shouldn't work
for duplicate entries in the list? Normal sorting algorithms are just
fine with that.
Of course, I had to use non-generic ArrayList instead with custom
IComparer for sorting.
And there goes another surprise - ArrayList.Sort implements an
unstable sort algorithm. That is, for equal objects, it does not
guarantee that after sorting they will keep the original order. Of
course this is not documented at all. And of course causes problems -
in my case it would make sequential messages with timestamp come in
wrong order.

I don't know what sort algorithm did Microsoft use, but simple
QuickSort implementation is a stable algorithm and has no problem with
sorting duplicated values. When Microsoft recruits any developer, they
ask him to implement some list sorting algorithm during job interview,
so I thought all developers in Microsoft are capable of implementing
it correctly.

Best regards,
RG

You should create your own class that encapsulates a DateTime object and
implements IComparable. You just need to handle the equals case as you see
fit. Here is an example of what I mean:

namespace SortedListTest
{
public class ComparableDateTime : IComparable
{
private DateTime mDT;
public int CompareTo ( object obj )
{
if (mDT.Ticks < ((ComparableDateTime) obj).mDT.Ticks) return -1;
return 1;
}

public ComparableDateTime ( long milliseconds )
{
mDT = new DateTime ( milliseconds );
}
}

class Program
{
static void Main ( string [ ] args )
{
SortedList<ComparableDateTime, string> msl;
msl = new SortedList<ComparableDateTime, string> ();

msl.Add ( new ComparableDateTime ( 123456786 ), "string one" );
msl.Add ( new ComparableDateTime ( 123456786 ), "string two" );

Console.Out.WriteLine ( );

foreach (string s in MySortedList.Values)
{
Console.WriteLine ( s );
}

Console.Out.WriteLine ( );
Console.ReadLine ();
}
}
}
 
When you're done with the venom, this is clearly documented:

http://msdn2.microsoft.com/en-us/library/ms132319.aspx
"Every key in a SortedList<(Of <(TKey, TValue>)>) must be unique."

(equally, the exception tells you very clearly what the problem is)

With regards to unstable sort, this too is clearly documented:
http://msdn2.microsoft.com/en-us/library/b0zbh7b6.aspx
"This implementation performs an unstable sort; that is, if two
elements are equal, their order might not be preserved."

I might be mistaken but I seem to recall that the LINQ sort is stable.
Likewise in like there is an ILookup<TKey,TValue> for duplicated keys,
but the default implementation (Lookup<TKey, TValue>) is immutable. It
took me only a few moments to write a mutable ILookup<TKey, TValue>
implementation, however.

Marc
 
I don't know what sort algorithm did Microsoft use, but simple
QuickSort implementation is a stable algorithm and has no problem with
sorting duplicated values.

I assume by "simple" you mean "using extra memory rather than sorting
in place". The more efficient in-place quick sort isn't stable.

Jon
 
Hi,

- Family Tree Mike, you probably didn't notice I have implemented
IComparer, thanks anyway.
- Marc, I know Microsoft has documented the weird SortedList behavior,
but does it add more sense to the API?
- Jon, you're right, in-place quicksort is unstable, but anyway, there
is a stable version.
Still, the sorting functionality is very essential to a collections
API, just like a good hashtable implementation. It should be done
better and programmers should have the option to select sorting
algorithm.
best regards
RG
 
I guess what I was too subtle about was, that by

1. writing an IComparable (not IComparer) class wrapping DateTime,
2. that the comparison routine never returns equal,

the code does not crash on equal keys.
 
I guess what I was too subtle about was, that by

1. writing an IComparable (not IComparer) class wrapping DateTime,
2. that the comparison routine never returns equal,

the code does not crash on equal keys.

Mike, your solution is very simple but doesn't help a bit because it
doesn't guarantee stable sorting. Equal timestamps is something I have
to handle in a certain way, that is I have to get them in the same
order as in input, and your IComparable just makes sure two timestamps
are never equal - that solves some problem, but not mine.
 
To restate - the LINQ sort is stable:
http://technet.microsoft.com/en-us/library/bb549422.aspx
"This method performs a stable sort; that is, if the keys of two
elements are equal, the order of the elements is preserved."

If .NET 3.5 isn't an option, then hacky though it sounds, perhaps
consider adding a member to keep track of the original insertion order
(just an int), and then use that as a secondary after your primary
sort condition(s) - i.e. the following will give you a stable sort
(assuming you give each Foo incremental Sequence values, perhaps using
something in the ctor):

class Foo {
public int Sequence { get; set; }
public double Value { get; set; }
}
public class FooComparer : IComparer<Foo> {
int Compare(Foo x, Foo y) {
// primary sort
int result = x.Value.CompareTo(y.Value);
// backup for stability
if (result == 0) {
result = x.Sequence.CompareTo(y.Sequence);
}
return result;
}
}

Yes it is duplication...

Third option: use an external library.

Marc
 
Back
Top