.NET 2.0 performance bug in ArrayList.Sort

  • Thread starter Thread starter Guest
  • Start date Start date
Jon said:
I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the
speed-up of some is comparable to the slow-down of others, that it's
a reasonable change - it sounds like you're just unlucky in your data.

Though that's IMHO irrelevant here, as ArrayList.Sort simply calls
Array.Sort(items... ) under the hood, which means that it should
resolve in teh same results as a call to Array.Sort()

(with the exception ofcourse that the call to Array.Sort in this
particular example isn't equal to the ArrayList.Sort due to the
differences in parameters and therefore can't be compared.)

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
Alex said:
I am aware of that - I mentioned boxing/unboxing overheads. The
issues are: a) its a LOT slower than in .NET 1.1 <-- this alone
makes it unacceptable b) its DRAMATICALLY slower with some data

NO. There's no boxing involved but you ignore the fact that the
Array.Sort() call YOU make is completely different from the
Array.Sort() call the ArrayList.Sort() makes as the ArrayList uses a
general comparer, not a string specific comparer!
I can't say for sure what's flawed but something certainly is, its no
so much slower performance of ArrayList.Sort that bothers me, its
much slower performance than same code for .NET 1.1,

This is logical, as the comparers for strings etc. have changed in
..NET 2.0.

I've a tight schedule so I won't run your code, though if you pass a
string specific comparer to the ArrayList.Sort call, you'll likely see
that the results are vastly improving.

That's what I was trying to illustrate. Your calls show different
results, and that's understandable because they're actually doing
completely different things, however you insist that they're the same.
Well, they're not.

FB

ps: majestic12... the old amiga group?


--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
Frans Bouma said:
Though that's IMHO irrelevant here, as ArrayList.Sort simply calls
Array.Sort(items... ) under the hood, which means that it should
resolve in teh same results as a call to Array.Sort()

(with the exception ofcourse that the call to Array.Sort in this
particular example isn't equal to the ArrayList.Sort due to the
differences in parameters and therefore can't be compared.)

Yup - unless you provide the same parameters (the important one being
the IComparer), at which point you get the same results - if you use
Comparer.Default, you get far, far more comparisons.

For there to be more comparisons, of course, presumably the comparisons
must be returning different results, although I can't see why. Hmm.
 
Alex,

I made this test to see what it was with equal objects (of course in VB in
my case)

Your file is sorted. If I turn that file around I get a quit normal figurs.
However as it is already sorted, it takes an extreme long time. This I call
a bug.

\\\
Public Class Test
Public Shared Sub Main()
Dim b As New ArrayList
Dim sr As New IO.StreamReader("C:\slow_data.txt")
Dim line As String = sr.ReadLine
Do While line IsNot Nothing
b.Add(line)
line = sr.ReadLine
Loop
Dim alist As New ArrayList
' For i As Integer = 0 To b.Count - 1
For i As Integer = b.Count - 1 To 0 Step -1
alist.Add(b(i))
Next
Dim afixed(alist.Count - 1) As Object
alist.CopyTo(afixed)
Dim start As Integer = Environment.TickCount
alist.Sort()
Dim AListtime As Integer = Environment.TickCount - start
start = Environment.TickCount
Array.Sort(afixed)
Dim Afixedtime As Integer = Environment.TickCount - start
MessageBox.Show("Arraylist = " _
& AListtime & vbCrLf & "fixed = " & Afixedtime)
End Sub
End Class
///

Cor

Alex Chudnovsky said:
I have come across with what appears to be a significant performance bug in
.NET 2.0 ArrayList.Sort method when compared with Array.Sort on the same
data. Same data on the same CPU gets sorted a lot faster with both methods
using .NET 1.1, that's why I am pretty sure its a (rather serious) bug.
Below
you can find C# test case that should allow you to reproduce this error,
to
run it you will need to put 2 data files into current directory where
executable is or just change filename pathes accordingly, the data files
can
be obtained from here:

fast_data.txt:
http://majestic12.co.uk/files/other/dotnet2bug/fast_data.txt
slow_data.txt:
http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine (AMD
x2
3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good in
.NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are MUCH
slower in ArrayList.Sort, and particularly for the "slow_data.txt" file,
Array.Sort times are actually better, so I am not complaining there, but
clearly ArrayList sorts are seriously flawed - this appears to be data
dependent and by that I don't mean size of the data or number of strings:
I
have got lots of such data files and about every 10th of them is 10-20
times
slower than the other even though it has got about the same number of
lines
and bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with
ArrayLists,
however in this case the slowdown is really bad comparing to .NET 1.1 and
it
appears to be data dependent - I am getting it on about 10% of my dataset
from which I have selected 2 examples (slow and fast) to demonstrate that
its
a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////////
using System;
using System.Collections;
using System.IO;

/*
This is a test case of what appears to be a major performance issue in
ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly sized
data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <[email protected]>
Date: 28 Apr 2006
*/


namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take a LOT
more time to do the sorting
/// </summary>
class SlowArrayListSortTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same place as
the executable
TestFile("slow_data.txt"); // <--- this data file has got 10
times slower
TestFile("fast_data.txt"); // <--- more reasonable 2 times
slower

Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRead(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oFS);

while(oSR.BaseStream.Position<oSR.BaseStream.Length)
{
oLines.Add(oSR.ReadLine());
}

oFS.Close();

Console.WriteLine("Loaded {0} lines from file
{1}",oLines.Count,sFile);

// now copy same strings into string array for speed
comparisons
string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Length;i++)
sLines=(string)oLines;

DateTime oTime=DateTime.Now;
oLines.Sort();
Console.WriteLine("Time to sort strings in ArrayList is: {0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);

oTime=DateTime.Now;
Array.Sort(sLines);
Console.WriteLine("Time to sort strings in string[] array is:
{0} msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);
}
}
}

//////////////////////////////////////////////////////////////////////////
 
Thanks for the feedback!

The files that I sort are indeed partially sorted - in some cases they _may_
be fully sorted (but I don't know that so I have to sort them), and of course
I can and did change my sorting code to avoid this bug, that's not the
problem: I already have workaround (use Array.Sort), the only reason I posted
this is for the benefit of others in hope to replicate this issue and bring
Microsoft's attention to get it fixed: there is just no excuse for sorter to
take so much more time than in .NET 1.1, hence it affects legacy software
that may not be changed due to lack of source, thus fix is clearly needed in
the .NET runtime.

Frans Bouma:
Your calls show different results, and that's understandable because
they're actually doing completely different things, however you insist that
they're the same. Well, they're not.

Try running same code under .NET 1.1 - you will see that ArrayList.Sort
performance there just fine, however it is a dog in .NET 2.0, thus I _am_
doing the _same_ things by executing software that was well debugged and
worked fine under .NET 1.1, however it was slowed down to crawl (which is
what made me investigate) under .NET 2.0. Workaround is not an issue - I
already have done it before I posted test case, my motivations for doing it
explained above.

Markus: my applications deals with billions of urls that need to be sorted,
those files are partially sorted elsewhere, and in some cases _may_ be fully
sorted, I however do not have such guarantee thus need to do another sort
(this will go away but not immediately), in any case I said many times
workaround is NOT an issue: I already have one, thanks!

I actually think I know where the bug may be: on first sight it appears that
its Comparer.Default that's at fault here: with it even Array.Sort takes long
time, however I think the issue may be in the actual sorting algorithm that
involves custom comparer in principle: I have changed code to do Array.Sort
with my own comparer and the sorting is still slow: number of comparisons
made is 147,408,713 <--- this is way too many and is primary reason for
slowdown: excessive comparisons. Updated source code for this test can be had
here:
http://majestic12.co.uk/files/other/dotnet2bug/program.cs

I think this allows to narrow down problem to the actual issue in Array.Sort
with custom Comparer - perhaps ArrayList was changed to use Default Comparer
rather than just do a call to sort like it (probably) did in .NET 1.1, which
is what made ArrayList slow, but in reality problem is deeper. The
interesting thing is that Array.Sort with null comparer is not affected, this
leads me to believe that there are ought to be 2 sorting paths, one of which
(no comparer present) is correct, however the other one is not correct for
cases when data is partially or fully sorted. _BOTH_ code paths should use
the same logic, the only difference between them should be call to supplied
custom Comprarer. Indeed, if one with null comparer does job very fast then
its no excuse for the other one to fail so miserably in this case. Hence, the
problem is in the runtime code.

I am inclined to post a simplified test case as a .NET 2 bug in Array.Sort
with custom comparer provided (and by extention all ArrayList Sorts). Does
anyone disagree with my conclusions and next action, and can someone point me
in the best direction where this bug report will actually be taken advantage
of?
 
I've located what is happening. When you call Array.Sort(string) it is
actually calling Array<string>.Sort(string).

This uses the ArraySortHelper<string>.Default.Sort method.

On the other hand, ArrayList is calling Array.Sort which in turn seems
to be using Array+SortObjectHelper.QuickSort which is where the slow
down is occuring.

The only differerce to these two calls is that Array+SortObjectHelper
makes three extra comparisons to decide upon whether the left, middle,
or right value should be used as a pivot.

Tried a test running the same quick sort object helper uses but without
the call to GetPivotValue and it runs in the same time the other sorts
did (140ms compared to 13,281ms).

I then tried a test running the comparisons used in get pivot but always
returning the middle value as the pivot (the same way ArraySortHelper
works) and the method still ran at around 140ms.

Seems to have something with it's choice of pivot as if you only ever
pivot around the middle index each time it works as you would expect
(this is infact the same behavior as ArraySortHelper).

The string comparisons have no effect as both sort methods are using the
same comparisons.

The only difference I found was this choice of pivot. Leaving everything
else in there, but having it always use the middle index as the pivot
produced the expected results.

Must be something about the pattern of your data that just happens to
cause it to fall into an O(n2) pattern.

Was looking at the pattern here:
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

But the quicksort used doesn't seem to quite be median of three, and
your data doesn't follow that pattern. Strangly the bad data in this
case is infact the more sorted data. There is one spike though, this
element is near the start and is moved to near the end, could this be
causing the quicksort to break down?
 
Chris said:
Must be something about the pattern of your data that just happens to
cause it to fall into an O(n2) pattern.

I agree that its data dependent and indeed ironically the slowdown seems
to happen with partially or fully sorted data (one might naively assume
it would make secondary sorting faster, not slower): in my case I do not
know in advance if data is fully or partially sorted, thus have to sort
again, however this is not really relevant - more importantly if it
works fast in .NET 1.1 then it should not be much slower in .NET 2.0.

If _all_ other .NET sorting methods (and its quicksort everywhere,
right?) were producing the same slow result on the same data, then I
could have accepted it, however there is really no justification for
such poor performance that appears to happen only if you specify
non-null comparer: this seems to lead to another code path (rightfully
so, since it allows to avoid check for custom comparer inside loop), but
that codepath is not as optimal as the other one.

In effect I assert here that the following code will use different code
paths and one of them is broken:

1) Array.Sort(Items,0,Items.Length,null);

// specify any comparer even your own and performance is much slower
// due to more comparisons
2) Array.Sort(Items,0,Items.Length,Comparer.Default);

Note here: we can stop talking about ArrayList since the problem seems
to be located in Array.Sort with non-null comparer, which I presume is
not null in case of ArrayList.Sort, but since we pass string[] directly
to Array.Sort here it means that your theory about
Array+SortObjectHelper.QuickSort does not (or should not?) apply.

I see no reason why mere presense of custom comparer (so long as it
compares in the same way) should make any difference on sorting.

I know that source code of .NET 1.1 is available and can be checked, but
can the same be done for .NET 2.0, anyone got any pointers?


regards,

Alex
 
again, however this is not really relevant - more importantly if it
works fast in .NET 1.1 then it should not be much slower in .NET 2.0.

This seems to be a limitation to the quick sort algorithm in general.
Might be the old algorithm they were using worked slower on average with
other datasets? You're just unluck to now have a dataset that the new
version doesn't like. You could have had a dataset that sorted slow in
the 1.1 implementation and now sorts fast.

My only trouble with this, is if the new version is on average faster
why does the generic methods still sort using an old algorithm and
didn't use the new one?

If _all_ other .NET sorting methods (and its quicksort everywhere,
right?) were producing the same slow result on the same data, then I
could have accepted it, however there is really no justification for
such poor performance that appears to happen only if you specify
non-null comparer: this seems to lead to another code path (rightfully
so, since it allows to avoid check for custom comparer inside loop), but
that codepath is not as optimal as the other one.

In effect I assert here that the following code will use different code
paths and one of them is broken:

1) Array.Sort(Items,0,Items.Length,null);

// specify any comparer even your own and performance is much slower
// due to more comparisons
2) Array.Sort(Items,0,Items.Length,Comparer.Default);

not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)
which in turn calls
Array<string>.Sort(items, 0, items.Length-1, Comparer<string>.Default)

Using Comparer.Default stops the call to Array.Sort(string[] items) from
being changed to the generic Array<string>.Sort(string[] items)

Hence why it is using a different code path as Array<string>.Sort and
Array.Sort use different quick sort algorithims.

Array.Sort uses Array+SortObjectHelper to sort the list
Array<T>.Sort uses ArraySortHelper to sort the list

The performance difference between the two lies in their choice of pivot
value.

ArraySortHelper just takes, (left + ((right - left) / 2), thus always
the middle value.

Array+SortObjectHelper compares the left value, middle value and right
value to decide which to use (median of three?).
I see no reason why mere presense of custom comparer (so long as it
compares in the same way) should make any difference on sorting.

It isn't directly, its how the compiler deals with generic types and
automagicaly making a call to Array<T> when it recognises the
parameters.

I know that source code of .NET 1.1 is available and can be checked, but
can the same be done for .NET 2.0, anyone got any pointers?

Reflector can look at both .NET 1.1 and .NET 2.0
 
Its getting hotter, or "better" and "better" if you like :)

I know some of you were not happy about two things in my original test case:
a) that my strings were already sorted (why sort again?)
b) possible changes in string comparisons in .NET 2

Ok, lets look at more generic example, using Array.Sort on integers in a
self contained test that does not rely on external data, here is output
from new test that only uses Array.Sort on integer array with numbers
that were generated in the test itself:

-----------------------------------------------
Running under .NET v2.0.50727.42
Sorting 1000000 elements, already sorted: NO
Time to sort with null: 46 msec
Time to sort with default comparer: 46 msec
Time to sort with custom comparer: 9734 msec, compares: 20572897
-----------------------------------------------

Note: already sorted here means that numbers were actually sorted but in
reverse order, ie descending, where as out sort rearranges them to be in
ascending order.

See how long it took to sort them in case of custom defined comparer
which actually does nothing else but calls method of Comparer.Default?

Now, here is indeed very interesting output from when you run it under
..NET 1.1:

-----------------------------------------------
Running under .NET v1.1.4322.2032
Sorting 1000000 elements, already sorted: NO
Time to sort with null comparer: 203 msec
Time to sort with default comparer: 218 msec
Time to sort with custom comparer: 4531 msec, compares: 19000036
-----------------------------------------------

Its also slow, albeit not as slow, in .NET 1.1!!! 8-/

This actually explains why I had some fairly slow running sorts with
custom comparers. Profiler appears to show in both cases that most of
time is spend in Array+SorterGenericArray.QuickSort(int left, int
right), or more specifically inside children of that method that I can't
see, but its not actually the custom comparison function that accounts
for less than 10% of the sorting time.

The key here is not the comparisons per se - but actual number of calls
to them: its very very very high, why would it be so unless that special
"generic array sort" code path is flawed?

Full code (includes earlier string test) here:
http://majestic12.co.uk/files/other/dotnet2bug/program.cs

Comment out #define DOTNET2 if you want it to run under .NET 1.1.

regards,

Alex
 
Chris said:
The performance difference between the two lies in their choice of pivot
value.

ArraySortHelper just takes, (left + ((right - left) / 2), thus always
the middle value.
Array+SortObjectHelper compares the left value, middle value and right
value to decide which to use (median of three?).

But why have different choice of pivot values? It seems that
ArraySortHelper's way of doing things is better performance wise. IMO, a
call to a Sort routine should yield same performance for same data: just
because its generic object should not mean that potentially performance
of application can be degraded so badly.

I just posted message that seems to be fairly close to what you were
saying about those different sorters.

regards,

Alex
 
Interesting, I added a call to the generic compariter as well:

oTime = DateTime.Now;
Array.Sort(iData, 0, iData.Length, new
ComparerTest<int>());
Console.WriteLine("Time to sort with custom<int>
comparer: {0} msec, compares: {1}",
(DateTime.Now.Ticks - oTime.Ticks) /
TimeSpan.TicksPerMillisecond, ComparerTest<int>.iComparisonsNum);
CheckSortedOrder(iData);


and:


class ComparerTest<T> : IComparer<T> {
public static int iComparisonsNum = 0;

public int Compare(T a, T b) {
iComparisonsNum++;
return Comparer<T>.Default.Compare(a, b);
}
}


The results were:

Running under .NET v2.0.50727.42
Sorting 1000000 elements, already sorted: NO
Time to sort with null comparer: 46 msec
Time to sort with default comparer: 62 msec
Time to sort with custom<int> comparer: 796 msec, compares: 19000036
Time to sort with custom comparer: 9437 msec, compares: 20572897
 
Apparently relevant source code that is available here:
http://dotnet.di.unipi.it/content/sscli/docs/doxygen/fx/bcl/array_8cs-source.html

Shows interesting internal call:

[MethodImplAttribute(MethodImplOptions.InternalCall)]
private static extern int TrySZSort(Array keys, Array items, int
left,int right);

It is presumably very fast and is only called by if Comparer is Default
or null.

Perhaps what we are having here is very poor performance of .NET code
versus very highly optimised sorting routine that is called in most
cases, but not when comparer is present.

What I think may have changed in .NET 2.0 is that this high performance
method was not called in all cases where it was in .NET 1.1 :-/


regards,

Alex
 
-----------------------------------------------
Running under .NET v2.0.50727.42
Sorting 1000000 elements, already sorted: NO
Time to sort with null: 46 msec
Time to sort with default comparer: 46 msec
Time to sort with custom comparer: 9734 msec, compares: 20572897
-----------------------------------------------

I've located it, it seems with primative arrays it seems to call
TrySZSort

Running under .NET v2.0.50727.42
Sorting 1000000 elements, already sorted: YES
Time to sort with null comparer: 46 msec
Time to sort with default comparer: 46 msec
Time to sort with custom<int> comparer: 328 msec, compares: 0
Time to sort with custom comparer: 8937 msec, compares: 20572880
Time to sort with TrySZSort: 62 msec, retval = True


And the code:

oTime = DateTime.Now;
bool could_sort = (bool)
typeof(Array).InvokeMember("TrySZSort",
System.Reflection.BindingFlags.Static |
System.Reflection.BindingFlags.InvokeMethod |
System.Reflection.BindingFlags.Public |
System.Reflection.BindingFlags.NonPublic,
null, null, new object[] { iData, null, 0,
iData.Length - 1 });
Console.WriteLine("Time to sort with TrySZSort: {0}
msec, could_sort = {1}",
(DateTime.Now.Ticks - oTime.Ticks) /
TimeSpan.TicksPerMillisecond, could_sort);
if (could_sort) Checkcould_sortOrder(iData);



So now we have three code paths,

Array.Sort -> Array+SorterObjectArray
Array<T>.Sort -> ArraySortHelper<T>
Array.TrySZSort

Array.Sort will call TrySZSort so long as:
comparer != null && comparer != Comparer.Default

TrySZSort must be using native code and looking at the raw memory values
to avoid having to box it into an object.

From
http://discuss.develop.com/archives/wa.exe?A2=ind0009d&L=dotnet&T=0&F=P&S=&P=33045

An SZArray is just an array with only one dimension and the lower bound
is 0.
 
Chris said:
So now we have three code paths,

Array.Sort -> Array+SorterObjectArray
Array<T>.Sort -> ArraySortHelper<T>
Array.TrySZSort

There are seems to be 3 code paths in Array.Sort (non generic version):

1) TrySZSort - fastest, native code
2) Array+SorterObjectArray: slow .NET, but still not the worst case as
no GetValue/SetValue functions are used: only called if .NET can get
what appears to be pointer to that array of some kind, using the
following code: Object[] objKeys = keys as Object[];
3) Array+SorterGenericArray: last resort - slowest .NET version that
seems to be called in our case

It may well be that the reason it happens in ArrayList.Sort due to
non-Default comparer used which forces it to use slowest path, IMO, no
big reason for change of behavior to that present in .NET 1.1

All in all I am now getting better understanding how this sorting is
done, not sure now if I should submit a bug or not :-/

Alex
 
Chris said:
So now we have three code paths,

Array.Sort -> Array+SorterObjectArray
Array<T>.Sort -> ArraySortHelper<T>
Array.TrySZSort

There are seems to be 3 code paths in Array.Sort (non generic version):

1) TrySZSort - fastest, native code
2) Array+SorterObjectArray: slow .NET, but still not the worst case as
no GetValue/SetValue functions are used: only called if .NET can get
what appears to be pointer to that array of some kind, using the
following code: Object[] objKeys = keys as Object[];
3) Array+SorterGenericArray: last resort - slowest .NET version that
seems to be called in our case

It may well be that the reason it happens in ArrayList.Sort due to
non-Default comparer used which forces it to use slowest path, IMO, no
big reason for change of behavior to that present in .NET 1.1

All in all I am now getting better understanding how this sorting is
done, not sure now if I should submit a bug or not :-/

Alex

The trouble is implementing sorting whilst keeping it generic.
Unfortunatly the best sorting method often depends upon the data and how
sorted the data is already.

My only question is the choice of which value should be for the pivot.
This seems to be what slows it down the most with your original data.
Why don't both sorts use the same method of choosing a pivot? At least
your sort would then be consitantly slow :)
 
Chris said:
My only question is the choice of which value should be for the pivot.
This seems to be what slows it down the most with your original data.

But the issue does not appear to be the pivot - its that native call is
not made for some reason in ArrayList.Sort in .NET 2.0 (at least for
strings): perhaps that secret native sorting method is also smarter at
pivots, we don't really know (and I am not desperate enough to get into
disassembly).

Its tempting for me to leave it as is :-/

regards,

Alex
 
But the issue does not appear to be the pivot - its that native call is
not made for some reason in ArrayList.Sort in .NET 2.0 (at least for
strings): perhaps that secret native sorting method is also smarter at
pivots, we don't really know (and I am not desperate enough to get into
disassembly).

Its tempting for me to leave it as is :-/

regards,

Alex

That would probaly be because the string is not a primative object. I'd
guess that the TrySZSort is capable of sorting primatives such as
short/int/long/char/etc.
 
Chris said:
That would probaly be because the string is not a primative object. I'd
guess that the TrySZSort is capable of sorting primatives such as
short/int/long/char/etc.

TrySZSort seems to work on strings: that's why (I think) sorting is fast
when you use Array.Sort on string array, what I think happens is that
ArrayList.Sort in .NET will specify Comparer that is not Default, this
makes Array.Sort method to avoid using TrySZSort because it can't run
with custom comparers, thus forcing down the route of much slower native
..NET implementation, and it seems that its the slowest (Generic sorter)
that actually runs rather than 2nd slowest (Object sorter).

It also raises question as to why would TrySZSort be _that_ much faster,
sure optimal use of registers etc, but perhaps it has got some
algorithmic changes that make it faster than the code that we can see?

The issue is kind of still here: strings loaded into ArrayList will
appear to get very different sorting treatment in .NET 2.0 rather than
that in .NET 1.1. If that's the true reason then I wonder whether it was
totally necessary to specify non-default comparer in ArrayList. It still
feels like we have not reached total clarity as to what happens inside :-/

Alex
 
I am not sure where you are coming up with this ..
not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)

It seems it is being sorted by an unmanaged method called TrySZSort before
trying a managed sort.

Here is the relevant reflectorred code.

if ((length > 1) && (((comparer != Comparer.Default) && (comparer !=
null)) || !Array.TrySZSort(keys, items, index, (index + length) - 1)))
{
object[] objArray1 = keys as object[];
object[] objArray2 = null;
if (objArray1 != null)
{
objArray2 = items as object[];
}
if ((objArray1 != null) && ((items == null) || (objArray2 !=
null)))
{
Array.SorterObjectArray array1 = new
Array.SorterObjectArray(objArray1, objArray2, comparer);
array1.QuickSort(index, (index + length) - 1);
}
else
{
Array.SorterGenericArray array2 = new
Array.SorterGenericArray(keys, items, comparer);
array2.QuickSort(index, (index + length) - 1);
}
}


If the comparer is null or default it should be falling out to the unmanaged
code..

What is really interesting (and I included it with the test program I sent
in) is the call ...

oLines.Sort(0, oLines.Count - 1, null);

arraylist.sort calls .. Array.Sort(this._items, index, count, comparer);

Thus producing the exact same array.sort call I showed earlier which was
fast, yet it comes out as slow ...

Cheers,

Greg





Chris Chilvers said:
again, however this is not really relevant - more importantly if it
works fast in .NET 1.1 then it should not be much slower in .NET 2.0.

This seems to be a limitation to the quick sort algorithm in general.
Might be the old algorithm they were using worked slower on average with
other datasets? You're just unluck to now have a dataset that the new
version doesn't like. You could have had a dataset that sorted slow in
the 1.1 implementation and now sorts fast.

My only trouble with this, is if the new version is on average faster
why does the generic methods still sort using an old algorithm and
didn't use the new one?

If _all_ other .NET sorting methods (and its quicksort everywhere,
right?) were producing the same slow result on the same data, then I
could have accepted it, however there is really no justification for
such poor performance that appears to happen only if you specify
non-null comparer: this seems to lead to another code path (rightfully
so, since it allows to avoid check for custom comparer inside loop), but
that codepath is not as optimal as the other one.

In effect I assert here that the following code will use different code
paths and one of them is broken:

1) Array.Sort(Items,0,Items.Length,null);

// specify any comparer even your own and performance is much slower
// due to more comparisons
2) Array.Sort(Items,0,Items.Length,Comparer.Default);

not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)
which in turn calls
Array<string>.Sort(items, 0, items.Length-1, Comparer<string>.Default)

Using Comparer.Default stops the call to Array.Sort(string[] items) from
being changed to the generic Array<string>.Sort(string[] items)

Hence why it is using a different code path as Array<string>.Sort and
Array.Sort use different quick sort algorithims.

Array.Sort uses Array+SortObjectHelper to sort the list
Array<T>.Sort uses ArraySortHelper to sort the list

The performance difference between the two lies in their choice of pivot
value.

ArraySortHelper just takes, (left + ((right - left) / 2), thus always
the middle value.

Array+SortObjectHelper compares the left value, middle value and right
value to decide which to use (median of three?).
I see no reason why mere presense of custom comparer (so long as it
compares in the same way) should make any difference on sorting.

It isn't directly, its how the compiler deals with generic types and
automagicaly making a call to Array<T> when it recognises the
parameters.

I know that source code of .NET 1.1 is available and can be checked, but
can the same be done for .NET 2.0, anyone got any pointers?

Reflector can look at both .NET 1.1 and .NET 2.0
 
Hey guys I have already submitted a bug report based upon this ..

the exact behavior I submitted is what you are getting at now ...

Array.Sort(array,0,length, null) //fast
Array.Sort(array,0,length,DefaultComparer)//slow
ArrayList.Sort(0,length,null) //should generate same call as first (fast
generates slow call)

Cheers,

Greg
 
Back
Top