Jay,
The maximum size of a typical queue for this application ranges from 0 to
60000 with most of the queues (note that there are several) running from
4000 - 6000, but I have absolutely no guarantees of this as the source
queue max sizes aren't under my control. I also have a time constraint on
the input portion so the inputs must be added as quickly as possible.
Combine this with the fact that I load the queues about 10 times faster
than they can be emptied and I was seeing long pauses (on the order of
several minutes) while the internal arrays were being reallocated and the
GC was running with both Queue<T> and List<T>. Creating a queue or list
with an initial size caused the application to stall when a queue or list
was first being created.
It turns out that emulating a queue using LinkedList<T> was very quick and
the O(1) insertion to the end and removal from the front made a huge
difference in the application performance. I did notice last night that I
had a ~25 minute pause for a full GC pass, which by the way dropped the
memory footprint of this application from 90Mb to 20Mb. This GC pass
occurred well after the queues had been loaded so the time constraint on
the input portion of my application was no longer applicable.
By the way, by rewriting to use multiple queues and threads, I took the
runtime of this application, which is a one-way DFS between two Samba
servers using a Windows Server as the controller, from over 24 hours
(attempting to read input the entire time) to about 7 hours (with only the
first 90 minutes reading input). I expect further performance gains now
that I finally have a complete pass and I don't have to copy gigabytes of
files over a T1.
Mike.
Jay B. Harlow said:
Michael,
If you know the maximum/typical size of the queue you can use its
constructor that sets the initial capacity:
http://msdn2.microsoft.com/en-us/library/35bz1ab7.aspx
I would recommend setting the initial capacity first no matter which data
type you use Queue(Of T) or List(Of T). Of course LinkedList(Of T)
doesn't have a capacity ;-)
I would also try Queue(Of T) with varying capacities before reverting to
an alternate data type.
--
Hope this helps
Jay B. Harlow
.NET Application Architect, Enthusiast, & Evangelist
T.S. Bradley -
http://www.tsbradley.net
Michael D. Ober said:
Actually, I ended up switching to LinkedList<T> to emulate the
Enqueue/Dequeue methods.. My application generates queues of a few
items to thousands of items. It appears that the slight overhead of
Linked Lists is worth the benefit of never reallocating an array.
Mike.
Thanks - I'll switch to List<T>. My application needs the quicker
reallocation.
Mike.
"Carl Daniel [VC++ MVP]"
"Carl Daniel [VC++ MVP]"
Michael D. Ober wrote:
When calling Enqueue, the internal array may need to be reallocated.
My question is by how much? In the old MFC array classes, you could
tell MFC how many additional elements to add to the array when it
needed to reallocate, which greatly boosted performance relative to
adding 1 element at a time.
Unfortunately, the .NET documentation fails to make any mention of
the reallocation strategy for any of the array-based generic
containers. I'd suggest doing some experimentation to try to measure
the insert performance as the queue gets larger, or use .NET
Reflector to examine the IL behind the queue class to determine how
it grows.
Hopefully, it grows exponentially (by a factor of 1.5 - 2) because
that's been shown to give the best overall performance for a
dynamically sized array.
Taking a quick look with reflector, it's clear that Queue<T> grows
linearly - whenever the internal array is full, it's increased in size
by 4 elements. That gives O(n^2) performance on average if nothing is
ever removed from the queue. Since you typically wouldn't expect a
queue to have a large number of elements, linear growth is probably a
reasonable choice for most applications.
List<T>, on the other hand, uses exponential growth (factor of 2), so
it will have amortized constant insert time as it grows.
-cd