Ameliorations in the Collections framework in .NET 2

  • Thread starter Thread starter Arnaud Debaene
  • Start date Start date
A

Arnaud Debaene

Hello all,

One area where the .Net framework 1.1 is really poor is Collections : We've
got very few choice concerning the containers and their characteristics (for
example, ArrayList, Queue and Stack are the only "non-dictionary" avalaible
containers, and only Stack has some complexity specifications).

The worst thing however is the IEnumerator interface, which doesn't allow
for insertion, deletion or even *modification* of an item during
enumeration.
This makes IEnumerator (and therefore foreach in C#) almost unusable in most
cases, which may be why most programmers still use "for" loops, with the
obvious danger of index "off by one" error, etc...
The only workaround I found on the net is
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncscol/html/csharp01212002.asp,
but this is mostly unacceptable (do a copy of the whole collection - even a
shallow copy - for each enumeration! By Jove!!! How awfull!)


..NET 2 is already a great step forward with strongly typed generics
collections and the addition of a *few* more containers (LinkedList,
SortedDictionnary, and... that's all!). However, it seems there is still no
formal compexity specifications for all operations on the containers...
More important, the IEnumerator is still limited to reading the container :
The Generic.IEnumerator doc states that modifying a container during
enumeration is "undefined behaviour" (gee, they begin to borrow the C++
standardese language!), but for all the *provided* containers, modifying the
container during enumeration is invalid. For example, the
LinkedList.Enumerator's doc states :
"An enumerator remains valid as long as the collection remains unchanged. If
changes are made to the collection, such as adding, modifying, or deleting
elements, the enumerator is irrecoverably invalidated and its behavior is
undefined."
Heeks! What's the advantage of a doubly linked list over an array if you
cannot insert or delete items in the middle of the container at will ??

Compared with the STL, where each container defines when and how iterators
are invalidated, the IEnumerator model is extremly poor and almost unusable
as soon as you do non-trivial manipulations on the containers. Does anyone
know wether some ameliorations will be provided in this area, and if so,
when ? (I do not think we will see anything new in Visual 2005 Gold, but
perhaps after...). Short of that, I am pondering about implementing my own
collections framework, more inspired by the STL or other libraries; However,
only Microsoft can provide the "top-level", much usefull, facilites such as
foreach...

Arnaud
MVP - VC
 
Arnaud said:
The worst thing however is the IEnumerator interface, which doesn't allow
for insertion, deletion or even modification of an item during enumeration.
This makes IEnumerator (and therefore foreach in C#) almost unusable in
most cases, which may be why most programmers still use "for" loops, with
the obvious danger of index "off by one" error, etc...

I'm with you in your assessment of the problem - I would very much like
these restrictions to be lifted. Just wanted to quickly comment on '...
which may be why most programmers still use "for" loops ...' - I estimate
I use foreach about 95% of the time, if not more often, and this is the
same in most of the .NET code I've seen written by other people. The loops
I write that don't need to make any changes to the list's content
outnumber the ones that do by far, so I definitely find the foreach
statement extremely useful in its current form - not that it couldn't be
even more useful, but that would also raise the level of complexity when
implementing collections or custom iterators.


Oliver Sturm
 
I wouldn't expect much as this is likely "by design". The last time I saw
something about that the explanation was that updating the collection you
are enumerating is like "cutting the branch on which you are sit"...

I didn't take the time to dive into gory details but for example if you
remove the current node from a linked list, it's seem quite logical to
update it's next/previous pointer and the enumerator doesn't have any more
the information allowing to jump to the next element. You'll likely find
those details by Googling...
 
Patrice said:
I wouldn't expect much as this is likely "by design". The last time I saw
something about that the explanation was that updating the collection you
are enumerating is like "cutting the branch on which you are sit"...

I didn't take the time to dive into gory details but for example if you
remove the current node from a linked list, it's seem quite logical to
update it's next/previous pointer and the enumerator doesn't have any more
the information allowing to jump to the next element. You'll likely find
those details by Googling...

The answer is quite simple : when deleting an item from a linked list,
only those iterators/enumerators pointing to the deleted node are
invalidated. The "delete" method can return an iterator/enumerator
pointing to the next element remaining after the deletetiion. This is
what std::list::erase does in C++.

Arnaud
MVP - VC
 
Arnaud,

Invalidation of the enumerator is a non issue for me most of the time
since I rarely need to modify a collection during enumeration. It's
more likely for me that I fall back on a normal for loop when I need to
know the position of the item in the enumeration. Of course, you
cannot use a normal for loop on IDictionary or anything other than
IList or an array so we're left with declaring and incrementing our own
loop counters manually in that case.

The SortedList quicksorts the entire collection on every insert and
delete operation making it frightfully slow. I've asked this before
and I'll ask it again. Why, oh why, did they not provide a red black
tree implementation?

Where's the priority queue?

Has anyone found the ListDictionary or HybridDictionary useful? I've
never used either one. Yeah, I know why they're there, it's just that
a normal Hashtable has proven satisfactory in all cases for me.

Brian
 
Brian said:
Arnaud,

Invalidation of the enumerator is a non issue for me most of the time
since I rarely need to modify a collection during enumeration.
Don't you ever do operations of the kind "delete from this container
all items that are equals to said:
It's
more likely for me that I fall back on a normal for loop when I need to
know the position of the item in the enumeration. Of course, you
cannot use a normal for loop on IDictionary or anything other than
IList or an array so we're left with declaring and incrementing our own
loop counters manually in that case.
Yep, and we all know that manual loop counters are dangerous and a
source of infinite trouble, especially if we add or remove items from
the container during enumeration.

The SortedList quicksorts the entire collection on every insert and
delete operation making it frightfully slow. I've asked this before
and I'll ask it again. Why, oh why, did they not provide a red black
tree implementation?
You've got it in .NET 2 with SortedDictionnary, but nothing for "non
dictionnary" containers:-( What really miss here is the equivalent of
std::set and std::multi_set in C++...

Where's the priority queue?

Has anyone found the ListDictionary or HybridDictionary useful? I've
never used either one. Yeah, I know why they're there, it's just that
a normal Hashtable has proven satisfactory in all cases for me.

For small data sets, I would say that list dictionnary is faster than
Hadhtable (no need to calculate hashs....)

Arnaud
MVP - VC
 
Don't you ever do operations of the kind "delete from this container
all items that are equals to <X> or that satisfy <Y> predicate" ?

Sure, but for me it just doesn't happen very much. I do see where
you're coming from though.
Yep, and we all know that manual loop counters are dangerous and a
source of infinite trouble, especially if we add or remove items from
the container during enumeration.

Also, it means the counter must be scoped outside of the foreach
construct. It seems clunky when I only intend for it to be used inside
the loop.
You've got it in .NET 2 with SortedDictionnary, but nothing for "non
dictionnary" containers:-( What really miss here is the equivalent of
std::set and std::multi_set in C++...

That's definitely a nice addition.
For small data sets, I would say that list dictionnary is faster than
Hadhtable (no need to calculate hashs....)

I'm sure it is. The documentation even says so. It's just that the
performance gain, which appears be to marginal at best, isn't a very
convincing reason in most cases. If the data set grew unexpectedly
then the performance loss would be painful if using a ListDictionary.

The HybridDictionary is a little more understandable. But for me, I've
never had a situation where I can proudly assert that performance is of
the utmost importance, but concede that the data set size, though small
in most cases, is in fact unbounded.
 
Arnaud said:
The worst thing however is the IEnumerator interface, which doesn't
allow for insertion, deletion or even *modification* of an item during
enumeration.

Well, I don't see that as a problem. Look at the name of the interface -
it is for *enumerating*. This is interface programming, each interface
has a behaviour and in the case oif IEnumerable the behaviour is clear -
it gives a serial access to the data in the collection.
which may be why most programmers still use "for"
loops, with the obvious danger of index "off by one" error, etc...
The only workaround I found on the net is
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncscol/html/csharp01212002.asp,
but this is mostly unacceptable (do a copy of the whole collection -
even a shallow copy - for each enumeration! By Jove!!! How awfull!)

If you want to change the collection why would you want to use an
interface that enumerates objects in the collection? I understand the
issue you are trying to raise, but the problem is that you are trying to
use the interface to do something that it is specifically not supposed
to do.

Richard
 
Richard said:
Well, I don't see that as a problem. Look at the name of the
interface - it is for *enumerating*. This is interface programming, each
interface
has a behaviour and in the case oif IEnumerable the behaviour is
clear - it gives a serial access to the data in the collection.

I know this is the interface specification.... and what I said is that this
specification is mostly inadequate for no trivial work on a container. But
this is not a problem in itself : The problem is that there is *nothing
else* available ;-)

If you want to change the collection why would you want to use an
interface that enumerates objects in the collection? I understand the
issue you are trying to raise, but the problem is that you are trying
to use the interface to do something that it is specifically not supposed
My point is that there is *no* easy way to modify a container during
enumeration, and that there should be one, because it is quite a common
requirement, and it is quite easy to get it wrong when doing it "by hand"
over and over again. I am not talking about a bug here, but about a lack in
the framework API...

Arnaud
MVP - VC
 
Back
Top