Inconsistent derivation for generic collections

  • Thread starter Thread starter Michi Henning
  • Start date Start date
M

Michi Henning

Looking at List<T> and LinkedList<T>, we have:

public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>,
IList, ICollection, IEnumerable
public class LinkedList<T> : ICollection<T>, IEnumerable<T>,
ICollection,
IEnumerable, ISerializable,
IDeserializationCallback

Interesting that LinkedList<T> derives from ISerializable and
IDeserializationCallback, but List<T> does not. Why the inconsistency?

But, more interstingly, both LinkedList<T> and List<T> derived from
ICollection<T> and from ICollection.

Now compare that to Queue<T> and Stack<T>:

public class Queue<T> : IEnumerable<T>, ICollection, IEnumerable
public class Stack<T> : IEnumerable<T>, ICollection, IEnumerable

These derive from ICollection, but *not* from ICollection<T>.

So, I can pass a List<T> or a LinkedList<T> as ICollection<T>, but I
*cannot* pass Queue<T> or Stack<T> as ICollection<T>.

I can pass all four collection concrete types as ICollection, but that
has other drawbacks:

- It incurs the cost of boxing and unboxing.

- ICollection<T> has an Add() method, but ICollection does *not* have
an Add() method. So, if I pass collections as ICollection, I cannot
add to the collections in the callee.

Why these gratuitous differences?

Cheers,

Michi.
 
Interesting that LinkedList<T> derives from ISerializable and
IDeserializationCallback, but List<T> does not. Why the inconsistency?

Implementing ISerializable is just a way to implement your own
serialization logic. Why that's required should probably be seen as an
implementation detail. List<T> is still Serializable, so for you as a
consumer of the class it shouldn't matter.

- ICollection<T> has an Add() method, but ICollection does *not* have
an Add() method. So, if I pass collections as ICollection, I cannot
add to the collections in the callee.

Why these gratuitous differences?

See

http://blogs.msdn.com/kcwalina/archive/2005/09/23/Collections.aspx
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=110995


Mattias
 
Michi said:
[...]
Now compare that to Queue<T> and Stack<T>:

public class Queue<T> : IEnumerable<T>, ICollection, IEnumerable
public class Stack<T> : IEnumerable<T>, ICollection, IEnumerable

These derive from ICollection, but *not* from ICollection<T>.

So, I can pass a List<T> or a LinkedList<T> as ICollection<T>, but I
*cannot* pass Queue<T> or Stack<T> as ICollection<T>.

I can pass all four collection concrete types as ICollection, but that
has other drawbacks:

- It incurs the cost of boxing and unboxing.

In what way do you mean?
- ICollection<T> has an Add() method, but ICollection does *not* have
an Add() method. So, if I pass collections as ICollection, I cannot
add to the collections in the callee.

Why these gratuitous differences?

IMHO, you should leave the word "gratuitous" out of your posts,
especially when asking this sort of question. You are asking the
question, which means you obviously don't know the reason the
differences exist, which means you have no knowledge that would allow
you to accuse the differences of being "gratuitous".

That sort of phrasing simply colors your posts as prejudiced and signals
to others that they may not have much luck trying to answer your
questions, because you've already apparently made up your mind. That,
of course, discourages people from replying at all.

Now, that said...

In this particular case, there are very specific differences between
ICollection and ICollection<>. You've even noted one, but the more
general observation is that an ICollection<> is read/write while an
ICollection is read-only.

Because a Queue<> and a Stack<> have a very specific structure and it is
a design error (*) to modify those structures without going through
their defined accessors (enqueue/dequeue, push/pop), those classes don't
implement the ICollection<> interface. Since ICollection is read-only,
it's safe for them to implement that interface.

Notes:

(*) It's my opinion that modifying a queue or stack outside the intended
accessors is a design error. Others may disagree (in particular, I know
that Jon Skeet wrote a queue-like class that he calls a queue but which
allows for rearranging of the members according to priority; while we
don't disagree on the utility of such a class, we disagree on what you
should call it :) ). However, it seems to me that the designers of the
..NET classes would agree with me, given the implementation of the classes.

2) IMHO, where you write that a class "inherits" an interface, it would
be more correct to write that the class "implements" the interface. To
me, "inherits" implies that there's some implementation that is actually
inherited. I grant that this implication isn't 100% (see abstract
classes for example), but as a general rule inheritance is about getting
already-implemented behavior, while interfaces are about promising to
implement specific behavior.

Pete
 
On serialization, you might find this post useful. (Please put any
follow-up comments on the MSDN forum - I don't read newsgroups.)

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2067909&SiteID=1&mode=1

I think a review of serialization would help a bit. Let's look at
these types a little more closely. Here's my view of these types:

[DebuggerTypeProxy(typeof(Mscorlib_CollectionDebugView<>))]
[DebuggerDisplay("Count = {Count}")]
[Serializable()] <-------------------------
public class List<T> : IList<T>, System.Collections.IList

[Serializable()] <-------------------------
[System.Runtime.InteropServices.ComVisible(false)]
[DebuggerTypeProxy(typeof(System_CollectionDebugView<>))]
[DebuggerDisplay("Count = {Count}")]
public class LinkedList<T>: ICollection<T>,
System.Collections.ICollection, ISerializable,
IDeserializationCallback {

Remember that serializability of a type, at least for the CLR's
BinaryFormatter & SoapFormatter, is defined by the presence of the
SerializableAttribute on the type. ISerializable is more of an
implementation detail of the type, meaning that the type either has
need of some special versioning constraints around compatibility with
previous versions of its serialized format on disk (or over the
network), or that it may explicitly need some additional code to run
on deserialization to restore internal invariants. Sometimes these
needs can be met by marking fields as optionally serializable, but
ISerializable's more general methods were necessary.

IDeserializationCallback takes this a step further, saying that
deserialization of this type is so special, we must explicitly run
some code after the rest of the object graph has been deserialized.
As an example of where this is necessary, consider a hash table that
must call GetHashCode on all keys within the hash table. Note that we
can't persist out the return value of GetHashCode, since hash
functions are likely to be improved from one version of a library to
another. So we must call GetHashCode on the key sometime during
deserialization, but after the individual key has been deserialized.
If the hash table is deserialized before the keys are deserialized,
then calling GetHashCode may return the wrong value.
IDeserializationCallback allows you a chance of working around simple
ordering problems, by providing a post-deserialization callback.

The only confusing thing about ISerializable and
IDeserializationCallback are that if a base type implements these
interfaces, it suggests that derived types should be serializable (ie,
they should have the SerializableAttribute on them), may need to
implement a deserialization constructor, and if they override these
methods, they may need to call the base type's implementation in the
appropriate places. The most common place where this shows up in our
library is Exception - very few developers remember the
deserialization constructor on their own exceptions (which only shows
up as a problem once you start using multiple appdomains, remoting, or
the CLR's new add-in model). But the slightly fishy serialization
design issue is that the implementation details of the type tell you
whether special serialization semantics are required, and this leaks
through in the public interface in terms of these marker interfaces &
requirements on derived types. Given that serialization of a type
isn't something to be done lightly for any type with non-trivial
internal state, this requirement on subclasses is well within reason.

Back to our collections - clearly, List<T> and LinkedList<T> are
serializable. The original poster's question might then be, how would
these type's serialization needs be different? The issue comes down
to only implementation details. For List<T>, it's basically a T[], a
count, and some other uninteresting state around enumeration.
Assuming that T is serializable (and all subclasses of T that are
stored in the List are also serializable), then List<T> can be
serialized & deserialized safely.

So what about LinkedList<T>? Unlike our hash tables, it doesn't
require hash functions with may vary with the CLR or third party
library version changes. So why do anything special here? The answer
is performance - we don't want to serialize out a list of
LinkedListNodes. We instead write out just the interesting data. Now
let's turn a critical eye to this class. Do we need the
OnDeserialization method? Probably not in this version. Does writing
out just the values instead of the complete object graph break
scenarios when a consumer of the list held references to individual
LinkedListNodes? Yes, this does indeed break the notion of
serializing out an entire object graph then being able to recover all
of it faithfully, potentially on another machine with a different
version of the CLR. But our collection designers apparently felt the
performance of serializing out linked lists efficiently outweighed the
one scenario we've (perhaps accidentally) disabled.

I hope this helps you to understand serialization more clearly.

Brian Grunkemeyer
CLR Base Class Library Team
 
IMHO, you should leave the word "gratuitous" out of your posts,
especially when asking this sort of question. You are asking the
question, which means you obviously don't know the reason the
differences exist, which means you have no knowledge that would allow
you to accuse the differences of being "gratuitous".

You are right, and I apologize.
Because a Queue<> and a Stack<> have a very specific structure and it is
a design error (*) to modify those structures without going through
their defined accessors (enqueue/dequeue, push/pop), those classes don't
implement the ICollection<> interface. Since ICollection is read-only,
it's safe for them to implement that interface.

I hear your reasoning. But what would be wrong to have Add() simply
enqueue on a queue, and push on a stack? I mean, after all, LinkedList
supports ICollection<T> and therefore supports Add(). For LinkedList,
Add() appens to the list, but it could equally well prepend, at the
same run-time cost.

So, if Add() is deemed inappropriate for queues and stacks, I think I
could mount an argument that Add() is equally inappropriate for a
LinkedList.

So, I don't think this is a particularly strong argument.

And the cost of it is rather serious, because I cannot pass List,
LinkedList, Stack, and Queue generically other than as an ICollection,
but that incurs quite a performance hit for all four of them.
2) IMHO, where you write that a class "inherits" an interface, it would
be more correct to write that the class "implements" the interface.

Fine, I'm happy to call it that. I guess I could have also said that
List derives from ICollection--the exact terminology doesn't change
the gist of the problem.

Cheers,

Michi.
 
Michi said:
I hear your reasoning. But what would be wrong to have Add() simply
enqueue on a queue, and push on a stack?

I can't say that if you're just considering Add(), that would
necessarily break the paradigm. It's slightly ambiguous, but as you
suggest is unlikely to be a huge problem.

I mean, after all, LinkedList
supports ICollection<T> and therefore supports Add(). For LinkedList,
Add() appens to the list, but it could equally well prepend, at the
same run-time cost.

I don't see how LinkedList<> is a significantly different example from
List<>. List<> also has an InsertAt() that would allow prepending. But
both List<> and LinkedList<> have a specific Add() method, and that
always adds at the end. It's not a stretch for someone to assume that
ICollection said:
So, if Add() is deemed inappropriate for queues and stacks, I think I
could mount an argument that Add() is equally inappropriate for a
LinkedList.

IMHO it's not really Add() that is the big problem.
So, I don't think this is a particularly strong argument.

You're welcome to your opinion. And I suppose that you may still find
the difference gratuitous after the explanations given. Personally, I
think that the differences are significant enough to justify the
difference in which interfaces are supported.
And the cost of it is rather serious, because I cannot pass List,
LinkedList, Stack, and Queue generically other than as an ICollection,
but that incurs quite a performance hit for all four of them.

I'm still not clear on what performance issues you're talking about. In
what way does using ICollection instead of ICollection<> incur "quite a
performance hit"?

I'm also not clear on what situation in which it would make sense to
require having a single method that can handle all four types of
collections. I'm especially suspicious of a method that treats a list
(List<> or LinkedList<>) the same as a Queue<> or Stack<> (primarily for
the same semantic reasons I've discussed already).

IMHO, you would probably have more success discussing the specific
scenario that you're trying to address. The fact is, the .NET
collection interfaces are what they are; that's not going to change. So
it's probably more productive to focus on whatever specific issue you're
trying to solve. There may actually be a reasonably appropriate
alternative to what you're trying to do now. But it's hard to say
without the specific scenario being addressed.
Fine, I'm happy to call it that. I guess I could have also said that
List derives from ICollection--the exact terminology doesn't change
the gist of the problem.

IMHO "derives from" is the same as "inherits". But no, you're
right...the exact terminology doesn't change the problem. It could
affect how well you're understood, but so far I don't think that's been
a problem either.

Pete
 
I had a look, thanks. However, that article doesn't explain why
Stack<T>
and Queue<T> don't derive from ICollection<T>, as far as I can see.

I thought Anthony Moore's comment from the second link made it pretty
clear

"This apparent incongruity was intentional and is the result of a
difference between ICollection and ICollection<T>. ICollection was
essentially just IEnumerable with a count, and as such did not add
much value to IEnumerable. The heirarchy was re-designed for the
generic namespace, so ICollection<T> represents a mutable collection
with Add and Remove methods. As such Queue does not comply with this
extended interface because it does not support unconstrained addition
and removal."


Mattias
 
I'm still not clear on what performance issues you're talking about. In
what way does using ICollection instead of ICollection<> incur "quite a
performance hit"?

ICollection incurs the cost of boxing and unboxing, whereas
I'm also not clear on what situation in which it would make sense to
require having a single method that can handle all four types of
collections. I'm especially suspicious of a method that treats a list
(List<> or LinkedList<>) the same as a Queue<> or Stack<> (primarily for
the same semantic reasons I've discussed already).

The reason I'm raising this question is that I'm unmarshaling
sequences off the wire, as part of ZeroC's Ice middleware
(www.zeroc.com). What I would like to do is allow the user to
determine the concrete collection type into which the data that
arrives over the wire will be unmarshaled. So the user gets a
LinkedList, a List, a Queue, or Stack, depending on their preference.
As part of the unmarshaling code, I have something like the following
method:

void fillWithData(ICollection<T> c)

The caller passes an empty collection, and the implemetnation of this
method reads the data off the wire and calls Add() to put the data
into the collection.

Unfortunately, I can do this only for List and LinkedList, but not for
Stack and Queue.

Alternatively, I can write the method as:

void fillWithData(ICollection c)

That works for all four collection types but, unfortunately,
ICollection don't have an Add() method, so I can't fill the sequence.

So, I'm forced to write four overloads:

void fillWithData(List<T> c)
void fillWithData(LinkedList<T> c)
void fillWithData(Queue<T> c)
void fillWithData(Stack<T> c)

Now I can do what I want, but only at substantially increased effort.

Yet another alternative is to go back to the previous signature:

void fillWithData(ICollection c)

and use downcasts or a type test to see what I'm dealing with. But the
code still is different for each collection because the four
collection types use different operation names for what, conceptually,
is the same operation: Add() for List<T>, AddLast() for LinkedList<T>,
Enqueue() for Queue<T>, and Push() for Stack<T>.

So, no matter how hard I try, I can't treat these collections
generically.

What I'm really looking for is something like C++ STL's push_back()
function, which can add elements to any collection, no matter what its
type.

Cheers,

Michi.
 
I thought Anthony Moore's comment from the second link made it pretty
clear

"This apparent incongruity was intentional and is the result of a
difference between ICollection and ICollection<T>. ICollection was
essentially just IEnumerable with a count, and as such did not add
much value to IEnumerable. The heirarchy was re-designed for the
generic namespace, so ICollection<T> represents a mutable collection
with Add and Remove methods. As such Queue does not comply with this
extended interface because it does not support unconstrained addition
and removal."

Hmmm... Queue<T> certainly coold support Add() and Remove(). For
Stack<T>, Remove() is problematic. It would probably have been better
to have a Remove() that is the inverse off Add(). In other words, if
Add() adds an element "to the end" of the collection, Remove() should
remove the element "at the end" of the collection. This would make
Add() and Remove() suitable for any collection, and make them
analogous to C++ STL push_back and pop_back() functions.

This makes me wonder about the design of ICollection<T> in general
though. I mean, what's the point of having a Remove() operation that
removes a specified element when the run-time cost is O(n) for many
collection types? This strikes me as singularly inappropriate for a
generic collection interface. In fact, efficient Remove() is
achievable only for hash tables and trees, as far as I can see. It
certainly won't be efficient for arrays, linked lists, queues, deques,
or priority queues.

A bit of a mis-design here, maybe?

What seems to be a big problem in the .NET collection classes is that
there is no concept of separating container, algorithm, and iterator.
For example, with a hash table or sorted hash table, I cannot remove
an entry *and* get at the value that was removed. Instead, I have to
first do a lookup. and then do a remove. With an iterator concept, the
lookup could return an iterator that I can pass to the remove as hint,
so the implementation can efficiently locate.

But I digress--the lack of a generic Add() operation for all the
collection types still strikes me as strange. It cetinaly makes it
quite a bit harder to work with collections generically.

Cheers,

Michi.
 
Michi said:
ICollection incurs the cost of boxing and unboxing, whereas
ICollection<T> does not.

Sorry, I'm still confused. As near as I can tell (from the rest of your
reply) you can't even use ICollection, since it's a read-only interface.
If I iterate over a collection of ints (or
other value types), that's significant.

Your example was of adding data to the collection, which ICollection
doesn't support. Are you actually also using the enumerator for the
collection to iterate through the elements? And if so, why not just use
the IEnumerable<> generic interface there? It's implemented by all four
generic classes you're talking about.

Assuming you have a very good reason for using ICollection instead of
IEnumerable<> I can see how the boxing would come into play there, but
it didn't seem to me that you were doing that.

In any case, you should measure the cost of boxing before you worry
about it. In particular, I suspect that unless _all_ you're doing is
copying data around, the boxing is a negligible part of the algorithm.
And note that by "copying data around" I don't mean across a network or
to/from a disk. Those activities have a significant i/o overhead that I
would expect to be at least an order of magnitude greater than the cost
of copying boxed data within memory and so in those cases the boxing
shouldn't be an issue either.

Have you actually measured the performance difference? Does using
ICollection for iterating really cause a significant slowdown in your
algorithm?

[...]
So, I'm forced to write four overloads:

void fillWithData(List<T> c)
void fillWithData(LinkedList<T> c)
void fillWithData(Queue<T> c)
void fillWithData(Stack<T> c)

Now I can do what I want, but only at substantially increased effort.

I'm not really convinced that the increased effort is substantial. But
I guess that's in the eye of the beholder. At the very least, you only
need three overloads, because you can handle List<> and LinkedList<>
with an ICollection<>. Just because you need the explicit overload for
Stack<> and Queue<>, that's not to say you need to use it for everything
else.

So there's a 25% reduction in your substantially increased effort right
there. No need to thank me. :)

Personally, I don't see how writing separate overloads is all that much
extra effort. Any code that is the same between the overloads you
should be able to share in a common method somewhere, and of course the
code that the framework requires to be different will have to be
different, but that should amount to one line (the line that actually
adds an item to the collection).

Heck, for that matter, the implementation of the Stack<> and Queue<>
versions could easily just create a temporary List<> instance, use the
base ICollection<> implementation you use for actual List<> and
LinkedList<> instances, and then copy the results into the Stack<> or
Queue<> instance using the push/enqueue methods. Yes, there's a
doubling of effort because of the extra copy but, again, assuming this
data is coming from somewhere (like a disk or network) that overhead
should be tiny compared to the bandwidth of the i/o source.

If that overhead is unacceptable to you, see above for an alternative
that doesn't have that overhead.

But, assuming that the effort seems too substantial, you always have the
option of implementing your own generic queue and stack classes that do
support ICollection<>. Just because the built-in classes don't do what
you want, that doesn't mean you're stuck with them.

Pete
 
Sorry, I'm still confused. As near as I can tell (from the rest of your
reply) you can't even use ICollection, since it's a read-only interface.

The example I quoted is about unmarshaling. However, I also have the
opposite, where I need to marshal the containers and serialize them
onto the wire. In this case, I have to iterate over the containers to
pull out the elements.
Your example was of adding data to the collection, which ICollection
doesn't support. Are you actually also using the enumerator for the
collection to iterate through the elements?
Yes.

And if so, why not just use
the IEnumerable<> generic interface there? It's implemented by all four
generic classes you're talking about.

IEnumerable<T> works for getting at the elements for all four
containers, yes. However, when I marshal a collection onto the wire, I
have to write the number of elements, followed by the actual elements.
If I pass the collections as IEnumreable<T>, I can no longer get at
the count of elements, because that count is part of ICollection<T>,
not IEnumerable said:
In any case, you should measure the cost of boxing before you worry
about it. In particular, I suspect that unless _all_ you're doing is
copying data around, the boxing is a negligible part of the algorithm.
And note that by "copying data around" I don't mean across a network or
to/from a disk. Those activities have a significant i/o overhead that I
would expect to be at least an order of magnitude greater than the cost
of copying boxed data within memory and so in those cases the boxing
shouldn't be an issue either.

Your expectation is incorrect. I have measured this by marshaling in
one process, sending the data via TCP/IP to another process, and
unmarshaling it there. Boxing makes transmission and receipt of
sequences of 50,000 ints slower by a factor of 5.5.
I'm not really convinced that the increased effort is substantial.

I don't need to do this only for sequences of int, but also for
sequences of bool, byte, string, etc. I have ten types overall. So,
instead of having to write ten methods, I have to write forty. I
believe that writing forty methods instead of ten is substantial extra
effort, even if the method implementations are trivial.
I guess that's in the eye of the beholder. At the very least, you only
need three overloads, because you can handle List<> and LinkedList<>
with an ICollection<>. Just because you need the explicit overload for
Stack<> and Queue<>, that's not to say you need to use it for everything
else.

So there's a 25% reduction in your substantially increased effort right
there. No need to thank me. :)

There are other reasons why I need to distinguish between List<> and
LinkedList<>, so I still need the overload.

But that's beside the point: the core problem is that we have four
containers in the framework, but two of those containers cannot be
filled generically. I see no good reason for this limitation.
Heck, for that matter, the implementation of the Stack<> and Queue<>
versions could easily just create a temporary List<> instance, use the
base ICollection<> implementation you use for actual List<> and
LinkedList<> instances, and then copy the results into the Stack<> or
Queue<> instance using the push/enqueue methods. Yes, there's a
doubling of effort because of the extra copy but, again, assuming this
data is coming from somewhere (like a disk or network) that overhead
should be tiny compared to the bandwidth of the i/o source.

Again, your assumption is incorrect. I have tried this and benchmarked
it, and it is slower by something like a factor of two, depending on
the element type. The code I'm writing is performance critical, so
this matters.
If that overhead is unacceptable to you, see above for an alternative
that doesn't have that overhead.

I'm not sure what alternative you are referring to.
But, assuming that the effort seems too substantial, you always have the
option of implementing your own generic queue and stack classes that do
support ICollection<>. Just because the built-in classes don't do what
you want, that doesn't mean you're stuck with them.

I don't see the sense of that argument. There are queue and stack
implementations in the framework. They should be usable, but are not.
Writing my own is no good because the code I'm talking about is part
of a library that application code links with. It is that application
code that wants to use stacks and queues and pass them to other
libraries, that expect them to be the .NET versions, not some
homegrown version. If I do what you suggest, the application would
have to copy between the homegrown versions and the .NET versions as
it passes these containers around. Doing this is unacceptable, both
because it is messy in the code because it incurs an unacceptable
performance penalty.

Cheers,

Michi.
 
Michi said:
[...]
And if so, why not just use
the IEnumerable<> generic interface there? It's implemented by all four
generic classes you're talking about.

IEnumerable<T> works for getting at the elements for all four
containers, yes. However, when I marshal a collection onto the wire, I
have to write the number of elements, followed by the actual elements.
If I pass the collections as IEnumreable<T>, I can no longer get at
the count of elements, because that count is part of ICollection<T>,
not IEnumerable<T>.

Surely the caller could just pass the count as a parameter. I mean, you
also could obviously enumerate the collection twice, but I don't think
that should be necessary.
Your expectation is incorrect. I have measured this by marshaling in
one process, sending the data via TCP/IP to another process, and
unmarshaling it there. Boxing makes transmission and receipt of
sequences of 50,000 ints slower by a factor of 5.5.

Could you post a concise-but-complete example of code that demonstrates
this difference?

That difference is, obviously, far larger than I would have expected.
It's large enough for me to wonder if there's not something else going
on. But it's hard to comment without seeing the code.
I don't need to do this only for sequences of int, but also for
sequences of bool, byte, string, etc. I have ten types overall. So,
instead of having to write ten methods, I have to write forty. I
believe that writing forty methods instead of ten is substantial extra
effort, even if the method implementations are trivial.

So, your overloads are not actually generic, even though that's what you
posted above? Okay.

I submit that in the time it's taken you to discuss the shortcomings of
ICollection versus ICollection<>, you could have had all forty methods
written.

Any of the "interesting" code should be shared, and so should only have
to be written ten times (once for each non-generic type you have to deal
with). The "uninteresting" code that has to be written four times
should be able to simply be copied and pasted, replacing each
non-generic type as necessary in each group of copied methods.

Only the "uninteresting" part is overhead anyway (you'd have to write
the ten versions regardless), and even estimating conservatively, it
shouldn't take more than ten minutes to do the whole thing (one minute
per copied-and-pasted block of code, including the necessary
search-and-replace operation).
There are other reasons why I need to distinguish between List<> and
LinkedList<>, so I still need the overload.

Now I'm really confused. If you need the overload anyway, then in what
way is the difference between ICollection and ICollection<> relevant _at
all_?

I mean, I see the theoretical difference. But it apparently doesn't
apply to you anyway. Even if ICollection<> was implemented by Stack<>
and Queue<> you'd have three different methods anyway (assuming that
just as List<> and LinkedList<> need to be handled separately, they each
would need to be separate from Stack said:
But that's beside the point: the core problem is that we have four
containers in the framework, but two of those containers cannot be
filled generically. I see no good reason for this limitation.

I'm sorry that even after the explanations you still cannot understand
the justification. Suffice to say, opinions obviously vary and in this
case yours is in the minority.
Again, your assumption is incorrect. I have tried this and benchmarked
it, and it is slower by something like a factor of two, depending on
the element type. The code I'm writing is performance critical, so
this matters.

And again, I don't understand. For copying a Stack<> or Queue<> to a
List<> and then processing it that way to double the time it takes to
process, _all_ of the processing time would have to be spent managing
the structure. And that's assuming that you are iterating through the
final List<> without doing anything more than the copy of each element
that is required to create the List<> (typically code does something
more interesting than just copying the element once).

There's no way that CPU and memory bandwidth is anywhere near as costly
time-wise as network or disk i/o. So again, I'm left wanting a
concise-but-complete example of code that demonstrates this difference.
I'm not sure what alternative you are referring to.


I don't see the sense of that argument.

Why not? Sometimes, one finds that the framework-supplied classes just
don't do what you want. In those cases, you generally either extend the
class (by inheritance in .NET 2.0) or write your own. That's just a
fact of life.
There are queue and stack
implementations in the framework. They should be usable, but are not.

I don't think it's helpful to say that the implementations aren't
usable. Plenty of people use them all the time. Your own use of them
is a relatively esoteric issue, and one that most people don't run into.
Writing my own is no good because the code I'm talking about is part
of a library that application code links with. It is that application
code that wants to use stacks and queues and pass them to other
libraries, that expect them to be the .NET versions, not some
homegrown version. If I do what you suggest, the application would
have to copy between the homegrown versions and the .NET versions as
it passes these containers around. Doing this is unacceptable, both
because it is messy in the code because it incurs an unacceptable
performance penalty.

I understand "messy", at least to the extent that you might think that a
simple search-and-replace of the words "Stack<" and "Queue<" with
"MyStack<" and "MyQueue<" is messy (personally, I don't think that's a
terrible requirement, but whatever).

I certainly don't understand your claim of "an unacceptable performance
penalty". The whole point would be to avoid the performance penalty you
are complaining about with respect to ICollection. There's no reason
that your own stack or queue implementation should be any slower than
the .NET implementations.

For reading from the collections, it seems to me that IEnumerable<> is a
fine solution, and completely avoids whatever performance issues using
ICollection might introduce (I'm skeptical that ICollection itself could
be responsible for such a significant difference, but that's between you
and your profiler I guess...without example code, it's impossible to
discuss the specifics).

For writing to the collections, ICollection won't work anyway, so any
performance issue is irrelevant. You can either implement a stack and
queue yourself, or you can just spend the ten minutes writing the
overloads you need.

Either way, I just don't see the big deal. It's certainly not enough of
a problem to go around saying that the Stack<> and Queue<> classes are
simply useless. You may not like the implementations, but it's just not
productive to go projecting your own frustrations onto everyone else.
Not everyone has the same difficulties you're having.

Pete
 
Surely the caller could just pass the count as a parameter. I mean, you
also could obviously enumerate the collection twice, but I don't think
that should be necessary.

Well, what's nicer?

prx.func(mySequence);

or

prx.func(mySequence, mySequence.Count);

Not exactly ergonomic, and also inconsistent with how we do this in
Java, C++, Python, and Ruby.
Could you post a concise-but-complete example of code that demonstrates
this difference?

No, not really. This is in the context of a complete client and server
written with Ice. I can assure you though that the measurements are
accurate and conscienscous. If you like to find out for yourself, grab
a copy of Ice from www.zeroc.com, and create a client and server to
benchmark. (The throughput demo is a good starting point.) The modify
the demo to use generics instead by editing the generated code.
That difference is, obviously, far larger than I would have expected.
It's large enough for me to wonder if there's not something else going
on. But it's hard to comment without seeing the code.

I can assure you that there is nothing else going on. I've taken these
measuremeants with arrays, collections that are derived from
CollectionBase, and generics.
I submit that in the time it's taken you to discuss the shortcomings of
ICollection versus ICollection<>, you could have had all forty methods
written.

I am not here to argue. I noticed a deficiency in the collection
classes and pointed it out. What I'm trying to achieve can be done
with ease in both Java and C++, but it can't be achieved with C#. That
tells me that there is something wrong with the collection classes
in .NET, not with what I'm trying to do.
Either way, I just don't see the big deal. It's certainly not enough of
a problem to go around saying that the Stack<> and Queue<> classes are
simply useless.

I don't recall having said that they are useless, only that they don't
behave like their list and linked list counterparts for generic
insertion.
You may not like the implementations, but it's just not
productive to go projecting your own frustrations onto everyone else.
Not everyone has the same difficulties you're having.

Projecting frustrations? I thought this group was about the .NET
Framework? (If I want a psychoanalyst, I already know where to find
one.)

Anyway, I can do what I want with ease in C++ and Java, but I can't do
it in C#. You keep telling me that this is my fault and has nothing to
do with C#. I guess we'll just have to disagree.

Michi.
 
Michi said:
Well, what's nicer?

prx.func(mySequence);

or

prx.func(mySequence, mySequence.Count);

I don't see any significant difference. Other than one works and the
other doesn't. I generally try to write code that works, so there is that.
Not exactly ergonomic, and also inconsistent with how we do this in
Java, C++, Python, and Ruby.

Um, whatever. There's lots of stuff in every language that doesn't have
an exact analog in other languages. I don't see that as a problem.
No, not really.

Well, there's not much else I can offer in the way of advice then.
Suffice to say, I am suspicious of the numbers, even if you did take
care in measuring them. I remain unconvinced that boxing incurs the
penalty you say it incurs, never mind that boxing appears to be a red
herring anyway (since there's no need to use the class that boxes anyway).

I don't doubt the confidence you have in your measurements. But until I
see code that reproduces the result myself, I'm going to suspect a
mistake in the measurement somewhere. A 5x difference is just too
severe to be real.
[...]
I am not here to argue. I noticed a deficiency in the collection
classes and pointed it out. What I'm trying to achieve can be done
with ease in both Java and C++, but it can't be achieved with C#. That
tells me that there is something wrong with the collection classes
in .NET, not with what I'm trying to do.

I can't speak for Java, but C++ isn't a framework. It is not in any way
comparable to .NET, which is what's being discussed here, nor is really
actually as hard as you claim to do what you want using .NET or C#
(depending on the specific approach you want to take).
I don't recall having said that they are useless, only that they don't
behave like their list and linked list counterparts for generic
insertion.

You wrote (and I quoted, though you snipped it out of your reply):
There are queue and stack implementations in the framework. They should be usable, but are not.

How can something not be usable, and yet also not be useless?

It seems to me that you definitely did write that the classes were useless.
Projecting frustrations? I thought this group was about the .NET
Framework?

It is. But your comments aren't really so much about .NET Framework so
much as they are about how much you don't like .NET Framework and how
much you think everyone else shouldn't like it either.

If you had written simply that .NET Framework isn't meeting your needs,
fine. But you're making absolute statements about things that you feel
are flaws in .NET Framework. You are projecting.
[...]
Anyway, I can do what I want with ease in C++ and Java, but I can't do
it in C#. You keep telling me that this is my fault and has nothing to
do with C#. I guess we'll just have to disagree.

I suppose. I disagree with your characterization of me having placed
blame somewhere, but I admit that nothing I can write about .NET appears
to have any effect on influencing your opinion.

Perhaps I should have taken your initial hints of being prejudiced (see
"gratuitous") to heart and ignored the thread altogether.

I have pointed out a number of alternatives, none of which are really
all that hard. If you don't find them suitable, then I guess my attempt
to assist was pointless. I'm sorry for having wasted your time.

Pete
 
I have pointed out a number of alternatives, none of which are really
all that hard. If you don't find them suitable, then I guess my attempt
to assist was pointless. I'm sorry for having wasted your time.

FWIW, in a private communication, a couple of folks from the C# team
did confirm that Michi's gripe is legitimate and the design of
ICollection and ICollection<> leaves a lot to be desired. In fact
someone has even filed a bug about a variation of the same problem
with Microsoft and has received sympathetic response from them (but
they are unable to do anything because its constituting a breaking
change at this stage).

I have known Michi for a while -- he has enormous respect in the
middleware community. You could Google his name just for kicks (or in
comp.object.corba if you want). So if he says boxing incurs that kind
of penalty he sure as hell knows what he is talking about.
 
Dilip said:
FWIW, in a private communication, a couple of folks from the C# team
did confirm that Michi's gripe is legitimate and the design of
ICollection and ICollection<> leaves a lot to be desired.

I never said the design was perfect. I'm saying it doesn't seem to me
to be nearly as intractable as Michi claims, nor do I find the issues
being complained about to be nearly as arbitrary as the tone in the
thread implies.
[...]
I have known Michi for a while -- he has enormous respect in the
middleware community. You could Google his name just for kicks (or in
comp.object.corba if you want). So if he says boxing incurs that kind
of penalty he sure as hell knows what he is talking about.

Whatever. I don't code by reputation.

If there's really the kind of over-the-wire slowdown claimed, it should
be simple enough to reproduce. If no such code is forthcoming, I
reserve my right to be skeptical, regardless of how highly regarded a
particular individual is.

I'm happy to be shown. I'm not happy to have someone just say "trust
me, it's as bad as I say it is".

Pete
 
I never said the design was perfect. I'm saying it doesn't seem to me
to be nearly as intractable as Michi claims, nor do I find the issues
being complained about to be nearly as arbitrary as the tone in the
thread implies.

I have no problems with that. It just means you are entitled to your
opinion the same way he is entitled to his.
Whatever. I don't code by reputation.

You don't "code" by reputation? I don't know what that means.
If there's really the kind of over-the-wire slowdown claimed, it should
be simple enough to reproduce.

Why? Maybe that kind of slowdown is intertwined inside that enormous
code base that is not easily extractable. You don't have any
experience with the Ice code base nor do you have any idea how
complicated it can be. What makes you think its "simple" enough to
take it out and reproduce? If Don Box says that some obscure corner
of the WCF plumbing doesn't perform as expected would you ask him to
either produce the code and prove it or else to keep his mouth shut?
He architected it for Christ's sake -- he should know!

In any case you were already told how you can reproduce the problem
yourself. If you don't want to do it, then that's your problem.
People have better things to do with their time than establish their
honesty to you.
If no such code is forthcoming, I
reserve my right to be skeptical, regardless of how highly regarded a
particular individual is.

If someone were posting with the deliberate intention of creating FUD
you would be right. Which is why reputation precedes an individual.
If Joe Average Programmer happens across this newsgroup and starts
claiming .NET doesn't make coffee for him you are entitled to be
skeptical. If someone as reputed as Michi Henning claims that there
is a penalty incurred in *his* product that *he* created from the
ground up, then questioning that is simply being argumentative for the
sake of it.
 
Dilip said:
I have no problems with that. It just means you are entitled to your
opinion the same way he is entitled to his.

As long as opinions are stated as such, that's fine. Michi has not
qualified his comments as opinion.
You don't "code" by reputation? I don't know what that means.

I mean that I don't see the point in bringing up someone's reputation as
a way justifying some technical point. Technical points, by definition,
can be justified in an objective way without needing to consider
someone's reputation.
Why? Maybe that kind of slowdown is intertwined inside that enormous
code base that is not easily extractable.

If that's true, then the slowdown is not due solely to the boxing.

Conversely, if boxing is truly responsible for the slowdown, it would be
reproducible without all the other stuff.
You don't have any
experience with the Ice code base nor do you have any idea how
complicated it can be.

It doesn't matter. If using Ice is required to reproduce the problem,
it is hardly reasonable to blame the entire performance issue on .NET's
boxing of value types.
[...]
In any case you were already told how you can reproduce the problem
yourself. If you don't want to do it, then that's your problem.

Actually, it's not my problem. I'm not the one who is having trouble
People have better things to do with their time than establish their
honesty to you.

Who said anything about honesty? I'm not questioning anyone's honesty,
and it's insulting and misleading for you to accuse me of doing so.
[...]
If someone as reputed as Michi Henning claims that there
is a penalty incurred in *his* product that *he* created from the
ground up, then questioning that is simply being argumentative for the
sake of it.

I have never said that I doubt he has noticed a penalty. What I said I
doubt is that the penalty can be blamed on boxing.

As far as being "argumentative for the sake of it" goes, well...I find
both Michi's and your own behavior to fall squarely into that category.
You don't even have a horse in this race, and Michi continues to
discount perfectly reasonable alternatives suggested to him (including
the most obvious one, pointing out that simply writing all of the
overloads he needs is NOT really all that much work).

Pete
 
Dilip wrote:
You don't even have a horse in this race,

That much is certainly true. I guess my objective was to just inform
you that I did converse with a few Microsoft folks and they did admit
that for Michi's particular use-case, the way ICollection and friends
are designed today, solutions _are_ a bit awkward. Thats not to say a
workaround is impossible, I am just saying there is no straightforward
way to do it when other languages seem to not suffer from this
problem.

I should have stopped there. But I guess I got carried away. Michi
certainly doesn't need _me_ to defend himself. He is quite capable of
running circles around me!
 
Back
Top