What interface for "ISet"

  • Thread starter Thread starter Helge Jensen
  • Start date Start date
H

Helge Jensen

I've got some data that has Set structure, that is membership, insert
and delete is fast (O(1), hashing). I can't find a System.Collections
interface that matches the operations naturally offered by Sets.

- ICollection cannot decide containment
- IList promises indexability by the natural numbers, which is not
achievable (since i hash elements, not sort them).
- IDictionary is definatly not setlike.

Although I can, of course, define my own "ISet" is there really no
".NET" interface that matches the Set datastructure?

interface ISet: ICollection {
void Add(Object);
void Remove(Object);
bool Contains(Object);
}

If i define my own ISet, I will introduce all the problems of
cooperation with other libraries, which will use other names for the Set
interface.

Any ideas for nice solutions to this problem?
 
I guess I'm not seeing the problem with deriving from IList or, at least,
creating a Set object that simply contains an IList object and doesn't
expose its interfaces.
If i define my own ISet, I will introduce all the problems of
cooperation with other libraries, which will use other names for the Set
interface.

Is there any reason that you believe that your namespace wouldn't take care
of any "problems" with cooperation with other libraries?


--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
 
Nick said:
I guess I'm not seeing the problem with deriving from IList or, at least,
creating a Set object that simply contains an IList object and doesn't
expose its interfaces.

The basic property of IList is indexability by the natural numbers,
0,...,list.Count.

I am doing an efficient representation of networks, insertion, deletion
and membership must be decidable very efficiently. Equality can be
rather expensive to decide for the nodes, but a good hasing function exists.

The best approach (based on the actual data in the network) is to use
hashing, which gives me (amortized worst-case) O(1) operation on all
these operations, where at most 1-2 equals operations would get done.

Using a sorted list would yield (worst-case) O(logN)-operation for
membership and (worst-case) O(n) insert and delete operations (moving
all the other elements).

While the sorted-list solution would provide me with a data-structure to
fit an existsing System.Collections interface, it would be a rather bad
move for performance... especially since I don't have any real use for
the indexing, only for containment, insert and delete.

So, the underlying data-structure can only implement the indexing
operation (this[int i] { get; }) by iterating all elements, counting as
it goes along. It would be a very bad idea for me to give users the
impression that my data-structure should be used as a IList.
Is there any reason that you believe that your namespace wouldn't take care
of any "problems" with cooperation with other libraries?

Cooperation problems would not stem from clashing of names.

In C# (and JAVA, python, C++, ...) there are interface definition for
the most common datastructures.

This establishes a uniformity of interface for differing libraries: I
can pass an IDictionary from one independant library to another. So
datastructures with the same properties would be useable across libraries.

It becomes natural for libraries to choose the language-supplied
interface for communicating datastructures that matches these interfaces
(IDictionary, IList, Enumerator in C# -- Set, Map, Iterator ... in JAVA
-- std::set<T>, std::map<T,R>, std::iterator_traits<T>::iterator in C++).

The (very basic) set datastructure, whose operations are: contains,
insert, remove and iteration, have language-standard-library presence in
most other languages than C#.

I was (in my naivity) hoping, that there was some .NET data-structure
interface corresponding to the set data-structure properties somewhere
in .NET, and that I had just not seen it. Alternately, perhaps someone
has a hint as to the best way to choose to present my set to .NET users
not nessesarily having knowledge of the explicit representation of set's
in my library.

For now, I have chosen to define ISet: ICollection in my own namespace,
but having it convertible to IList.

So:
- those having knowledge of the ISet interface can use the
Contains,Insert and Delete opreations.
- those not knowing the ISet can either of:
- use it as ICollection
- cast it to a (very badly performing) IList

Of course, any user can always "copy-contruct" an IList from the
ICollection interface of my ISet if they wish an indexable copy of the
ISet, using (for example) new ArrayList(set); But i wouldn't want to do
that with the multi-million node networks :)
 
Is there some reason you don't use a HashTable collection, instead of basic
IList?


--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
Helge Jensen said:
Nick said:
I guess I'm not seeing the problem with deriving from IList or, at least,
creating a Set object that simply contains an IList object and doesn't
expose its interfaces.

The basic property of IList is indexability by the natural numbers,
0,...,list.Count.

I am doing an efficient representation of networks, insertion, deletion
and membership must be decidable very efficiently. Equality can be
rather expensive to decide for the nodes, but a good hasing function exists.

The best approach (based on the actual data in the network) is to use
hashing, which gives me (amortized worst-case) O(1) operation on all
these operations, where at most 1-2 equals operations would get done.

Using a sorted list would yield (worst-case) O(logN)-operation for
membership and (worst-case) O(n) insert and delete operations (moving
all the other elements).

While the sorted-list solution would provide me with a data-structure to
fit an existsing System.Collections interface, it would be a rather bad
move for performance... especially since I don't have any real use for
the indexing, only for containment, insert and delete.

So, the underlying data-structure can only implement the indexing
operation (this[int i] { get; }) by iterating all elements, counting as
it goes along. It would be a very bad idea for me to give users the
impression that my data-structure should be used as a IList.
Is there any reason that you believe that your namespace wouldn't take care
of any "problems" with cooperation with other libraries?

Cooperation problems would not stem from clashing of names.

In C# (and JAVA, python, C++, ...) there are interface definition for
the most common datastructures.

This establishes a uniformity of interface for differing libraries: I
can pass an IDictionary from one independant library to another. So
datastructures with the same properties would be useable across libraries.

It becomes natural for libraries to choose the language-supplied
interface for communicating datastructures that matches these interfaces
(IDictionary, IList, Enumerator in C# -- Set, Map, Iterator ... in JAVA
-- std::set<T>, std::map<T,R>, std::iterator_traits<T>::iterator in C++).

The (very basic) set datastructure, whose operations are: contains,
insert, remove and iteration, have language-standard-library presence in
most other languages than C#.

I was (in my naivity) hoping, that there was some .NET data-structure
interface corresponding to the set data-structure properties somewhere
in .NET, and that I had just not seen it. Alternately, perhaps someone
has a hint as to the best way to choose to present my set to .NET users
not nessesarily having knowledge of the explicit representation of set's
in my library.

For now, I have chosen to define ISet: ICollection in my own namespace,
but having it convertible to IList.

So:
- those having knowledge of the ISet interface can use the
Contains,Insert and Delete opreations.
- those not knowing the ISet can either of:
- use it as ICollection
- cast it to a (very badly performing) IList

Of course, any user can always "copy-contruct" an IList from the
ICollection interface of my ISet if they wish an indexable copy of the
ISet, using (for example) new ArrayList(set); But i wouldn't want to do
that with the multi-million node networks :)
 
Helge,

I mean you have a long reply about why you are not using the SortedList and
do not mention that it is a dictionary, and now that Nick propose the
hashtable you tell that it is not right because it is a dictionary.

For me that did look inconsequent, that is my only contribution.

Cor
 
Helge said:
Although I can, of course, define my own "ISet" is there really no
".NET" interface that matches the Set datastructure?
If you define a set abstraction as a list-collection that contains no
duplicate elements then the answer is no, the System.Collections
namespace does not have such a class. However you don not need a new
ISet interface to define the contract for a set. The ICollection
interface defines the members a set abstraction must implement.
You can easily implement a set class by leveraging the ArrayList class
as the internal datastructure. Below is a simple implementation of a set
I developed some time ago:

using System;
using System.Collections;

namespace Objectware.Collections {
public class Set : ICollection {
private ArrayList innerList=new ArrayList();
public Set() {}

public void CopyTo(Array array) {
CopyTo(array, 0);
}

public void CopyTo(Array array, int index) {
innerList.CopyTo(array, index);
}

public int Count {
get { return innerList.Count; }
}

public object SyncRoot {
get { return innerList.SyncRoot; }
}

public bool IsSynchronized {
get { return innerList.IsSynchronized; }
}

public int Add(object value) {
if (!Contains(value)) {
return innerList.Add(value);
} else return -1;
}
public void AddAll(ICollection collection) {
foreach (object item in collection) {
Add(item);
}
}
public void Remove(object value) {
innerList.Remove(value);
}
public void RemovedAll(ICollection collection)
{
foreach (object item in collection)
{
Remove(item);
}
}
public bool RetainAll(ICollection collection) {
ArrayList newList=new ArrayList(innerList);
foreach (object item in collection) {
newList.Remove(item);
}
int oldCount=innerList.Count;
innerList=newList;
return oldCount!=innerList.Count;
}

public bool Contains(object value) {
return innerList.Contains(value);
}
public bool ContainsAll(ICollection collection) {
foreach (object item in collection) {
if (!Contains(item)) return false;
}
return true;
}

public void Clear() {
innerList.Clear();
}

public IEnumerator GetEnumerator() {
return innerList.GetEnumerator();
}
}
}

Anders Norås
http://dotnetjunkies.com/weblog/anoras/
 
Cor said:

Thanks for your reply.
I mean you have a long reply about why you are not using the SortedList and
do not mention that it is a dictionary, and now that Nick propose the
hashtable you tell that it is not right because it is a dictionary.

Maybe, I haven't been clear enough on this, since all 3 who have been
kind enough to answer have (I think) misunderstood me.

I have a datastructure, implemented and all... I am *not* trying to
select a .NET System.Collections *implementation* for my data-structure.
An implementation exists, which satisfies my needs (written by me).

It runs very fast and have the following properties:

- It supports: Contain(item), Add(item), Remove(item), Count and
GetEnumerator()
- Contains, Add, and Remove are O(1) operations
- Enumeration has no specific order

Now, back in my very first computer-science-class, data-strutures with
these properties had a name: "Set". This is a standard computer-science
term for data-structures with the properties above (well, not
nessesarily O(1)-time-bounds on Add, Remove and Contains, but
"fast"~=O(logN)).

I was wondering how to make this new data-struture most easily usable
from other parts of .NET.

Since a Set is a standard computer-science description of datastrutures
with the properties above, I expected there to be a ISet in
System.Collections which I could declare the preferred way to formulate
sets in C#/.NET. Just like there is an IList, that decribes the standard
"List" behaviour and an IDictionary, that describes the standard
dictionary, or map datastruture. But no such luck.

Now, my questions were(are):

1) Have I overlooked some interface, stragely named, but covering the
Set properties?
- I don't think so at this point :)

2) Why is there no such interface?
- It is a commonly known data-struture
- It is damned usefull at times (explains why it's commonly known :)
- A parallel exists in most other languages standard-library: C++
(std::set<T>, JAVA: java.util.Set, ...)

3) What is the best thing that I can do, since no such interface exists?
- Obviously, my datastructure fullfills the ICollection interface
- None of the other interfaces seems to fit
- Maybe someone out there had just the same problem as me, but had
just found a nice way to make it kind'a-work

BTW: With regard to the name "SortedList", that name is a bit
misleading, since it does not expose List behaviour at all, but
dictionary behaviour (I realize that it is named SortedList because the
implementation uses a sorted list as an implementation), but is an
ArrayList implementing IList via a list of arrays?...
 
Helge,

I always try too avoid academic discussions about used words and now I am in
that.

Back in my computer history the Set was I thought used to put a memory value
in an accumulator register (better any register). (Could be as well visa
versa). So meaning of words have changed for me in time.

With this not giving any opinion about your messages before you think that.
At a certain moment I started avoiding this kind of discussions, I became to
pragmatic for that.

However maybe Nick can tell you more about your message.

Cor
 
Hello Helge,

I could have replied to your reply, but I wanted to reply to the long post
so that I can refer to your text.

Let me first restate the problem so that you can tell me if you and I are on
the same page. You have implemented a Collection type that you have called
Set. You recognize that the methods and behavior of this class are typical
to the mathematical concept of a Set, where it is important not only to add
and delete members but also to determine membership. You did not find a
Collection object that performed this function for you, and you wanted to
know (a) if it were there, how could you have missed it? (b) why isn't it
there, since a Set is fundamental to so many mathematical concepts? and (c)
is there a simple workaround that you missed?

Did I capture your questions correctly?

I suggest that you could use a Hashtable and your reply was that your
requirement is not for a Dictionary. You also suggest that users of your
collection type may use the underlying properties to move around, which
tells me that you are intending to create, in effect, a component that, in
your view, is useful to a broad array of people whom you are unlikely to
meet. While I don't know where this may be, that consideration adds a few
constraints to your implementation, because you want to make sure that all
exposed interfaces make sense.

OK... so I looked again at your text. You said that you want a collection
with three abilities: add, delete, and member, where the member method has
to run quickly.
I am doing an efficient representation of networks, insertion, deletion
and membership must be decidable very efficiently. Equality can be
rather expensive to decide for the nodes, but a good hasing function
exists.

From this, I gather that you expect and plan to use a hashing function not
only to determine membership but also to determine equality. E.G. The hash
value is the objects identity. Let me know if I am off here. Therefore,
two objects are equal if their hash value is equal.

Now, who determines the hash value of an object. Let's look at an
inheritance tree, where you have a basic INode interface. From INode, you
inherit various network members (I'm guessing: computer, resource, person,
etc... like a directory). So you will have various different items that you
want to store in your Set collection. While I do not know your hash
function, in general, I would consider it a good practice to define the
interface for the hash function at INode (or above) and to implement the
hash function in the object itself.

Therefore, each object needs to provide its own hash function. So, not only
do you need to have a generic Set collection, but you need to have a generic
GetHashCode method defined in the inheritance tree that your "hashable
objects" can use. Also, Hash functions do not, by their nature, guarantee
that the hash value is absolutely unique. Their promise is that the value
is evenly distributed across a wide spectrum of values, and that two objects
that are the same will produce the same hash function. However, two objects
that are different could, potentially, produce the same hash function.
Therefore, you also need to override the Equal() method of your hashable
objects, so that your generic collection has a good way to compare items.

Let's look again at the criteria: you need a collection that can support
add, delete, membership, generate-hash, and equality. Let me turn that
around: ANY collection that can efficiently support add, delete, membership,
and can appropriatly delegate the generate-hash and equality methods can be
considered to be a valid and useful collection for Set.

Look at the second sentence carefully in the above paragraph. This is the
key to understanding why you didn't find the Set collection when you were
looking right at it. The HashTable collection completely fills the criteria
stated in the second collection.

Insert is called HashTable.Add
Delete is called HashTable.Delete
Membership is called HashTable.Contains
Generate-Hash is called Object.GetHashCode
Equality is called Object.Equals

Note: the overridable interface for generate-hash and equality are methods
on a the Object base class. Therefore, you don't have to worry about
multiple inheritance issues when defining your override methods... every
class you create is derived from Object, whether you know it or not. You
should see this topic on how to create your hash function:
http://msdn.microsoft.com/library/en-us/cpref/html/frlrfsystemobjectclassgethashcodetopic.asp

Finally, you complain that you don't have a dictionary and therefore you
cannot use a hashtable collection because it is implemented using the
dictionary collection base class (DictionaryBase). Do you care how the
collection is implemented, as long as the implementation is efficient? The
Dictionary collection IS a hash table. It isn't a sorted list.
Implementing the Dictionary as a sorted list requires a different base class
(HybridList).

Therefore, I assert categorically that .Net did not leave out the Set
collection. It just used a different name. Not only does the collection
exist, but it is highly efficient. You can easily prove this by creating a
simple performance test using your objects. Create a large list of your
objects, and run each collection through its paces: inserting, deleting, and
determining membership. To be fair, the hash function must produce widely
distributed values and both collections must use the same hash function.
Given that criteria, I will put $20 on the table right now that the .Net
HashTable object is just as efficient or more than your collection class.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
Helge Jensen said:
Nick said:
I guess I'm not seeing the problem with deriving from IList or, at least,
creating a Set object that simply contains an IList object and doesn't
expose its interfaces.

The basic property of IList is indexability by the natural numbers,
0,...,list.Count.

I am doing an efficient representation of networks, insertion, deletion
and membership must be decidable very efficiently. Equality can be
rather expensive to decide for the nodes, but a good hasing function exists.

The best approach (based on the actual data in the network) is to use
hashing, which gives me (amortized worst-case) O(1) operation on all
these operations, where at most 1-2 equals operations would get done.

Using a sorted list would yield (worst-case) O(logN)-operation for
membership and (worst-case) O(n) insert and delete operations (moving
all the other elements).

While the sorted-list solution would provide me with a data-structure to
fit an existsing System.Collections interface, it would be a rather bad
move for performance... especially since I don't have any real use for
the indexing, only for containment, insert and delete.

So, the underlying data-structure can only implement the indexing
operation (this[int i] { get; }) by iterating all elements, counting as
it goes along. It would be a very bad idea for me to give users the
impression that my data-structure should be used as a IList.
Is there any reason that you believe that your namespace wouldn't take care
of any "problems" with cooperation with other libraries?

Cooperation problems would not stem from clashing of names.

In C# (and JAVA, python, C++, ...) there are interface definition for
the most common datastructures.

This establishes a uniformity of interface for differing libraries: I
can pass an IDictionary from one independant library to another. So
datastructures with the same properties would be useable across libraries.

It becomes natural for libraries to choose the language-supplied
interface for communicating datastructures that matches these interfaces
(IDictionary, IList, Enumerator in C# -- Set, Map, Iterator ... in JAVA
-- std::set<T>, std::map<T,R>, std::iterator_traits<T>::iterator in C++).

The (very basic) set datastructure, whose operations are: contains,
insert, remove and iteration, have language-standard-library presence in
most other languages than C#.

I was (in my naivity) hoping, that there was some .NET data-structure
interface corresponding to the set data-structure properties somewhere
in .NET, and that I had just not seen it. Alternately, perhaps someone
has a hint as to the best way to choose to present my set to .NET users
not nessesarily having knowledge of the explicit representation of set's
in my library.

For now, I have chosen to define ISet: ICollection in my own namespace,
but having it convertible to IList.

So:
- those having knowledge of the ISet interface can use the
Contains,Insert and Delete opreations.
- those not knowing the ISet can either of:
- use it as ICollection
- cast it to a (very badly performing) IList

Of course, any user can always "copy-contruct" an IList from the
ICollection interface of my ISet if they wish an indexable copy of the
ISet, using (for example) new ArrayList(set); But i wouldn't want to do
that with the multi-million node networks :)
 
Cor Ligthert said:
I always try too avoid academic discussions about used words and now I am in
that.

Back in my computer history the Set was I thought used to put a memory value
in an accumulator register (better any register). (Could be as well visa
versa). So meaning of words have changed for me in time.

It's not that the meaning has *changed* - it's that the word "set" has
multiple meanings. As a verb, it's used for setting the value of
something. As a noun it's an unordered collection.

(Those are just the computer science type definitions - I believe
"set" is one of the words with the most definitions in most English
language dictionaries.)
 
Jon,

I knew this.

The translation to Dutch from the English "Set" is "Zet" and as well with
mostly the same different meaning.

We also use by instance for a very English habit "theezetten" I think that
is not the same in English, when I would translate it I would say. "Making
tea" while you could probably think on a "tea-set". Of course are not all
the meanings equal there are probably some more in English and some more in
Dutch.

My only part in the message was that because of the reason I mentioned (real
I did that in (far) past) however as well my explanation above where I have
easily the change to make a mistakes about meanings of English words.
Because I don't know that meaning or take easily a meaning from a word in
another language (not only Dutch).

And trying to make it even more clear, who is using as one of the most the
word "DataSet" in this newsgroups and tells what it is, maybe I change that
now and will tell that it is a set of datatables and relations instead of a
collection.

However thanks for the attention.

Cor
 
Nick,

I don't know if you read my answer too Jon, however before you understand me
wrong.

I made and sent that message before I readed yours.

In that message I mention the "DataSet", that has not a relation too your
message (although reading your message quick I saw some relations).

Cor
 
My only part in the message was that because of the reason I mentioned
(real I did that in (far) past) however as well my explanation above where
I have easily the change to make a mistakes about meanings of English
words. Because I don't know that meaning or take easily a meaning from a
word in another language (not only Dutch).
Don't deal in academically discussions when it is about meanings of English
words.

Cor
 
Nick said:
Hello Helge,

I could have replied to your reply, but I wanted to reply to the long post
so that I can refer to your text.

Let me first restate the problem so that you can tell me if you and I are on

Thank you. I replied in length another place in the thread with some
additional information, since I gather that I have not been too clear
the first time: news://msnews.microsoft.com:119/[email protected]

Thank you for the very long and elaborate reply, unfortunatly we seem to
have a slight misunderstanding about what I was asking for. Please
accept my apologies for not formulating that clearly enough.
the same page. You have implemented a Collection type that you have called
Set. You recognize that the methods and behavior of this class are typical
to the mathematical concept of a Set, where it is important not only to add
and delete members but also to determine membership.

Correct. I have implemented a collection-type, with the properties of a
mathematical Set. The actual class is called "HashSet".

You did not find a
Collection object that performed this function for you, and you wanted to
know (a) if it were there, how could you have missed it? (b) why isn't it
there, since a Set is fundamental to so many mathematical concepts? and (c)
is there a simple workaround that you missed?
Did I capture your questions correctly?

Not really, I have the implementation, what I am missing is a suitable
interface that defined the operations of a mathematical set.

Like System.Collections contains an interface called IList to ease
cooperation between components that exposes parts or data-structures
that can be thought of as having list-structure without divulging into
the details of how that list-structure is implemented, I was kind of
hoping there would be an ISet fulfilling the same purpose for the Set
datastructure as IList does for lists, and IDictionary does for
dictionaries/maps.

Different parts of code could then easily pass set-structured data to
each-other without knowing the exact implementation or having to use
adapters between each others definitions of the set-interface.

I realize that not every datastructure in the world can have an
interface in System.Collections, but I was rather surprised sets weren't
included.

The standard-library for most languages (at-least: C++, JAVA, ML,
python) have a definition of (in whatever is the appropriate parallel in
that language to "interface") sets. I'm really not interested in the
implementation, I have that.
I suggest that you could use a Hashtable and your reply was that your
requirement is not for a Dictionary.

I am not looking for an implementation, I have an implementation. That
implementation does not implement a dictionary (it can be thought of as
a function into {true,false}, but it's not *convinient* (or common
practice) to formulate the behaviour of mathematical sets as dictionaries.

OK... so I looked again at your text. You said that you want a collection
with three abilities: add, delete, and member, where the member method has
to run quickly.

I think I said I *had* a collection, where these member methods run quickly.
exists.

From this, I gather that you expect and plan to use a hashing function not
only to determine membership but also to determine equality. E.G. The hash
value is the objects identity. Let me know if I am off here. Therefore,
two objects are equal if their hash value is equal.

No, I use the hashing function to hash the nodes down to "buckets", and
search linerly in the appropriate bucket. If the buckets grow larger
than some amount (currently log^2 N), I rehash over a larger set of buckets.

This is a standard way to implement (near-)constant-time set operations
for hashable members, and probably very similar to the implementation of
Hashtable.
Now, who determines the hash value of an object. Let's look at an
inheritance tree, where you have a basic INode interface. From INode, you
inherit various network members (I'm guessing: computer, resource, person,
etc... like a directory). So you will have various different items that you
want to store in your Set collection. While I do not know your hash
function, in general, I would consider it a good practice to define the
interface for the hash function at INode (or above) and to implement the
hash function in the object itself.

I Use a IHashCodeProvider, passed to the HashSet constructor to
calculate the hash.

The actual hashcode provider implementation uses only information
obtainable via the INode interface, so i do not have a problem with
implementations having to provide proper implementation of GetHashCode()

Using a IHashCodeProvider alos allows me to easily experiment with
different hashing functions on the same objects.
Therefore, each object needs to provide its own hash function. So, not only
do you need to have a generic Set collection, but you need to have a generic
GetHashCode method defined in the inheritance tree that your "hashable
objects" can use. Also, Hash functions do not, by their nature, guarantee
that the hash value is absolutely unique. Their promise is that the value
is evenly distributed across a wide spectrum of values, and that two objects
that are the same will produce the same hash function. However, two objects
that are different could, potentially, produce the same hash function.
Therefore, you also need to override the Equal() method of your hashable
objects, so that your generic collection has a good way to compare items.

I am aware what a hash-function is, this is not the first time I have
implemented high-performance datastructures. It is, however, the first
time I have done it in C#.

I do not need to override the Equals method, because I compare members
using an IComparator, passed in the Hashset constructor. The actual
IComparator implementation used uses only data obtainable via the INode
interface; that exposes a rather lengthy unique ID; thus the expensive
equality coparison.

The is *astoundingly* similar to the way the
System.Collections.Hashtable works ;) except it doesn't map keys to
values, it just decides membership, thereby halving memory-consumption.
Let's look again at the criteria: you need a collection that can support
add, delete, membership, generate-hash, and equality. Let me turn that
around: ANY collection that can efficiently support add, delete, membership,
and can appropriatly delegate the generate-hash and equality methods can be
considered to be a valid and useful collection for Set.

I have such a collection implementation. I just wish there was a name
for the properties it has, instead of a name for the specific
implementation, lists have "IList", dictionaries (or maps, as they are
often called in other languages) have "IDictionary", but apparently sets
didn't cut it to get a System.Collections.ISet...
Look at the second sentence carefully in the above paragraph. This is the
key to understanding why you didn't find the Set collection when you were
looking right at it. The HashTable collection completely fills the criteria
stated in the second collection.

1) Hashtable is not a description of properties, it's an implementation
(of IDictionary)

2) IDictionaries are not (naturally) sets

3) While a mathematical set *can* be thought of as a dictionary from
elements into true/false, that is not a very common thing to do. And
using "just the keys" of a hashtable as a set is a hack; it can at times
be a good hack, but it's not good for making people understand what they
can do: "what do you mean i should say foo.Add(item, null) to insert
into the set?".

4) Of course one can implement a HashSet by using an IDictionary, and
just defining that "all the entries which have keys are members, and you
are not supposed to look at the values they map to". But I would rather
just have:

interface ISet: ICollection {
void Add(Object item);
void Remove(Object item);
bool Contains(Object item);
}

Which formulates the properties of a mathematical set very nicely.

As i said, I have ended up just declaring this interface myself, in my
own namespace, which solves my immedeate itch: to just have to explain
to my users what they can do with the data they have been given, which
is: set-operations, without tying me down to a specific implementation.

It allows me to talk about the properties of the data instead of the
representation of the data.
Insert is called HashTable.Add
Delete is called HashTable.Delete
Membership is called HashTable.Contains

Hashtable is not a set, I cannot new HashTable().Add(item), since a
Hashtable maps keys to values, So it would be .Add(item, null).
Generate-Hash is called Object.GetHashCode
Equality is called Object.Equals

HashSet, (and Hashtable) construtors accept an IHashCodeProvider and
IComparator, so I am not bound to using the objects GetHashCode() and/or
Equals() for hashing/comparison, even *if* I used Hashtable as an
implementation for my sets.
Finally, you complain that you don't have a dictionary and therefore you
cannot use a hashtable collection because it is implemented using the
dictionary collection base class (DictionaryBase). Do you care how the
collection is implemented, as long as the implementation is efficient? The
Dictionary collection IS a hash table. It isn't a sorted list.
Implementing the Dictionary as a sorted list requires a different base class
(HybridList).

I have never said anything about DictionaryBase, or claimed that I was
unable to use the Hashtable for implementation of a set, what I did
claim was that a Hashtable (or more generally: IDictionary) does not fit
the description of a set: a set does not map keys to values, it simply
decides containment.
Therefore, I assert categorically that .Net did not leave out the Set

you better have a catch clause for that assertion :)

A dictionary and a set are not the same. The claim that a set is just a
dictionary where you "only look at the keys" doesn't really help.
collection. It just used a different name. Not only does the collection
exist, but it is highly efficient.

I do not doubt the efficiency of the Hashtable implementation. I known
how hashtables work, and have implemented many different data-structures
for specific applications before: red-black-, AVL-, splay-trees, and
some space-efficient sets and maps for use in code analysis, which
provides runtime-tradeability between memory and cpu utilization, while
still being able to guarantee the asymptotic performance of the
operations on the structures.
You can easily prove this by creating a
simple performance test using your objects. Create a large list of your
objects, and run each collection through its paces: inserting, deleting, and
determining membership. To be fair, the hash function must produce widely
distributed values and both collections must use the same hash function.
Given that criteria, I will put $20 on the table right now that the .Net
HashTable object is just as efficient or more than your collection class.

Out of curiosity, i did a small (totally unscientific) test: HashSet is
my first-draft set implementation, looping 1000 times over inserting and
deleting the ints from 0 to 100000. Of course you could start twiddling
with loadfactors and stuff like that, and probably come up with a
reverse result.

Hashtableset is:

public class HashtableSet: Hashtable, ISet
{
public HashtableSet(int size): base(size) {}
void ISet.Add(Object item) { base.Add(item, null); }
void ISet.Remove(Object item) { base.Remove(item); }
bool ISet.Contains(Object item) { return base.ContainsKey(item); }
}


Both ISet implementations are given an initial capacity argument of
100000/3.

Inserting and removing 100000 elements: 1000 times
graphutil.HashSet: 00:00:28.6111408
graphutil.HashtableSet: 00:00:39.7571680
 
Anders said:
Helge Jensen wrote:
If you define a set abstraction as a list-collection that contains no
duplicate elements then the answer is no, the System.Collections
namespace does not have such a class. However you don not need a new
ISet interface to define the contract for a set. The ICollection
interface defines the members a set abstraction must implement.

Problem is, ICollection is not the set contract. It doesn't give me a
way short of iteration to test for containment. While that may be
alright for small sets and a small number of tests for containment, it's
not really a good-thing, and *much* fatser solutions exists.
You can easily implement a set class by leveraging the ArrayList class
as the internal datastructure. Below is a simple implementation of a set
I developed some time ago:
namespace Objectware.Collections {

This is certainly a Set. However unsorted list implementations are very
inefficient, in effect you have to search the entire list to determine
Contais.

The set operations (Add, Remove, Contains) can be implemented much
faster if you have access to either an ordering (log N), or hashing
(constant time).

Also, you could use specialization instead of composition for the
implementation:

public class ArrayListSet: ArrayList
{
public override int Add(Object item)
{
if ( Contains(item) )
return -1;
else
return base.Add(item);
}
public override void AddRange(ICollection items) {
foreach (Object item in items)
Add(item);
}

You would probably be better off using either SortedList or Hashtable
for it's implementation (as suggested by Nick Malik in another thread to
me), if your elements can be ordered or hashed.

I suspect the below would work for both SortedList and Hashtable
implementations of Set's. It's the GetEnumerator i'm worried about,...
Does does HashtableSet call Hashtable.GetEnumerator?, probably not but
you'd better make a unit-test :)

public class HashtableSet: Hashtable, ISet, IEnumerable
{
public HashtableSet(int size): base(size) {}
void ISet.Add(Object item) { base.Add(item, null); }
void ISet.Remove(Object item) { base.Remove(item); }
bool ISet.Contains(Object item) { return base.ContainsKey(item); }
IEnumerator IEnumerable.GetEnumerator() { return
base.Keys.GetEnumerator(); }
}

Alternately you can do implementation by composition, like you did with
the list.
 
Hello Helge,

I apologize for not understanding your requirements. You didn't need an
efficient implementation of a set. You wanted an interface that was defined
in the framework.

As you probably guessed, I am powerless to modify the framework. (I'm not
sure I'd agree with you about the need to.)

If you'd like to see a new interface added to the framework, send a note to
MSWish and cross your fingers.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
 
4) Of course one can implement a HashSet by using an IDictionary, and
just defining that "all the entries which have keys are members, and you
are not supposed to look at the values they map to". But I would rather
just have:

interface ISet: ICollection {
void Add(Object item);
void Remove(Object item);
bool Contains(Object item);
}

Out of curiosity, does the v2 ICollection<T> satisfy your requirements?

its current definition(not 100% sure its up to date, the newest beta may
have changed it somewhat):

public interface ICollection<T> : IEnumerable<T>{ // Methods
void Add(T item);
void Clear();
bool Contains(T item);
void CopyTo(T[] array, int arrayIndex);
bool Remove(T item);

// Properties
int Count { get; }
bool IsReadOnly { get; }
}

While I would like to see set semantics put in stone, it seems to me this
generic ICollection suits the needs and a forward looking generic ISet
interface would be little more than a marker.
 
Back
Top