MulticastDelegate.GetHashCode returns not usable hashs in .NET 2.0

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

The behavior of the function MulticastDelegate.GetHashCode() has changed from
..NET 1.1 to 2.0.
With 1.1 it returns different hashs if the methods stored in the delgate are
different. This is as expected because the delgates are completely different
and should also deliver different hashs.
With 2.0 this behavior has changed. The delegates with different methods
deliver the same hashs. So identifying the delegates by this hash will fail
now...
....very annoying, because it gives no exception or compilation error, only
the application fails if it deals with the hashs.

I'm currently starting to lose confidence in .NET 2.0 because this is the
second bug which I found in only 3 days of trying to use my 1.1 project with
2.0 now. When will I found the next issue, and will I really found all before
next release of my package??
 
Stonesource said:
The behavior of the function MulticastDelegate.GetHashCode() has changed
from
.NET 1.1 to 2.0.
With 1.1 it returns different hashs if the methods stored in the delgate
are
different. This is as expected because the delgates are completely
different
and should also deliver different hashs.
With 2.0 this behavior has changed. The delegates with different methods
deliver the same hashs. So identifying the delegates by this hash will
fail
now...
...very annoying, because it gives no exception or compilation error, only
the application fails if it deals with the hashs.

I'm currently starting to lose confidence in .NET 2.0 because this is the
second bug which I found in only 3 days of trying to use my 1.1 project
with
2.0 now. When will I found the next issue, and will I really found all
before
next release of my package??

This is not a bug.

Same things MUST hash to the same value.
Different things DO NOT have to hash to different values.

The bug is in your code.
 
Stonesource said:
The behavior of the function MulticastDelegate.GetHashCode() has changed from
.NET 1.1 to 2.0.
With 1.1 it returns different hashs if the methods stored in the delgate are
different. This is as expected because the delgates are completely different
and should also deliver different hashs.
With 2.0 this behavior has changed. The delegates with different methods
deliver the same hashs. So identifying the delegates by this hash will fail
now...

Why does that mean it "fails"?
...very annoying, because it gives no exception or compilation error, only
the application fails if it deals with the hashs.

Then you have a bug. You should not rely on GetHashcode giving
different answers for different values - you should only rely on it
giving the same answer for the same value. It's more useful if the
return values are different, of course, but an implementation of
GetHashcode() which just returned 0 every time would be valid.
I'm currently starting to lose confidence in .NET 2.0 because this is the
second bug which I found in only 3 days of trying to use my 1.1 project with
2.0 now. When will I found the next issue, and will I really found all before
next release of my package??

Well, this one is a bug in your app rather than in the framework. What
was the other "bug"?
 
Jon Skeet said:
Why does that mean it "fails"?

According to the MSDN documentation 'GetHashCode is suitable for use in
hashing algorithms and data structures like a hash table'. That is what fails
now: One cannot identify the delegate by the hash it returns. So adding the
delegate to a hash table using this hash will possibly overwrite a previously
added entry.
Then you have a bug. You should not rely on GetHashcode giving
different answers for different values - you should only rely on it
giving the same answer for the same value. It's more useful if the
return values are different, of course, but an implementation of
GetHashcode() which just returned 0 every time would be valid.

Thats wrong.
Just have a look to string.GetHashCode(). If you could not rely on the
returned hash you could never use a string as a key in hash tables!
Look again in the MSDN documentation. There it is stated that you cannot
rely on the DEFAULT implementation of the GetHashCode, but derived
implementations 'must override GetHashCode with an implementation that
returns a unique hash code'. MulticastDelegate.GetHashCode() is a derived
implementation.
Well, this one is a bug in your app rather than in the framework.

It is a bug in the framework. A hash what cannot be used as a hash is no
hash...
What was the other "bug"?

The other bug is related to the new IPC channel. In short: It is not
possible to use the channel with multiple clients on the same server in the
same way as before with TCP channel. With IPC every client must have a
different 'portName', so you cannot deliver a common config file for the
clients as before.
 
[cut]
Thats wrong.
Just have a look to string.GetHashCode(). If you could not rely on the
returned hash you could never use a string as a key in hash tables!

I'm afraid that you are wrong and Jon is right.

The obvious proof is that there are only 2^32 hashcodes and there are an
infinite number of strings so it is clearly impossible to associate every
string with a unique hashcode.

I suggest that you read up on hashtable implementation.

The reason that you SHOULD override GetHashCode is that if you don't the
performance will be poor.
In particular if GetHashCode always returns a constant then it will be
equivalent to linear search.

The default implementation will be particularly poor in .NET because unlike
C++ the framework can't just use the address of the object since this is
liable to change.
 
[snip]
The obvious proof is that there are only 2^32 hashcodes and there are an
infinite number of strings so it is clearly impossible to associate every
string with a unique hashcode.

So you will say using Hashtable is basically unreliable.
I think this discussion starts to be quite academic (2^32!)...
I suggest that you read up on hashtable implementation.

The reason that you SHOULD override GetHashCode is that if you don't the
performance will be poor.
In particular if GetHashCode always returns a constant then it will be
equivalent to linear search.

The default implementation will be particularly poor in .NET because unlike
C++ the framework can't just use the address of the object since this is
liable to change.

Please read the documentation to object.GetHashcode():

[...]
The default implementation of GetHashCode does not guarantee uniqueness or
consistency; therefore, it must not be used as a unique object identifier for
hashing purposes. Derived classes must override GetHashCode with an
implementation that returns a unique hash code. For best results, the hash
code must be based on the value of an instance field or property, instead of
a static field or property.
[...]

There are only MUSTs, NO SHOULDs.
And this is what I initially wanted: To get a unique hash code for
MulitcastDelegate (as in .NET 1.1).

To turn this discussion to a better end: What would you suggest to solve my
problem:
I have a couple of delegates, holding references to different functions.
I have to store these delegates to a hashtable.
Today I use MulticastDelegate.GetHashCode() to identify the delegate (in
1.1) or MulticastDelegate.Method.Name (in 2.0).
Is this not allowed anymore? Do I have to change all the parts of my code
where hashs or strings or ... are used as keys in hashtables to GUIDs?

Best regards,
Stefan.
 
Stonesource said:
According to the MSDN documentation 'GetHashCode is suitable for use in
hashing algorithms and data structures like a hash table'. That is what fails
now: One cannot identify the delegate by the hash it returns. So adding the
delegate to a hash table using this hash will possibly overwrite a previously
added entry.

No it won't, because hashtables don't work that way.
Thats wrong.
Just have a look to string.GetHashCode(). If you could not rely on the
returned hash you could never use a string as a key in hash tables!

Yes you could, because hashtables don't work the way you think they do.

It's perfectly possible for two different strings to return the same
hashcode - otherwise you couldn't have more than 2^32 different
possible strings!
Look again in the MSDN documentation. There it is stated that you cannot
rely on the DEFAULT implementation of the GetHashCode, but derived
implementations 'must override GetHashCode with an implementation that
returns a unique hash code'. MulticastDelegate.GetHashCode() is a derived
implementation.

The documentation in MSDN is wrong, and *has* to be wrong by the pigeon
hole principle.

Consider the strings "0", "1", "2" etc, including the negative numbers,
covering every single integer in the range of Int32. According to you
(and the documentation) each of those must give a unique hash code.

Now, I hope you can see that that would use up all the possible hash
codes - so what hash code could "x" give?
It is a bug in the framework. A hash what cannot be used as a hash is no
hash...

It can be used as a hash. It can't be used as an identifier. There's a
difference.
The other bug is related to the new IPC channel. In short: It is not
possible to use the channel with multiple clients on the same server in the
same way as before with TCP channel. With IPC every client must have a
different 'portName', so you cannot deliver a common config file for the
clients as before.

And is that definitely a .NET bug, or just a limitation in IPC itself?
 
Stonesource said:
So you will say using Hashtable is basically unreliable.

No. We're saying that you don't understand hashtables properly.

Hashtables use hashcodes to quickly find the set of possible matching
keys. They then use Equals to determine which of the keys is *actually*
involved.
I think this discussion starts to be quite academic (2^32!)...

In that case you've missed the point. Do you think the strings should
know what values are already in the hashtable? How many possible
different strings do you think there are?

I suggest you try to imagine what an implementation of
System.Int64.GetHashcode would look like if your idea about hashcodes
was correct. There are 2^64 different Int64 values, and 2^32 possible
return values from GetHashcode. I await your suggested implementation
with interest.
Please read the documentation to object.GetHashcode():

Yes, the documentation is wrong. I'll make an appropriate complaint and
try to get it fixed. That happens sometimes - in the .NET v1.1
documentation, for instance, System.Decimal was talked about as a fixed
point type, when it's actually a floating point type.

To turn this discussion to a better end: What would you suggest to solve my
problem:
I have a couple of delegates, holding references to different functions.
I have to store these delegates to a hashtable.

As keys or values?
Today I use MulticastDelegate.GetHashCode() to identify the delegate (in
1.1) or MulticastDelegate.Method.Name (in 2.0).
Is this not allowed anymore? Do I have to change all the parts of my code
where hashs or strings or ... are used as keys in hashtables to GUIDs?

Just use the delegates themselves as keys. They implement Equals
appropriately, so it'll be fine.
 
Stonesource said:
[snip]
The obvious proof is that there are only 2^32 hashcodes and there are an
infinite number of strings so it is clearly impossible to associate every
string with a unique hashcode.

So you will say using Hashtable is basically unreliable.
I think this discussion starts to be quite academic (2^32!)...

No - it works perfectly well if you follow the rules.

Re 2^32 being academic - would it make it simpler if were to ask how you
propose to generate a unique int for every long.
I suggest that you read up on hashtable implementation.

The reason that you SHOULD override GetHashCode is that if you don't the
performance will be poor.
In particular if GetHashCode always returns a constant then it will be
equivalent to linear search.

The default implementation will be particularly poor in .NET because
unlike
C++ the framework can't just use the address of the object since this is
liable to change.

Please read the documentation to object.GetHashcode():

[...]
The default implementation of GetHashCode does not guarantee uniqueness or
consistency; therefore, it must not be used as a unique object identifier
for
hashing purposes. Derived classes must override GetHashCode with an
implementation that returns a unique hash code. For best results, the hash
code must be based on the value of an instance field or property, instead
of
a static field or property.
[...]

Don't tell me that you are shocked to find that microsoft don't document
stuff properly.

Fortunately most programmers have done enough computer science to know how
hash tables work so they just ignore the rubbish about "unique" hash codes.
There are only MUSTs, NO SHOULDs.
And this is what I initially wanted: To get a unique hash code for
MulitcastDelegate (as in .NET 1.1).

To turn this discussion to a better end: What would you suggest to solve
my
problem:
I have a couple of delegates, holding references to different functions.
I have to store these delegates to a hashtable.
Today I use MulticastDelegate.GetHashCode() to identify the delegate (in
1.1) or MulticastDelegate.Method.Name (in 2.0).
Is this not allowed anymore? Do I have to change all the parts of my code
where hashs or strings or ... are used as keys in hashtables to GUIDs?

Best regards,
Stefan.

I still don't understand what your problem is:

You CAN use delegates as keys in hashtables BUT you must not alter them
after adding.
Two delegates with the same target and invocation lists will be Equals() and
will have identical hash codes. They will not necessarily be
Object.ReferenceEquals()
 
Thank you both for the discussion.
I agree with you - using the hash as identifier is wrong. I think I have to
check my code for that...

Some notices:
Initially I did not complain the hash itself but the difference of
MulticastDelegate.GetHashCode() between 1.1 and 2.0. In 1.1 I get different
hashs for delegates with different methods, in 2.0 I get the same hash for
every delegate regardless of which method is internally referenced to... So
at least the implementation of that function has changed with 2.0. And if you
say 'This is a feature not a bug': How can I know that this has changed?
You also say one should not rely on the MS documentation. Would you please
give me a hint where I can find all the source codes of MS to look what is
really done before using it ;-)
Regarding the IPC problem: If this is a limitation of the IPC channel and it
is NEVER and NOWHERE documented: I call it a bug. MS always place emphasis
on the fact that IPC is the perfect substitution for remoting applications
using TCP where server and clients only run on the same machine. (BTW: TCP
channel has also a very annoying bug, but this is another story..)

Thank you again,
Stefan.
 
Stonesource said:
Thank you both for the discussion.
I agree with you - using the hash as identifier is wrong. I think I have to
check my code for that...

Some notices:
Initially I did not complain the hash itself but the difference of
MulticastDelegate.GetHashCode() between 1.1 and 2.0. In 1.1 I get different
hashs for delegates with different methods, in 2.0 I get the same hash for
every delegate regardless of which method is internally referenced to... So
at least the implementation of that function has changed with 2.0. And if you
say 'This is a feature not a bug': How can I know that this has changed?

You shouldn't care - that's the point. For instance, the implementation
of String.GetHashCode has also changed between 1.1 and 2.0 - but so
long as hashcodes are only used within the same process, and only in
terms of "if the hashcodes are different, the objects are different",
it shouldn't matter to you.
You also say one should not rely on the MS documentation. Would you please
give me a hint where I can find all the source codes of MS to look what is
really done before using it ;-)

The docs are usually reliable, but there are bound to be problems,
basically. It's a shame MS doesn't open up the implementation of the
framework, but until they do you have to rely on the documentation
unless it violates reason (which in this case it does) or experience
(ditto).
Regarding the IPC problem: If this is a limitation of the IPC channel and it
is NEVER and NOWHERE documented: I call it a bug. MS always place emphasis
on the fact that IPC is the perfect substitution for remoting applications
using TCP where server and clients only run on the same machine. (BTW: TCP
channel has also a very annoying bug, but this is another story..)

I don't know enough about IPC to say for sure, but hopefully someone
could comment in a networking group.

Jon
 
[snip]
It's a shame MS doesn't open up the implementation of the
framework, but until they do you have to rely on the documentation
unless it violates reason (which in this case it does) or experience
(ditto).

It isn't a shame. It is a GOOD thing. Relying on implementation rather than
(admittedly incorrect) documentation is what caused the OP to go wrong in
the first place and has caused MS lots of expensive grief in the past.

Imagine how he would have complained if he had seen the implementation:
"MulticastDelegate.GetHashCode() used to return 42 (for example) but now it
does completely different stuff with targets and methods and everything!"
 
Nick Hounsome said:
It isn't a shame. It is a GOOD thing. Relying on implementation rather than
(admittedly incorrect) documentation is what caused the OP to go wrong in
the first place and has caused MS lots of expensive grief in the past.

How would the implementation being open have made that situation worse?
Imagine how he would have complained if he had seen the implementation:
"MulticastDelegate.GetHashCode() used to return 42 (for example) but now it
does completely different stuff with targets and methods and everything!"

The effect would have been exactly the same thing.

The difference is that on platforms where the source is available
(Java, for example) it's simple to debug through the source when
necessary.
 
Jon Skeet said:
How would the implementation being open have made that situation worse?

For you and I it wouldn't but it has been proved time after time that people
WILL write code that relies on implementation detail rather than
specification and then complain when MS changes the implementation and blame
MS rather than themselves. Furthermore, if these people develop a succesful
product which then fails because of the "bug" then millions of end users
have a bad impression of MS.

I don't even like MS and I feel sympathy for them :-)
The effect would have been exactly the same thing.

No. Irrational as it may be I believe that he would have been even more
convinced that it was a bug.
The difference is that on platforms where the source is available
(Java, for example) it's simple to debug through the source when
necessary.

That is only useful if you understand that it is an error to rely on the
implementation rather than the spec and the OP obviously did not.

Another problem is that it would only encourage the efficiency freaks who
would go through every implementation and rewrite their code to maximally
exploit a method's (current) implementation.
 
Nick Hounsome said:
For you and I it wouldn't but it has been proved time after time that people
WILL write code that relies on implementation detail rather than
specification and then complain when MS changes the implementation and blame
MS rather than themselves. Furthermore, if these people develop a succesful
product which then fails because of the "bug" then millions of end users
have a bad impression of MS.

But that's exactly what happened here.
No. Irrational as it may be I believe that he would have been even more
convinced that it was a bug.

I'm not sure that "very convinced it's a bug" is significantly
different from "convinced it's a bug", to be honest. I agree that there
may be a slightly more uphill battle, but I don't think it would
actually be a really significant difference.
That is only useful if you understand that it is an error to rely on the
implementation rather than the spec and the OP obviously did not.

But he didn't despite not having the source code for the implementation
in the first place.
Another problem is that it would only encourage the efficiency freaks who
would go through every implementation and rewrite their code to maximally
exploit a method's (current) implementation.

I'd be more than willing to take the trade-off there, having been stung
by bugs in the past which can be worked round once one understands the
implementation.

(Besides, every so often, there's a situation where rewriting the code
to take advantage of a current implementation is the *right* thing to
do - if you have very tight control over the platform version, etc.
It's rare, but it happens.)
 
Back
Top