something better than String.getHashCode

  • Thread starter Thread starter buu
  • Start date Start date
B

buu

I'm using getHashCode sub from String object, but sometimes it gives me
duplicate values for different strings.
Difference is always when there are some numbers in strings.

Is there any better/new getHash code function or any way to do some better
hashing?
 
buu said:
I'm using getHashCode sub from String object, but sometimes it gives me
duplicate values for different strings.

Yes, it will.
Difference is always when there are some numbers in strings.

Is there any better/new getHash code function or any way to do some better
hashing?

Hashing will always give duplicate values when there are more potential
"source" values than "hash" values (as there clearly are in the case of
strings hashing to any fixed size hash).

If you want to use a hash for integrity checking etc you should use the
classes derived from System.Security.Cryptography.HashAlgorithm - but
if it's just for a hash table or similar structure, then GetHashCode
should be absolutely fine.

What's your actual use case?
 
Jon Skeet said:
Yes, it will.


Hashing will always give duplicate values when there are more potential
"source" values than "hash" values (as there clearly are in the case of
strings hashing to any fixed size hash).

If you want to use a hash for integrity checking etc you should use the
classes derived from System.Security.Cryptography.HashAlgorithm - but
if it's just for a hash table or similar structure, then GetHashCode
should be absolutely fine.

What's your actual use case?

--
Jon Skeet - <[email protected]>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon_skeet
C# in Depth: http://csharpindepth.com

I have a collection of Strings (names & addresses together) wich I found to
be best to search at the way to create a hashCode of that string and to use
as a key in a dictionary.
It's good, because it's fast, but it gives me duplicate hashCodes
sometimes...
 
buu said:
I have a collection of Strings (names & addresses together) wich I found to
be best to search at the way to create a hashCode of that string and to use
as a key in a dictionary.
It's good, because it's fast, but it gives me duplicate hashCodes
sometimes...

You should use the string itself as the key in the dictionary. That's
the whole point of dictionaries! The dictionary is going to take the
hash code and use that itself (without assuming it's going to be
unique) - why do you feel the need to take a hash yourself to start
with?
 
Buu,

I don't know if you use this in a kind of batch process, otherwise remember
that any interaction with a user will const much more time then you can win
with any optimalizing of methods you describe now. A user will almost for
sure not see the by you gained time by this.

Cor
 
Jon Skeet said:
You should use the string itself as the key in the dictionary. That's
the whole point of dictionaries! The dictionary is going to take the
hash code and use that itself (without assuming it's going to be
unique) - why do you feel the need to take a hash yourself to start
with?

--
Jon Skeet - <[email protected]>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon_skeet
C# in Depth: http://csharpindepth.com

You're right...
but, what if I want to check if some name & address exist, and by that
'hash-key' there are some entries in dictionary wich are same by the key,
but not same by the content?
 
You're right...
but, what if I want to check if some name & address exist, and by that
'hash-key' there are some entries in dictionary wich are same by the key,
but not same by the content?

Then you need to have a dictionary which goes the other way. Basically
you'll need a dictionary for each kind of key you want to look things
up by.

Jon
 
Back
Top