Effective hashing

  • Thread starter Thread starter Farouche
  • Start date Start date
F

Farouche

Hi

I would like some suggestions on how to, effectively compute a Hash Value
for a "Collection" of simple Field Objects:


internal class Field
{
private string _fieldName;
private object _fieldValue;

public Field(string fieldName, object fieldValue)
{
this._fieldName = fieldName;
this._fieldValue = fieldValue;
}

public int GetHashcode()
{
Return ????????
}
}

The HashCode for Field must ofcourse be computed based on both Name and
Value


These Field objects are stored in a derived HashTable:


internal class FieldGroup : Hashtable
{

public int GetHashCode()
{
string key;
Field fld;

foreach (int key in this.Keys) {
fld = this(key);
?????????
}
}
}


FieldGroup objects are filled with Field objects in this way:


FieldGroup fg = new FieldGroup();
string aName;
object aValue;
fg(aName) = new Field(myName, myValue);


I would like a suggestion on the most effective way to calculate a unique
HashCOde for such a FieldGroup object based on its contained Field objects.




Thanks in advance

Farouche
 
Farouche said:
public int GetHashcode()
{
return (this._fieldName.GetHashCode()
^ this._fieldValue.GetHashCode());
internal class FieldGroup : Hashtable
{

public int GetHashCode()
{
string key;
Field fld;

foreach (int key in this.Keys) {
fld = this(key);
?????????
}
}

this(key)?... don't you mean this[key]. and i don't think your foreach
loop works either, but i think i get the point.

If the ordering doesn't matter (it shouldn't on a hashtable, unless it's
linked, which your's isn't), you can use XOR too here on the keys and
the fields:

public int GetHashCode() {
int hashCode = 0;
foreach ( Object key in Keys )
hashCode = hashCode ^ key.GetHashCode() ^ this[key].GetHashCode();
return hashCode;
}

Instead of recomputing the hash for every call to GetHashCode() you can
have the hash as a member:

protected int _hashCode = 0;
public Object this[Object key] {
get { return base[key]; }
set {
// note: cunningly removed 2 times xor with key,.GetHashCode()
// if key already exists
if ( ContainsKey(key) )
_hashCode = this[key].GetHashCode();
else
_hashCode = _hashCode ^ key.GetHashCode()
base[key] = value;
_hashCode = value.GetHashCode();
}
}
public void Add(Object key, Object value) {
if ( ContainsKey(key) )
return;
base.Add(key, value);
_hashCode = _hashCode ^ key.GetHashCode() ^ value.GetHashCode();
}
public void Remove(Object key) {
if ( ContainsKey(key) ) {
Object value = base[key];
base.Remove(key);
_hashCode = _hashCode ^ key.GetHashCode() ^ value.GetHashCode();
}

and just have GetHashCode return it:

public int GetHashCode() { return _hashCode; }

This is of course only an efficient implementation if you expect to
actually invoke GetHashCode()...

And this is just a sketch, you need to work on it to make value == null
work, I suggest assigning a hashcode of 0 to null values, or simply
rejecting to accept inserts of null values.

Note that this approach has bad properties if the keys and values in the
hash has the same hashcode, or if you insert the same value twice. for
example:

FieldGroup fg = new FieldGroup();
int h1 = fg.GetHashCode();
fg[1] = 1;
int h2 = fg.GetHashCode();

will make h1 == h2.

Note that to update it on remove you simply xor again... xor has NICE
properties ;)
I would like a suggestion on the most effective way to calculate a unique
HashCOde for such a FieldGroup object based on its contained Field objects.

You probably cannot make it unique, (2^32 is a pretty small number, any
hash that small has a good number of collisions... randomly inserting
you would roughly expect 1 collision in 2^16 inserts, it's the "brithday
paradox" :)
Thanks in advance

Watch out, I may just have wasted your time :)
 
Back
Top