Object identity

  • Thread starter Thread starter Stephan Keil
  • Start date Start date
S

Stephan Keil

Hi all,

I am a novice with .NET and I am wondering if there is something like an
"identity value" of an object. I mean something like the object's address
in C++ or C, i.e. a fixed unique value per object, which can be used e.g.
to put objects in an associative container (a hash value is not an
alternative as it is not fixed during the object's life time).
I know that the garbage collector moves objects around in the memory, so
the gc pointer cannot be used (at least as long as it is not "pinned"). Is
there anything like a fixed pointer in .NET?

Thx & regards,

Stephan
 
object.GetHashCode() -
"serves as a hash function for a particular type, suitable for use in
hashing
algorithms and data structures like a hash table"
 
Stephan,
i.e. a fixed unique value per object, which can be used e.g.
to put objects in an associative container

Can't you store the object reference itself?

I know that the garbage collector moves objects around in the memory, so
the gc pointer cannot be used (at least as long as it is not "pinned").

When that happens all references are updated.


Mattias
 
Wow, that's a lot of answers in a very short time. Thanks to all.
Can't you store the object reference itself?

It's a general question, I don't have a particular problem to solve. But in
C++ e.g. I sometimes use a STL set with object pointers to track, which
objects I have already seen (a STL set is an associative container which is
typically implemented by a sorted and balanced binary tree for efficient
look up and manipulation).
E.g. (sorry for posting C++ code):

class Y { ... };

class X {
std::set<Y*> m_processed; // keep track of already processed Ys
public:
// ...
void ProcessY(Y* y) { // process special Y object, if not already done
if (m_processed.find(y) != m_processed.end()) {
return; // already processed
}
// ... process y ...
m_processed.insert(y); // remember y
}
};

How, e.g. does serialization work in .NET? During serialization, the
already serialized objects must be remembered by identity in some data
structure to prevent cycling (or they must be marked, which is only
possible with runtime support). How is _efficient_ identity lookup possible
here?

- Stephan
 
object.GetHashCode() -

Now, after re-reading the documentation of GetHashCode(), I am totally
confused :-o
I will open another thread for this.

Let me describe in an example, which kind of problems I would like to
attack: Suppose you have a container with - say - 1 million object
references. Now you get another object reference and the job is to
efficiently find out whether this particular object is already contained
(by identity) in the container (side note: the objects could have changed
their state, after they've been inserted into the container; for my
understanding this makes the GetHashCode() useless for this problem).

You could iterate the whole container and check for identity equality with
each container element. This results in 1 million comparisons!

If there is something like a fixed object identity value with a total order
(like the pointers in C++), you could sort the container by that value and
make a binary search. The "contained check" would then be possible with
approx. 20 comparisons!

Thus, a fixed object identity could be of great value, but I've got to
admit, that I don't know how the .NET framework could make such an identity
available without overhead.

- Stephan
 
If these are your own classes, use a Guid to uniquely identify them and
initialize the Guid at the time the object is constructed. If you
override the GetHashCode() method to generate a hash code based on the
value of the Guid, all the standard .Net container classes will function
as expected.

-Ted
 
If these are your own classes, use a Guid to uniquely identify them

What if not? Sorry for being pertinacious, I just want to know if .NET has
something to offer to generally solve these issues (e.g. how does the .NET
serialization mechanism prevent cycles?).

- Stephan
 
Hello Stephan,

it uses the reference to identify the objects. Like Mathias stated in an
earlier post.
 
Hi Vadym,
Do not use Hash - most of the time it is ok, but if for example you have a
long (64 bit) multiple values will map to the same Hash(32 bit) and it cannt
be guaranteed that the same wont happen for other objects

hth

guy
 
I think you're going to have a problem however you want to do this.

Let me just talk my way through this and I'll try to explain what I mean.

First, you need to add all those objects to the container in such a way
that they can be quickly retrieved. Let's say you have something of an
"identity" value for each of those objects.

This identity value would either have to be:
- based on the values in the object
- a unique value not related to the contents of the object

Let's go with the first option and let's use the hash code of the object
as returned by GetHashCode() as this value. If, after adding an object
to a hash table, you modify the object in such a way that the hash code
returned by GetHashCode() for that object is now different from the one
used when the object was added, then yes, you have a problem.

In this context, you cannot modify the values of the keys in such a way
that the hash code changes. Put differently, the keys should be
immutable, if not enforcable then at least you should treat them as such
and not use them.

The bonus of this is that if you later on construct a wholy new object
with the same values internally as an object already present in the hash
table, then those two objects should return the same hash code and thus
you could easily detect that the object values are already in another
object in your container.

This is, as I see it, what you want.

Now, let's go with the other option. Make each object have a unique
value unrelated to the contents of the object.

This would of course make it possible for you to change the contents of
the object without messing up the hash table, as the hash code you would
use in the hash table is still the same as the original one.

However, when you later on construct a new object with the same values
as an existing object in the hash table, this new object gets a new,
unique value that will not be found in the hash table. Or, if it is
found, chances are it's not the object you're interested in, ie. another
object just happen to have that particular hash code.

Another option would be to base the hash code on the values from the
object and then cache it so that subsequent calls to GetHashCode() would
return the same value even if the contents of the object has changed.

This is also problematic regarding a hash table as it will use an
equality test once it finds the hashed values in the table to determine
which one in particular you want, and if the contents have changed...

Basically, it all comes down to one thing, the keys in the hash table
should never change. If they do then you need to take the key+value out
of the hash table and re-add it with the new key. Anything else won't work.
 
Some typos. I still need to regulate my coffee intake it seems :P

Lasse Vågsæther Karlsen wrote:
In this context, you cannot modify the values of the keys in such a way
that the hash code changes. Put differently, the keys should be
immutable, if not enforcable then at least you should treat them as such

.... if not enforcable by the compiler/runtime then at least ...
and not use them.

... and not change them.
<snip>
 
Back
Top