serialization and circular references

  • Thread starter Thread starter Eric Eggermann
  • Start date Start date
E

Eric Eggermann

I'm having a problem with really large file sizes when serializing the
classes that describe my little document. There are some circular references
which result in the same object getting written to disk multiple times. Now
I'm using just basic serialization as described in MSDN. Clearly, I need to
stop serializing these parent references, but I do need to re-instate them
to the proper objects when de-serializing. Can anyone help me with a
strategy to do that?

TIA,
Eric Eggermann
 
References to parents are difficult. Generally you should remove all
references before you serialize, serialize the object tree and when you
deserialize, replace all the references because the object obviously won't
be the same one that was serialized.

Such as....

foreach(myDoodaa o in myObjectTree)
o.Parent=null;
//serialize...

then...

MyObjectTree myObjectTree=bf.Deserialize(..);
foreach(myDoodaa o in myObjectTree)
o.Parent=this;

etc..

If you have references to items in a deeply nested tree which refer to other
items in the tree you'll need to have some other way of identifying and
reconstructing the references such as keeping all the objects in a flat list
as-well and referring to their index in that list or something.
--
Bob Powell [MVP]
C#, System.Drawing

The November edition of Well Formed is now available.
Learn how to create Shell Extensions in managed code.
http://www.bobpowell.net/currentissue.htm

Answer those GDI+ questions with the GDI+ FAQ
http://www.bobpowell.net/gdiplus_faq.htm

Read my Blog at http://bobpowelldotnet.blogspot.com
 
snip
If you have references to items in a deeply nested tree which refer to other
items in the tree you'll need to have some other way of identifying and
reconstructing the references such as keeping all the objects in a flat list
as-well and referring to their index in that list or something.

Thanks Bob. Yeah, There's a tree, but it's not all that deep. Just three
levels, not counting two collection classes. Objects only ever refer to one
above, skipping the collection classes.

Anyway, I've been thinking about doing the custom serialization thing, and
since the references are all in a 'straight line' so to speak, each object
can restore the parent properties of it's field objects, provided it knows
when to do this. How can I tell when my top level object is deserialized ?

Eric Eggermann
 
Eric,
There are some circular references
which result in the same object getting written to disk multiple times.
Are you writing a single graph or multiple graphs to your file? In other
words are you calling Formatter.Serialize once or are you calling
Formatter.Serialize multiple times, for a given stream?

If you call Formatter.Serialize multiple times, the serializer cannot track
references to the same object, so it will serialize that object multiple
times. I find its easier & better to serialize the root graph and call
Formatter.Serialize a single time!
Clearly, I need to
stop serializing these parent references, but I do need to re-instate them
If you have a tree, all you need to do is serialize the root of the tree and
the entire tree will be serialized, if 50% of the nodes of the tree refer to
a single common object, this single common object will be serialized once.

For places where I do not want to serialize 'parent references', yet be able
to re-establish them on deserialization. I implement the ISerializable
interface and serialize an identifier that can be used to lookup the 'parent
reference' when I deserialize. Another option is implementing the
IObjectReference interface as identified in Part 2 of the following
articles. This is useful for objects that implement the Singleton pattern.

http://msdn.microsoft.com/msdnmag/issues/02/04/net/
http://msdn.microsoft.com/msdnmag/issues/02/07/net/
http://msdn.microsoft.com/msdnmag/issues/02/09/net/

Note I find all three articles invaluable when working with .NET
serialization.

Hope this helps
Jay
 
Jay B. Harlow said:
Eric,
Are you writing a single graph or multiple graphs to your file? In other
words are you calling Formatter.Serialize once or are you calling
Formatter.Serialize multiple times, for a given stream?

If you call Formatter.Serialize multiple times, the serializer cannot track
references to the same object, so it will serialize that object multiple
times. I find its easier & better to serialize the root graph and call
Formatter.Serialize a single time!
...

Thanks a lot Jay,
I've sort of solved my problem, but your post showed me where I was
wrong in diagnosing it. I was writing one graph, and did have all the refs
in a straight line down the tree. I'd first noticed the problem when using
serialization to create a deep copy of an object, which is in the center of
my tree, and it was not the parent itself, being copied too many times, but
all the siblings of my object, through the parent reference, which were
pushing up the file sizes WAAAY too much, causing the copy process to take a
very long time.

I fixed (hacked) the problem by marking all parent refs in the tree
non-serializable, re-instating the parent reference at the end of the clone
method, then implementing SaveToFile, and FromFile methods in my root
object, and then re-instating the refs again at the end of the FromFile
method. So it is working as expected.

When I save the whole tree, the files are still way too big, but the size
increases in proper proportions, and not exponentially, so that problem
belongs under another subject, and of course, I'll give tightening up the
size a good whack before posting again.

Thanks for the help.

Eric
 
Eric,
How many nodes have the parent ref?
I fixed (hacked) the problem by marking all parent refs in the tree
non-serializable, re-instating the parent reference at the end of the clone
method, then implementing SaveToFile, and FromFile methods in my root
object, and then re-instating the refs again at the end of the FromFile
method. So it is working as expected.
I would suggest you put the NonSerializedAttribute on each parent ref you do
not want serialized. Of course when you deserialize these references will be
null.

[Serializable]
class Node
{
[NonSerialized]
SomeClass parent;

SomeOtherClass data;
}

In the above, contents of the data field will be serialized with the class,
while parent will not.

Alternatively I would suggest you check out the ISerializable interface, as
I suspect that will be 'cleaner' implementation, then what you are currently
doing.

Hope this helps
Jay
 
Jay B. Harlow said:
Eric,
How many nodes have the parent ref?

Almost all, and they point up level. So under the root, there are 4 levels.
I looked for ways to do away with the parent entirely, but that isn't really
possible, without moving some routines out of the class they logically
belong in.

I'm going to have a good look at ISerializable, and I may have to make big
changes to my model anyway.

Thanks,

Eric
 
Eric,
I misstated my question.

How many node types have the parent ref?

What I'm asking is: Do you have a one or two node types, implementing
ISerializable would not be that much effort, however if you have 50 or 60
distinct node types, ISeralizable may be more effort, even with a base class
that has "just" the parent ref in it.

Hope this helps
Jay
 
Jay B. Harlow said:
Eric,
I misstated my question.

How many node types have the parent ref?

Still 4. Yeah, I can implement it. It's not so much. The model looks like
this
FlashSet(root)
CardsCollection
Card (ref to FlashSet)
Panel (ref to Card)
ElementsCollection (ref to Panel)
Element (ref to Panel)

Element is a base class for two other types.
So it's not such a big deal. Think I'll use ISerializable anyway, and then I
can easily see which bits are taking up the most space.

Eric
 
Back
Top