Serialization takes to long. Any suggestions?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have binary datafiles that I read and write from a stream that is created
through Serialization/DeSerialization. These data files are very large and
it's taking to much time to deserialize the stream.

I have already implemented the SerializationInfo/StreamingContext
Constructors and the GetObjectData and have also used OnDeserialization. But
it still takes to much time to Deserialize the stream. Can anyone suggest
something else that can be done to speed up deserialization?

I am now at the point to break the full Serialzation/Deserialization by not
adding/getting some of the lower objects and read/write those into some
additional binary flat files. If I do this I was wondering if I can simply
perform this functionality from within SerializationInfo Constructor and the
GetObjectData method. Has anyone tried this? Would if be faster if I took
this out of the binary flat files and moved it all to Sql Server?
 
You'll have to explain more about these "files" and what they are serialized
from.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

What You Seek Is What You Get.
 
The binary file is a serialization. The serialization contains objects
containing objects 5 deep. The various objects contain ArrayLists of
objects, HashTable of Objects and of course value fields. The Serialization
and DeSerialization works fine. It just takes to long to Deserialize mainly
and even the Serialization is a bit slow but not terribly bad. A Test file
is about 31 megs, they will assuredly get bigger. If I can somehow speed up
the Deserialization then I would have no problem. I am already using the
SerializationInfo Constructor and using the GetObjectData method to
Deserialize and Serialize and I am using OnDeserialization to do anything
else I may need to do (which is very very little) after Deserialization. Can
something else be done to speed Deserialization?

What I am thinking of doing is only to Serialize a portion of the "call
graph" ( think that is the correct term). The part of the "call graph" I do
not serialize I am going to place into a different binary data file that is
not loaded and unloaded using Serialization techniques. If I have to do
this, do you think it is ok to appropriately place the calls to store the
other data within the GetObjectData method for Serialization? I will place
the other needed functionality for data loading into the OnDeserialization
method, but will the OnDeserialization methods be called after all the
SerailzationInfo constructors for the entire "call graph" are called and
creating all the objects as well?


--
Ed Reyes



Kevin Spencer said:
You'll have to explain more about these "files" and what they are serialized
from.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

What You Seek Is What You Get.
 
I would remove anything that is not needed. Maybe serialize to a file, and
review the data.
Adjust scope, (Makes things private/internal/friend when you can). Binary
serilization includes privates...
Avoid any duplicate data.

Another thing to watch for is any exceptions raised/caught when
serialized/deserialized this can really slow it down..
Your transport protocol makes a difference also, some us compression.

That's all I can think of.

Schneider



E said:
The binary file is a serialization. The serialization contains objects
containing objects 5 deep. The various objects contain ArrayLists of
objects, HashTable of Objects and of course value fields. The Serialization
and DeSerialization works fine. It just takes to long to Deserialize mainly
and even the Serialization is a bit slow but not terribly bad. A Test file
is about 31 megs, they will assuredly get bigger. If I can somehow speed up
the Deserialization then I would have no problem. I am already using the
SerializationInfo Constructor and using the GetObjectData method to
Deserialize and Serialize and I am using OnDeserialization to do anything
else I may need to do (which is very very little) after Deserialization. Can
something else be done to speed Deserialization?

What I am thinking of doing is only to Serialize a portion of the "call
graph" ( think that is the correct term). The part of the "call graph" I do
not serialize I am going to place into a different binary data file that is
not loaded and unloaded using Serialization techniques. If I have to do
this, do you think it is ok to appropriately place the calls to store the
other data within the GetObjectData method for Serialization? I will place
the other needed functionality for data loading into the OnDeserialization
method, but will the OnDeserialization methods be called after all the
SerailzationInfo constructors for the entire "call graph" are called and
creating all the objects as well?
 
I'm not sure what to recommend. At 31 MB of data, serialization is going to
take some time. It will take less if the objects are less complex. For
example, it will take less time to serialize/deserialize a single 31 MB
Image than a Collection of 31 1MB Images. I also noticed that you're using
ArrayLists and HashTables. Using strongly-typed Collections would probably
improve the speed of serialization/deserialization, as it would not entail
the use of reflection as much during the process. And HashTables are
definitely going to slow down performance. An MSDN Magazine online article
(http://msdn.microsoft.com/msdnmag/issues/02/07/net/) states, regarding
binary serialization of HashTables:

"Occasionally, you will design a type that requires complete control over
how it is serialized and deserialized. The System.Collections.Hashtable type
is just such a type. When serialized, a Hashtable object and all the objects
it references must be written to the stream. Upon deserialization, a new
Hashtable object must be constructed, all the objects managed by the
Hashtable object must be constructed, and all the object references must be
set correctly. The problem is that hash codes are not guaranteed to be the
same for the newly deserialized objects. So deserializing a Hashtable
requires that all the objects be deserialized first and then each of the
objects must be manually added to the Hashtable object using each object's
new hash code value."

That's about all I can think of.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

What You Seek Is What You Get.


E said:
The binary file is a serialization. The serialization contains objects
containing objects 5 deep. The various objects contain ArrayLists of
objects, HashTable of Objects and of course value fields. The
Serialization
and DeSerialization works fine. It just takes to long to Deserialize
mainly
and even the Serialization is a bit slow but not terribly bad. A Test
file
is about 31 megs, they will assuredly get bigger. If I can somehow speed
up
the Deserialization then I would have no problem. I am already using the
SerializationInfo Constructor and using the GetObjectData method to
Deserialize and Serialize and I am using OnDeserialization to do anything
else I may need to do (which is very very little) after Deserialization.
Can
something else be done to speed Deserialization?

What I am thinking of doing is only to Serialize a portion of the "call
graph" ( think that is the correct term). The part of the "call graph" I
do
not serialize I am going to place into a different binary data file that
is
not loaded and unloaded using Serialization techniques. If I have to do
this, do you think it is ok to appropriately place the calls to store the
other data within the GetObjectData method for Serialization? I will
place
the other needed functionality for data loading into the OnDeserialization
method, but will the OnDeserialization methods be called after all the
SerailzationInfo constructors for the entire "call graph" are called and
creating all the objects as well?
 
E said:
I have binary datafiles that I read and write from a stream that is created
through Serialization/DeSerialization. These data files are very large and
it's taking to much time to deserialize the stream.

I have already implemented the SerializationInfo/StreamingContext
Constructors and the GetObjectData and have also used OnDeserialization. But
it still takes to much time to Deserialize the stream. Can anyone suggest
something else that can be done to speed up deserialization?

I am now at the point to break the full Serialzation/Deserialization by not
adding/getting some of the lower objects and read/write those into some
additional binary flat files. If I do this I was wondering if I can simply
perform this functionality from within SerializationInfo Constructor and the
GetObjectData method. Has anyone tried this? Would if be faster if I took
this out of the binary flat files and moved it all to Sql Server?

Two suggestions:

1. Try to split the data in multiple partitions that are
loaded/deserialized on demand.

2. Don't use serialization at all. I don't believe that serialization is
optimized to perform the task of mass data storage. Change your
persistence implementation to something like db4o (www.db4o.com). It can
handle object hierachies and searching in your data data better.

my 0.02$,
Wolfgang
-
 
Back
Top