Fast deserialisation of strings from byte[]

Guest · Sep 5, 2005

I have an application that performs custom deserialisation of object state
from byte arrays. This happens very regularly, so needs to be fast. In
addition, most of the strings repeat, meaning I'm deserialising the same
sequence of bytes repeatedly, giving the same output string. Let's ignore
the text encoding method, as it's not relevant to my question.

Right now, I'm using BinaryReader.ReadString() which gives the correct
result, however it creates a new instance of System.String for each byte
sequence. What I'd really like is to detect the repeated byte sequence, and
return a reference to an existing deserialised version.

A colleague put me onto string.Intern, but this won't help as by the time
I'm calling that method, I've already allocated the string.

Note that these strings are very short lived. After deserialisation, they
will be processed and (for the most part) garbage collected before they get
promoted to generation 1. This happens several thousand times a second under
normal conditions, giving the garbage collector (what I assume is) a lot of
work. I'm seeing the classic sawtooth pattern in a heap timeline but with
very high frequency.

I'd like to know whether this is a situation in which I can improve
performance. I can envisage some sort of structure (perhaps a Trie) that
hones in on the stored string as we progress through the byte sequence.
However this structure cannot be pre-populated (the strings will be
determined at runtime).

The big question is: do the benefits of reducing string allocation justify
the overhead in finding a stored string? This, no doubt, depends upon the
implementation.

There may also be knock-on benefits from knowing strings having the same
value are identical objects (eg. object.ReferenceEquals rather than
object.Equals), but this is secondary.

This seems to me a great performance question. I hope others find it as
interesting as I do and will share their ideas and experience.

Regards,

Drew Noakes.

Cor Ligthert [MVP] · Sep 5, 2005

Drew,

I after reading your question twice is the answer in the first section of
your question.

A string is in Net never mutable. It will forever been build new even with
the slightest change.

The only "string-like" is a stringbuilder which is a kind of collection of
characters, however maybe can that help you.

http://msdn.microsoft.com/library/d...ml/frlrfsystemtextstringbuilderclasstopic.asp

Be aware that the description is wrong. There cannot be a mutable string. In
the remarks it is written right.

I hope this helps,

Cor

Guest · Sep 5, 2005

Hi Cor,

Thanks for your prompt response. I'm aware of the behaviour of strings with
regards to mutability, but this issue is different. Perhaps I didn't explain
myself clearly enough. I simply do not want to instantiate two different
string objects that have the same value.

Therefore, when I'm stepping through the byte[], the first time I see a
given pattern I would create the string and store it. The next time I see
the same pattern, I'll return a reference to the string I have stored. This
avoids the overhead of having two strings on the heap that have identical
values.

Bear in mind that I'm talking about doing this many many times a second, to
a point where I believe there is a performance gain to be reaped from this
added complexity.

Regards,

Drew.

Cor Ligthert [MVP] · Sep 5, 2005

Drewnoakes,

This sounds to the hashtable (dictionary) however I doubt that it will be
giving you benefits.

The bytepatern in the key and the string in the value.

http://msdn.microsoft.com/library/d...frlrfSystemCollectionsHashtableClassTopic.asp

I hope however that it helps anyhow.

Cor

Guest · Sep 5, 2005

Hi Cor,

If I use a Hashtable, I must create a new byte[] which in turn is another
object allocation. I wish to achieve this lookup without allocating any
object on the heap.

Drew.

Cor Ligthert [MVP] · Sep 5, 2005

Drewnoakes,

Are you sure of that, the hashtable holds objects, not values.

http://msdn.microsoft.com/library/d...rfsystemcollectionshashtableclassaddtopic.asp

I hope this helps,

Cor

Guest · Sep 6, 2005

Hi Cor,

Keying a hash table on byte[] will not reduce my memory overhead. Besides,
I still have to allocate an object (byte[] is an object, not a value-type)
before I can look up the string in the hashtable.

Drew.

.neter · Sep 6, 2005

Cor said:
Drewnoakes,

Are you sure of that, the hashtable holds objects, not values.

http://msdn.microsoft.com/library/d...rfsystemcollectionshashtableclassaddtopic.asp

I hope this helps,

Cor

Well, a raw byte[] cannot be used as a key. But you can calculate an
integer hash value for the byte sequence and use that hash as a key in a
normal Hashtable.
But as someone has previously said, this will not necessarily improve
the performance.

Best regards
RG

Simple Q re Deserialisation	1	Jun 21, 2004
Direct Print c# VS 2008	0	May 19, 2010
Deserializing from string	2	Jan 22, 2007
concatanate string and byte array	3	May 3, 2007
vb.net:converting from byte array to string and back again	4	Apr 8, 2006
Self-deserialising object	1	Aug 28, 2003
Testing for a string of zero bytes	4	Apr 30, 2009
convert string to byte array	13	Sep 16, 2008

Fast deserialisation of strings from byte[]

Guest

Cor Ligthert [MVP]

Guest

Cor Ligthert [MVP]

Guest

Cor Ligthert [MVP]

Guest

.neter

Ask a Question

Similar Threads