C# DIctionary size limit

O

orsula

Hi Guys,

I have a class A composed of several string and int members.
I would like to manage a huge amount (several thousands) of A objects
in a dictionary where each object has its unique key.

Is there any known limitation or best practice that you think I should
consider?

Thanks,
Orsula.z
 
J

Jon Skeet [C# MVP]

I have a class A composed of several string and int members.
I would like to manage a huge amount (several thousands) of A objects
in a dictionary where each object has its unique key.

Is there any known limitation or best practice that you think I should
consider?

"Several thousand" is actually reasonably small. Basically if you've
got enough room in memory for the objects and a bit of overhead, the
dictionary should be fine.

Jon
 
C

Cor Ligthert[MVP]

Orsula,

Be aware that as with every system collection, you are only adding the
references of the objects, not the values itself, therefore a dictionary can
be in fact very small.

Cor
 
O

orsula

"Several thousand" is actually reasonably small. Basically if you've
got enough room in memory for the objects and a bit of overhead, the
dictionary should be fine.

Jon

Thanks (again :) ) Jon
 
A

Arne Vajhøj

orsula said:
I have a class A composed of several string and int members.
I would like to manage a huge amount (several thousands) of A objects
in a dictionary where each object has its unique key.

Is there any known limitation or best practice that you think I should
consider?

No.

My understanding is that would be able to stuff 2GB/16B = 125 million
of entries in a Dictionary<string,A> on 32 bit Windows.

Arne
 
J

Jon Skeet [C# MVP]

No.

My understanding is that would be able to stuff 2GB/16B = 125 million
of entries in a Dictionary<string,A> on 32 bit Windows.

Not quite. For one thing I think there's an upper limit of about 1GB
per object (including the entries array) in the CLR. I've also found
that for some reason (using a Dictionary<int,int> to avoid creating
any extra objects) the memory per entry averages at around 40 bytes
rather than 16. No idea why at the moment - 16 would make sense to me
as well, although there's bound to be *some* extra overhead for the
buckets etc. Possibly alignment issues? Hmm...

Jon
 
C

Cor Ligthert [MVP]

A little bit difficult, because then all pairs should be referencing to null
and that is not possible.

Cor
 
B

Brian Gideon

Not quite. For one thing I think there's an upper limit of about 1GB
per object (including the entries array) in the CLR. I've also found
that for some reason (using a Dictionary<int,int> to avoid creating
any extra objects) the memory per entry averages at around 40 bytes
rather than 16. No idea why at the moment - 16 would make sense to me
as well, although there's bound to be *some* extra overhead for the
buckets etc. Possibly alignment issues? Hmm...

Jon

I just Reflector'd it. In addition to the key and the value the hash
code (int) and a next pointer (int) are also stored on each entry.
That would account for 16B. There is also an int[] array that is used
for indexing the entries array. That gets us to 20B. The resizing
algorithm uses the first prime number greater than twice the
hashtable's current size and uses that for the new size. That would
easily get you to 40.

Since we're discussing very large dictionaries I noticed that the
prime table is preloaded up to 7,199,369 (which is not to say that all
primes up to that value are included). Once it reaches that size the
next resize operation will start using the naive algorithm for
primality testing, but will *probably* yield 14,398,753, 28,797,523,
and 57,595,063. That gives us jumps to 288, 576, and 1152 MB
respectively. If the 1GB limit is indeed in play for arrays then we
might expect ~28 million to be the upper limit. Pure speculation.
 
J

Jon Skeet [C# MVP]

I just Reflector'd it.  In addition to the key and the value the hash
code (int) and a next pointer (int) are also stored on each entry.
That would account for 16B.  There is also an int[] array that is used
for indexing the entries array.  That gets us to 20B.

For some reason I didn't think that the int[] was the same length as
the main array - but I see that it is.
 The resizing
algorithm uses the first prime number greater than twice the
hashtable's current size and uses that for the new size.  That would
easily get you to 40.

Ah, I've just worked out a flaw in my test - I was only outputing the
result when the amount of memory changed, which would always give the
worst case. Printing out the best case as well shows around 20 bytes,
which now makes sense.
Since we're discussing very large dictionaries I noticed that the
prime table is preloaded up to 7,199,369 (which is not to say that all
primes up to that value are included).  Once it reaches that size the
next resize operation will start using the naive algorithm for
primality testing, but will *probably* yield 14,398,753, 28,797,523,
and 57,595,063. That gives us jumps to 288, 576, and 1152 MB
respectively.  If the 1GB limit is indeed in play for arrays then we
might expect ~28 million to be the upper limit.  Pure speculation.

Don't forget that the 1GB limit applies to individual objects AFAIK -
so only the 16 bytes per entry of the "main" array. The int[] is
separate. 57,595,063*16 = 921 521 008, which should be okay. However,
you've also got to have enough room for the old version as well, so
your whole process needs 1.5x as much memory, unless you preallocate
that capacity.

Unfortunately I don't know the exact details of all the limits, so I
won't analyse this any further, for fear of giving the wrong
information.

Jon
 
B

Brian Gideon

Don't forget that the 1GB limit applies to individual objects AFAIK -
so only the 16 bytes per entry of the "main" array. The int[] is
separate. 57,595,063*16 = 921 521 008, which should be okay. However,
you've also got to have enough room for the old version as well, so
your whole process needs 1.5x as much memory, unless you preallocate
that capacity.

Ah yes. Good points. Also in the sake of full disclosure, that
57595063 value was based on a quick statistical test for primality so
it may not even be prime for all I know. I'm sure there is a prime
somewhere in that ballpark so it's probably not going to skew the
numbers too much.
Unfortunately I don't know the exact details of all the limits, so I
won't analyse this any further, for fear of giving the wrong
information.

Understood.
 
B

Brian Gideon

A little bit difficult, because then all pairs should be referencing to null
and that is not possible.

Cor

Well, as long as TValue is a reference type then the value could be
null. I haven't actually tested that because I've never done it, but
it should be possible according to the documentation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top