Not quite. For one thing I think there's an upper limit of about 1GB
per object (including the entries array) in the CLR. I've also found
that for some reason (using a Dictionary<int,int> to avoid creating
any extra objects) the memory per entry averages at around 40 bytes
rather than 16. No idea why at the moment - 16 would make sense to me
as well, although there's bound to be *some* extra overhead for the
buckets etc. Possibly alignment issues? Hmm...
Jon
I just Reflector'd it. In addition to the key and the value the hash
code (int) and a next pointer (int) are also stored on each entry.
That would account for 16B. There is also an int[] array that is used
for indexing the entries array. That gets us to 20B. The resizing
algorithm uses the first prime number greater than twice the
hashtable's current size and uses that for the new size. That would
easily get you to 40.
Since we're discussing very large dictionaries I noticed that the
prime table is preloaded up to 7,199,369 (which is not to say that all
primes up to that value are included). Once it reaches that size the
next resize operation will start using the naive algorithm for
primality testing, but will *probably* yield 14,398,753, 28,797,523,
and 57,595,063. That gives us jumps to 288, 576, and 1152 MB
respectively. If the 1GB limit is indeed in play for arrays then we
might expect ~28 million to be the upper limit. Pure speculation.