Weak Dictionary / Cache

  • Thread starter Thread starter Todor Todorov
  • Start date Start date
T

Todor Todorov

Hi,

I am trying to implement a weak dictionary (more like a lookup table) for
caching purposes. The idea is that when you put an object in it, both the
key and the value (and the key-value pair) are eligible for garbage
collection.

I am used to Smalltalk, and the way this is implemented in Smalltalk is to
use a key-value pair with a weak reference to the value object and strong
reference to the key object. Smalltalk also allows the system to notify YOU
(the dictionary) when a specific 3rd. object (the value object) gets
finalized. That way you can clean up any unwanted objects - or more
precisely, remove the pair object, thus making the pair and key objects
eligible for GC once the value object has been collected.

Some assumptions about my .Net objects.
* The type of value objects for the dictionary is well known.
* The key and value pair 1 to 1. Each value object knows its key.
* I can extend the value object class.
* A value object will be stored in only one cache dictionary.

I've been thinking of implementing the weak dictionary in the following way:

class WeakTable : ...
{
private Dictionary<object, Item> innerDictionary;
void Add(Item item)
{
this.innerDictionary.Add(item.Key, new WeakReference(item);
item.Owner = this;
}
Item Get(object key)
{
return this.innerDictionary[key].Target;
}
}
class Item
{
public object Key;
public WeakTable Owner;
void Finalize()
{
this.Owner.RemoveKey(this.Key);
}
}

The basic idea here is:
1. To use WeakReference as the middle-man inside the weak dictionary, so the
reference to the value object will be weak.
2. To have the value object remove itself from the dictionary when it's
GC'ed, so the pair and key objects can be GC'ed as well.

Of course the version above is very simplified without checks etc. More
methods have to be implemented as well as enumerator that handles the
middle-man WeakReference object correctly.

My questions:
- Does this make sense?
- What kind of GC related race conditions should I wary about?
- Is there anything special GC related that I've missed?
 
- Does this make sense?

It looks like you are looking to rely on the GC to maintain your cache
for you. While being lazy is always fun, I think the amount of
dependency on the GC internals you are injecting into your application
is unhealthy, and is bound to give you problems, most of which will be
trying to track down rare error situations. Your cache will function
very differently in debug and release mode, for example, since the GC
is kept on a tight leash when debugging. Also, if void Finalize() is
really meant to be a finalizer, then I think you're violating
several .NET best practices, and also making life much harder for the
runtime.

Would it really be much extra work writing your own garbage collector,
searching for trash only in your cache?
 
We've used this approach in Smalltalk for over 10 years now without
problems.

Why do you think I should write my own GC when I can have it for free? In
waht way can I do it better than the CLR?

Also, which best practeces am I violating? Wear referencing part of the CLR,
why not use it? Also, except for timing and threading, what's different in
the GC between debug and release?
 
We've used this approach in Smalltalk for over 10 years now without
problems.

I don't think the approach itself isn't problematic. Weak tables is a
classic implementation feature, and I personally think it's elegant as
long as it's used as lazy lookup management. This said,
In waht way can I do it better than the CLR?

That depends on what you mean by "it". If you're basing a cache on the
GC, you have no flexibility. Depending on what you're caching, you
might want to dispose resources based on other criteria than just
those the GC has.
Also, which best practeces am I violating?

Well, if you're using this as an alternative to disposing objects (and
not finalizing them), then I think you're violating the idea of
disposition and suppression of finalization, which takes workload off
the GC, and provides less surprises at runtime. But if you're just
holding simple data in your table, and using the table to keep
possible references to stuff that would normally be out of reach
because its eligibility for collection, then it's a typical
WeakReference scenario, as far as I understand it. (I was under the
impression you wanted us to be the devil's advocate here.)
 
- What kind of GC related race conditions should I wary about?
- Is there anything special GC related that I've missed?

Not strictly GC related, but since you'll have to employ some sort of
synchronization on your actual table (whether it's a hashtable or a
dictionary), and you'll need access to the table in the finalizer, you
may want to be careful not to cause the GC to lock unnecessarily. For
example,

~Item() {
lock (Owner.SyncRoot) {
Owner.RemoveKey(Key);
}
}

Locking inside the finalizer could possibly hamper performance. I'm
not aware of any specific considerations here, but maybe someone else
is. I've never seen Smalltalk, so I don't know if it handles
synchronization on weak tables implicitly.
 
Locking inside the finalizer could possibly hamper performance. [...]

To see what I mean, try the sample implementation below. If you remove
the lock() around the blocks that manipulate the inner dictionary, you
get race sporadic conditions that result in, for example, index out of
range exceptions in the internals of the inner dictionary. I don't
know if you consider this a GC related race condition. I haven't
tested how much the locks hamper performance, since I can't
spontaneously think of a lock free implementation. Since you don't
know the GC details, you have to assume you have to lock. And another
interesting thing to note is the first count printout. In my tests, it
varies between 1500 and 15000. Since you don't know the runtime
details, you can't really assume anything at all here. These are two
things I consider speaking against this type of cache making sense in
the CLR, but YMMV if your implementation has more lax needs.

class Program {
static void Main(string[] args) {
WeakDictionary<string, Item> weakDictionary =
new WeakDictionary<string, Item>();
for (int i = 0; i < 20000; i++) {
Item item = new Item();
item.X = i;
weakDictionary.Add(i.ToString(), item);
}
Console.WriteLine(weakDictionary.Count);
Console.ReadLine();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine(weakDictionary.Count);
Console.ReadLine();
}
}
class WeakDictionary<TKey, TValue> {
private Dictionary<TKey, WeakReference> weakRefDict =
new Dictionary<TKey, WeakReference>();
public object SyncRoot {
get {
return ((IDictionary)weakRefDict).SyncRoot;
}
}
public void Add(TKey Key, TValue Value) {
lock (((IDictionary)weakRefDict).SyncRoot) {
WeakItem<TKey, TValue> weakItem =
new WeakItem<TKey, TValue>();
weakItem.Value = Value;
weakItem.Key = Key;
weakItem.OwningWeakDict = this;
WeakReference weakReference =
new WeakReference(weakItem);
weakRefDict.Add(Key, weakReference);
}
}
public void Remove(TKey Key) {
lock (((IDictionary)weakRefDict).SyncRoot) {
weakRefDict.Remove(Key);
}
}
public int Count {
get {
return weakRefDict.Count;
}
}
}
class WeakItem<TKey, TValue> {
public TKey Key;
public TValue Value;
public WeakDictionary<TKey, TValue> OwningWeakDict;
~WeakItem() {
OwningWeakDict.Remove(Key);
}
}
class Item {
public int X;
}
 
Well Tomten,

the race conditions you are talking about are not specific to GC, but
general to any multithreaded environment.

About the GC details, Jeff Richter has a chapter about GC in his book
(http://www.amazon.com/CLR-via-Second-Pro-Developer/dp/0735621632/) -
recommended reading! Most important things:

1. A GC will suspend ALL threads.

2. There are two CLR variants:
- Workstation: Desktop apps and single CPU machines.
- Server: Server apps (IIS, MSSQL etc.) on multiple CPU machines.
On server CLR, each CPU has a heap (for fater allocation) and each CPU
executes a GC thread (for simultaneous GC'ing).

3. Finalizable objects are put in a finalization reachable queue (by the
GC).

4. A DEDICATED THREAD executes the finalizers on objects in the finalization
reachable queue (after the GC cycle has finished). So! Race conditions can
occur as with any other multithreaded app!

5. The downsides of finalization:
- Allocation is slightly more expensive (the CLR will need to track
finalizable objects).
- Finalizers must be fast and avoid blocking.
- Objects get promoted (due to the natire of finalization, and the
finalization queue / finalization reachable queue) => 2 or more GC cycles
are required to reclaim memory
- Objects refered by a finalizable object must survive as well, thus the
life of those objects is prolonged as well.
- You never know when finalization happens! That's OK with me.

Conclusion:
The dedicated thread that executes the finalizers sets some restrictions. No
thread local storage, exceptios, avoiding waits deadlocks etc. Coding the
weak dictionary must be carefull to avoid deadlocks. That's all.

I have an even bigger problem. Since the app is distributed, some objects
are proxies to web services. It's no use locking those, because only the
proxy will be locked, not the real object. So my app will have to be
tollerant and live without locking and synchronization :-(

-- Todor


UL-Tomten said:
Locking inside the finalizer could possibly hamper performance. [...]

To see what I mean, try the sample implementation below. If you remove
the lock() around the blocks that manipulate the inner dictionary, you
get race sporadic conditions that result in, for example, index out of
range exceptions in the internals of the inner dictionary. I don't
know if you consider this a GC related race condition. I haven't
tested how much the locks hamper performance, since I can't
spontaneously think of a lock free implementation. Since you don't
know the GC details, you have to assume you have to lock. And another
interesting thing to note is the first count printout. In my tests, it
varies between 1500 and 15000. Since you don't know the runtime
details, you can't really assume anything at all here. These are two
things I consider speaking against this type of cache making sense in
the CLR, but YMMV if your implementation has more lax needs.

class Program {
static void Main(string[] args) {
WeakDictionary<string, Item> weakDictionary =
new WeakDictionary<string, Item>();
for (int i = 0; i < 20000; i++) {
Item item = new Item();
item.X = i;
weakDictionary.Add(i.ToString(), item);
}
Console.WriteLine(weakDictionary.Count);
Console.ReadLine();
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine(weakDictionary.Count);
Console.ReadLine();
}
}
class WeakDictionary<TKey, TValue> {
private Dictionary<TKey, WeakReference> weakRefDict =
new Dictionary<TKey, WeakReference>();
public object SyncRoot {
get {
return ((IDictionary)weakRefDict).SyncRoot;
}
}
public void Add(TKey Key, TValue Value) {
lock (((IDictionary)weakRefDict).SyncRoot) {
WeakItem<TKey, TValue> weakItem =
new WeakItem<TKey, TValue>();
weakItem.Value = Value;
weakItem.Key = Key;
weakItem.OwningWeakDict = this;
WeakReference weakReference =
new WeakReference(weakItem);
weakRefDict.Add(Key, weakReference);
}
}
public void Remove(TKey Key) {
lock (((IDictionary)weakRefDict).SyncRoot) {
weakRefDict.Remove(Key);
}
}
public int Count {
get {
return weakRefDict.Count;
}
}
}
class WeakItem<TKey, TValue> {
public TKey Key;
public TValue Value;
public WeakDictionary<TKey, TValue> OwningWeakDict;
~WeakItem() {
OwningWeakDict.Remove(Key);
}
}
class Item {
public int X;
}
 
Hello Todor,

We have implemented a similar cache to handle a set of globally reused
bitmaps. We did not use finalization as the trigger, and I would not
recommend that technique. I would not make the cached values nor the
key-value pair objects finalizable.

The WeakReference is the safest way. It is true that you may have some dead
key-value pairs and WeakReferences in your table at any given point, but
these are small objects.

Lets ignore multi-threading issues (use a ThreadStaticAttribute or a lock).
The "value" in the key-value pair stored is (or contains) a WeakReference.
Using this technique you do have to explicitly clean the dictionary
periodically. What is the trigger for the cleaning?

When we search our dictionary using the key, and find the corresponding
WeakReference contains a collected reference (it is now null), we flag the
dictionary to be cleaned. The dead entry must be fixed immediately, but you
don't have to clean the entire dictionary at that point. You could wait for
a certain number of dead entries to be found before triggering a larger
cleaning.

We used reference counting as well to trigger more aggressive clean up. The
reference counts were incremented and decremented by the objects using the
cached bitmaps. The WeakReference was to handle the case where Dispose is
not explicitly called. In this case the reference counts are incorrect, and
the cached bitmap never gets a reference count of 0. However, if all its
references become garbage, eventually it becomes eligible for GC, and the
WeakReference is set to null.

So the end user can use Dispose for aggressive cache cleaning, or forget
about Dispose, and we clean lazily.

Regards,
Frank Hileman

check out VG.net: http://www.vgdotnet.com
Animated vector graphics system
Integrated Visual Studio graphics editor
 
I have an even bigger problem. Since the app is distributed, some objects
are proxies to web services. It's no use locking those, because only the
proxy will be locked, not the real object.

As long as all consumers are locking on the same thing, it doesn't
matter what it is, does it? For example, automatic locking in the CLR
locks on different things according to scenario, but the mechanism is
still valid. In some situations you may also want different locks for
different operations on the same complex object (read/write locking is
one example, even if it's perhaps more deadlock prone), which means
the locks are completely arbitrary and require consumers to follow
arbitrary rules.

What's the strategy behind putting a proxy to a web service in a weak
cache?
 
Hello Todor,

From your post, my understanding on this issue is: you want to implement a
cache mechanism where you maintain the weak references to the cached
objects in a Dictionary. If I'm off base, please feel free to let me know.

After reading your sample codes, I hope to discuss the following three
points with you.

1. the Finalize() method in the class Item should be replaced with ~Item()
IL | C++/CLI | C#
| VB
Dispose | ~ClassName | Dispose | Dispose
Finalize | !ClassName | ~ClassName | Finalize
According to the table above, the finalize method in MSIL corresponds to
~ClassName in C#. Therefore, your Finalize() should be replaced with
~Item(), otherwise, 'this.Owner.RemoveKey(this.Key)' will never be called.
For a more detailed view of .net GC, you can refer to the page:
http://www.bluebytesoftware.com/blog/2005/04/08/DGUpdateDisposeFinalizationA
ndResourceManagement.aspx

2. Weak Reference can be used to implement cache, as this blog described:
http://blogs.msdn.com/johan/archive/2007/04/26/finalizers-and-weak-reference
s.aspx. But because the finalization is expensive (see
http://blogs.msdn.com/cbrumme/archive/2004/02/20/77460.aspx) , I hope to
discuss with you about an improvement of your cache mechanism:
In my opinion, you can remove your finalization method and let GC to
release the expired objects. When a cached object is released by GC, the
corresponding WeakReference's Target will be set to Null automatically.
Whenever you call GetItem(key) method in WeakTable, it should check if the
Target of the key's value (WeakReference ) is null. If it is null, then the
key/value pair should be removed from innerDictionary and return Null for
the GetItem method.

3. As Tomten said, lock (Owner.SyncRoot) { ¡­ } is necessary to avoid that
GC locks unnecessarily. In your last message, you said that your app is
distributed, some objects in the cache are proxies to web services. In
fact, web service is stateless, there is no real object in server side,
thus I think those proxy objects still need to be locked to avoid the
exceptions if you are still using the .NET finalization method. Of course,
if you accept my suggestion in point 2 and remove your finalization, then I
think the lock is unnecessary.

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
What's the strategy behind putting a proxy to a web service in a weak

The strategy is object identity!

Meaning, if I request an object (via the web service) and later request the
same object again (each object has an unique ID), I would like to get
reference to the same object, if it exists on the client.
 
When a cached object is released by GC, the
corresponding WeakReference's Target will be set to Null automatically.
Whenever you call GetItem(key) method in WeakTable, it should check if the
Target of the key's value (WeakReference ) is null. If it is null, then
the
key/value pair should be removed from innerDictionary and return Null for
the GetItem method.

Yes! But who will release the key-value pair object inside the dictionary
and the WeakReference object? The dictionary will grow with garbage
containing a lot of entries for GC'ed objects (weak references with target
== null).

This is a common problem. The solution, found in many Smalltalk
implementations is called Ephemerons. Here is a paper that describs the
issue and the solution "Ephemerons: a new finalization mechanism"
(http://portal.acm.org/citation.cfm?id=263733&coll=Portal&dl=ACM&CFID=31581603&CFTOKEN=33582611).
BTW, Ephemerons were invented by Geroge Bosworth for Visual Smalltalk.
Geroge Bosworth later become one of the chief architects behind the GC in
..Net.


My original questions was, how do I clean up the dictionary for garbage
key-value pair objects and weak reference objects. Once the object has been
collected, the key-value pair and the weak reference objects are no longer
needed. I want those to be collectable as well. Without ephemerons, the only
solutions I see is to have each object explicitely clean up in the
dictionary.

-- Todor
 
We have implemented a similar cache to handle a set of globally reused
bitmaps. We did not use finalization as the trigger, and I would not
recommend that technique. I would not make the cached values nor the
key-value pair objects finalizable.

Why are people so affraid of finalization? We've used this technique for 15
years now (outside of the .Net worls) without any problems. If you
understand what happens, I see no risk in it.

Downside, performance hit, but compared to doing a call via soap over the
internet or other bar programming, this is cheap. I am also of the oppinion
that object allocation and garbage collection is cheap. We are not talking
millions of objects. We are talking about 1000s. I believe that MS have the
resources and have hired the best brains to write the object allocation and
garbage collection.
 
Why are people so affraid of finalization? We've used this technique for 15
years now (outside of the .Net worls) without any problems. If you
understand what happens, I see no risk in it.

Well, risk is not really the issue. What will happen in your scenario
has been thoroughly described in this thread, except for Frank
Hileman's version, which I think you should consider, unless you are
daring and want to take a shot at implementing ephemerons in .NET.

That said, object allocation in .NET is cheaper than a [profanity
removed -Ed]. However, I think the reason people advise against using
them here is because it's not a pattern .NET was designed for. If you
need cleanup, the Dispose pattern I was talking about in my first
reply is usually suggested, because it doesn't interfere with the GC,
but still achieves about the same thing. You're right about 1000
objects in a weak table using locking inside the finalizers isn't
going to sink an application, but since you haven't shared that much
about what you want to do, the discussion is kinda theoretical, and in
theory, this is not recommended. Why not try out all three versions
(IDispose, finalizer, Hileman), now that you've gotten enough feedback
to get started?
I believe that MS have the resources and have hired the best brains to write
the object allocation and garbage collection.

"Belief is the psychological state in which an individual is convinced
of the truth of a proposition."
 
Hi Todor,

In .net finalization has costs and complexities that are best avoided. It
prevents the GC from doing a good job and there is more potential to create
bugs within finalization code.

We use finalization when necessary, but the only classes that should have a
finalizer in .net is a simple wrapper around an unmanaged resource. Any
other finalizers add cost without benefit.

There is a performance whitepaper somewhere on MSDN that discusses the
limited cases when finalizers are needed. There is also an old "dispose
pattern" that incorrectly uses a finalizer.

WeakReference was designed for your scenario. Even if you have only 1000s of
objects, you wish to have the dictionary cleaned, and WeakReference enables
you to do that without reference counters.

Regards,
Frank
 
Hello Todor,
Why are people so afraid of finalization? We've used this technique for 15
years now (outside of the .Net worlds) without any problems. If you
understand what happens, I see no risk in it.

In fact, we usually do not recommend 'finalization' because:
(http://msdn.microsoft.com/msdnmag/issues/1100/GCI/default.aspx)
#1. Finalizable objects get promoted to older generations, which increases
memory pressure and prevents the object's memory from being collected when
the garbage collector determines the object is garbage.
#2. Finalizable objects take longer to allocate.
#3. Forcing the garbage collector to execute a Finalize method can
significantly hurt performance. Remember, each object is finalized. So if I
have an array of 10,000 objects, each object must have its Finalize method
called.
#4. Finalizable objects may refer to other (non-finalizable) objects,
prolonging their lifetime unnecessarily.
#5. You have no control over when the Finalize method will execute. The
object may hold on to resources until the next time the garbage collector
runs.
#6. When an application terminates, some objects are still reachable and
will not have their Finalize method called. For instance, Finalize methods
are not called for unreachable objects when an application exits so that
the application may terminate quickly. Of course, all operating system
resources will be reclaimed, but any objects in the managed heap are not
able to clean up gracefully.
#7. The runtime doesn't make any guarantees as to the order in which
Finalize methods are called. It is strongly recommended that Finalize
methods not access any inner, member objects.

Whether the factors above will have great impact on the application will
largely depend on how the finalization is used and other architecture of
the application. In this scenario, I do not think #1, #4, #5, #6, #7
influence your implementation of ephemerons. Thus, the finalization is
acceptable if the performance is OK.

Based on my research, some Smalltalks do not support finalization of
objects directly due to the complexity of GC's rules. They have only weak
references and implement finalization with it: Any object which is not
directly reachable through a strong pointer chain is garbage collected, and
any weak references are "nilled". The weakly referencing objects which
suffer bereavements, are informed, and it is up to them to perform
finalization actions on behalf of the objects that they have lost. This is
typically achieved by having a copy of the finalizable objects, and using
them as 'executors'. This approach makes garbage collection simpler, but is
inefficient and requires more complex Smalltalk classes to support it.

Tomten and Hileman's solutions follow the pattern of .NET, therefore, I
think you may try these options and see whether they work better.

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Hello Todor,

Would you mind letting me know the result of the suggestions? If you need
further assistance, feel free to let me know. I will be more than happy to
be of assistance.

Have a great day!

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Back
Top