Hashtables and concurrency

  • Thread starter Thread starter Mark
  • Start date Start date
M

Mark

When the MSDN docs say
"Hashtable is thread safe for use by multiple reader threads and a single
writing thread ... which allows for lock-free reads provided that the writers
are serialized to the Hashtable"

how is that accomplished? If the readers are flipping through the table,
even though the operation is quick, don't they need to lock it if there's a
writer out there?

Also, if the hash table in question is a member of a larger class and the
"writer" may swap in a whole new table periodically and the readers are all
"lock-free" I presume that using Interlocked.Exchange<Hashtable>() would be
sufficient to make sure the readers don't hit a pothole?

Also also, it seems like the change in behavior between Hashtable and
generics would complicate concurrency. Hashtable has the behavior that
lookups that aren't found return null; Dictionary<> throws an error,
necessitating a 2-step lookup.

e.g.
object foo = myClass.Hashtable["foo"];
if (foo == null) Console.Writeline("Not found");

vs

object foo = myClass.Dictionary["foo"]; // blows up if not found
or
if (myClass.Dictionary.ContainsKey("foo")) foo = myClass.Dictionary["foo"];

From the external point of view, lookup is an atomic operation for Hashtable
but not for Dictionary. If someone swaps out the Hashtable with an
interlocked exchange, the Hashtable reader will still work with the one in
hand. If someone swaps out a Dictionary, you could catch a reader between
the ContainsKey and the lookup.

Thanks
Mark
 
Mark said:
When the MSDN docs say
"Hashtable is thread safe for use by multiple reader threads and a single
writing thread ... which allows for lock-free reads provided that the writers
are serialized to the Hashtable"

how is that accomplished? If the readers are flipping through the table,
even though the operation is quick, don't they need to lock it if there's a
writer out there?
The Hashtable guarantees that this:

object value = hashtable[key];

is lock-free and will return a valid value (that is, neither garbage nor a
value that was never inserted). That is all. It does so with a lock-free
algorithm that spins in a loop until the table is stable.

If by "flipping through" you mean "enumerating": as the MSDN mentions,
enumerating is intrinsically thread-unsafe. External locking is required.
Also, if the hash table in question is a member of a larger class and the
"writer" may swap in a whole new table periodically and the readers are all
"lock-free" I presume that using Interlocked.Exchange<Hashtable>() would be
sufficient to make sure the readers don't hit a pothole?
This has nothing to do in principle with hasthables, readers or writers.
Also, reference assignment is atomic, so I'm not sure what the .Exchange()
would be good for, unless it's part of some algorithm that uses two
hashtables for interesting purposes.

This is in any case not how Hashtable is implemented, if that's what you're
assuming.
Also also, it seems like the change in behavior between Hashtable and
generics would complicate concurrency. Hashtable has the behavior that
lookups that aren't found return null; Dictionary<> throws an error,
necessitating a 2-step lookup.
..TryGetValue().

From the external point of view, lookup is an atomic operation for Hashtable
but not for Dictionary. If someone swaps out the Hashtable with an
interlocked exchange, the Hashtable reader will still work with the one in
hand. If someone swaps out a Dictionary, you could catch a reader between
the ContainsKey and the lookup.
Well, of course, and if a reader did multiple reads in a row, it could see
different dictionaries, whether you're exchanging dictionaries or not (some
other client might alter it). If you need atomic access for multiple-step
operations, locking is inevitable, even if individual operations are atomic.
 
Thanks Jeroen...

Jeroen Mostert said:
Mark said:
how is that accomplished? If the readers are flipping through the table,
even though the operation is quick, don't they need to lock it if there's a
writer out there?
The Hashtable guarantees that this:

object value = hashtable[key];

is lock-free and will return a valid value (that is, neither garbage nor a
value that was never inserted). That is all. It does so with a lock-free
algorithm that spins in a loop until the table is stable.

For my immediate purposes I'll just take it on faith and use it, but I'm
curious how it detects the table is stable... I take it this also applies to
generic collections...
If by "flipping through" you mean "enumerating": as the MSDN mentions,
enumerating is intrinsically thread-unsafe. External locking is required.

No, "flipping through" was just imprecise terminology. I was thinking of
the hash lookup as a multi-step process (getting the hash, finding the
bucket, finding the item in the bucket).

I understand that enumerations have to be locked.
This has nothing to do in principle with hasthables, readers or writers.
Also, reference assignment is atomic, so I'm not sure what the .Exchange()
would be good for, unless it's part of some algorithm that uses two
hashtables for interesting purposes.

Sorry I didn't explain more fully. We have some hashtables that are fed
from database queries. Sometimes a new table is loaded from the db, so we
load the whole thing up in a side copy to avoid holding the lock all that
time. Then at the end we have something like
lock (myClass.Hashtable)
{
Hashtable = newtable;
}

and the readers lock the table when they read from it to make sure
everyone's on the same page.

I was thinking that
Interlocked.Exchange<Hashtable>(ref myClass.Hashtable, newtable);

would take care of the exchange without requiring the readers to lock.
..TryGetValue().

Great! Sorry, I missed that method in the docs before.

Thanks
Mark
 
Mark said:
Jeroen Mostert said:
Mark said:
how is that accomplished? If the readers are flipping through the table,
even though the operation is quick, don't they need to lock it if there's a
writer out there?
The Hashtable guarantees that this:

object value = hashtable[key];

is lock-free and will return a valid value (that is, neither garbage nor a
value that was never inserted). That is all. It does so with a lock-free
algorithm that spins in a loop until the table is stable.

For my immediate purposes I'll just take it on faith and use it, but I'm
curious how it detects the table is stable... I take it this also applies to
generic collections...
It does so by checking a "version" field, which is updated by any operation
that alters the table. The algorithm looks (very roughly) like this):

int version;
int bucketNumber = getBucketNumber(key);
again:
do {
version = this.version;
bucket = buckets[bucketNumber];
} while (this.writerBusy || version != this.version);
if (bucketMatchesHashAndBucketOk(bucket) {
return bucket.value;
} else {
goto again;
}

I'm omitting the finer details (the actual algorithm is quite intricate);
you can check for yourself with a tool like Reflector. The basic idea is
that the individual operations are atomic and the chain of operations is
broken by intermittent updates only in ways that are detectable.
No, "flipping through" was just imprecise terminology. I was thinking of
the hash lookup as a multi-step process (getting the hash, finding the
bucket, finding the item in the bucket).
The idea is that if any intermediate step fails, the operation is retried.
Obviously, the expectation is that it won't have to be retried often.
We have some hashtables that are fed from database queries. Sometimes a
new table is loaded from the db, so we load the whole thing up in a side
copy to avoid holding the lock all that time. Then at the end we have
something like
lock (myClass.Hashtable)
{
Hashtable = newtable;
}

and the readers lock the table when they read from it to make sure
everyone's on the same page.
The lock has an effect, but not the one you're thinking of. Reference
assignment is atomic, so if it were just one thread replacing the reference
and other threads using it for atomic operations, you wouldn't need locks to
begin with.

However, without either locking or marking the field "volatile", you're
subject to optimization issues. The compiler or the processor cache could
hold a reference to any "old" value indefinitely, meaning that readers
running on different threads never see the updated value. In this sense, the
lock *is* necessary, just not for the reason you think it is.
I was thinking that
Interlocked.Exchange<Hashtable>(ref myClass.Hashtable, newtable);
You don't actually need to specify the type argument; the compiler can infer
it (generic overloads are chosen in preference to ones using "object").
would take care of the exchange without requiring the readers to lock.
To my understanding (which is not complete by any means, so I'd love any
corrections from the hardcore concurreny crowd) that is *not* the case. It's
no better than simple assignment. The reason it's no better is that, as I
mentioned, reference assignment is atomic to begin with, and the readers are
still subject to optimization problems if they don't do any locking. In
theory, a single thread could maintain a reference to the old value
indefinitely if it's not forced to re-read it either by means of locking, or
marking the field "volatile", or Thread.VolatileRead().

Marking the field "volatile" would do. In that case, you don't need any
locks. The other solutions are all more complicated.
 
Hi Mark,

I agree with Jeroen that "default thread-safety" of HashTable is added by
the built-in locking machenism in its Insert and Remove methods. You can
find that it will call begin/end criticalsection (through reflector). And
for indexer code, it will also check the hashtable editing status and
version info so as to make sure the data read is updated.

Sure, there are many other cases you will need to apply your own custom
external locking such as enumerating the hash collection or access multiple
items via many accessing calls.

As for the following code you mentioned:

==========
lock (myClass.Hashtable)
{
Hashtable = newtable;
}
==============

Is the "myClass.Hashtable" the object you want to prevent from concurrent
access? If so, I'm afraid lock on it won't gurantee that no other thread
can access it. If you want to ensure that all threads access a certain
object sequentially(synchornized), you need to make sure that all those
threads use the same accessing methods(which include lock /unlock block).
e.g.

void MethodToAccess()
{
lock(a certain shared object variable)
{
//do work here
}
}

and the object on which to lock is not necessarily the object you want to
ensure synchronized in code.

Here is a good aritcle mentioned much on threading in C# that include
synchronizing & concurrent topics:

http://www.albahari.com/threading/part3.html

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.



--------------------
From: =?Utf-8?B?TWFyaw==?= <[email protected]>
References: <[email protected]>
Subject: Re: Hashtables and concurrency
Date: Mon, 12 May 2008 10:43:03 -0700
Thanks Jeroen...

Jeroen Mostert said:
Mark said:
how is that accomplished? If the readers are flipping through the table,
even though the operation is quick, don't they need to lock it if there's a
writer out there?
The Hashtable guarantees that this:

object value = hashtable[key];

is lock-free and will return a valid value (that is, neither garbage nor a
value that was never inserted). That is all. It does so with a lock-free
algorithm that spins in a loop until the table is stable.

For my immediate purposes I'll just take it on faith and use it, but I'm
curious how it detects the table is stable... I take it this also applies to
generic collections...
If by "flipping through" you mean "enumerating": as the MSDN mentions,
enumerating is intrinsically thread-unsafe. External locking is required.

No, "flipping through" was just imprecise terminology. I was thinking of
the hash lookup as a multi-step process (getting the hash, finding the
bucket, finding the item in the bucket).

I understand that enumerations have to be locked.
This has nothing to do in principle with hasthables, readers or writers.
Also, reference assignment is atomic, so I'm not sure what the .Exchange()
would be good for, unless it's part of some algorithm that uses two
hashtables for interesting purposes.

Sorry I didn't explain more fully. We have some hashtables that are fed
from database queries. Sometimes a new table is loaded from the db, so we
load the whole thing up in a side copy to avoid holding the lock all that
time. Then at the end we have something like
lock (myClass.Hashtable)
{
Hashtable = newtable;
}

and the readers lock the table when they read from it to make sure
everyone's on the same page.

I was thinking that
Interlocked.Exchange<Hashtable>(ref myClass.Hashtable, newtable);

would take care of the exchange without requiring the readers to lock.
..TryGetValue().

Great! Sorry, I missed that method in the docs before.

Thanks
Mark
 
Steven said:
I agree with Jeroen that "default thread-safety" of HashTable is added by
the built-in locking machenism in its Insert and Remove methods. You can
find that it will call begin/end criticalsection (through reflector).

Actually, it calls Thread.Begin/EndCriticalRegion(). This has nothing to do
with locking, although it's an easy enough mistake to make. It's to signal
the runtime host that the thread's entering a critical region, and that
termination could lead to instability of other threads (because it will
leave the hashtable in an inconsistent state). This is usually but not
always when using a custom approach to locking.

The insertion and removal code are lock-free, just like the reader. Brian
Grunkemeyer explains the design (and the critical region) in the last
section of this entry:
http://blogs.msdn.com/bclteam/archive/2005/06/14/429181.aspx
 
Thread.Begin/EndCriticalRegion(). This has nothing to do with locking,
although it's an easy enough mistake to make.

I don't understand what you mean by this. Care to explain further?

--

Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
The O.W.C. Black Book, 2nd Edition
Exclusively on www.lulu.com/owc $19.99
 
Alvin said:
I don't understand what you mean by this. Care to explain further?
It's in the piece you cut away?

And in the MSDN, of course.
http://msdn.microsoft.com/library/system.threading.thread.begincriticalregion.aspx

Alright, when I said it had "nothing to do with locking", that was too
strong -- the point was that these methods do not establish any sort of lock
or critical section by themselves, which was wat Steven implied. In
particular, they are not to be confused with the
Enter/LeaveCriticalSection() functions. Indeed, Hashtable uses them to
demarcate its use of a lock-free mechanism for concurrency.
 
Thanks for the followup and further explanation Jeroen.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


--------------------
 
sorry, I misread the post. I saw criticalregion and read "criticalsection"
which was what confused me in the message. blame it on lack of sleep.

--

Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
The O.W.C. Black Book, 2nd Edition
Exclusively on www.lulu.com/owc $19.99
 
Back
Top