ThreadPool and AppDomains and DeadLock - HELP!

  • Thread starter Thread starter Paul Wardle
  • Start date Start date
P

Paul Wardle

We have a plug-in architecture which dynamically loads plug-ins into their
own AppDomains. The plug-ins are multithreaded and use the Thread Pool for
asynchronous communications, timers etc. They typically use the lock keyword
to synchronize access to objects that are used from different threads.

It is very important that the system is very reliable. However, a customer
has reported that the system becomes unstable when he runs 60 instances of
one plug-in. We have investigated the problem and found that the plug-in
deadlocks Thread Pool threads under certain circumstances. All threads in
the Thread Pool eventually become locked and the system dies.

The plug-ins are not necessarily written by our development team; we need to
ensure that the system is as robust as possible. The system needs to be able
to recover from the deadlock. It can unload the AppDomain, but this does not
unblock the Thread Pool threads. Is there any way we can get .Net to unblock
arbitrary Thread Pool threads when an AppDomains is unloaded?

Thanks in advance

Paul
 
[...] Is there any way we can get .Net to unblock arbitrary Thread Pool
threads when an AppDomains is unloaded?

Generally, the only way to "unblock" a deadlocked thread is to simply
terminate either it or the one it's waiting for. If there's truly
deadlock, you have at least one thread that is holding a resource required
by another thread, but which is blocked waiting to get a resource held by
that other thread. At least one of those threads needs to simply be
removed from the deadlock cycle, and the only general purpose mechanism
supported directly by the OS is to terminate it.

If you wanted to complicate the design, you could theoretically create an
arbitrator that handles all resource locking, detects deadlock, and
reassigns resources to deal with that. However, doing that in a safe,
bug-free way is non-trivial. It seems to me that it would better to just
impose on the plug-ins the requirement that they be written correctly and
not deadlock, and when you find one that does, exclude that plug-in from
use until the developer of the plug-in has shown that they have a version
that's fixed to not deadlock.

As a compromise, you could instead try to detect deadlock conditions and
abort the thread(s) that is(are) deadlocked. Not knowing the design of
your application, it's hard to say how you might do that. But if you have
some mechanism by which the top-level application can monitor the activity
of the plug-ins, then you might be able to for example assume a plug-in
has deadlocked if its thread doesn't show any signs of activity for some
period of time. At that point, you could just terminate the threads that
the plug-in is using (I believe that the ThreadPool will just recreate new
threads to account for that, to ensure the usual minimum idle threads
requirement).

Pete
 
Hello Paul,

Do you use async threads like "Begin...." or just thread pool?

---
WBR, Michael Nemtsev [.NET/C# MVP].
My blog: http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

PW> We have a plug-in architecture which dynamically loads plug-ins into
PW> their own AppDomains. The plug-ins are multithreaded and use the
PW> Thread Pool for asynchronous communications, timers etc. They
PW> typically use the lock keyword to synchronize access to objects that
PW> are used from different threads.
PW>
 
If you wanted to complicate the design, you could theoretically create an
arbitrator that handles all resource locking, detects deadlock, and
reassigns resources to deal with that. However, doing that in a safe,
bug-free way is non-trivial.

I had already thought about that. This would require a rewrite of all the
plug-ins that we already have - which is not trivial.
It seems to me that it would better to just impose on the plug-ins the
requirement that they be written correctly and not deadlock, and when you
find one that does, exclude that plug-in from use until the developer of
the plug-in has shown that they have a version that's fixed to not
deadlock.

Yes, we believe we have fixed the problem with this particular plug-in.
However it is very had to ensure that there are no multithreading issues. We
do code reviews and a degree of stress testing etc we but cannot guarantee
that we have found all of the problems. If a customer's site to goes down it
takes a lot of resources to find out what has actually happened.
As a compromise, you could instead try to detect deadlock conditions and
abort the thread(s) that is(are) deadlocked. Not knowing the design of
your application, it's hard to say how you might do that. But if you have
some mechanism by which the top-level application can monitor the activity
of the plug-ins, then you might be able to for example assume a plug-in
has deadlocked if its thread doesn't show any signs of activity for some
period of time. At that point, you could just terminate the threads that
the plug-in is using (I believe that the ThreadPool will just recreate new
threads to account for that, to ensure the usual minimum idle threads
requirement).

Our application queues "commands" for each plug-in. The plug-ins execute
each command in turn. The command sends a notification once it completes and
progress while it is executing. If a command "goes quiet" for a period of
time then the system assume it is stuck and the command is aborted. This
happens when there is a bug and the plug-in deadlocks, but the deadlocked
thread is not freed.

The commands are executed asynchronously on the ThreadPool. As I said
before, the plug-ins use asynchronous communications and timers so we
currently have no way of knowing which threads are being used for that
plug-in as the thread allocation is essentially random.

During comms, we have a layer between the driver and the ThreadPool which we
can modify to track which thread processes the received data. However, we
have no wrappers for the timers - this could be the thing that we need to
introduce so we can also track their execution.

Its a shame that .Net does not automatically free the blocked thread when
the AppDomain is unloaded. I can understand why - how does it know that it
is safe to free the thread? It is interesting that if the thread blocks on a
lock (or Monitor.Enter) the thread is not freed, but if you are waiting on
an event (such as ManualResetEvent) then it is (not sure whether it throws a
ThreadAbortedExeception here).

It looks like any solution here is non-trivial :(

Thanks for your help

Paul
 
We use code like this:

ThreadPool.QueueUserWorkItem(new WaitCallback(this.ProcessItem), command);

We also use BeginInvoke to execute code on a background thread and our event
handlers for timers and comms also get processed on Threadpool threads.

Paul
 
You've just run into one of the biggest problems with using the ThreadPool.
I've personally run into this, and been forced to recode quite a bit to work
around it.
http://www.coversant.com/dotnetnuke/Default.aspx?tabid=88&EntryID=8

From here, you're probably going to need to create your own Thread Pool (or
pull one down off the web), and make sure that plug-ins share your threads,
and don't use the built-in pool.
The commands are executed asynchronously on the ThreadPool. As I said
before, the plug-ins use asynchronous communications and timers so we
currently have no way of knowing which threads are being used for that
plug-in as the thread allocation is essentially random.

That sounds (to me) like you want a custom thread pool running threads that
you own.
 
Hello Chris Mullins [MVP],

Exactly, that's what I tried to find out my question to OP

Seems that he has some arch flaws which exhausted the ThreadPool

---
WBR, Michael Nemtsev [.NET/C# MVP].
My blog: http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

C> You've just run into one of the biggest problems with using the
C> ThreadPool. I've personally run into this, and been forced to recode
C> quite a bit to work around it.
C> http://www.coversant.com/dotnetnuke/Default.aspx?tabid=88&EntryID=8
C>
C> From here, you're probably going to need to create your own Thread
C> Pool (or pull one down off the web), and make sure that plug-ins
C> share your threads, and don't use the built-in pool.
C>C> That sounds (to me) like you want a custom thread pool running
C> threads that you own.
C>
C> C>
 
Back
Top