On Jon Skeet's threadpool

  • Thread starter Thread starter David Levine
  • Start date Start date
D

David Levine

Jon,

I've taken a look at it and these comments are based on a code-inspection
but I did not verify the correctness if its operation by using it in a
sample.

The ctor for CustomThreadPool increments instanceCount before the
lock(staticLock)...why not inside of the lock?

The MinThreads and MaxThread properties do not validate the values...I
thought you were going to add some logic to detect and handle this.

The routines OnBeforeWorkItem and OnAfterWorkItem do not have a try-catch
that surrounds the callback. In addition, these routines are invoked from
method ExecuteWorkItem. This uses a try-catch around job.Invoke, but these
two routines are unprotected by a try-catch (there is a try-finally). The
routine ExecuteWorkItem is also not surrounded with a try-catch; this means
that if either throws an exception it will not be logged or otherwise
detected by logic in the Threadpool. I don't think the threadpool will be
in an indeterminate state, but at a minimum the cause of the unexpected
thread termination will not be recorded.

WorkerThreadLoop is much more readable; good job of refactoring.

The OnException method - I like to at least generate some trace output.
Exceptions should always be treated as abnormal occurrences, and some kind
of breadcrumbs should be left behind. As the note says, it could even log it
to the event log....I think that's an excellent idea. You could add
something at the end of the method so that if no callback handles it a
default action is taken.

Question about Monitor: In AddWorkItem the first action taken if the
queueCount is zero is to pulse the queueLock before adding the workItem to
the queue; why not pulse it after adding it to the queue? Also, in the path
where queueCount is greater then zero it never pulses the queue at all - why
is this not needed? (I think it's not needed but I want to check.) I notice
that you implement the worker priority here and start the thread if needed
after the item is added to the queue - looks good. As you noted, a binary
search may be useful here (low on the priority list).

I did not actually setup a sample and run it through its paces. A test
fixture/integration/unit test would be useful to validate it. Testing a
multi-threaded class is tricky but unit testing would at least verify the
non-timing related logic. If nothing else, others might find such a test jig
useful - could even turn into a little product.

On the whole it looks good.

Dave
PS: Have you had a chance to look at the exception document I sent?
 
David Levine said:
I've taken a look at it and these comments are based on a code-inspection
but I did not verify the correctness if its operation by using it in a
sample.

The ctor for CustomThreadPool increments instanceCount before the
lock(staticLock)...why not inside of the lock?

It's inside the static lock but outside the stateLock. At least in the
version of the code I'm looking at:

public CustomThreadPool()
{
lock (staticLock)
{
instanceCount++;
lock (stateLock)
{
name = "CustomThreadPool-"+instanceCount;
}
}
}

Whether or not I should increment the instance count in the constructor
where the caller specifies the name is an open question.
The MinThreads and MaxThread properties do not validate the values...I
thought you were going to add some logic to detect and handle this.

Doh - forgot to add that to the list of ToDo items. Thank you. I should
be able to do that fairly soon - it's very simple code.
The routines OnBeforeWorkItem and OnAfterWorkItem do not have a try-catch
that surrounds the callback. In addition, these routines are invoked from
method ExecuteWorkItem. This uses a try-catch around job.Invoke, but these
two routines are unprotected by a try-catch (there is a try-finally). The
routine ExecuteWorkItem is also not surrounded with a try-catch; this means
that if either throws an exception it will not be logged or otherwise
detected by logic in the Threadpool. I don't think the threadpool will be
in an indeterminate state, but at a minimum the cause of the unexpected
thread termination will not be recorded.

Indeed - this is my current exception non-handling strategy - I let
everything outside the work item itself propagate up. I expect anyone
who is interested in "out of band" exceptions to put their own
UnhandledException handler in, I guess. (Without that, the system
threadpool never shows you any exceptions...)
WorkerThreadLoop is much more readable; good job of refactoring.
:)

The OnException method - I like to at least generate some trace output.
Exceptions should always be treated as abnormal occurrences, and some kind
of breadcrumbs should be left behind. As the note says, it could even log it
to the event log....I think that's an excellent idea. You could add
something at the end of the method so that if no callback handles it a
default action is taken.
Yup.

Question about Monitor: In AddWorkItem the first action taken if the
queueCount is zero is to pulse the queueLock before adding the workItem to
the queue; why not pulse it after adding it to the queue?

It makes no odds - the worker thread will have to wait until the
"adding" thread has released queueLock anyway.
Also, in the path where queueCount is greater then zero it never
pulses the queue at all - why is this not needed? (I think it's not
needed but I want to check.)

The worker thread will only wait if the count is 0, so we only need to
Pulse if we're coming out of that state and want to wake any waiting
workers.
I notice that you implement the worker priority here and start the
thread if needed after the item is added to the queue - looks good.

Phew :) I've noted in the documentation that if MinThreads is 0,
there's a possible race condition which means that a job added just at
the same time as the last thread quits may not be done. I don't *think*
it ends up being a problem if MinThreads isn't 0 though.
As you noted, a binary search may be useful here (low on the priority
list).

Very low :) (I'm expecting priorities to be a fairly rarely used
feature, and I optimise the case where we just need to put the item on
the end.)

The trouble with binary search is I find it one of those places where
off-by-one errors are easy to make...
I did not actually setup a sample and run it through its paces. A test
fixture/integration/unit test would be useful to validate it. Testing a
multi-threaded class is tricky but unit testing would at least verify the
non-timing related logic. If nothing else, others might find such a test jig
useful - could even turn into a little product.

Absolutely. Unit testing is one of those things I've always thought is
a good idea, but not done nearly enough of :( As you point out, it's a
real pain for multi-threading code, but I should make an effort some
time. I readily admit I've done very little testing of this code, which
is a pretty poor state of affairs.
On the whole it looks good.

Thanks very much. I'm happier with it than I was.
PS: Have you had a chance to look at the exception document I sent?

Not yet, I'm afraid. I should have quite a lot of free time in November
though, so I may even get round to it :)
 
It's inside the static lock but outside the stateLock. At least in the
version of the code I'm looking at:
Yes, my bad, I missed it.
Whether or not I should increment the instance count in the constructor
where the caller specifies the name is an open question.

I missed that too. I'd say to increment it, otherwise you wont have an
accurate count of threadpool instances. It doesn't matter in terms of
functionality, but it might be misleading.
Indeed - this is my current exception non-handling strategy - I let
everything outside the work item itself propagate up. I expect anyone
who is interested in "out of band" exceptions to put their own
UnhandledException handler in, I guess. (Without that, the system
threadpool never shows you any exceptions...)

I'd argue the other way. Exceptions that are not caught do not propagate up
anywhere; once it escapes the threadpool thread that's as far as it goes,
and from there it turns into an unhandled exception (as you state). At this
point you are at the mercy of the unhandled exception policy - this
currently ignores it, but in the future it might shutdown the app (I do
realize it's hard to argue in favor of designing for future compatiblity or
for features that don't exist). There may be an application UE handler, and
it might not know what to make of it.

Also, the exception itself is not wrapped in another exception that could
provide additional context to help the user/developer figure out what went
wrong. If the original exception was something generic like a NullReference,
then when it turns into a UE it might leave the user completely mystified as
to where the exception occurred and what went wrong. Instead, how about
something like this...

try
{
OnBeforeWorkItem()
}
catch(Exception ex)
{
if ( !OnException(ex) ) // was it handled?
throw new Exception("Error occurred before WorkItem was executed for
yada yada yada.",ex);
}

This implies that the method OnException returns a value indicating if the
user handled the exception. My error message sucks and should be improved -
something more informative could be used, but I hope you see where I am
going with this.

The other issue is the specific exception type that is thrown. For
catch-wrap-throw scenarios I wrote a method called CloneException that
creates a new exception of the same type and returns it, so the LOC could
instead be:
throw CloneException("msg goes here", innerException);
This uses the same exception type, so this mechanism provides additional
context without transforming the original exception type.

You could also define an exception type, i.e. class JonsThreadpoolException
: Exception{}, and use that to add specific information to the exception
object. That way the UE handler could look for exceptions of that type and
have a better chance of handling it appropriately.

Again, a lot of this boils down to what the intended goal is. I am
uncomfortable with a strategy that deliberately uses UE to communicate
results...it feels unclean, but perhaps that's just me.
It makes no odds - the worker thread will have to wait until the
"adding" thread has released queueLock anyway.

Ah, I see.
Phew :) I've noted in the documentation that if MinThreads is 0,
there's a possible race condition which means that a job added just at
the same time as the last thread quits may not be done. I don't *think*
it ends up being a problem if MinThreads isn't 0 though.
Interesting race possibility. If that truly is a problem then you could use
some additional synchronization to detect and prevent this. Or another
possibility is to always create at least one thread and never let the thread
count drop to zero - then the race condition should never occur.

One additional thought - it would be useful to flow the security context to
the work item so it could be used during the callback; it would be nice if
you could use GetCompressedStack and SetCompressedStack for this. Some
caveats - I've never tried this, and I am not familiar enough with this
aspect of security to know what the side effects are, but it is something to
consider. Also, I've read this: "their use is restricted to libraries signed
with the Microsoft strong name", so you may not be able to use it at all.
See article
http://msdn.microsoft.com/msdnmag/issues/04/11/NETMatters/default.aspx for
info on this and other issues. It seems that Whidbey provides a much richer
API for doing that sort of stuff.

Absolutely. Unit testing is one of those things I've always thought is
a good idea, but not done nearly enough of :( As you point out, it's a
real pain for multi-threading code, but I should make an effort some
time. I readily admit I've done very little testing of this code, which
is a pretty poor state of affairs.

Unit testing is one of those things I always plan to do more of then I
actually do...it requires great discipline to slow down , stop writing the
code I really want to write, and write the code that tests the real code.

I usually do it when I am forced to. e.g. I wrote a test jig to simulate the
environment that exists when a custom action is called during installation
under an MSI file. What a pain...
Not yet, I'm afraid. I should have quite a lot of free time in November
though, so I may even get round to it :)
Goodo; thanks.
 
[Finally got round to replying to this.]

David Levine said:
I missed that too. I'd say to increment it, otherwise you wont have an
accurate count of threadpool instances. It doesn't matter in terms of
functionality, but it might be misleading.

Yup. Changed appropriately.
I'd argue the other way. Exceptions that are not caught do not propagate up
anywhere; once it escapes the threadpool thread that's as far as it goes,
and from there it turns into an unhandled exception (as you state). At this
point you are at the mercy of the unhandled exception policy - this
currently ignores it, but in the future it might shutdown the app (I do
realize it's hard to argue in favor of designing for future compatiblity or
for features that don't exist). There may be an application UE handler, and
it might not know what to make of it.

Also, the exception itself is not wrapped in another exception that could
provide additional context to help the user/developer figure out what went
wrong. If the original exception was something generic like a NullReference,
then when it turns into a UE it might leave the user completely mystified as
to where the exception occurred and what went wrong. Instead, how about
something like this...

try
{
OnBeforeWorkItem()
}
catch(Exception ex)
{
if ( !OnException(ex) ) // was it handled?
throw new Exception("Error occurred before WorkItem was executed for
yada yada yada.",ex);
}

This implies that the method OnException returns a value indicating if the
user handled the exception. My error message sucks and should be improved -
something more informative could be used, but I hope you see where I am
going with this.

Sort of. What if OnException throws an exception though? Do I try to
catch that, too? It ends up being pretty nasty.
The other issue is the specific exception type that is thrown. For
catch-wrap-throw scenarios I wrote a method called CloneException that
creates a new exception of the same type and returns it, so the LOC could
instead be:
throw CloneException("msg goes here", innerException);
This uses the same exception type, so this mechanism provides additional
context without transforming the original exception type.

Doesn't that rely on there being an appropriate constructor for the
type of exception which has been thrown?
You could also define an exception type, i.e. class JonsThreadpoolException
: Exception{}, and use that to add specific information to the exception
object. That way the UE handler could look for exceptions of that type and
have a better chance of handling it appropriately.

That's certainly a possibility - I'm not sure it feels entirely right,
but I'll keep thinking.
Again, a lot of this boils down to what the intended goal is. I am
uncomfortable with a strategy that deliberately uses UE to communicate
results...it feels unclean, but perhaps that's just me.

No, I know what you mean.
Interesting race possibility. If that truly is a problem then you could use
some additional synchronization to detect and prevent this. Or another
possibility is to always create at least one thread and never let the thread
count drop to zero - then the race condition should never occur.

I think actually the way to fix it is to have the bit of code which
reduces the number of threads in with the decision about whether or not
the thread should die, all effectively atomic using the state lock. In
other words, you can't detect that a thread exists if it's already
decided to quit. I haven't tried to engineer that yet, admittedly :)
One additional thought - it would be useful to flow the security context to
the work item so it could be used during the callback; it would be nice if
you could use GetCompressedStack and SetCompressedStack for this. Some
caveats - I've never tried this, and I am not familiar enough with this
aspect of security to know what the side effects are, but it is something to
consider. Also, I've read this: "their use is restricted to libraries signed
with the Microsoft strong name", so you may not be able to use it at all.
See article
http://msdn.microsoft.com/msdnmag/issues/04/11/NETMatters/default.aspx for
info on this and other issues. It seems that Whidbey provides a much richer
API for doing that sort of stuff.

I'll have a look. Once you start getting into the security aspects,
there's an *awful* lot more work that should be done.
Unit testing is one of those things I always plan to do more of then I
actually do...it requires great discipline to slow down , stop writing the
code I really want to write, and write the code that tests the real code.
Yup.

I usually do it when I am forced to. e.g. I wrote a test jig to simulate the
environment that exists when a custom action is called during installation
under an MSI file. What a pain...
:)


Goodo; thanks.

Still not done this yet, btw - but now I've got a job offer and the
concert I'm directing is this Friday/Saturday, so it's on my list of
things to do next week.
 
Jon Skeet said:
[Finally got round to replying to this.]

It's been a while :-)
Sort of. What if OnException throws an exception though? Do I try to
catch that, too? It ends up being pretty nasty.

You could write the method OnException so that it catches-wraps-throws the
original exception in a new exception object, must the same as if it had
returned false. It's ok to throw a new exception from within the catch
block so long as it accurately reflects the problem; i.e. wraps the inner
exception, or if that was handled but a new problem occurred, then the new
exception accurately reflects it.

Handling all the cases with proper handling does get tedious, but I believe
it's preferrable to the alternative.
Doesn't that rely on there being an appropriate constructor for the
type of exception which has been thrown?
I search for the particular .ctor I want and if that fails I use the generic
exception type. Here's the code I use, perhaps you could check it for
flaws. I don't check for a null exception argument because that is
explicitly an error (bug) by the caller.

static private System.Exception CloneExceptionPrivate(string
message,System.Exception ex)
{
Type t = ex.GetType();
try
{
Type[] types = new Type[2] { typeof(string),typeof(System.Exception) };
object[] args = new object[] {message,ex};
ConstructorInfo ci = t.GetConstructor(types);
if ( ci != null )
{
object o;
o = t.Assembly.CreateInstance(
t.FullName,false,BindingFlags.CreateInstance,null,args,null,null );
return (System.Exception)o;
}
}
catch(Exception uhOh)
{
SwallowException(uhOh,"Unable to CloneException of type {0}.",t );
}
// if here it was unable to create an instance of the same type. Create a
generic exception
// and return that.
return new System.Exception(message,ex);
} // CloneExceptionPrivate

I think actually the way to fix it is to have the bit of code which
reduces the number of threads in with the decision about whether or not
the thread should die, all effectively atomic using the state lock. In
other words, you can't detect that a thread exists if it's already
decided to quit. I haven't tried to engineer that yet, admittedly :)
Synchronizing all that correctly would be a chore :-) Might be easier to
always leave one thread running if that solves the problem.

I'll have a look. Once you start getting into the security aspects,
there's an *awful* lot more work that should be done.

Agreed. Definitely non-trivial to cover all the cases.
Still not done this yet, btw - but now I've got a job offer and the
concert I'm directing is this Friday/Saturday, so it's on my list of
things to do next week.
I hope all goes well for you.

regards,
Dave
 
Back
Top