thread pooling and short lived threads

  • Thread starter Thread starter Guest
  • Start date Start date
David Levine said:
Sorry if I wasn't clear. I meant that I had not analyzed the relationship
between the stateLock and the queueLock to verify that the lock order was
guaranteed if you used the property instead of the field. I agree that you
need a certain lock order, or a single lock.

Right. I'll have a look and see whether I can combine them without too
much impact. (The event lock is a simple one, fortunately - I don't
need to access anything else within it, basically.)
 
David Levine said:
I stated it badly...by "system" I was referring to the managed process. I
was unaware of the 1000 IOCP threads/process; is this a Windows limit or a
CLR limit?

Both are CLR managed pools, so it's a CLR limit.

Willy.
 
David Levine said:
Agreed. I feel this is one of those issues that does not have a single
"correct" answer.


That would work. Another option is to add the name to the workitem, and then
set the name of the thread to that value immediately before invoking the
worker callback. Since the only legitimate use of the thread name is for
diagnositic/instrumentation purposes this seems to me like a good place to
set the name. That way the thread takes on the identity of the worker item
being processed. You could set the name to idle when it has no more items
left to process.

Unfortunately, that doesn't work due to .NET only allowing you to set
the thread name once :(
Yes, but that isn't any more onerous that handling the add/remove as you are
already doing. I think something along the lines of the following would work
(caveat - this is completely untested and off the top of my head).

bool _runningList = false;
object _unsubscribeLock = new object();
ArrayList _unsubscriptionList = new ArrayList();

public event ExceptionHandler WorkerException
{
<snip>
remove
{
lock (eventLock)
{
exceptionHandler -= value;
if ( _runningList )
lock(queueLock)
{
queue.Enqueue(value);
if ( _runningList == true )
{
lock(_unsubscriptionList)
{ // add delegate to temp list
_unsubscriptionList.Add(value);
}
}
}
}
}
}

Unfortunately it's not that simple, for the reasons you state later.

Given that all of this is entirely different to the way .NET treats
event handlers the rest of the time, I think I'll go for something like
this:

1) Implement the idea of cancelling further processing of the event - I
like that.

2) Use the normal .NET semantics of "whatever was subscribed when the
event is fired, use that" - don't try to react to changes during the
event.

3) Use the normal .NET semantics of an exception stopping further
processing.

4) Add some sort of exception logging for this and the other events.

I would also change the lock used to synchronize the invocation list from
the event subscription so that the user can unsubscribe while processing the
exception even if called on a different thread. As it is you may get into a
deadlock if the user makes a blocking call to another thread that tries to
unsubscribe from the event while blocking in the worker callback.

Not with the current code. I don't hold the event lock while executing
the callback.
Note: This is just a suggestion - this would not work for the final solution
because it does not account for multiple exceptions that may be processed
from multiple worker threads for multiple work items. So it really needs a
queue per worker thread.

Yup. That's where things get tricky. I think it *could* be implemented
using a very clever linked list to track the whole invocation list
(never actually combining any delegates) but I don't think it's worth
it, to be honest.
I haven't seen that class yet. Perhaps you could copy the documentation over
to this class.

I'll certainly revise all the documentation when I've got the semantics
above implemented - and probably implement and document the same
semantics in ThreadController.
My take on non-CLSCompliant exceptions is...let the runtime barf if you get
one (IOW, don't catch it). The reason is that as far as I know the only way
you can even throw a non-clscompliant exception is to do it directly in
IL...I don't know of any language that will even let you generate one. You
can't get one of those even from interop because the runtime catches and
translates all exceptions from the interop layer. So if you do get one
you're are probably in an indeterminate state anyway and there are no
guarantees about what will happen next. I'd rather exit the app then keep
running under those conditions.

Yup, sounds good to me.
This is definitely an area I'm relatively unhappy about in .NET. If a
non-compliant exception is thrown, but you want to keep going anyway,
there's not a lot you can do - there isn't an exception to pass to any
handlers.

Agreed. I understand why they designed it in, but the problem is that they
did not give us a complete story about what it means, how it should be
handled, and what to expect if it is not handled.
Yup.
I take your point about not just swallowing it wholesale - but if I
call SwallowException, what should a library do in this case? It can't
know about debugging/ogging etc, and we're already in an OnException
case here - I don't want to go down a "OnExceptionInExceptionHandler"
route :)

Advice welcome here.

I feel that it is ok to do at least some minimal tracing - heck, the runtime
itself will generate tracing when a listener is subscribed. I'd rather have
a trace facility built into the threadpool that the user can opt out of at
runtime. These kinds of breadcrumbs are invaluable when troubleshooting
nasty bugs. There could be a static property called UseDebugOutput that by
default is set to true. When it is true it could execute some logic like
this.

static void DebugOutput(string format,params object[] args)
{
if ( UseDebugOutput )
{
string msg = string.Format(format,args);
if ( System.Diagnostics.Debugger.IsAttached )
System.Diagnostics.Debug.WriteLine(msg);
else
System.Diagnostics.Trace.WriteLine(msg);
}
}

Right. Presumably that means the code should be compiled with TRACE and
DEBUG defined, right? Otherwise those calls are removed entirely, I
believe. That's the other thing about libraries - having a debug
version and a separate release version is often a pain, IMO :(

<snip>

[Exception handler getting parameters]
I don't think that matters much because if an exception occurs it is already
going down a sub-optimal code path anyway.

But the problem is, you've got to keep the parameters around *whether
or not* an exception is thrown, just in case an exception is thrown.
I was thinking about logging it to the event log itself. There are some
exceptions that are more setup or configuration type errors (setting the
MinThreadPoolSize to an invalid value) that should be obviously apparent,
but some runtime problems are extremely trickly to capture, let alone
diagnose and fix. In the case of a threadpool, because unhandled exceptions
go unnoticed there is a real danger that the application wont even realize
that an exception occurred and it could wind up hanging without having a
clue what went wrong. This could happen if the app did not subscribe to the
exception event. You could even filter the logging so that it only logged if
the exception event was not subscribed to.

Right, that's not a bad idea. I haven't used the event log much - I'll
investigate.
Yup. Or passing a magic cookie that represents the work item (it could be
generated and returned to the caller when the work item is queued up).

I think I prefer the idea of the caller optionally providing it - they
can then give something useful if they want to, and just ignore it if
they're not going to use the facility anyway.
This is perhaps a policy question. I think those sorts of exceptions
represents bugs in the threadpool class itself rather then a user error. In
that case I am not sure what should happen as it means that the threadpool
class has detected an internal error and it may not be able to continue to
operate.

I think letting the exception propagate is probably the way forward -
it's a shame I can't detect whether or not AppDomain.UnhandledException
has a handler attached already...
Yes, this part of the operation is very tricky. I think there is a fair bit
of work to do to ensure that it operates correctly.

I've got a freeware threadpool class that I got from a MSFT blog over a year
ago - I'll forward it to your email. It is structured very different from
yours but it may give you some ideas. To be honest I find yours much easier
to understand :-).

Received - I'll have a look. I think the problem is that there's a
potential race condition just about whatever you do. I think I'll try
to fix it so it creates threads in fewer cases, but when in doubt it
should create one. I suspect the logic is "when I add the item to the
queue, was it empty before? If so, start a thread if none are running.
Otherwise, start a thread if fewer than the maximum number of threads
are running."

Very definitely personal - I love it :-) It makes it easy for me to tell the
difference between a field and a property.

I do that with case instead - apart from constants which are in Pascal
case.
This definitely non-trivial to get it right...

Yup. I think I can probably factor out the "work out how long to wait
for" code, which would help.

Yes, setting it immediately before and after each worker item callback
should do the trick. I think you want to do both to ensure that the NT
thread scheduler treats each thread the same.

I don't think I want to set it before the worker item callback itself -
I want to set it before the "before work item" event is raised, so that
if that decides to change the priority, it can. I'll set it to the
default again after the "after work item" event is raised. Sound
reasonable?
I've used classes that implement this. The basic idea is that multiple
threads want to acquire a lock on a synchronization primitive, such as a
mutex. Rather then acquire/release it directly, the mutex is wrapped by a
class that initializes the mutex and which takes as an argument the base
priority that all threads attempting to acquire the lock must be at. When a
thread attempts to get the lock the mutex wrapper ensures that the thread is
executing at the correct priority level and then attempts to acquire the
lock. After releasing the lock it sets the priority back to the original
level (there's obviously more to it then that, but that's the basic idea).

This ensures that all threads that own the mutex are running at a base
priority level such that a thread that normally runs at a low priority level
will instead run at an elevated level. This prevents the system from
suspending the thread that owns the mutex in favor of a third (or more)
thread that is running at a level higher then the base level of the low
priority thread but below the priority level of the mutex.

Right. I can see how that works, but I'm not sure
I think a magic cookie may be the answer here. Just return a number that is
associated with the work item and return it from the Queueing call.

Again, I think I prefer the idea of the client providing the
identification, but after that it's fine.
Fair enough. I've read a bit about it on Brumme's weblog but I have not yet
seen the docs on it.

Right. I'll post again when I've implemented the changes and uploaded
them.
 
Unfortunately, that doesn't work due to .NET only allowing you to set
the thread name once :(

Oops, right - darn it. I really do wonder what the justification was for
that decision - I can't think of a good reason why the name of a thread has
set-once semantics.

Unfortunately it's not that simple, for the reasons you state later.

Given that all of this is entirely different to the way .NET treats
event handlers the rest of the time, I think I'll go for something like
this:

1) Implement the idea of cancelling further processing of the event - I
like that.

2) Use the normal .NET semantics of "whatever was subscribed when the
event is fired, use that" - don't try to react to changes during the
event.

3) Use the normal .NET semantics of an exception stopping further
processing.

4) Add some sort of exception logging for this and the other events.

All that sounds reasonable to me. After I sent out my last msg I started
thinking about the code snippet I had sent and realized there were a number
of problems with it...it's the sort of code you really want to think about
and not just whip up in 10 minutes.
Yup. That's where things get tricky. I think it *could* be implemented
using a very clever linked list to track the whole invocation list
(never actually combining any delegates) but I don't think it's worth
it, to be honest.

Agreed. If that ever became a requirement then the entire worker item/thread
mechanism may need to be reworked.

I remembered something that I missed the first time around - you can catch
the ThreadAbortException and reset it yourself (assuming you have sufficient
security).

Right. Presumably that means the code should be compiled with TRACE and
DEBUG defined, right? Otherwise those calls are removed entirely, I
believe. That's the other thing about libraries - having a debug
version and a separate release version is often a pain, IMO :(

Good point. You can always define those constants even for a release build
if necessary.
<snip>

[Exception handler getting parameters]
I don't think that matters much because if an exception occurs it is
already
going down a sub-optimal code path anyway.

But the problem is, you've got to keep the parameters around *whether
or not* an exception is thrown, just in case an exception is thrown.

Refactoring the code might provide a way to avoid that, but it probably
isn't worth the effort. I usually don't worry too much about the performance
and memory penalties associated with exceptions because once you accept that
after you have an exception the performance in that code path is horrible,
then the extra bit of horribleness isn't all that much worse. Unless you
have LOTS of exceptions occurring at a high rate, in which case the
application probably is doomed anyway.

I think it requires care in laying it all out, but I wouldn't let the
possible memory pressure stop me from providing some potentially extremely
useful functionality.

Another possibility to consider is to change the arguments in the work item
from an array to a single object. The caller can still provide an object
that is an array, but it wouldn't force everyone to allocate an array when a
single object will suffice.



I think letting the exception propagate is probably the way forward -
it's a shame I can't detect whether or not AppDomain.UnhandledException
has a handler attached already...

You could always add your own handler onto the chain, but I've not convinced
myself that this would be all that useful.
Received - I'll have a look. I think the problem is that there's a
potential race condition just about whatever you do. I think I'll try
to fix it so it creates threads in fewer cases, but when in doubt it
should create one. I suspect the logic is "when I add the item to the
queue, was it empty before? If so, start a thread if none are running.
Otherwise, start a thread if fewer than the maximum number of threads
are running."

That is exactly the type of logic that it would take, and while there are
ways to eliminate the race it involves a rather complicated handshaking
using two or more events - it isn't worth it in this case. I think a simpler
mechanism will suffice.

<snip>

I don't think I want to set it before the worker item callback itself -
I want to set it before the "before work item" event is raised, so that
if that decides to change the priority, it can. I'll set it to the
default again after the "after work item" event is raised. Sound
reasonable?

Yes that ought to do it. It needs to be set before the 1st callback occurs
and reset after the last callback has completed.

Cheers,
Dave
 
David Levine said:
All that sounds reasonable to me.

Having rejigged the implementation for most of the stuff we've talked
about, I've now actually just got exceptions from the events
propagating up. That means I don't have to swallow *any* exceptions,
which is quite nice. I still need to work out exactly what to do if an
exception occurs in the work item and there are no handlers available,
but that's a separable piece of work.
After I sent out my last msg I started
thinking about the code snippet I had sent and realized there were a number
of problems with it...it's the sort of code you really want to think about
and not just whip up in 10 minutes.
Absolutely.


I remembered something that I missed the first time around - you can catch
the ThreadAbortException and reset it yourself (assuming you have sufficient
security).

Mmm... Not entirely sure I should be doing that though.
[Exception handler getting parameters]
That's a nice idea. The disadvantage is that the parameters couldn't be
garbage collected until after a job completed or failed. I don't know
how significant a disadvantage that would be.

I don't think that matters much because if an exception occurs it is
already
going down a sub-optimal code path anyway.

But the problem is, you've got to keep the parameters around *whether
or not* an exception is thrown, just in case an exception is thrown.

Refactoring the code might provide a way to avoid that, but it probably
isn't worth the effort.

I don't see how it could though. Before calling the delegate, you've
got to make the decision about whether or not you're going to keep the
parameters around. If you are, you've got to keep them until the
delegate has finished executing. You can't make that decision at the
point of time when the work item throws an exception.
I usually don't worry too much about the performance
and memory penalties associated with exceptions because once you accept that
after you have an exception the performance in that code path is horrible,
then the extra bit of horribleness isn't all that much worse. Unless you
have LOTS of exceptions occurring at a high rate, in which case the
application probably is doomed anyway.

But the point is that the penalty is incurred for the non-exception
case as well...
I think it requires care in laying it all out, but I wouldn't let the
possible memory pressure stop me from providing some potentially extremely
useful functionality.

I've ended up making it optional. The work item class is now a separate
public class which publishes its parameter array, ID, priority and
whether or not parameters are preserved, allowing those to be set at
construction time. (It also allows the calling code to specify whether
the parameter array should be cloned at construction time or not.)
Another possibility to consider is to change the arguments in the work item
from an array to a single object. The caller can still provide an object
that is an array, but it wouldn't force everyone to allocate an array when a
single object will suffice.

An array has to be allocated either way, for the delegate invocation.
I've now added the params modifier so that the creation of the array
doesn't need to occur in client code though.

One thing neither of us noticed when talking about the cancelling and
prioritising was that Queue doesn't allow you to look at anything other
than the start of the queue, or add things anywhere other than the end.
So I've had to write a RandomAccessQueue collection which is most
efficient at normal queue operations, but allows other access too. I've
written it this morning, but given that I'm ill and it's about 500
lines long, I think I'll need to review it carefully before trusting it
too much. Writing code fast like that is energising but error-prone!
 
Having rejigged the implementation for most of the stuff we've talked
about, I've now actually just got exceptions from the events
propagating up. That means I don't have to swallow *any* exceptions,
which is quite nice. I still need to work out exactly what to do if an
exception occurs in the work item and there are no handlers available,
but that's a separable piece of work.

I'm not sure I'm following what's happening there, but that's ok for
now...when you get the code to a point you're happy with it make it
available and I'll take another look at it.

Mmm... Not entirely sure I should be doing that though.

What I'm thinking is that a callback routine may have a ThreadAbort injected
into it, but since it is a threadpool thread it is really owned by the
threadpool class, so it ought to log it and possibly reset it. Also,
depending on the semantics offered by the class, you may want to save a copy
of the exception object and make it available later, either by someone
requesting a copy of it, or rethrowing the same (or a wrapped) exception
object but on a different thread. This is similar to what the async
callbacks do - when an exception occurs on another thread the exception is
saved. Later, when the code calls EndInvoke the exception is rethrown.
I don't see how it could though. Before calling the delegate, you've
got to make the decision about whether or not you're going to keep the
parameters around. If you are, you've got to keep them until the
delegate has finished executing. You can't make that decision at the
point of time when the work item throws an exception.

I think you are more concerned about this then I am. But wouldn't you have
to save them until the callback had completed running anyway? I think I am
missing something here.
<snip>
I've ended up making it optional. The work item class is now a separate
public class which publishes its parameter array, ID, priority and
whether or not parameters are preserved, allowing those to be set at
construction time. (It also allows the calling code to specify whether
the parameter array should be cloned at construction time or not.)

I'd like to see the code before I comment on this.

An array has to be allocated either way, for the delegate invocation.
I've now added the params modifier so that the creation of the array
doesn't need to occur in client code though.
I was thinking about using the params keyword as well. I don't quite see why
the object context itself needs to be an array - I assume you have a
different purpose in mind then I do.
One thing neither of us noticed when talking about the cancelling and
prioritising was that Queue doesn't allow you to look at anything other
than the start of the queue, or add things anywhere other than the end.
So I've had to write a RandomAccessQueue collection which is most
efficient at normal queue operations, but allows other access too. I've
written it this morning, but given that I'm ill and it's about 500
lines long, I think I'll need to review it carefully before trusting it
too much. Writing code fast like that is energising but error-prone!

I love the smell of fresh code in the morning :-) Best let it simmer for a
day or two, then add a little seasoning before letting anyone have a taste
of it :-)

I think a random access queue is a useful construct.

Dave
 
David Levine said:
I'm not sure I'm following what's happening there, but that's ok for
now...when you get the code to a point you're happy with it make it
available and I'll take another look at it.

Yup. Should be later today.

Basically the only exceptions I specifically catch are those within the
work items themselves. Anything else can go to the AppDomain's
unhandled exception event.
What I'm thinking is that a callback routine may have a ThreadAbort injected
into it, but since it is a threadpool thread it is really owned by the
threadpool class, so it ought to log it and possibly reset it. Also,
depending on the semantics offered by the class, you may want to save a copy
of the exception object and make it available later, either by someone
requesting a copy of it, or rethrowing the same (or a wrapped) exception
object but on a different thread. This is similar to what the async
callbacks do - when an exception occurs on another thread the exception is
saved. Later, when the code calls EndInvoke the exception is rethrown.

Hmm. That's a possibility. I'm not sure about resetting thread aborts
though - that seems to me the kind of thing I shouldn't be doing, at
least not without being told to. Maybe I could make it an option.
I think you are more concerned about this then I am. But wouldn't you have
to save them until the callback had completed running anyway? I think I am
missing something here.

No, you don't need to save them. If you put them into just a local
variable, the JIT can easily notice when nothing else is going to read
them.
I'd like to see the code before I comment on this.

Understandable :)
I was thinking about using the params keyword as well. I don't quite see why
the object context itself needs to be an array - I assume you have a
different purpose in mind then I do.

I'm calling Delegate.Invoke, which uses an object array as its way of
passing parameters. I can't get away from that (unless I special case
specific parameter sets, which feels ugly to me) so I think
params object[] is fine. The only time it's a problem is when you want
to pass null or an object[] as the sole parameter, at which point you
need to cast that to object (otherwise it'll be viewed as the parameter
array itself).
I love the smell of fresh code in the morning :-) Best let it simmer for a
day or two, then add a little seasoning before letting anyone have a taste
of it :-)

I'll look over it again today and then publish it. The more eyes that
look at it, the better :)
I think a random access queue is a useful construct.

Likewise. Of course, that means someone else has probably already done
it. Currently the code has a few "TODO: This is too hard to think about
right now" bits, but only where efficiency is concerned. Basically I
should be using Array.Copy to do things in a bulky way where I'm
currently not, but that can improve over time (and it only occurs with
the ThreadPool when you cancel an item or insert one with a different
priority so it ends up in the middle of the queue).
 
I'll look over it again today and then publish it. The more eyes that
look at it, the better :)

Let me know when you have it ready; I'll save all my comments until then.

Dave
 
David Levine said:
Let me know when you have it ready; I'll save all my comments until then.

It's up now, in fact. I've also added the ability to make worker
threads foreground or background in the same way as thread priorities.
(So you can have threads that are background for most of the time, but
set them to foreground in the BeforeWorkItem if you want, to make sure
the CLR doesn't decide to quit half way through an item.)
 
It'll take me a few days to get to it...things are crazy at work right now.
I'll post back on this thread when I've had a chance to look at it.

PS: I took a quick glance and there have been some significant changes.
 
David Levine said:
It'll take me a few days to get to it...things are crazy at work right now.
I'll post back on this thread when I've had a chance to look at it.

Yup, no problem. I appreciate you looking at it at all, whenever that
happens to be :)
PS: I took a quick glance and there have been some significant changes.

Oh yes :) I don't think there's anything significant that we haven't
discussed though, apart from the background thread bit. The main loop
is refactored quite a bit, which should help readability.
 
I started a new thread on this.

Jon Skeet said:
Yup, no problem. I appreciate you looking at it at all, whenever that
happens to be :)


Oh yes :) I don't think there's anything significant that we haven't
discussed though, apart from the background thread bit. The main loop
is refactored quite a bit, which should help readability.
 
Back
Top