Threading issues and question

  • Thread starter Thread starter JMZ
  • Start date Start date
J

JMZ

I have a .Net service that processes messages contained in disk files using 1
or more threads. We run this on a server with 4 CPUs. We allow it to have up
to 8 active threads processing messages from a synchronized blocking queue.
The threads are long-running custom threadpool threads. Each thread can be in
one of several states - running, suspended, stopping, stopped. These are
logical states and not the states associated with the actual ThreadState
property of the thread. We use AutoResetEvents to manage them.

The service is scheduled using an Alarm component that reads its schedule
from the database. In each logical schedule, the number of threads and the
next schedule start are assigned. The Alarm is set to the next schedule start
time and raises an event which is handled by the method that reads the
schedule from the database. Each time this method runs, it sets the number of
running threads, along with some other associated data, and sets the next
Alarm.

Each thread is created at startup of the service, and contains many object
instances, including datasets and dataviews of static data.

The main dispatching thread is responsible for looking for the message
files, enqueuing them, and pausing whenever the number of threads is set to
zero. We use zero during times we want processing to stop for a specified
period, such as when we run nightly ETL. These periods are also part of the
schedule in the database. The schedule consists of all the periods we want to
change processing parameters for a week. Each week, the schedule repeats.
During off hours, we take advantage by scheduling more threads. During
business hours, we schedule fewer threads to reduce the performance impact to
an associated web application that shares its database.

We have done significant performance testing to ensure this methodology does
in fact produce the best performance. The .Net ThreadPool, in our testing,
did not perform well. Also, creating a new thread for each new message is
extremely slow because of the large number of objects it must create. The
custom threadpool perfoms very well at 1 to 2 seconds per message.

Occaisionally, the service terminates unexpectedly with no event log entries
or any other events that could be associated to it. Sometimes it stops during
a schedule change. Most of the time, we end up with one or more messages in
an inconsistent state, which we have to then recover.

My questions are as follows:

1. With a custom threadpool, should I be recycling the threads at some
interval, creating all new objects and datasets, etc?

2. Are there any other factors I should be considering that I have not
touched on above?

Thank you very much for any help.
 
JMZ said:
[...]
Each thread is created at startup of the service, and contains many object
instances, including datasets and dataviews of static data.

[...]
We have done significant performance testing to ensure this methodology does
in fact produce the best performance. The .Net ThreadPool, in our testing,
did not perform well. Also, creating a new thread for each new message is
extremely slow because of the large number of objects it must create. The
custom threadpool perfoms very well at 1 to 2 seconds per message.

Occaisionally, the service terminates unexpectedly with no event log entries
or any other events that could be associated to it. Sometimes it stops during
a schedule change. Most of the time, we end up with one or more messages in
an inconsistent state, which we have to then recover.

My questions are as follows:

1. With a custom threadpool, should I be recycling the threads at some
interval, creating all new objects and datasets, etc?

No, there should be no need to reset a thread, ever. They don't
deteriorate in any way unless your own code is flawed.
2. Are there any other factors I should be considering that I have not
touched on above?

Well:

– Your comment about "Each thread…contains many object instances"
doesn't make any sense, unless you are using thread-local storage for
those objects and they are not visible to any other thread. Normally,
object data and thread instances are orthogonal concepts.

– It's not clear from your post why you think the built-in thread
pool performs poorly. It is actually a reasonably efficient way to deal
with things, but it's certainly possible to misuse. From your
description it seems that you don't really need a thread pool, but
rather just a producer/consumer design with multiple consumers running
concurrently.

– An unexpected termination could be any number of things, but most
likely you've got important thread throwing an exception and taking the
process down with it. Impossible to say for sure without a
concise-but-complete code example that reliably demonstrates the problem.

Pete
 
Thanks Peter.

Yes, each thread has its own local objects which it reuses for each message
it processes.

By using the .Net threadpool, we have to make all the objects each message
needs global and manage them so that no two threads can use them
concurrently, aside from marking the public APIs Synchronized. While we did
not attempt this scenario per se because of the massive rewrite to do it,
it's hard to believe it could possible perform better than < 2 seconds per
message given that threads may not be available under all conditions.

The model is actually producer (1)/consumer (1 to 8). The producer is in the
service's main thread created at startup. From there, it creates the pool of
8 threads. Each of the threads creates it's own objects and then enters a
loop, waiting to be made active if it is suspended, and if active, waiting
for messages to show up in the queue.

Each thread also access a synchronized list of Strings. When a thread
dequeues a message, it performs some basic processing and then converts it
into a standard XML format. Once there, it extracts 2 Strings from the XML
and checks the list for them, concatenated (synchronized of course). If the
String is not found, it is placed on the list and the critical section ends.
The message then continues processing. If the String IS found on the list,
the thread pauses and loops, checking the list at specified intervals until
it no longer finds the String on the list. When the String is no longer on
the list, the thread continues as stated earlier. The goal, of course, is for
no two threads to process messages with the same concatenated String value
simultaneously. This is to avoid duplicate records from being created in the
database. Because the List object was always used in the service, even beofre
it started failing, I doubt that it is causing any problems.

Now, I realize that some big unhandled exception is occuring, but all the
code is enclosed in Try..Catch blocks.

We cannot reliably recreate the problem because we do not know how to
trigger it. And believe me, we have tried.

I forgot to mention that this defect only started after changing to
producer/consumer from strictly using a new thread for each message with no
object reuse.





Peter Duniho said:
JMZ said:
[...]
Each thread is created at startup of the service, and contains many object
instances, including datasets and dataviews of static data.

[...]
We have done significant performance testing to ensure this methodology does
in fact produce the best performance. The .Net ThreadPool, in our testing,
did not perform well. Also, creating a new thread for each new message is
extremely slow because of the large number of objects it must create. The
custom threadpool perfoms very well at 1 to 2 seconds per message.

Occaisionally, the service terminates unexpectedly with no event log entries
or any other events that could be associated to it. Sometimes it stops during
a schedule change. Most of the time, we end up with one or more messages in
an inconsistent state, which we have to then recover.

My questions are as follows:

1. With a custom threadpool, should I be recycling the threads at some
interval, creating all new objects and datasets, etc?

No, there should be no need to reset a thread, ever. They don't
deteriorate in any way unless your own code is flawed.
2. Are there any other factors I should be considering that I have not
touched on above?

Well:

– Your comment about "Each thread…contains many object instances"
doesn't make any sense, unless you are using thread-local storage for
those objects and they are not visible to any other thread. Normally,
object data and thread instances are orthogonal concepts.

– It's not clear from your post why you think the built-in thread
pool performs poorly. It is actually a reasonably efficient way to deal
with things, but it's certainly possible to misuse. From your
description it seems that you don't really need a thread pool, but
rather just a producer/consumer design with multiple consumers running
concurrently.

– An unexpected termination could be any number of things, but most
likely you've got important thread throwing an exception and taking the
process down with it. Impossible to say for sure without a
concise-but-complete code example that reliably demonstrates the problem.

Pete
.
 
Yes, but we haven't been able to reproduce the problem.

I realize that it's very difficult to help troubleshoot this problem without
having the code and data, but I appreciate all your comments and suggestions.
If I am able to gather more data around this issue, I will post back here.

Again, thank you.

Michael Ober said:
Can you run this inside a debugger?

Mike Ober.

JMZ said:
Thanks Peter.

Yes, each thread has its own local objects which it reuses for each
message
it processes.

By using the .Net threadpool, we have to make all the objects each message
needs global and manage them so that no two threads can use them
concurrently, aside from marking the public APIs Synchronized. While we
did
not attempt this scenario per se because of the massive rewrite to do it,
it's hard to believe it could possible perform better than < 2 seconds per
message given that threads may not be available under all conditions.

The model is actually producer (1)/consumer (1 to 8). The producer is in
the
service's main thread created at startup. From there, it creates the pool
of
8 threads. Each of the threads creates it's own objects and then enters a
loop, waiting to be made active if it is suspended, and if active, waiting
for messages to show up in the queue.

Each thread also access a synchronized list of Strings. When a thread
dequeues a message, it performs some basic processing and then converts it
into a standard XML format. Once there, it extracts 2 Strings from the XML
and checks the list for them, concatenated (synchronized of course). If
the
String is not found, it is placed on the list and the critical section
ends.
The message then continues processing. If the String IS found on the list,
the thread pauses and loops, checking the list at specified intervals
until
it no longer finds the String on the list. When the String is no longer on
the list, the thread continues as stated earlier. The goal, of course, is
for
no two threads to process messages with the same concatenated String value
simultaneously. This is to avoid duplicate records from being created in
the
database. Because the List object was always used in the service, even
beofre
it started failing, I doubt that it is causing any problems.

Now, I realize that some big unhandled exception is occuring, but all the
code is enclosed in Try..Catch blocks.

We cannot reliably recreate the problem because we do not know how to
trigger it. And believe me, we have tried.

I forgot to mention that this defect only started after changing to
producer/consumer from strictly using a new thread for each message with
no
object reuse.





Peter Duniho said:
JMZ wrote:
[...]
Each thread is created at startup of the service, and contains many
object
instances, including datasets and dataviews of static data.

[...]
We have done significant performance testing to ensure this methodology
does
in fact produce the best performance. The .Net ThreadPool, in our
testing,
did not perform well. Also, creating a new thread for each new message
is
extremely slow because of the large number of objects it must create.
The
custom threadpool perfoms very well at 1 to 2 seconds per message.

Occaisionally, the service terminates unexpectedly with no event log
entries
or any other events that could be associated to it. Sometimes it stops
during
a schedule change. Most of the time, we end up with one or more
messages in
an inconsistent state, which we have to then recover.

My questions are as follows:

1. With a custom threadpool, should I be recycling the threads at some
interval, creating all new objects and datasets, etc?

No, there should be no need to reset a thread, ever. They don't
deteriorate in any way unless your own code is flawed.

2. Are there any other factors I should be considering that I have not
touched on above?

Well:

– Your comment about "Each thread…contains many object instances"
doesn't make any sense, unless you are using thread-local storage for
those objects and they are not visible to any other thread. Normally,
object data and thread instances are orthogonal concepts.

– It's not clear from your post why you think the built-in thread
pool performs poorly. It is actually a reasonably efficient way to deal
with things, but it's certainly possible to misuse. From your
description it seems that you don't really need a thread pool, but
rather just a producer/consumer design with multiple consumers running
concurrently.

– An unexpected termination could be any number of things, but most
likely you've got important thread throwing an exception and taking the
process down with it. Impossible to say for sure without a
concise-but-complete code example that reliably demonstrates the problem.

Pete
.

.
 
* JMZ wrote, On 6-1-2010 21:27:
Peter Duniho said:
JMZ said:
[...]
Each thread is created at startup of the service, and contains many object
instances, including datasets and dataviews of static data.

[...]
We have done significant performance testing to ensure this methodology does
in fact produce the best performance. The .Net ThreadPool, in our testing,
did not perform well. Also, creating a new thread for each new message is
extremely slow because of the large number of objects it must create. The
custom threadpool perfoms very well at 1 to 2 seconds per message.

Occaisionally, the service terminates unexpectedly with no event log entries
or any other events that could be associated to it. Sometimes it stops during
a schedule change. Most of the time, we end up with one or more messages in
an inconsistent state, which we have to then recover.

My questions are as follows:

1. With a custom threadpool, should I be recycling the threads at some
interval, creating all new objects and datasets, etc?

No, there should be no need to reset a thread, ever. They don't
deteriorate in any way unless your own code is flawed.
2. Are there any other factors I should be considering that I have not
touched on above?

Well:

– Your comment about "Each thread…contains many object instances"
doesn't make any sense, unless you are using thread-local storage for
those objects and they are not visible to any other thread. Normally,
object data and thread instances are orthogonal concepts.

– It's not clear from your post why you think the built-in thread
pool performs poorly. It is actually a reasonably efficient way to deal
with things, but it's certainly possible to misuse. From your
description it seems that you don't really need a thread pool, but
rather just a producer/consumer design with multiple consumers running
concurrently.

– An unexpected termination could be any number of things, but most
likely you've got important thread throwing an exception and taking the
process down with it. Impossible to say for sure without a
concise-but-complete code example that reliably demonstrates the problem.

Pete
Thanks Peter.

Yes, each thread has its own local objects which it reuses for each message
it processes.

By using the .Net threadpool, we have to make all the objects each message
needs global and manage them so that no two threads can use them
concurrently, aside from marking the public APIs Synchronized. While we did
not attempt this scenario per se because of the massive rewrite to do it,
it's hard to believe it could possible perform better than< 2 seconds per
message given that threads may not be available under all conditions.

The model is actually producer (1)/consumer (1 to 8). The producer is in the
service's main thread created at startup. From there, it creates the pool of
8 threads. Each of the threads creates it's own objects and then enters a
loop, waiting to be made active if it is suspended, and if active, waiting
for messages to show up in the queue.

Each thread also access a synchronized list of Strings. When a thread
dequeues a message, it performs some basic processing and then converts it
into a standard XML format. Once there, it extracts 2 Strings from the XML
and checks the list for them, concatenated (synchronized of course). If the
String is not found, it is placed on the list and the critical section ends.
The message then continues processing. If the String IS found on the list,
the thread pauses and loops, checking the list at specified intervals until
it no longer finds the String on the list. When the String is no longer on
the list, the thread continues as stated earlier. The goal, of course, is for
no two threads to process messages with the same concatenated String value
simultaneously. This is to avoid duplicate records from being created in the
database. Because the List object was always used in the service, even beofre
it started failing, I doubt that it is causing any problems.

Now, I realize that some big unhandled exception is occuring, but all the
code is enclosed in Try..Catch blocks.

We cannot reliably recreate the problem because we do not know how to
trigger it. And believe me, we have tried.

I forgot to mention that this defect only started after changing to
producer/consumer from strictly using a new thread for each message with no
object reuse.


have you tried the following two events to see if you can find the problem:
-
http://msdn.microsoft.com/en-us/library/system.windows.forms.application.threadexception.aspx
http://msdn.microsoft.com/en-us/library/system.appdomain.unhandledexception.aspx

Second, It seems like a very difficult way of trying to keep duplicate
objects from being created. I am not completely sure I understand what
you're doing exactly, but first of all, why search through a
concatenated list of strings? Isn't it easier to just search for the
single string instances?

I read things like, wait in a loop until the string becomes available...
You could just use a WaitHandle and signal every time a string is being
removed to wake up those threads waiting, then let them check for the
string they're waiting for and then just wait for the handle again. You
might even be able to create a waithandle for every string you're
waiting for, so that you can signal the next in line that you're ready...

Lastly, 2 seconds, seems liek an awfully long time for what you're
doing. Of course we don't know how big these files are, what they
contain and how you continue after parsing... but in my books 2 seconds
can be a very long time. So my read question is, can't you just handle
the messages one by one and speed up the total time it takes to handle a
message, before trying to do them in parallel with all this additional work?

Jesse
 
Back
Top