threading - suspend, resume, abort

  • Thread starter Thread starter Keith Langer
  • Start date Start date
K

Keith Langer

I have an application which consists of a main work thread and
multiple threads which each maintain a TCP socket.

When a configuration change occurs, all activity on the socket threads
must be halted. If a socket is no longer in the configuration, that
thread must be aborted and the socket must be closed. After the
configuration is complete, the remaining threads must be resumed.

I currently have the following architecture:

1) Suspend all threads
2) Abort threads which are no longer required and close their sockets
3) Resume all remaining threads.

If I suspend a thread and then abort it, CPU usage jumps to 100%. I
must first resume the thread and then abort it. This doesn't seem
right, so I'm wondering if I'm missing something here.

I also found in MSDN that Microsoft does not recommend using these
methods. If that's the case, then what is the proper technique?

thanks,
Keith
 
I would do it very differently:

Instead of interrupting all the threads and aborting some of them brutally
when the config changes, I would do the following:

Every thread must have some kind of loop like
while (ReadRequest()) {
DispatchRequest();
WriteResponse();
}
(this is pseudo-code of course).

In the ReadRequest, I would interrogate the configuration manager, with a
call like configManager.IsAllowed(connectionInfo), and terminate the loop if
the method returns false.

If the configuration changes, I would call a method like
configManager.ChangeConfig(configInfo).

Then, you only have to solve two problems:

* Make sure that your configuration manager is correctly synchronized. The
methods IsAllowed and ChangeConfig should be synchronized on a common lock
so that a thread cannot call IsAllowed while another thread is in the middle
of executing ChangeConfig.

* Use asynchronous rather than synchronous read in ReadRequest, so that you
can terminate the threads that are not allowed any more but are blocked
waiting for input. The idea is to call StartRead and then "Wait" on the
ASyncWaitHandle of the IAsyncResult. In your config manager, you should set
a flag in all the connections that need to be terminated and signal their
ASyncWaitHandle. This will allow you to terminate these thread without
having to wait for them to receive more input.

With this strategy, you don't need to use any of the thread control method
(Abort, Suspend, Resume), you only use synchronization primitives (lock,
wait). This is much cleaner and more efficient.

Also, you should avoid "Suspend" as much as possible, except if you are
going to resume the thread or abort it immediately afterwards. If you
Suspend a thread and don't resume it or abort it immediately afterwards, you
block its execution at an unknown point, and you may very well block it in a
piece of code where it has acquired some locks. This may prevent other
threads (that are not suspended) to acquire these locks, and you run the
risk of getting a general deadlock (at least until the thread is resumed).
So, be very careful with "Suspend" because it interferes in dangerous ways
with the lock/wait synchronization pattern.

Bruno.
 
Keith Langer said:
I have an application which consists of a main work thread and
multiple threads which each maintain a TCP socket.

When a configuration change occurs, all activity on the socket threads
must be halted. If a socket is no longer in the configuration, that
thread must be aborted and the socket must be closed. After the
configuration is complete, the remaining threads must be resumed.

I currently have the following architecture:

1) Suspend all threads
2) Abort threads which are no longer required and close their sockets
3) Resume all remaining threads.

If I suspend a thread and then abort it, CPU usage jumps to 100%. I
must first resume the thread and then abort it. This doesn't seem
right, so I'm wondering if I'm missing something here.

I also found in MSDN that Microsoft does not recommend using these
methods. If that's the case, then what is the proper technique?

Each socket's work thread should regularly check whether or not it
needs to pause/restart/shutdown. Usually this is just a case of
changing the condition of an otherwise endless loop.
 
Thanks guys. I took the approach of setting a flag from the main thread
which directs the socket thread to abort or suspend. Now the socket
thread handles these actions when it is safe to do so.

According to MSDN, the Abort method only executes when it is safe, so
I'm guessing that at the very least it waits until all locks are
released. Is this not the case? After testing it, I found that the
abort did take a few seconds.

One more question - if the main thread can read or write to a hashtable
and the socket thread reads from that hashtable, is there any danger of
an incomplete write operation while the socket thread is accessing an
item by its key? I am not using enumeration.

thanks,
Keith
 
If you have a flag, why do you still need to call abort or suspend?

My assumption was that you would set the flag on the threads that need to be
aborted, and only on these threads.

Then, the threads that don't have the flag set should continue running, they
don't need to be suspended.

And the threads that have the flag set should terminate by testing the flag.
You should test the flag in your loops (my assumption was that you only had
one main loop in every thread but you code may be a bit more complex) and
the thread will terminate. The only difficulty here is terminating threads
that may be waiting for input and thus won't test the flag until they
receive new input. My suggestion to use asynch read was a way to handle this
case.

Calling abort will not wait for locks to be released (may never happen), but
it will release all the locks by propagating a special exception. So, you
are ok here.

It seems to me that you are using a strange strategy to handle concurrent
access on a shared resource (your config data). Instead of protecting the
shared resource with monitors (using the C# lock keyword to synchronize the
methods that access the config data), you are interrupting all threads when
you need to modify the shared data, and you are resuming them after the
shared data has been modified. Your approach has several **major** problems:

* Performance: stopping and resuming all threads costs a lot more than
acquiring a monitor. Also, you interrupt everything everytime you need to
modify the shared data. If you used the monitor strategy instead, the other
threads will be able to continue while the config is being modified, at
least as long as they don't try to read the config (they they will block on
the monitor until the modification is complete).

* Correctness of your logic and risks of deadlock. The threads will get
suspended and resumed at random points in their execution. If the config is
stored in a hash table and if the threads use hash table lookup logic to
retrieve it, you run the risk of interrupting a thread in the middle of the
lookup operation. For example, the thread may have computed the bucket index
but may not have obtained the bucket itself. If the main thread interrupts
at this specific point and triggers a rehash of the table, the thread that
does the lookup will do unpredictable things when it gets resumed (because
the hash table will contain a different bucket at the index that was
computed before the interruption).
And, things are even worse if you synchronize the hash table. In this case,
the thread that does the lookup will acquire the monitor on the hash table
to do the lookup. Then the main thread will suspend it in the middle of the
lookup. And then the main thread will try to modify the hash table and will
block forever because the table's monitor is acquired by a thread that is
suspended (suspending a thread does not release the monitors that it owns).
Your server will be dead!

So, the answer to your last question is: Yes, you will be in serious trouble
if you use this strategy and if you share a hashtable between your main
thread and the socket thread.

So, I strongly encourage your to avoid suspend and abort and to code
everything with monitors.

Bruno
 
According to MSDN, the Abort method only executes when it is safe, so
I'm guessing that at the very least it waits until all locks are
released. Is this not the case? After testing it, I found that the
abort did take a few seconds.
No, that is not correct. The link below is to the Abort documentation, where
it makes it clear that this is not the case. You may be experiencing a delay
because the thread you are aborting may actually be executing unmanaged
code, and the abort exception will not be raised until after the thread
returns to managed code.

http://msdn.microsoft.com/library/d...rlrfsystemthreadingthreadclassaborttopic2.asp

As Bruno pointed out, using Suspend/Resume is not the way you should be
doing this.
 
The flag is something I just added so that instead of the main thread
aborting the socket thread, the socket thread aborts itself. You are
right that there is one loop (wouldn't an async call create another
thread?). There is essentially a main loop and an inner "retry" loop,
and in both places I check for the abort flag and throw an exception if
it's true. The main loop handles this exception by closing the socket
and aborting it's own thread. On the outer loop, I check for the
"suspend" flag, but it might not even be necessary to suspend
activities.

As far as config operations interfering with the socket operations, this
does not happen regularly. There are two types of changes which could
affect the socket thread -

1) A change to the socket address or port, which causes the existing
thread to be aborted and a new one to be created.

2) The addition, modificaton, or deletion of a device on the socket. In
this case, the only risk is that the socket thread is processing a
message for a device that has just been deleted. Each message contains
a key which references the source device. Before sending each message,
a reference to the device is obtained (via a hashtable lookup) so that
it can validate the response through the device object. If the device
does not exist (because the main thread removed it from the hashtable),
an exception is thrown and the messages for that device are not sent.

Obtaining the reference to the device is the main operation that could
be stepped on by the configuration thread (if the device is deleted).
I've thought about using a queue to passively inform the socket thread
of device additions and deletions, but I'm wondering if this is
overkill. I think there is still risk with this approach, since I
either have to pass a reference to the device through the queue, or I
would have to pass a key and then access the device from a global
collection (which could also be modified by the main thread).

Any thoughts?
Keith
 
Bruno,

I removed the suspend flag (but kept the abort flag) and added locks to
the hashtable when the socket thread reads from it or when the main
thread removes from it. Does this approach sound better?

Keith
 
Bruno,

One more question - is there much additional overhead to a synclock if
two threads never actually lock the object at the same time?
 
Hi Keith,

See my comments inline...

Keith Langer said:
The flag is something I just added so that instead of the main thread
aborting the socket thread, the socket thread aborts itself. You are
right that there is one loop (wouldn't an async call create another
thread?). There is essentially a main loop and an inner "retry" loop,
and in both places I check for the abort flag and throw an exception if
it's true. The main loop handles this exception by closing the socket
and aborting it's own thread. On the outer loop, I check for the
"suspend" flag, but it might not even be necessary to suspend
activities.

Sounds good.
As far as config operations interfering with the socket operations, this
does not happen regularly. There are two types of changes which could
affect the socket thread -

1) A change to the socket address or port, which causes the existing
thread to be aborted and a new one to be created.

2) The addition, modificaton, or deletion of a device on the socket. In
this case, the only risk is that the socket thread is processing a
message for a device that has just been deleted. Each message contains
a key which references the source device. Before sending each message,
a reference to the device is obtained (via a hashtable lookup) so that
it can validate the response through the device object. If the device
does not exist (because the main thread removed it from the hashtable),
an exception is thrown and the messages for that device are not sent.

Obtaining the reference to the device is the main operation that could
be stepped on by the configuration thread (if the device is deleted).
I've thought about using a queue to passively inform the socket thread
of device additions and deletions, but I'm wondering if this is
overkill. I think there is still risk with this approach, since I
either have to pass a reference to the device through the queue, or I
would have to pass a key and then access the device from a global
collection (which could also be modified by the main thread).

Sounds good too. I would go for the simple hashtable solution.
The queue just makes things more complex.

Whether you use a central hashtable lookup or a queue, you need
synchronization because
you will have more than one thread accessing a common data structure.
Any thoughts?

A bit more in response to your other replies...
 
Keith Langer said:
Bruno,

I removed the suspend flag (but kept the abort flag) and added locks to
the hashtable when the socket thread reads from it or when the main
thread removes from it. Does this approach sound better?

Sounds good. Do you still have calls to Thread.Suspend/Resume/Abort or did
you manage to make everything work with the abort flag only and the locks to
protect the hash table? If you managed to get rid of the
Suspend/Resume/Abort, you are definitely on the right tracks.
 
Keith Langer said:
Bruno,

One more question - is there much additional overhead to a synclock if
two threads never actually lock the object at the same time?

Yes, there is overhead. But the overhead is not that dramatic and synclocks
are heavily used in the OS to handle multi-thread access on shared
resources. So, you should not worry about it in this case.

IMO, locks are only a real source of performance degradation in the
following situations:

1) when they are used defensively to protect data structures that will
**not** be shared by threads 99% of the time. The typical example of this is
the Java JDK 1.1 where all the methods of the collection classes (Vector,
Hashtable) were synchronized.
99 % of the time, these collections are allocated by one thread, manipulated
only by this thread and then released, so all the collection related code
was penalized because of a very defensive design choice.

2) when a real lock (a lock that is really needed to protect a real shared
object) is acquired in a tight loop that does not contain a wait. In this
case, the code should be re-analyzed and the lock should be moved around
the loop rather than inside.

But otherwise, synclocks are really the way to go. They allow you to control
multi-thread access in a very clean way and you should not worry about the
perf impact. If you have to share a resource among threads, use a synclock,
that's the way to go!

Bruno.
 
Back
Top