All Your Thread Are Belong to Us

  • Thread starter Thread starter C# Learner
  • Start date Start date
C

C# Learner

Hi all,

I have a nasty bug in my multi-threaded app which I think is being
caused by "race conditions" with multiple threads. I'm getting data
corruption.

The app is an HTTP server I'm making just for fun. I noticed the bug
yesterday night when performing a little stress testing on it. When
under stress, I find that a particular variable sometimes has an
incorrect value.

I'm not quite a threading expert, so I may be making a stupid mistake...

How can I go about getting help with this? I'd probably have to give
someone the whole project to get help, so they can observe the problems
themselves, I guess...
 
C# Learner said:
Hi all,

I have a nasty bug in my multi-threaded app which I think is being caused
by "race conditions" with multiple threads. I'm getting data corruption.

The app is an HTTP server I'm making just for fun. I noticed the bug
yesterday night when performing a little stress testing on it. When under
stress, I find that a particular variable sometimes has an incorrect
value.

I'm not quite a threading expert, so I may be making a stupid mistake...

How can I go about getting help with this? I'd probably have to give
someone the whole project to get help, so they can observe the problems
themselves, I guess...

First and foremost, is that variable accessible from *one* place or is it
directly accessed in multiple places?
If its the latter and the variable isn't used as an out or ref parameter,
you might be able to turn the variable into a property temporarily and input
some logging to figure out why things are going wrong as well as placing a
breakpoint there.

Its probably a race condition, but it could also be some subtle bug on a
rarely crossed codepath you havn't noticed yet(such as a casing difference
causing an invalid assignment, etc).
 
C# Learner said:
I have a nasty bug in my multi-threaded app which I think is being
caused by "race conditions" with multiple threads. I'm getting data
corruption.

The app is an HTTP server I'm making just for fun. I noticed the bug
yesterday night when performing a little stress testing on it. When
under stress, I find that a particular variable sometimes has an
incorrect value.

I'm not quite a threading expert, so I may be making a stupid mistake...

How can I go about getting help with this? I'd probably have to give
someone the whole project to get help, so they can observe the problems
themselves, I guess...

I have an unfinished threading tutorial which may still be of some
help:

http://www.pobox.com/~skeet/csharp/multithreading.html

Let me know if it raises any further questions.
 
Daniel said:

Hi Daniel,

Thanks for the reply -- it made me take a step back and think at a
higher level.

After several hours, I've finally fixed the two bugs that were causing
the problems.

The first bug was that I was calling ArrayList.Remove instead of
ArrayList.RemoveAt. I was passing an 'int' as a parameter to Remove,
when the ArrayList was holding only references to instances of class
Connection. Doh!

I'm kicking myself now because of this. What seems weird here, though,
is that ArrayList.Remove doesn't throw an exception when the passed
reference doesn't exist in the list.

The other bug was a race condition which I solved by locking a
particular block of code that I hadn't given enough prior thought to.

I appreciated your reply.

I'm gonna have to let my brain rest for awhile now.

See ya.
 
C# Learner said:
Daniel said:

Hi Daniel,

Thanks for the reply -- it made me take a step back and think at a higher
level.

After several hours, I've finally fixed the two bugs that were causing the
problems.

The first bug was that I was calling ArrayList.Remove instead of
ArrayList.RemoveAt. I was passing an 'int' as a parameter to Remove, when
the ArrayList was holding only references to instances of class
Connection. Doh!
Ooops, Generics would have helped in this case, ;).
I'm kicking myself now because of this. What seems weird here, though, is
that ArrayList.Remove doesn't throw an exception when the passed reference
doesn't exist in the list.

It is strange, but as it stands ArrayList.Remove is pretty much a fire and
forget method. Most of the existing collectinos are *very* forgiving. This
may not stand in the face of generics. For example, Dictionary<T,U>
currently throws an exception when a key doesn't exist instead of returning
null(there is a discussion on this which I'll post a link to tomorrow, to
tired to find it right now). I

Its really hard to say which is better, virtually any answer you get is
always specific to what you are doing and if you think hard enough, you can
usually come up with an argument for hte other way of doing things.
The other bug was a race condition which I solved by locking a particular
block of code that I hadn't given enough prior thought to.
Multithreading is a demon like that, its generally teh stuff you don't
expect more so than what you do, especially when you start evolving a code
base that is only multi-thread safe where it is(was) possible for multiple
threads to reach. As the code grows, you sometimes add race conditions or
deadlocks where you didn't figure they were possible when you designed your
locking code.
 
Jon said:
I have an unfinished threading tutorial which may still be of some
help:

http://www.pobox.com/~skeet/csharp/multithreading.html

Let me know if it raises any further questions.

I read it earlier, in fact. :-) I bookmarked it the other day when you
mentioned it, then decided to read it during this bug crisis.

I did notice that it's unfinished because I didn't see an explanation
for why you created a new variable just for the purposes of locking
(countLock), but it's a very nice, easy-to-understand tutorial. It did
help, too, as I'm still pretty new to locking, and it cleared up some
doubts I had about it.

My problems seem to be fixed now (as detailed in my reply to Daniel), so
things are looking okay for now. I'll double-check all my changes
later on, though, to be sure...

Thanks.
 
C# Learner said:
I read it earlier, in fact. :-) I bookmarked it the other day when you
mentioned it, then decided to read it during this bug crisis.

Cool :)
I did notice that it's unfinished because I didn't see an explanation
for why you created a new variable just for the purposes of locking
(countLock), but it's a very nice, easy-to-understand tutorial. It did
help, too, as I'm still pretty new to locking, and it cleared up some
doubts I had about it.

Excellent. If you have any suggestions as to how to improve it (other
than to finish it!) please let me know.
My problems seem to be fixed now (as detailed in my reply to Daniel), so
things are looking okay for now. I'll double-check all my changes
later on, though, to be sure...

Glad you're sorted. I find there's a particular satisfaction in
squashing threading bugs. Maybe it's because they're usually so hard to
find!
 
Daniel O'Connell [C# MVP] wrote:

[...]
Ooops, Generics would have helped in this case, ;).

Yeah. :-D
It is strange, but as it stands ArrayList.Remove is pretty much a fire and
forget method. Most of the existing collectinos are *very* forgiving. This
may not stand in the face of generics. For example, Dictionary<T,U>
currently throws an exception when a key doesn't exist instead of returning
null(there is a discussion on this which I'll post a link to tomorrow, to
tired to find it right now). I

Alright. :-)

G'night.

[...]
 
Jon Skeet [C# MVP] wrote:

[...]
Excellent. If you have any suggestions as to how to improve it (other
than to finish it!) please let me know.

I don't remember noticing anything that I felt could've been improved.
In any case, I'll read it again later once I've rested my brain, and
I'll post here if I find anything.
Glad you're sorted. I find there's a particular satisfaction in
squashing threading bugs. Maybe it's because they're usually so hard to
find!

Yeah, it's a real weight off my shoulders. :-)
 
Jon,

I do have a comment on the tutorial. When talking about deadlocks you
suggest that programmers take care to take multiple locks in the same order.
That is a very difficult rule to enforce as you note, I would say you should
not acquire more than one lock at a time. If you need to lock two resources
at the same time than you need to control that with a single lock (and I
mean a single lock, there will be no way to lock just one resource). That's
lot easier to enforce than the order in which locks can be acquired.

Jerry
 
Jerry Pisk said:
I do have a comment on the tutorial. When talking about deadlocks you
suggest that programmers take care to take multiple locks in the same order.
That is a very difficult rule to enforce as you note, I would say you should
not acquire more than one lock at a time. If you need to lock two resources
at the same time than you need to control that with a single lock (and I
mean a single lock, there will be no way to lock just one resource). That's
lot easier to enforce than the order in which locks can be acquired.

I don't think it always makes sense though. For a start, locks can be
in different classes, so one combined lock may well be hard to contrive
elegantly (as it would be shared between two classes).

I'll have a think about it though - maybe suggest it as an alternative
approach.
 
Well, actually locking only one object at a time is merely a special case of
always locking in the same order: There is only one possible order for one
object ;-)

However, I do think the deadlocks section could mention more pitfalls: for
example, you do mention that Monitor.Wait does release and reacquire a
lock - chances are good that locks are in wrong order after that...
Or: Some methods (like Control.Invoke or anything else that uses a windows
message queue) contain "implicit locks"; If the main thread is waiting for a
worker thread while that thread issues a Control.Invoke call, the call will
deadlock.

And I think a "how do I shut down my worker threads cleanly"-section would
be very helpful;

But that's enough critic: I wished I'd have read that article before I
started multithreaded programming...

Niki

Jerry Pisk said:
Jon,

I do have a comment on the tutorial. When talking about deadlocks you
suggest that programmers take care to take multiple locks in the same order.
That is a very difficult rule to enforce as you note, I would say you should
not acquire more than one lock at a time. If you need to lock two resources
at the same time than you need to control that with a single lock (and I
mean a single lock, there will be no way to lock just one resource). That's
lot easier to enforce than the order in which locks can be acquired.

Jerry
 
Jerry Pisk said:
Jon,

I do have a comment on the tutorial. When talking about deadlocks you
suggest that programmers take care to take multiple locks in the same order.
That is a very difficult rule to enforce as you note, I would say you should
not acquire more than one lock at a time. If you need to lock two resources
at the same time than you need to control that with a single lock (and I
mean a single lock, there will be no way to lock just one resource). That's
lot easier to enforce than the order in which locks can be acquired.

Of course, you will avoid deadlocks if you replace two locks by a single
one, BUT there is also a good chance that you will reduce the level of
concurrency (and thus the throughput of your server if you are writing a
multithreaded server). So, IMO, the right way to go is to follow Jon's
avice, i.e., keep several locks and acquire them always in the same order.

Let's consider the following situation:
* You have two locks L1 and L2
* The code controlled by L1 is low level logic that executes very fast (does
not perform any I/O) and it is called from many places in your code (not
just from code controlled by L2).
* The code controlled by L2 is higher level logic that executes more slowly
(performs some I/O for example) and it calls code controlled by L1 (L1 is
always acquired after L2 to prevent deadlocks).

If you replace L1 and L2 by a single lock, you will end up waiting for L2
when you call the low level code controlled by L1. So operations that are
not I/O bound will have to wait for I/O completion on other operations. This
is very bad and it can have a dramatic impact on the throughput of a
multi-threaded server!

So, multithreading is really a subtle issue. Writing "correct" code, i.e.,
code that does not deadlock and that accesses shared data in a disciplined
way, is already difficult, but writing "efficient" code, i.e., code that
maximizes concurrent execution, is even more difficult. You need to
elaborate a good "locking strategy" if you want to take full advantage of
multi-threading (of course, I am talking about writing multi-threaded
servers, there are other scenarios that are not so demanding).

Bruno.
 
Niki Estner said:
Well, actually locking only one object at a time is merely a special case of
always locking in the same order: There is only one possible order for one
object ;-)

However, I do think the deadlocks section could mention more pitfalls: for
example, you do mention that Monitor.Wait does release and reacquire a
lock - chances are good that locks are in wrong order after that...

No! If your code paths guarantee that the locks are always acquired in the
same order, the will always be, even if there is wait that releases the
locks and reacquires them (they will reacquired in the order in which they
were acquired before entering the wait).
If this was not the case, Wait and Pulse would be completely unusable!
Or: Some methods (like Control.Invoke or anything else that uses a windows
message queue) contain "implicit locks"; If the main thread is waiting for a
worker thread while that thread issues a Control.Invoke call, the call will
deadlock.

Yes, you have to be careful with the locks placed by the framework, and in
general with "callbacks" (events, virtual methods overriden by code that you
don't control). You should try to issue your callbacks from code sections
that are not protected by locks to avoid deadlocks. This is not always easy
to achieve.

Bruno.
And I think a "how do I shut down my worker threads cleanly"-section would
be very helpful;

But that's enough critic: I wished I'd have read that article before I
started multithreaded programming...

Niki

Jerry Pisk said:
Jon,

I do have a comment on the tutorial. When talking about deadlocks you
suggest that programmers take care to take multiple locks in the same order.
That is a very difficult rule to enforce as you note, I would say you should
not acquire more than one lock at a time. If you need to lock two resources
at the same time than you need to control that with a single lock (and I
mean a single lock, there will be no way to lock just one resource). That's
lot easier to enforce than the order in which locks can be acquired.

Jerry
Daniel),
 
Bruno Jouhier said:
...

No! If your code paths guarantee that the locks are always acquired in the
same order, the will always be, even if there is wait that releases the
locks and reacquires them (they will reacquired in the order in which they
were acquired before entering the wait).

Where is that information from? The Docs clearly have a different position:
"Wait releases the lock for the specified object only; if the caller is the
owner of locks on other objects, these locks are not released" (copied from
MSDN Article - Monitor.Wait Method)
Let's assume we have two locks, L1 and L2:
Thread A aquires L1, acquires L2, then waits for L1 (accoring to the docs
releasing L1, not L2)
Thread B acquires L1, tries to acquire L2 -> deadlock (Thread B is waiting
for thread A, and noone else could Pulse L1, so they both wait forever)
If this was not the case, Wait and Pulse would be completely unusable!

Not at all: calling Wait while holding multiple locks isn't good style
anyway; Shared resources should be locked for as short as possible.
Yes, you have to be careful with the locks placed by the framework, and in
general with "callbacks" (events, virtual methods overriden by code that you
don't control).

There's nothing wrong about callbacks - synchronized containers or older COM
objects will have the same problems in ordinary function calls.
Reading from a socket might cause a deadlock too, even across
application-boundaries, without any callbacks in them. (which doesn't mean
locking socket access would be a bad thing to do)
You should try to issue your callbacks from code sections
that are not protected by locks to avoid deadlocks. This is not always easy
to achieve.

If you want that a callback is only called by one thread at a time, call it
in a lock;
Otherwise, don't.
That's it.

Niki
 
Jon said:
[...]
Excellent. If you have any suggestions as to how to improve it (other
than to finish it!) please let me know.
[...]

Reading through a Subversion tutorial has given me an idea.

It goes something like this:

"Suppose we have two co-workers, Harry and Sally. They each decide to
edit the same file at the same time. If Harry saves his changes first,
then it's possible that (a few moments later) Sally could accidentally
overwrite them with her own new version of the file. Any changes Harry
made won't be present in Sally's newer version of the file, because she
never saw Harry's changes to begin with."

Perhaps this, used as an analogy, could help newbies to understand race
conditions. What do you think?
 
Niki Estner said:
Well, actually locking only one object at a time is merely a special case of
always locking in the same order: There is only one possible order for one
object ;-)

However, I do think the deadlocks section could mention more pitfalls: for
example, you do mention that Monitor.Wait does release and reacquire a
lock - chances are good that locks are in wrong order after that...

Not sure how that could cause a deadlock at the moment.
Or: Some methods (like Control.Invoke or anything else that uses a windows
message queue) contain "implicit locks"; If the main thread is waiting for a
worker thread while that thread issues a Control.Invoke call, the call will
deadlock.

Yup - I'm going to mention that in a section about Control.Invoke.
Another problem with Control.Invoke is doing something like:

lock (foo)
{
bar = newValue;
Invoke (SomethingWhichReadsBarHavingLockedFoo);
}

(if you see what I mean).
And I think a "how do I shut down my worker threads cleanly"-section would
be very helpful;

Yup, that's coming later.
But that's enough critic: I wished I'd have read that article before I
started multithreaded programming...

Glad it's of some use. I really *must* get it finished though.
 
C# Learner said:
Reading through a Subversion tutorial has given me an idea.

It goes something like this:

"Suppose we have two co-workers, Harry and Sally. They each decide to
edit the same file at the same time. If Harry saves his changes first,
then it's possible that (a few moments later) Sally could accidentally
overwrite them with her own new version of the file. Any changes Harry
made won't be present in Sally's newer version of the file, because she
never saw Harry's changes to begin with."

Perhaps this, used as an analogy, could help newbies to understand race
conditions. What do you think?

Possibly. I can't remember right now how I've explained it at all. The
difficulty is getting the balance right between just enough analogies
to get the point across, and giving two many analogies which start
confusing things :)
 
Jon Skeet said:
Not sure how that could cause a deadlock at the moment.

Having read your example to Bruno, I see what you mean. I guess that
means only Waiting on the "innermost" lock.
 
Jon Skeet [C# MVP] wrote:

[...]
Possibly. I can't remember right now how I've explained it at all.

By the way, I think your current explanation of it is fine. The analogy
idea just came to me while reading something totally unrelated.
The
difficulty is getting the balance right between just enough analogies
to get the point across, and giving two many analogies which start
confusing things :)

Yeah, I see your point.
 
Back
Top