The .NET blocking problem, thread pools, and other interesting stuff

  • Thread starter Thread starter David Sworder
  • Start date Start date
D

David Sworder

Hi,

I'm developing an application that will support several thousand
simultaneous connections on the server-side. I'm trying to maximize
throughput. The client (WinForms) and server communicate via a socket
connection (no remoting, no ASP.NET). The client sends a message to the
server that contains some instructions and the server responds in an
asynchronous fashion. In other words, the client doesn't block while waiting
for the request. The request might be processed on the server in a split
second or it might take a few minutes. In any case, the client doesn't wait
around for the response. It happily goes about servicing the user and when
the server gets around to responding, the client makes that response data
available to the user.

Now let's look at what's happening on the server. The server basically
receives a message from the client on an I/O thread-pool thread and does as
much as it can to process that request. Sometimes the server logic realizes
that it must talk to another server to process the request. In that case,
the thread-pool thread on the server does NOT block. It simply sends it's
request asynchronously to another server and then the thread returns to the
pool. When the secondary server does its thing, it'll notify the main server
via an existing socket connection, an I/O thread from the .NET thread pool
is assigned the task of reading this information, and this thread eventually
sends the results back to the client. Again, my desire is to eliminate
thread-blocking on the server whenever possible since thread-pool threads
are a limited resource (25/CPU, I believe). Make sense so far? ok...

...but now I'm in a bit of a pickle [not literally, that's just an
expression]. I'm now required to have the server respond to a certain type
of client request that requires that a SQL Server database be contacted. The
request might call for a SELECT statement or INSERT/UPDATE/DELETE or for a
stored proc to be executed. So I thought "no problem" -- I'll just use
ADO.NET. The problem though is that ADO.NET is synchronous and "blocking" by
nature. There is no "BeginFill()" for example to asynchronously fill a
DataSet. What are the implications of this? Well, for starters, this means
that whenever my server is waiting for a reply from ADO.NET, the thread-pool
thread that is making the call just blocks. This might not sound like such a
big deal but on a two CPU machine that has 50 threads in the pool, I could
easily encounter a situation where all 50 threads are blocked waiting for
ADO.NET responses -- which means that there are no threads left to process
incoming requests. Even *simple* requests that don't require database access
must sit grudgingly in the queue because all of the thread-pool threads are
sitting idly in a blocked state waiting for ADO.NET calls to return! Not
good!

Now you might say, "Listen, brother...Just use delegates and
BeginInvoke() to simulate asynchronous behavior for your database calls."
This won't work because of course BeginInvoke() simply will place the call
in the thread pool queue and then make the call synchronously. You might
say, "My friend... Use the .NET API command that increases the number of
threads in the pool." This doesn't solve the problem though and it's hackish
and it implies that I know better than MSFT how many threads should be in
the pool [which I don't].

What other approach might I take? Perhaps I should create a bunch of
threads manually and use these for database access? It's ugly, but at least
I'm not wasting my valuable thread pool threads which are used for
processing of incoming requests.

I'm going to give MSFT the benefit of the doubt that they made ADO.NET a
synchronous blocking animal for good reason... and they're right, I suppose,
because 99% of the time it makes sense to make ADO.NET calls in a
synchronous fashion. But in my case, hopefully you'll see my plight and have
a clever workaround for me!

Peace be with you!

David
 
What other approach might I take? Perhaps I should create a bunch of
threads manually and use these for database access? It's ugly, but at least
I'm not wasting my valuable thread pool threads which are used for
processing of incoming requests.

I think this is probably exactly the approach to take. The threadpool
job items are meant to be quick things, not long-running work. If you
create a second threadpool, only serviced by a few threads, you should
be able to load work to be done onto that pool, and it will get
processed "sooner or later" - and in the meantime, you've got your
*normal* threadpool ready to receive requests and perform any quick
operations.

Just MHO, of course.
 
I think this is probably exactly the approach to take. The threadpool
job items are meant to be quick things, not long-running work. If you
create a second threadpool, only serviced by a few threads, you should
be able to load work to be done onto that pool, and it will get
processed "sooner or later" - and in the meantime, you've got your
*normal* threadpool ready to receive requests and perform any quick
operations.

Thanks, Jon... and I'm sure you can anticipate my next question: How
many threads in this mini-pool?

I'm thinking that since .NET seems to set the default maximum SQL
connection pool size to 50, it probably wouldn't make sense to exceed 50
threads. The idea of having 50 extra threads hanging around kind of bums me
out. Maybe I should take the same approach that the thread pool takes: If
one of my DB threads isn't used for say 2 minutes, let that thread die a
painless death. If my server process goes for several minutes without
processing any requests that require database access, these threads will all
die. If I need more threads later, I can create them -- up to a max of 50.
What do you think?

David
 
David Sworder said:
Thanks, Jon... and I'm sure you can anticipate my next question: How
many threads in this mini-pool?

I would stick with "not many at all" - like maybe 5.
I'm thinking that since .NET seems to set the default maximum SQL
connection pool size to 50, it probably wouldn't make sense to exceed 50
threads. The idea of having 50 extra threads hanging around kind of bums me
out. Maybe I should take the same approach that the thread pool takes: If
one of my DB threads isn't used for say 2 minutes, let that thread die a
painless death. If my server process goes for several minutes without
processing any requests that require database access, these threads will all
die. If I need more threads later, I can create them -- up to a max of 50.
What do you think?

Yes, your threadpool should definitely have more configuration etc
available than the built-in one. I'd suggest ramping up threads slowly
(so that if there's a long stream of fairly short-lived jobs, they all
end up being done by the same single thread) and ramping them down
slowly too (having one die per minute, say). I wouldn't go up as far as
50 threads though - if you've got 50 concurrent database requests going
at the same time, that's unlikely to help throughput in the long run.

Don't forget that you have much more knowledge of how your threadpool
is going to be used than .NET has of its own one - because you're the
only one who's going to be putting jobs on it. I suspect the thread
limit in the .NET threadpool is significantly higher than it needs to
be for most uses, but it's high for robustness. Just a guess though!
 
Yes, your threadpool should definitely have more configuration etc
available than the built-in one. I'd suggest ramping up threads slowly
(so that if there's a long stream of fairly short-lived jobs, they all
end up being done by the same single thread) and ramping them down
slowly too (having one die per minute, say). I wouldn't go up as far as
50 threads though - if you've got 50 concurrent database requests going
at the same time, that's unlikely to help throughput in the long run.

Thanks for the comments, Jon.

On a somewhat related issue, what do you recommend as an optimum size
for a *connection* pool? In my last message, I stated that the default max
size of a connection pool in .NET is 50. I was wrong. It is actually 100!
What is Microsoft trying to tell us by setting this value so high, and how
does this reconcile with your suggestion of having no more than 5 threads in
my personal thread-pool for database access?

Regarding MSFT's magic number of 100, are they saying:
a) SQL Server can't effectively handle more than 100 simultaneous
connections.

or...

b) A *client* of SQL Server shouldn't have more than 100
simultaneous connections open because assuming that each connection runs on
it's own thread, the context-switching penalty outweighs the benefits of
concurrency.

or...

c) If a middle tier machine needs to make more than 100 simultaneous
connections to SQL Server, it's time to scale out and get a second middle
tier machine to handle the load.

or

d) something else entirely?

Don't get me wrong -- I don't foresee a situation where 100+
simultaneous database connections would be necessary. I'm just trying to
understand the logic MSFT is using. If I understand their way of thinking, I
can better decide for myself how many simultaneous DB connections I should
be allowing.

David
 
David Sworder said:
On a somewhat related issue, what do you recommend as an optimum size
for a *connection* pool? In my last message, I stated that the default max
size of a connection pool in .NET is 50. I was wrong. It is actually 100!
What is Microsoft trying to tell us by setting this value so high, and how
does this reconcile with your suggestion of having no more than 5 threads in
my personal thread-pool for database access?

Hmm... no idea! I suppose it means you can keep connections open to
multiple databases, even if you're not going to use them all at once.
Regarding MSFT's magic number of 100, are they saying:

<snip>

I really wouldn't like to second-guess them very much on this. I don't
have a lot of experience in this area, and it's the kind of topic where
you really need people who know what they're talking about.
 
Back
Top