William Stacey said:
| Also, if you have sy 600 IOCP threads processing at once, there's
| really no way to do other async operations - you can't use the
| .Net thread pool, as it doesn't have enough threads. You certainly
| don't want to spin up hundreds of
| your own threads, so pragmatically it's normally best to do things
| synchronously within the IOCP callback.
But if your returning after beginning the new async operation, then your
releasing the thread and another IOCP thread (or same one) will handle the
new callback. So you keep going. Am I wrong? Thanks for the links.
Before I get into too much detail, there is one thing I want to mention -
the use cases I'm talking about below all are geared to optimizing the
performance of the entire system, not the performance of any single user
connected to the system. Most of the "kick off an async operation" arguments
do this so that a user's request into the system is performend that much
faster (which makes sense: if you can do things in parallel, then the user
always appriciates that). You don't actually save any work on the host cpus
by doing things async, you just gain a bit of parallelism on a particular
user's reqeust.
Optimizing this for a single user though (by using async calls) actually
degrades the overall performance of the system. The system still has to wait
for the async tasks to complete, and doing so sucks up more threads, more
memory, and more context switches. It almost always requires allocating
another Event, Waiting on it, and (potentially) having your thread put
briefly to sleep, all of which are fairly expensive operations. This means
that (in a high load case) user 1 had his operation completed slightly
faster, but user 13192 didn't even get to connect to the system. It's also
less prectiblehow long an operation will take with all the async calls in
there, as when you're under high load, context switching gets unpredictable.
The scenario I keep seeing is:
0 - You get the Socket.BeginRead callback, and get your data. You're now on
an IOCP thread.
1 - Perform an async operation (say, lookup MX or SRV records in the DNS).
Pass in a delegate for the callback.
2 - While that operation is under way, your IOCP thread keeps going doing
whatever it can do.
3 - Eventually the IOCP thread hits a WaitHandle and has to sync up with the
async operation you kicked off.
.... But
4 - Although the async operation you kicked off seemed like it was async, it
hasn't run yet because 600 other IOCP threads are also kicking off the same
operation, and the .Net threadpool is was past the point of starvation.
5 - So now your IOCP thread is stuck hanging around for a very long time
(much longer than it needed to).
6 - If you goofed just a little bit, your IOCP thread will be deadlocked,
and eventually you're whole app will stop responding.
At the end of the day, with a very high load of relativly light transactions
(which seems to be the common scenario), the async callbacks happening on
threadpool threads just kills everything. With 1000 IOCP threads and 25
threadpool threads (or 50, or 100), the operations just can't happen fast
enough. Even if you get things just right, threadpool starvation is still
too likley a canidate, as so many things in the .Net framework make use of
it.
At the end of the day, I've found it easier to just do everything
synchronously once you hit your IOCP callback. This makes for simpler code,
easier debugging, far fewer context switches, and (hopefully you won't need
to) far, far easier crashdump analysis.
Just to complicate things, here are some of the other architectures that
I've tried:
1 - As soon as I get a valid data chunk off a socket, I stick it into a
queue for later processing, then put the socket back into BeginRead mode.
Use a pool of worker threads [usually a custom threadpool, as too many other
things steal thrads from the .Net Threadpool] to pull data off the Queue and
process things. This approach seemed like the best canidate for a while, but
thread context switching absolutly destroys the performance. There are all
sorts of issues that also arise, such as how many worker threads to use, how
to manage thread affinity on multi-proc systems, how to restart threads that
get hung, etc. It turns out that on large, high-availability production
system this is all very difficult.
This case has another interesting side affect - pulling the data out of the
socket (in pure Win32 land) and into manged code where it sits until I can
process it in the queue means the heap fragments that much faster. I've
found it best to leave data in the socket until I'm actually ready to
process it.
2 - Choking the number of "running" IOCP threads. As soon as data came in
off the socket, I would block in a semaphore so that only a predetermined
number of IOCP threads there actually active at any one time. This seemed to
work well for a while, but ended up having so many weird side effects that
it was abandoned. It wasn't uncommon during load tests to see 15 threads
active, and 985 blocked in the semaphore, which caused strange things to
happen.