F
Frans Bouma
Hello,
I have an application that processes thousands of files each day. The
filenames and various related file information is retrieved, related
filenames are associate and placed in a linked list within a single
object, which is then placed on a stack(This cuts down thread creation
and deletions roughly by a factor of 4). I create up to 12 threads,
which then process a single object off of the stack. I use a loop with a
boolean statement, stack.Count > 0. Then I check each thread to see if
it is alive, if it is not, then I create a new thread with a new object
off of the stack which is passed as the constructor parameter for a new
threaded object. If the thread is alive, then it merely goes on to check
the status of the next thread in line. This is a big process and running
the CPU at 100% is not an issue, I would just like to optimize my
threading code in order to make my application faster and more
efficient. The ThreadPool class does not seem like a good option for my
needs, as my threads will be constantly processing throughout their
lifetime. I think that my constant polling of threads could definitely
be replaced with something like a thread callback upon completion of its
processing. How can I further reduce the threading overhead? Would it
be better to just reset all the variables in a thread and pass a new
stack object, without creating a new thread to overwrite the dead
thread? My code, while reliable so far, could easily be simplified and
improved upon.
I wonder if the threads are the bottleneck, since accessing a file
system via 2 or more threads simultaneously is slowing down readprocessing
terribly, because of head-stepping on your harddrive.
What I would do is a queue-like mechanism for file requests and one
thread that reads all the files in a sequential order and returns them
back to processing threads. This way your disk activity is streamlined.
So you can f.e. start 4 or 5 threads, which request files, these
are loaded and your loader thread goes to sleep until a new thread
requests a file so there will be another request in the queue. 20 or so
threads per process for a single CPU are probably the maximum amount,
since IIS is optimized for 20 threads per cpu.
FB