Basic Threading question

  • Thread starter Thread starter frankiespark
  • Start date Start date
F

frankiespark

Hello all,

I was perusing the internet for information on threading when I came
across this group. Since there seems to be a lot of good ideas and
useful info I thought I'd pose a question.

Threading is a new concept for me to implement. Here is my problem.

I have a system that receives xml files and records their file
locations in a database. I can potentially receive thousands,
sometimes hundreds of thousands, of files per day. When files are
received and stored in a folder on the server I need another
application to read in the paths from the database, locate, process,
and save each xml file. I want to create a windows service that can
read in a list from the database and assign work to multiple threads
in order to achieve greater performance. But, I am not sure where to
begin and I am having option paralysis. Do I need to create the
threads manually like:

Dim worker as New Thread(Address of Something)
Worker.Start()

Do I need to use the thread pool? The BackgroundWorker control? I have
seen a lot of examples. What I'd like is if someone could make a
research recommendation based on my scenario if possible. I realize
this is probably a basic question about a complex issue so any
feedback to get me thinking would be good.

Much appreciated.
 
Frankie,

It's my understanding that threading does not increase performance but makes an interactive program
more responsive while it is running a lengthy process.

If your job is going to run as a service, it has no GUI. So why have a threaded app?

Flomo
 
multithreading will improve performance when used correctly. It is
not only for UI work, web servers such as IIS are a good example of
this. I recommend using the thread pool and queuing work since the
workload can be variable. The Background worker control uses the
thread pool, but you'll get better control if you queue things into
the thread pool yourself. With that said, the Background worker
control is fairly easy to use for a someone just getting into
multithreading. I don't recommend creating threads manually since the
cost of creating threads is very high.
 
The threadpool has the additional advantage of limiting the number of
simultaneous running threads and will queue additional threads in a first in
first processed order. This allows the framework to adjust for system
resources as well as preventing too many threads from running at one time,
which can slow the entire system down via excessive thread context
switching. Although there are other situations where doing your own
threading is the way to go, this one is definitely a thread pool scenario.

Mike Ober.
 
Frankie,

It's my understanding that threading does not increase performance but makes an interactive program
more responsive while it is running a lengthy process.

If your job is going to run as a service, it has no GUI. So why have a threaded app?

Flomo

There are a couple of scenarios for threading. One is to improve user
experience in a win form app as you have mentioned. The other is for
performance scalability, as I am after in my example.
 
multithreading will improve performance when used correctly. It is
not only for UI work, web servers such as IIS are a good example of
this. I recommend using the thread pool and queuing work since the
workload can be variable. The Background worker control uses the
thread pool, but you'll get better control if you queue things into
the thread pool yourself. With that said, the Background worker
control is fairly easy to use for a someone just getting into
multithreading. I don't recommend creating threads manually since the
cost of creating threads is very high.

OK. This is pretty much what I expected. The way to do it would be to
creat a class containing a subset of the Db records -- the paths to
the xml files. Pass this in to a sub routine as an object for
processing. When the thread completes do a callback on the worker
method. Meanwhile, I can get another subset from the Db and repeat the
process. Sound good? Anything else I should know? Thanks for the
reply!

Frank
 
The threadpool has the additional advantage of limiting the number of
simultaneous running threads and will queue additional threads in a first in
first processed order. This allows the framework to adjust for system
resources as well as preventing too many threads from running at one time,
which can slow the entire system down via excessive thread context
switching. Although there are other situations where doing your own
threading is the way to go, this one is definitely a thread pool scenario.

Mike Ober.
This is consistant with my reading. Always helpful to get a second or
third opinion when approching a new concept. Thanks.
 
Frankie,

It's my understanding that threading does not increase performance but
makes an interactive program more responsive while it is running a
lengthy process.

Threading can improve an application's performance - after all some
processes use very little CPU and thus you can run multiple threads to
maximize CPU usage.
 
(e-mail address removed) wrote in @y5g2000hsa.googlegroups.com:
I have a system that receives xml files and records their file
locations in a database. I can potentially receive thousands,
sometimes hundreds of thousands, of files per day. When files are
received and stored in a folder on the server I need another
application to read in the paths from the database, locate, process,
and save each xml file. I want to create a windows service that can
read in a list from the database and assign work to multiple threads
in order to achieve greater performance. But, I am not sure where to
begin and I am having option paralysis. Do I need to create the
threads manually like:


Take a look at the FileSystemWatcher class - it can monitor a folder for
changes. Once it detects a change, you can fire off a thread to process the
change.

A couple other options:

1. Submit the XMLs to a web service and process the files immediately.
Since web services are executed by IIS/ASP.NET, it is extremely scalable

2. Perhaps look at using a MSMQ? MSMQ can handle a large volume of incoming
requests and hold them for you until your applications are ready to process
the data. MSMQ is quite easy to use ... and very reliable.
 
You can be sure that a process takes more throughput time when threading is
used then when not.
 
Never use a enduser program to learn

Cor

There are a couple of scenarios for threading. One is to improve user
experience in a win form app as you have mentioned. The other is for
performance scalability, as I am after in my example.
 
But only reading what you want to read is not always the best.

80% from the answers is against threading in your situation. It seems you
pick the one that fits you.

Cor
 
You can be sure that a process takes more throughput time when threading is
used then when not.

"Spam Catcher" <[email protected]> schreef in bericht




- Show quoted text -
OK. If I understand your point you are simply stating that there is
some overhead involved in creating threads. Fair enough. But if I can
split the work over multiple threads (with some reasonable limit on
the number of threads created) then I assume an overall performance
gain.
 
But only reading what you want to read is not always the best.

80% from the answers is against threading in your situation. It seems you
pick the one that fits you.

Cor

<[email protected]> schreef in bericht



- Show quoted text -

If you can suggest an alternate strategy I am all ears. And, can you
explain why 80% of the answer is against threading in my scenario?
 
(e-mail address removed) wrote in @y5g2000hsa.googlegroups.com:


Take a look at the FileSystemWatcher class - it can monitor a folder for
changes. Once it detects a change, you can fire off a thread to process the
change.

A couple other options:

1. Submit the XMLs to a web service and process the files immediately.
Since web services are executed by IIS/ASP.NET, it is extremely scalable

2. Perhaps look at using a MSMQ? MSMQ can handle a large volume of incoming
requests and hold them for you until your applications are ready to process
the data. MSMQ is quite easy to use ... and very reliable.

I have used FileSystemWatcher and MSMQ in various other projects. I
will take a quick look again at FileSystemWatcher, but while I think
this is perfectly ok for low volume, I am uneasy with it in large
volume. MSMQ is fine. But it is another layer and the Db is already a
"queue" of sorts. Access to MQ is fast but my app can only access one
MQ message at a time as far as I know. I could create multiple queues,
however... IIS is something for me to look into. Thanks for your
suggestion. This is great you have me thinking about some other
scenarios. Cheers.
 
(e-mail address removed) wrote in
I have used FileSystemWatcher and MSMQ in various other projects. I
will take a quick look again at FileSystemWatcher, but while I think
this is perfectly ok for low volume, I am uneasy with it in large
volume.

That's true - FileSystemWatcher may not scale to high volumes.

MSMQ is fine. But it is another layer and the Db is already a
"queue" of sorts.

Depending on the queuing mechanism you're using in the database, it may
not be threadsafe (i.e. 2 threads might end up processing one record).
Thus SQL Server 2005 introduced the service broker (queue service for
SQL Server) which solves this issue.
Access to MQ is fast but my app can only access one
MQ message at a time as far as I know.

Yes, but this is where multi-threading comes into play ;-) You can have
multiple threads monitoring the queue. Since MSMQ is threadsafe you
don't have to worry about two threads pulling the same message twice out
of the queue.

Also depending on how long it takes to process a single file - you might
not need multi-threading. Since the queue is persistent and acts as a
buffer, you can process the message at leisure ... and catch up during
lulls in transmission.
I could create multiple queues,
however... IIS is something for me to look into. Thanks for your
suggestion. This is great you have me thinking about some other
scenarios. Cheers.

You can get very fancy with your application. For example, I have a
similar application at the moment, we do it this way:

Web Service --> Queue <---> Service to Process incoming requests

Using a web service provides a simplar standardized way to submit data
to the queue (not everyone talks to MSMQ). Also, web services can be
scaled horizontally via load balancing. The queue can be clustered or
scaled horizontally too. Multiple back end services can be installed to
process the queues. So in a sense you can scale such a solution multiple
ways to increase throughput.
 
Why not just seperate programs.



If you can suggest an alternate strategy I am all ears. And, can you
explain why 80% of the answer is against threading in my scenario?
 
Frankie,
In addition to the other comments:

I've written similar applications.

Remember that threading in a service is most effective if you can
effectively use the processors (multi-core or hyper threaded CPUs or
multiple-CPU computers) or you have a lot of waiting on I/O. For example
your catalog thread is waiting to read a file, a second catalog thread could
be processing a file. While threading in an interactive app (Windows Forms)
is most effective if you avoid blocking the UI thread, so the user perceives
your app as being responsive.

Only use New Thread if you are only ever creating a 1 or 2 threads. For
example one thread to receive files and a second thread to process files.
Don't explicitly create a thread to process each file. Creating & destroying
threads is expensive, plus managing those threads can be a pain. The Thread
Pool creates a fixed # of threads and only creates a new thread if needed;
then it reuses them for requests. Further the Thread Pool will scale based
on the number of processors available (multi-core or hyper threaded CPUs or
multiple-CPU computers) the more processors available the more threads
available in the pool.

The BackgroundWorker control is intended for Windows Forms application so
your form can easily start a background process; it would not work as
expected in a Service (no Windows Forms message pump).

Instead I would recommend using the ThreadPool directly or indirectly via
{delegate}.BeginInvoke. (NOTE: Don't confuse this with Windows Forms
Control.BeginInvoke). Where {delegate} is the name of a Delegate Type.

For example the receive thread could use ThreadPool.QueueUserWorkItem to
start a catalog process for each file. The Thread pool would ensure an
effective # of catalog processes ran at one time. Instead of
ThreadPool.QueueUserWorkItem you could use {delegate}.BeginInvoke; however
be certain to call EndInvoke (the thread pool does it for you).

Something like (not fully tested):

Public Delegate Sub DoWork(ByVal file As String)

Dim worker As DoWork = AddressOf ProcessFile
For Each file As String In New String() {"a", "b", "c"}
worker.BeginInvoke(file, AddressOf EndProcess, worker)
Next

Private Sub ProcessFile(ByVal file As String)

End Sub

Private Sub EndProcess(ByVal ar As IAsyncResult)
Dim worker As DoWork = DirectCast(ar.AsyncState, DoWork)
Try
worker.EndInvoke(ar)
Catch ex As Exception
Log(ex)
End Try
End Sub
 
I have a system that receives xml files and records their file
locations in a database. I can potentially receive thousands,
sometimes hundreds of thousands, of files per day. When files are
received and stored in a folder on the server I need another
application to read in the paths from the database, locate, process,
and save each xml file.

So - spot a file, pick it up, go process it. Spot another file ...

Ideal candidate for threading, since each job is isolated from every
other. The less your threads have to talk to one another, the happier
(i.e. faster) they'll be.
This "database" bit is a bit worrying, though, because that's going to
force the threads to "fight" to get at the database itself. Could lead
to some contention that will slow things down.

Given the volumes that you have here, you can't go spawning a new thread
for every file; work with a set of 10 or so Threads and each one will
run fairly well. Try the same thing with 1000 Threads and watch your
machine spin itself into the ground. :-)
I want to create a windows service that can read in a list from the
database and assign work to multiple threads in order to achieve
greater performance.

Sounds good. The service's main Thread acts as a marshaller, handing
out work to the other Threads that do the real work.
But, I am not sure where to begin and I am having option paralysis.
Do I need to create the threads manually like:

Dim worker as New Thread(Address of Something)
Worker.Start()

You need to have the Thread "callback" to the main service when they've
finished their job - that way, the main service doesn't have to "poll"
them to see if they're still busy. Polling just slows things down.
Do I need to use the thread pool?

Depends on how long each job takes. The pool is intended for tasks that
run and die off very quickly, so that the Threads are available for
something else to pick up and use. If a job takes 20 minutes, spin up
your own Threads.
The BackgroundWorker control?
No. If this were a Forms app launching all these threads then yes,
because the BackgroundWorker control is built to marshal the callbacks
from other Threads back onto the UI (Windows) Thread because Forms
Controls aren't Thread-safe.

HTH,
Phill W.
 
Use the ThreadPool. It has two components - one for event handling and one
for application tasks. The application task portion of the thread pool is
designed for long running threads.

Mike Ober.
 
Back
Top