need pseudocode/code of maintaining X number of threads

  • Thread starter Thread starter HK
  • Start date Start date
H

HK

With VB.NET 2005, and a Windows Form, running on a dual CPU box, I need to
take a recordset (e.g. 100,000 records) and spawn a thread to handle an
internet XML transaction routine for each of the records. This is a nice
use of threading because those internet requests are going against 3rd party
servers that often have 1 second latency problems and so handling them with
multiple threads is the fastest way to get through all the records in the
recordset. I'd like to set a variable/constant that controls the number
of threads that are allowed to run simultaneously and have the code keep
spawning threads up to the number. Then, I can play with that number and
find out what # of threads works best. But I'm not the threading expert
and I don't know what new options I have with .NET 2.0 framework. I also
can't find any example that shows threads being repeatedly spawned up to a
certain number in an effort to work through a single recordset. Obviously
each thread need to communicate back to the main routine when done (fire an
event??) and that main routine would need to then take the next record and
throw it to another thread, if # threads is less than max # threads. Can
someone please help me with the pseudocode for this, or better yet, point me
to a real code example that does this?
 
HK,

The change that you get a threading sample with a recordset is almost equal
to 0,001%

Maybe if you first try to make a datatable from it that your change will be
higher.

As well do you have to know which protocol you use. Standard will HTTP 1.1
not handle more than 2 connections at the same time (this is tweakable).

So beside the threading you have a lot of others problems to do. Making your
program first without threading will probably a much better approach. Adding
than some threading is mostly not the big challenge.

Just my thought,

Cor
 
I don't understand your thinking that I won't get much benefit. I see it
differently but maybe I'm wrong. Maybe my spec wasn't clear.

HTTP connection limits aside, I would go FROM this:
-----------------------------
Do
send XML request
get XML response 1 second later after internet latency
Until All 100,000 Records Exhaused
-----------------------------

TO this:

-----------------------------
Do
if thread available
spawn Task with this record
end if
Until All 100,000 Records Exhaused

Sub Task
send XML request
get XML response 1 second later after internet latency
communicate that thread is finished back to main loop
End Sub
-----------------------------


Most of the time spent on each task is waiting for internet latency. If I
have 10 worker bees (threads) all taking the next record in the recordset, I
would probably have 10 responses every second rather than 1 response every
second and I could go through my 100,000 records 10x faster. Bandwidth is
not a bottleneck. Seems like an ideal threading app, and the companies on
the other end have agreed. But tell me if I'm wrong, of course.

I just need better pseudocode that what I wrote above because I don't know
how to track what thread is available and how to then spawn off the next one
with the next record.
 
HK,

I did not say it was not a good idea, however I think you first have to get
it working withouth the threads. Adding the thread is than mostly not the
job.

This is a good purpose for a queu by the way, however try it first withouth
that and try to use a datatable instead of a recordset, than you don't have
to use ADO.

Cor
 
HK said:
I don't understand your thinking that I won't get much benefit. I see it
differently but maybe I'm wrong. Maybe my spec wasn't clear.

HTTP connection limits aside, I would go FROM this:
-----------------------------
Do
send XML request
get XML response 1 second later after internet latency
Until All 100,000 Records Exhaused
-----------------------------

TO this:

-----------------------------
Do
if thread available
spawn Task with this record
end if
Until All 100,000 Records Exhaused

Sub Task
send XML request
get XML response 1 second later after internet latency
communicate that thread is finished back to main loop
End Sub
-----------------------------


Most of the time spent on each task is waiting for internet latency. If I
have 10 worker bees (threads) all taking the next record in the recordset, I
would probably have 10 responses every second rather than 1 response every
second and I could go through my 100,000 records 10x faster. Bandwidth is
not a bottleneck. Seems like an ideal threading app, and the companies on
the other end have agreed. But tell me if I'm wrong, of course.

I just need better pseudocode that what I wrote above because I don't know
how to track what thread is available and how to then spawn off the next one
with the next record.

Let me see if I understand what you are trying to do:

You have 100,000 records on the local side. For each record you have to
make a web request. Your thinking you can make 10 web requests at once
in order to speed up the process.

Well I understand why you think threads would speed this up, but you'd
be better sending more than one request at a time. I'm not sure if you
have access to the server side stuff, but you should design your xml to
handle multiple requests in one send. Then you send in 100,000 requests
and only have the latency once.

Chris
 
Chris said:
Let me see if I understand what you are trying to do:

You have 100,000 records on the local side. For each record you have to
make a web request. Your thinking you can make 10 web requests at once
in order to speed up the process.

Well I understand why you think threads would speed this up, but you'd
be better sending more than one request at a time. I'm not sure if you
have access to the server side stuff, but you should design your xml to
handle multiple requests in one send. Then you send in 100,000 requests
and only have the latency once.

Chris


The companies on the other side don't allow that. I don't control their
servers and they have already told me that other customers get around this
with threading and send multiple requests simultaneously.

So yes, I do plan to send 10 web requests simultaneously, each in its own
thread.

Can anyone help with code or pseudocode? Again, the part that troubles me
is communicating that the thread is done (I think that's best served by
raising an event) and then having some loop of code that says "do I have any
free threads? yes? ok, you take this record and you take that one. now
I'm at 10 active threads and need to wait for another few to come free".
That looping sounds a bit challenging but surely this concept of
"maintaining 10 active threads until done" has been done many times before
and thus I'm posting hoping for some code or pseudocode that does that part.
 
I do have it working in VB6 and need to do a project conversion to .NET 2.0.
That is why I keep wanting someone here to help with the pseudocode, or real
code, for that piece that gives me trouble...

Again, the part that troubles me is communicating that the thread is done (I
think that's best served by raising an event) and then having some loop of
code that says "do I have any free threads? yes? ok, you take this record
and you take that one. now
I'm at 10 active threads and need to wait for another few to come free".
That looping sounds a bit challenging but surely this concept of
"maintaining 10 active threads until done" has been done many times before
and thus I'm posting hoping for some code or pseudocode that does that part.
 
HK,

I wrote already have a look at "queu" that is made for this kind of
problems.

Cor
 
HK,

You ask for to much code. Than you real have to hire somebody, it is not
just a snippet what you ask.

While as I wrote you real have no change for even parts for that with a
recordset. The recordset is almost not used anymore in Net.

Cor
 
I asked for a few lines of pseudocode or for someone to point me to a real
code sample. Neither request is "too much" IMO.

The recordset concept is the lesser concern here but thanks for the tip.
 
Hi,

I would recommend avoiding worker threads for this kind of work - you
can use the asynchronous methods (BeginXxx/EndXxx) available for most
communication classes (e.g. HttpWebRequest.BeginGetResponse, etc.)

You can write it like (pseudocode)

// start at most 10 processing threads
function doRequests()
{
lock (requests)
{
for (i = 0; i < 10 && requests.Available; i++)
beginProcessRequest(requests.Dequeue(), onRequestComplete);
}

... wait for the requests to complete
}

function beginProcessRequest(request, callback)
{
... do something that processes your request asynchronously and calls }

function onRequestComplete()
{
lock (requests)
{
... do whatever you need to do
if (requests.Available)
beginProcessRequest(requests.Dequeue, onRequestComplete);
}
}

If you insist on the thread approach, you can:

function doRequests()
{
for (i = 0; i < 10; i++)
spawnWorker(worker);

... wait for the requests to complete
}

function worker()
{
for (;;)
{
lock (requests)
{
if (!requests.Available)
return;

request = requests.Dequeue();
}

... process your request
}
}

If you have any more questions, don't hesitate to ask :)

Stefan
 
Back
Top