Thread Exercise for the gurus

  • Thread starter Thread starter Alvin Bruney
  • Start date Start date
A

Alvin Bruney

I have an array list of queries. The arraylist is variable, anywhere from 10
to 10000 or more. I'd like to spin threads to take chunks of 500 queries out
of that array list, no more than 10 threads (context switching reasons). if
it's less than 500 i spin only one thread.

I am having trouble building an efficient, clean implementation of this.
Clean means, i don't want to loop from 500 to 1000 if i only have 650 items,
i would like to stop at 650. Efficient means i don't want the code to be
messy - like what i have now. readable. I don't need the thread code part, i
need the looping mechanism. I have it done, but it's not clean, tight or
efficient. I know it can be done better. Efficiency also means efficiently
locking and releasing the datastructure.

See if you can help, otherwise i'll run with what i have. (I can't think
clearly now, the ax just fell here and we lost some very good folk)
 
Its not clear if you're pre-assigning the queries to the threads, or the
thread is going back to the original list each time it completes a query.

Of course the tidiest way would be to pre-assign the queries to the thread,
but this may not achieve what you are wanting, if say one thread works
through its queries a lot faster than another (for some reason).

Greg
 
Alvin Bruney said:
I have an array list of queries. The arraylist is variable, anywhere from 10
to 10000 or more. I'd like to spin threads to take chunks of 500 queries out
of that array list, no more than 10 threads (context switching reasons). if
it's less than 500 i spin only one thread.

I am having trouble building an efficient, clean implementation of this.
Clean means, i don't want to loop from 500 to 1000 if i only have 650 items,
i would like to stop at 650. Efficient means i don't want the code to be
messy - like what i have now. readable. I don't need the thread code part, i
need the looping mechanism. I have it done, but it's not clean, tight or
efficient. I know it can be done better. Efficiency also means efficiently
locking and releasing the datastructure.

See if you can help, otherwise i'll run with what i have. (I can't think
clearly now, the ax just fell here and we lost some very good folk)


How much performance to you hope to gain here? I mean, the processor
can only execute so many instructions per second and splitting things
up into multiple threads is just adding more instructions to be
processed.

Unless you have a UI that you need to keep active, or you're processing
things that are asynchronous (database, file reads/writes, web service
calls), threading isn't going to help you much.

If this is a process of just looping through some calculations and
doing them one-by-one, you're probably just better off doing them
synchronously.

-c
 
I stack an array of 2000 queries in an array list. If i don't use threads,
it takes about 30 seconds. Boss don't like that, he wants to draw blood from
stone. So I figure i can spin 4 threads at 500 queries each at the same time
which would give him the performance he is looking for albeit at the cost of
cpu i know.
 
I can't use the thread pool because i need to join and wait on all threads
to complete.
 
The queries are already written and stacked in an arraylist. I just want to
efficiently pick out blocks of 500 to submit at the same time. for ex. if i
had 2000 records, i would spin 4 threads where each thread's work would be
to insert 500 records. The reasoning here is 500 at the same time among four
threads should be faster than 2000 on the main thread. We've got cpu
resources to burn.

i'd like to pre-assign - to answer your question. so each thread gets a
block of 500 and works it until it is done and then waits on the other
threads to complete.

after that i have a stored proc thing lined up if he still aint satisfied
with the results.
 
Ah, ok. That's good then.

So just create (#queries)/500 + 1 queues and loop
through the array list and throw the various references
into the various queues and then spawn (#queries)/500 + 1
(or 10, which ever is less) threads and assign the
queues to the threads.

Each thread can open a connection and execute it's query
and then close the connection (the connections are pooled,
so there's no real performance hit) and then start over
again. I don't know, I suppose you could leave the connection
open, but when you "close" the connection, the connection
sends a reset command to SQL which resets all data, temp tables,
transactions, etc and makes it all nice and clean for the
next statement. It's quite fast, actually.

What pare are you having the most problems with?

-c
 
Alvin Bruney said:
The queries are already written and stacked in an arraylist. I just want to
efficiently pick out blocks of 500 to submit at the same time. for ex. if i
had 2000 records, i would spin 4 threads where each thread's work would be
to insert 500 records. The reasoning here is 500 at the same time among four
threads should be faster than 2000 on the main thread. We've got cpu
resources to burn.

i'd like to pre-assign - to answer your question. so each thread gets a
block of 500 and works it until it is done and then waits on the other
threads to complete.

One simple way would be:

o Work out how many threads you'll need (divide by 500 and limit to 10)
o Create a new list for each thread
o Round-robin copy items from the old list to the new one
o Set the threads going

The copying shouldn't take long, and as you allocate them in a round-
robin fashion it'll be fair however many threads you have, without
working out any complicated boundary conditions. Once the lists are
written out, you'll need a memory barrier on each of the threads to
make sure that they all get all of the data, but after that there's no
interference between them, so they can each grab items off their own
list without worrying about the other threads.
 
You should be able to use a semaphore to accomplish the same purpose. If you
know in advance how many total queries there are you can set the semaphore
to that value and then as each work item completes you pulse the semaphore.
When the last one completes the thread waiting wakes up.

If the number of items can change during processing then this approach needs
to be modified, but then, so does the other approach.

I mention this because the system thread pool already takes in account all
those factors you wanted to handle; efficient waiting and division of
workload, balancing the number of threads against the number of cpus,
autodecay and terminate, etc.
 
i got all that but there is an issue with the thread lists. is this list a
static list? if it is, then there will be threading corruption and
contention as users begin to save data at the same time. i can lock and
release but i think that performance will begin to degrade when the app
starts to scale up. i'd like to avoid the static approach for that reason.
are there any other approaches to having each thread maintain its own list,
possibly internal to each thread, that isn't static?
 
This is what i have currently
int lowerBound = 0, upperBound = 499;


if(list.Count < 500)

{

WriteToDataBase(list);

}

else

{

while(upperBound < list.Count)

{

//assign quotas

for(int index = lowerBound; lowerBound < upperBound; index++)

targetArray.Add(list[index]);

//spin a thread to work

ThreadStart ThreadDelegate = new ThreadStart(ThreadWork.ThreadDoWork);

Thread saveThread = new Thread(ThreadDelegate);

saveThread.Start();

//get ready for more

lowerBound = upperBound;

upperBound+= 500;

targetArray.Clear();

}

}

but ThreadDoWork has no access to targetArray. Threadpool would have made it
easier since targetArray could be passed in, but it doesn't provide
synchronizing mechanisms because all the writes must be done before main
thread can move on. For reasons pointed out above this thread, i think a
static variable to hold targetArray would be problematic and less efficient.
 
Alvin Bruney said:
i got all that but there is an issue with the thread lists. is this list a
static list?

What do you mean by "is this list a static list"? You've got your
initial list with all the entries on, and then each thread has its own
list. None of them need references in static variables, if that's what
you were thinking.
if it is, then there will be threading corruption and
contention as users begin to save data at the same time. i can lock and
release but i think that performance will begin to degrade when the app
starts to scale up. i'd like to avoid the static approach for that reason.
are there any other approaches to having each thread maintain its own list,
possibly internal to each thread, that isn't static?

That's what I'd been suggesting... I can write some sample code if you
want.
 
This is what I'l do:

Make a class for the threads, and have a property to which you assign the
queries to be processed.

class QueryProcessor
{
private string[] queries;
public string[] Queries { set { queries = value; } }

public Go()
{
...
}
}

or something like that...
For each thread create a new instance of that class...

QueryProcessor queryProcessor = new QueryProcessor();
queryProcessor.Queries = something;
ThreadStart myThreadDelegate = new ThreadStart(queryProcessor.Go);
Thread myThread = new Thread(myThreadDelegate);
myThread.Start();

that should sort out any corruption.

Greg

Alvin Bruney said:
i got all that but there is an issue with the thread lists. is this list a
static list? if it is, then there will be threading corruption and
contention as users begin to save data at the same time. i can lock and
release but i think that performance will begin to degrade when the app
starts to scale up. i'd like to avoid the static approach for that reason.
are there any other approaches to having each thread maintain its own list,
possibly internal to each thread, that isn't static?

want
would
among
 
Back
Top