Global data concurrent access ?

  • Thread starter Thread starter Kunal
  • Start date Start date
K

Kunal

Hi friends,

I have some global structure in my application. The data within this is
read by some reader threads. There are about 10 new threads created per
second. Threads do some processing and die. All threads process based
on data read from the global structure.

This global structure is populated from a set of files at application
startup, which can be modified at run-time. There is also the option to
apply the modified files at run-time. At the time the modified files
are read into the application, I need to block access to this global
structure from the 'reader' threads. Threads do not modify this
structure.

What is the best/efficient way to achieve this task - lock / mutex /
wait - and how can this be done ? Any ideas are welcome.

Thanks n Regards,
Kunal
 
Kunal said:
There are about 10 new threads created per second. Threads do some
processing and die. All threads process based on data read from the
global structure.

You want to abandon this architecture as quickly as you can. The cost of
creating and destroying threads is very high, and at 10 threads per second,
you're application is spending 95% of it's time creating threads, and 5%
doing the work you want done.

You should:
1 - Use the System ThreadPool. If your work isn't I/O related (hitting SQL,
making a web service call, reading/writing to a file) this is what you want
to do. If your work is I/O centric, don't do this.

2 - Create your threads, but keep them around. When a thread is doing doing
it's work, have it go look in a "Work queue" for more work to do. This is a
good way to go in general.
This global structure is populated from a set of files at application
startup, which can be modified at run-time. There is also the option to
apply the modified files at run-time. At the time the modified files
are read into the application, I need to block access to this global
structure from the 'reader' threads. Threads do not modify this
structure.

The pattern you're looking for is a ReaderWriterLock. In my other post, I
used a Montor (which in C# looks like 'lock()'). A ReaderWriterLock is very
similar in terms of methodology though.
 
Kunal said:
There are about 10 new threads created per second. Threads do some
processing and die. All threads process based on data read from the
global structure.

You want to abandon this architecture as quickly as you can. The cost of
creating and destroying threads is very high, and at 10 threads per second,
you're application is spending 95% of it's time creating threads, and 5%
doing the work you want done.

You should:
1 - Use the System ThreadPool. If your work isn't I/O related (hitting SQL,
making a web service call, reading/writing to a file) this is what you want
to do. If your work is I/O centric, don't do this.

2 - Create your threads, but keep them around. When a thread is doing doing
it's work, have it go look in a "Work queue" for more work to do. This is a
good way to go in general.
This global structure is populated from a set of files at application
startup, which can be modified at run-time. There is also the option to
apply the modified files at run-time. At the time the modified files
are read into the application, I need to block access to this global
structure from the 'reader' threads. Threads do not modify this
structure.

The pattern you're looking for is a ReaderWriterLock. In my other post, I
used a Montor (which in C# looks like 'lock()'). A ReaderWriterLock is very
similar in terms of methodology though.
 
Kunal said:
There are about 10 new threads created per second. Threads do some
processing and die. All threads process based on data read from the
global structure.

You want to abandon this architecture as quickly as you can. The cost of
creating and destroying threads is very high, and at 10 threads per second,
you're application is spending 95% of it's time creating threads, and 5%
doing the work you want done.

You should:
1 - Use the System ThreadPool. If your work isn't I/O related (hitting SQL,
making a web service call, reading/writing to a file) this is what you want
to do. If your work is I/O centric, don't do this.

2 - Create your threads, but keep them around. When a thread is doing doing
it's work, have it go look in a "Work queue" for more work to do. This is a
good way to go in general.
This global structure is populated from a set of files at application
startup, which can be modified at run-time. There is also the option to
apply the modified files at run-time. At the time the modified files
are read into the application, I need to block access to this global
structure from the 'reader' threads. Threads do not modify this
structure.

The pattern you're looking for is a ReaderWriterLock. In my other post, I
used a Montor (which in C# looks like 'lock()'). A ReaderWriterLock is very
similar in terms of methodology though.
 
Kunal said:
There are about 10 new threads created per second. Threads do some
processing and die. All threads process based on data read from the
global structure.

You want to abandon this architecture as quickly as you can. The cost of
creating and destroying threads is very high, and at 10 threads per second,
you're application is spending 95% of it's time creating threads, and 5%
doing the work you want done.

You should:
1 - Use the System ThreadPool. If your work isn't I/O related (hitting SQL,
making a web service call, reading/writing to a file) this is what you want
to do. If your work is I/O centric, don't do this.

2 - Create your threads, but keep them around. When a thread is doing doing
it's work, have it go look in a "Work queue" for more work to do. This is a
good way to go in general.
This global structure is populated from a set of files at application
startup, which can be modified at run-time. There is also the option to
apply the modified files at run-time. At the time the modified files
are read into the application, I need to block access to this global
structure from the 'reader' threads. Threads do not modify this
structure.

The pattern you're looking for is a ReaderWriterLock. In my other post, I
used a Montor (which in C# looks like 'lock()'). A ReaderWriterLock is very
similar in terms of methodology though.
 
Kunal said:
There are about 10 new threads created per second. Threads do some
processing and die. All threads process based on data read from the
global structure.

You want to abandon this architecture as quickly as you can. The cost of
creating and destroying threads is very high, and at 10 threads per second,
you're application is spending 95% of it's time creating threads, and 5%
doing the work you want done.

You should:
1 - Use the System ThreadPool. If your work isn't I/O related (hitting SQL,
making a web service call, reading/writing to a file) this is what you want
to do. If your work is I/O centric, don't do this.

2 - Create your threads, but keep them around. When a thread is doing doing
it's work, have it go look in a "Work queue" for more work to do. This is a
good way to go in general.
This global structure is populated from a set of files at application
startup, which can be modified at run-time. There is also the option to
apply the modified files at run-time. At the time the modified files
are read into the application, I need to block access to this global
structure from the 'reader' threads. Threads do not modify this
structure.

The pattern you're looking for is a ReaderWriterLock. In my other post, I
used a Montor (which in C# looks like 'lock()'). A ReaderWriterLock is very
similar in terms of methodology though.
 
Chris Mullins said:
You want to abandon this architecture as quickly as you can. The cost of
creating and destroying threads is very high, and at 10 threads per second,
you're application is spending 95% of it's time creating threads, and 5%
doing the work you want done.

While I agree that creating a lot of threads is reasonably expensive, I
don't think it's quite as bad as all that.

On my laptop, creating and starting 500 threads takes about 120-170ms.
(I haven't got the energy to work up a good benchmark - this is about
as crude as they come.)

Assuming linear scaling (and it should actually be better than that,
because with only 10 threads at a time there'll be less context
switching) that would suggest that 10 threads would take less than 4ms
to start - i.e. under 1% of the time. Yes, it's a very crude benchmark
- but I do think 95% is higher than reality.

Just to reiterate though, I totally agree that the OP should move away
from that architecture ASAP :)
 
I try hard not to let reality intrude on my generalizations...

I guess the big questions for the OP is "how much work are you doing in each
thread?". It still may be trivial relevant to the time it takes to start the
thread.

.... but yea, starting and stopping threads are like exceptions. Everyone
says (even me, obviously) that it's horrible, but we all forget just how
good horrible can be. :)
 
Chris Mullins said:
I try hard not to let reality intrude on my generalizations...
:)

I guess the big questions for the OP is "how much work are you doing in each
thread?". It still may be trivial relevant to the time it takes to start the
thread.
True.

... but yea, starting and stopping threads are like exceptions. Everyone
says (even me, obviously) that it's horrible, but we all forget just how
good horrible can be. :)

And of course the reality of that changes every year as hardware gets
cheaper. Just as with exceptions, of course, excessive starting and
stopping of threads is a symptom of an architecture which should be
looked at, but probably won't actually kill performance in itself.

Next in line: how expensive is making a database connection these days?
:)
 
Jon said:
Next in line: how expensive is making a database connection these days?

That may have been a rhetorical question, but I happen to have just
written a little test to figure out how pooled versus unpooled
connections performed. The code below is executing the same simple
stored procedure 1000 times, from a 3.4GHz Xeon box to a 3.6GHz Xeon
box about four network hops away.

No pooling 1000 iterations. Elapsed time: 00:00:12.9217096
Pooling 1000 iterations. Elapsed time: 00:00:01.0624864

That's about 12ms per round-trip for a non-pooled connection, and 1ms
per for a pooled connection (I was trying to prove that pooling didn't
produce an order of magnitude performance improvement, and I was
obviously wrong).

using System;
using System.Data.SqlClient;
using System.Data;

namespace PoolSpeedTest
{
class Program
{
static string server = XXX
static string database = XXX
static string connStringPooling = "Data Source={0};Initial
Catalog={1};Integrated Security=True";
static string connStringNoPooling = "Data Source={0};Initial
Catalog={1};Integrated Security=True;Pooling=false";

static void Main(string[] args)
{
TwoByOne(1000);
Console.ReadLine();
}

static private void TwoByOne(int iterations)
{
// preload
for (int i = 0; i < 10; i++)
{
using (SqlConnection conn = new
SqlConnection(String.Format(connStringPooling, server, database)))
{
DoSomething(conn);
}
}

for (int i = 0; i < 10; i++)
{
using (SqlConnection conn = new
SqlConnection(String.Format(connStringNoPooling, server, database)))
{
DoSomething(conn);
}
}
//

DateTime start;
DateTime end;

start = DateTime.Now;
for (int i = 0; i < iterations; i++)
{
using (SqlConnection conn = new
SqlConnection(String.Format(connStringNoPooling, server, database)))
{
DoSomething(conn);
}
}
end = DateTime.Now;
Console.WriteLine(String.Format("No pooling {0} iterations. Elapsed
time: {1}", iterations, end - start));

start = DateTime.Now;
for (int i = 0; i < iterations; i++)
{
using (SqlConnection conn = new
SqlConnection(String.Format(connStringPooling, server, database)))
{
DoSomething(conn);
}
}
end = DateTime.Now;
Console.WriteLine(String.Format(" Pooling {0} iterations. Elapsed
time: {1}", iterations, end-start));

}

private static void DoSomething(SqlConnection conn)
{
SqlCommand command = new SqlCommand("XXX", conn);
command.CommandType = CommandType.StoredProcedure;
conn.Open();
using (IDataReader dr =
command.ExecuteReader(CommandBehavior.CloseConnection))
{
while (dr.Read())
{
string x= dr["XXX"].ToString();
}
}
}
}
}
 
Michael Petrotta said:
That may have been a rhetorical question, but I happen to have just
written a little test to figure out how pooled versus unpooled
connections performed. The code below is executing the same simple
stored procedure 1000 times, from a 3.4GHz Xeon box to a 3.6GHz Xeon
box about four network hops away.

No pooling 1000 iterations. Elapsed time: 00:00:12.9217096
Pooling 1000 iterations. Elapsed time: 00:00:01.0624864

That's about 12ms per round-trip for a non-pooled connection, and 1ms
per for a pooled connection (I was trying to prove that pooling didn't
produce an order of magnitude performance improvement, and I was
obviously wrong).

Out of interest, what happens to the figures if:

1) You're on the same box?
2) You're only one network hop away?

My guess is that 2) won't be terribly different, but 1) may well
decrease difference significantly.
 
Jon said:
Out of interest, what happens to the figures if:

1) You're on the same box?

No pooling 1000 iterations. Elapsed time: 00:00:06.0936720
Pooling 1000 iterations. Elapsed time: 00:00:00.2499968
2) You're only one network hop away?

No pooling 1000 iterations. Elapsed time: 00:00:12.7683600
Pooling 1000 iterations. Elapsed time: 00:00:00.2904176

Interesting. The results are repeatable, but it's not a perfect test
(in particular, the client for #2 is a much slower laptop).

If I take the results at face value, it seems to say that network speed
affects pooled connections (makes sense; there's not much connection
setup and teardown with pooled connections. It's just the time taken
to get the query and results over the wires).

I think the speed of my laptop is affecting the non-pooled test; it's
taking time to have ADO.NET set up and tear down the connection.
Unfortunately, our network architecture is such that it's hard to have
two powerful desktops closely connected.

(The speed of pooled connections in general surprised me, when I first
ran this test. I'd assumed network latency would give me round-trip
times around 10-20ms. Pings to the server are also returning
sub-millisecond times. Remember modems and their 200ms pings?)

Michael
 
Michael Petrotta said:
No pooling 1000 iterations. Elapsed time: 00:00:06.0936720
Pooling 1000 iterations. Elapsed time: 00:00:00.2499968


No pooling 1000 iterations. Elapsed time: 00:00:12.7683600
Pooling 1000 iterations. Elapsed time: 00:00:00.2904176

Interesting. The results are repeatable, but it's not a perfect test
(in particular, the client for #2 is a much slower laptop).

Wow. Just shows how wrong intuition can be!
If I take the results at face value, it seems to say that network speed
affects pooled connections (makes sense; there's not much connection
setup and teardown with pooled connections. It's just the time taken
to get the query and results over the wires).
Yup.

I think the speed of my laptop is affecting the non-pooled test; it's
taking time to have ADO.NET set up and tear down the connection.
Unfortunately, our network architecture is such that it's hard to have
two powerful desktops closely connected.

Fair enough - thanks for taking the time to run the tests at all!
(The speed of pooled connections in general surprised me, when I first
ran this test. I'd assumed network latency would give me round-trip
times around 10-20ms. Pings to the server are also returning
sub-millisecond times. Remember modems and their 200ms pings?)

It's quite incredible how fast some things can change while others stay
the same. Where are the 1TB+ cheap, fast static memory chips that we've
been waiting for for so long? That's what I think will *really*
transform computing...
 
Back
Top