Storing data from multiple clients

  • Thread starter Thread starter Massimo
  • Start date Start date
M

Massimo

Hi to all.

I'm developing a network server application which will be constantly
receiving data from mobile GPRS clients and storing them in a centralized
database (probably SQL Server or Oracle); from the server's point of view,
the clients will just be standard TCP/IP clients, sending data using a
custom protocol I'll design. The data will be sent to the server in
messages, which will contain very small amounts of data (a bunch of floating
point numbers), but will be sent quite often, every second or even faster.
The number of client is not pre-determined, but the application needs to be
able to scale up to hundreds of clients. The server won't do any processing
on the incoming data, just add a timestamp and the client's ID and INSERT a
new row in a table.

I'm going to use async sockets to handle the connections from the clients,
so the data storing actions will be triggered by incoming socket data and
will be handled by the socket callback methods; I think this is much better
than using worker threads to wait on the sockets, since the number of
simultaneous connections can potentially become quite high (am I right on
this?). Now, my question is: which is the best way to do an INSERT on the DB
every time a data message arrives from a connected socket?

I (obviously) don't want to manually open and close a DB connection every
time; I could open one when the server starts and do all of my data access
using that one, but I don't know how this method works when using a
multithreaded application; if I have to use some form of mutual exclusion
here, then the single DB connection will become a bottleneck.
I've heard about using connection pooling, but I don't actually know how it
works and if this is the ideal technique for this scenario.

Can someone please help? :-)

Thanks


Massimo
 
Massimo

I cannot answer for the async sockets. I've never used them; but for the database:

The .NET Connection classes can not be used by multiple trheads. You have to use one Connection instance for each thread.

Opening a connection will reuse the connection on the SQL Server side if possible, so the SQL Server will create it only once (Connection Pooling). Opening and closing the connection will be done fast.

We have seen, that the most time will be used by inserting the data into the table of SQL Server. Depending on the servers load and the available hardware, you could assume 20..50 ms for every insert. To get a more accurate time, try it out with the planned or similar hardware.
Use stored procedures for the insert; they have better performance than using sql statements.

Try to reduce roundtrips to the database server. You could collect lets say 10 messages from the clients and send them as a bunch to a stored procedure on the database server.

Try to decouple the insert from the TCP sockets, because you never know, how much time it will take. You could use Microsoft Message Queue for that.

Furthermore try to design your application for scale out. That means that you can do the inserts on multiple SQL/Oracle servers.


I hope that helps
Greetings from Switzerland
Thomas
 
"Thomas Weingartner" <[email protected]> ha scritto nel
messaggio

You gave me some good points, thanks.
The .NET Connection classes can not be used by multiple trheads.
You have to use one Connection instance for each thread.

That's the problem: when the framework uses its own worker threads to
process async socket callbacks, you don't have any control on how many
threads there are and which one of them is used. To do as you suggest, I'd
have to manually spawn worker threads and have each one of them manage one
or more connected sockets.
I could also have each callback create a connection object, open it, use it
and then close it, but I think this would be *really* a bottleneck.
Another option would be to have only one connection object and use a mutex
to have only one thread at a time access it, but I think this would create a
big bottleneck too.
Opening a connection will reuse the connection on the SQL Server side if
possible, so the SQL Server will create it only once (Connection Pooling).
Opening and closing the connection will be done fast.

Of course, but only when using the same connection object(s).
We have seen, that the most time will be used by inserting the data
into the table of SQL Server. Depending on the servers load and
he available hardware, you could assume 20..50 ms for every insert.
To get a more accurate time, try it out with the planned or similar
hardware.

I know this; the actual INSERT query is not my main concern here... I'm
concerned about how to handle connections from the application side.
Use stored procedures for the insert; they have better performance
than using sql statements.

But they only make sense when using one specific DBMS; The application will
need to support different ones.
Try to reduce roundtrips to the database server. You could collect
lets say 10 messages from the clients and send them as a bunch
to a stored procedure on the database server.

Try to decouple the insert from the TCP sockets, because you never know,
how much time it will take. You could use Microsoft Message Queue for
that.

That's a really good point; I think maybe MSMQ is too much for this, but an
internal (synchronized) FIFO buffer which would be filled by the socket
callbacks and periodically emptied by a worker thread which moves the data
to the DB could be a good design choice. This could also let me see if a
long queue of unstored data builds up (which would mean the DB can't handle
all those queries), or temporarily store data if the DB is down/unreachable
for a while. I should think more about this, thanks :-)
Furthermore try to design your application for scale out. That means
that you can do the inserts on multiple SQL/Oracle servers.

It's not planned to use many DB server (there are license costs here...).
Maybe my server will be deployed in a load balancing configuration, but even
in this case every instance will store data on the same DB.


Massimo
 
Massimo

I have to clear some of my proposals:

That's the problem: when the framework uses its own worker threads to
process async socket callbacks, you don't have any control on how many
threads there are and which one of them is used. To do as you suggest, I'd
have to manually spawn worker threads and have each one of them manage one
or more connected sockets.
I could also have each callback create a connection object, open it, use it
and then close it, but I think this would be *really* a bottleneck.
Another option would be to have only one connection object and use a mutex
to have only one thread at a time access it, but I think this would create a
big bottleneck too.

We have written a customized DataAdapter, that handles the connections internally on a per thread basis. That means the connection will be created for a specific thread. We've got a big application (>1 million loc) written in C# and there are running about 700 queries (average) every second against a SQL Server 2000. For every query we will create a connection object, open it, run the query and close it. The only bottleneck is the database server itself!

I understand your distrust; try it out with a little sample project and measure the time spent to create, open and close the connection object.
Of course, but only when using the same connection object(s).

Thats not true. Connection pooling will be done with the same "configuration". That means with the same data in the connection string and Windows identity when integrated security is used.
You should read the chapter "Using Connection Pooling" in the VS.NET 2005 help.
That's a really good point; I think maybe MSMQ is too much for this, but an
internal (synchronized) FIFO buffer which would be filled by the socket
callbacks and periodically emptied by a worker thread which moves the data
to the DB could be a good design choice. This could also let me see if a
long queue of unstored data builds up (which would mean the DB can't handle
all those queries), or temporarily store data if the DB is down/unreachable
for a while. I should think more about this, thanks :-)

It's very easy to use MSMQ. With your own FIFO buffer, you have to handle the synchronisation by yourself. Of course you even need a worker thread with MSMQ.

I post some sample code here (I left exception handling away). To use this code, you have to install MSMQ from your Windows Components (even unter Windows 2000/XP):

using System;
using System.Messaging;

public class AutoCreateMessageQueue
{
private string m_queueName;
private MessageQueue m_msgQ;

public AutoCreateMessageQueue(string queueName)
{
m_queueName = queueName;
}

public void Initialize()
{
// Create the queue if it does not already exist
if (!MessageQueue.Exists(m_queueName))
{
m_msgQ = MessageQueue.Create(m_queueName);
}
else
{
m_msgQ = new MessageQueue(m_queueName);
}
}

/// <summary>
/// Deletes all the messages contained in the queue.
/// </summary>
public void Purge()
{
m_msgQ.Purge();
}

/// <summary>
/// Adds an object to the nontransactional queue and specifies a label for the message.
/// </summary>
public void Add(string message, string label)
{
m_msgQ.Send(message, label);
}

/// <summary>
/// Receives the first message available in the queue. This call is synchronous, and
/// blocks the current thread of execution until a message is available.
/// </summary>
/// <returns>Returns the content of the message. The caller is responsible to interpret the contents.</returns>
public object GetBlocking()
{
Message myMessage = m_msgQ.Receive();
return myMessage.Body;
}
}


...
// socket side initializing
AutoCreateMessageQueue queue = new AutoCreateMessageQueue(@".\private$\DbInsertData");
queue.Initialize();
queue.Purge(); // cleanup old messages in queue

...
// socket side adding a message
string data = GetData(your parameter to assemble a string);
queue.Add(data, data.Id); // you could serialize your data class directly

...
// Worker thread initializing
AutoCreateMessageQueue queue = new AutoCreateMessageQueue(@".\private$\DbInsertData");
queue.Initialize();

...
// Worker thread reading the message and inserting in db
string data = (string)queue.GetBlocking()
InsertDataIntoDb(data);
...

That way, you could dynamically setup more worker threads to insert data into the database. The synchronization is done completely by MSMQ.


Good luck
Thomas
 
"Thomas Weingartner" <[email protected]> ha scritto nel
messaggio
For every query we will create a connection object, open it, run
the query and close it. The only bottleneck is the database server itself!

I understand your distrust; try it out with a little sample project and
measure the time spent to create, open and close the connection object.

Ok, I'll give it a try.
It just doesn't seem so obvious that having to create and open connections
on the fly is almost as efficient as using permanent connections :-)
It's very easy to use MSMQ. With your own FIFO buffer, you have
to handle the synchronisation by yourself. Of course you even need
a worker thread with MSMQ.

I post some sample code here (I left exception handling away).
[CUT]

That way, you could dynamically setup more worker threads to
insert data into the database. The synchronization is done completely
by MSMQ.

That's quite nice. Thanks for the tip.


Massimo
 
That way, you could dynamically setup more worker threads
to insert data into the database.

A question about this: does it make any sense to use N worker threads
instead of one, if they're sending the data to the same database? Maybe if
the database is running on a SMP machine, so it *can* actually process more
queries simultaneously; but what if it's running on a single CPU? Can it
better optimize the queries if they're coming from two or more client
connections instead of one?


Massimo
 
Massimo

In my opinion it makes sense, because scalable database servers work
with multiple threads too. The inserts could be processed in parallel
either using multiple hard disks to write the data or by buffering the
disk writes in the disk caches.

The queries from one client can only be processed serial.

Thomas
 
In my opinion it makes sense, because scalable database servers work
with multiple threads too. The inserts could be processed in parallel
either using multiple hard disks to write the data or by buffering the
disk writes in the disk caches.

The queries from one client can only be processed serial.

I was thinking the same.

About MSMQ: it's actually fairly easy to use, but I don't think it's the
best choice for an in-memory buffer used by a single application; it stores
its data on disk, and also its general-purpose architecture surely adds some
overhead. I think going with a System.Collections.Generic.Queue<> and some
lock objects could be a lot better.


Massimo
 
Massimo
About MSMQ: it's actually fairly easy to use, but I don't think it's the
best choice for an in-memory buffer used by a single application; it
stores its data on disk, and also its general-purpose architecture
surely adds some overhead. I think going with a
System.Collections.Generic.Queue<> and some lock objects could be a lot
better.

You are right.

MSMQ is more flexible about scaling out on another machine,
transactional handling etc. If you don't use these features, your choice
will be better.

Thomas
 
Back
Top