Using a message queue as a buffer

  • Thread starter Thread starter Massimo
  • Start date Start date
M

Massimo

I'm developing a server application which will, basically, receive data from
some TCP/IP clients and store them into a database; it's being developed
using .NET 2.0/C#.
The size of each data packet will be small (a few bytes), but the
application need to be able to scale well and handle hundreds of clients,
with some readings/second for each of them.

The application will have two main set of worker threads, with a buffer
between them (a classic producer-consumer model); some threads will acquire
data from the clients and store them in the buffer, while some other will
get data from the buffer and actually perform the database queries; this
structure will reduce the workload on socket reading and allow more socket
reads to be done quickly, instead of having to wait for a DB query to run
after each read; it also will help in the (unlucky) case the DB server can't
be accessed for a while.

Now, the question. Someone suggested me to use a message queue as my buffer,
due to its easy of programming and its automatic handling of thread
synchronization; also, there seems to be an added bonus: if my server stops
or crashes for some reason without being able to store all the buffered data
into the DB, the same queue will be available upon restarting, with all
queued messages being still there.

This seems to be a really good suggestion, but I have a couple of concerns
about this.

First one: performance. Since MSMQ is a general-purpose system, it will
surely have some overhead when confronted to an in-memory buffer; it also
stores its data on disk, wich will introduce even more overhead. How does
MSMQ behave when used for this purpose? If its performance penalty is
significantly low compared to the time needed for a DB query, I don't care
much about it; but if it's significantly high, this can be an issue. Also,
how does it behave in terms of disk storage? Does it create log files which
will need to be purged periodically (like SQL Server do)?

Second one: data format. My program will acquire many kinds of different
data from clients: integers, floating point numbers, text, even images;
sometimes, a single block of data will be composed of combinations of these
(i.e. two floating point numbers, or three integers and some text, and so
on). These blocks of data will be managed as single objects by the
application, and will need to be passed from socket-reading threads to
DB-writing threads; this is quite easy to do with an in-memory buffer, since
objects (and pointers!) are just the same across different threads in the
same program; but what will happen when sending one of these objects to a
queue and receiving it? Will it still be valid, even if internally it
includes, say, a string or an image (which are pointers to other objects)?

Thanks for any help/suggestion :-)


Massimo
 
Here are some links.

http://www.eggheadcafe.com/articles/20041204.asp

http://www.codeproject.com/csharp/survivingpoisonmessages.asp

The queue would be good for what you're doing.

Performance is good. You can test it, but compared to db connectivity, the
queue is small drops of water in the bucket.

You need to research transactional Queues. That will answer some of your
pros and con's questions.

On your second question.
If the object is serializable, then you'll be fine. Look at Peter's
article. You'll basically drop any assembly.dll's you need into the /bin
directory of the WindowsService used to monitor the Queue.


This can help also:
http://msmvps.com/blogs/manoj/archive/2005/10/16/70979.aspx


This book
http://www.apress.com/book/supplementDownload.html?bID=279&sID=2257

has "framework" code for creating queue logic. Its 1.1. I don't know if
there is a 2.0 version of the book/code yet
 
Here are some links.

Thanks for the link, many good points.
The queue would be good for what you're doing.

I was under the same assumption :-)
Performance is good. You can test it, but compared to db connectivity,
the
queue is small drops of water in the bucket.

That's precisely the answer I was looking for.

What about disk storage? Do you know anything about its architecture and how
it behaves when processing tons of small messages? SQL Server (and any other
transaction-based storage system) would build *huge* log files here.

You need to research transactional Queues. That will answer some of your
pros and con's questions.

Yes, I was also wondering what to do if a DB query fails after the message
is removed from the queue. Howewer, since I don't particularly care about
message odering (each data block has its own timestamp), I think I can just
send the message back to the queue.
If the object is serializable, then you'll be fine.

It should, since I'm using only framework-provided data types.
Will the queue sending throw an exception if an unserializable object is
sent?
Look at Peter's
article. You'll basically drop any assembly.dll's you need into the /bin
directory of the WindowsService used to monitor the Queue.

In this scenario, the producer and the consumer are all threads of a single
application; this should avoid any issue between them.
What about the queue itself? Does MSMQ need to be told (someway) what kind
of objects it's processing?
This book
http://www.apress.com/book/supplementDownload.html?bID=279&sID=2257

has "framework" code for creating queue logic. Its 1.1. I don't know if
there is a 2.0 version of the book/code yet

..NET has a whole namespace of messaging objects (System.Messaging); basic
use is quite easy. Maybe many of these details have already been addressed
by the framework developers.


Massimo
 
See inline:


Massimo said:
Thanks for the link, many good points.


I was under the same assumption :-)


That's precisely the answer I was looking for.

What about disk storage? Do you know anything about its architecture and how
it behaves when processing tons of small messages? SQL Server (and any other
transaction-based storage system) would build *huge* log files here.

I don't know the specifics of this one. But I don't think there are alot of
transactional logs.

I testout my Queue at 5000-10000 items. I pounce it with .Send()'s and it
handles it.
I don't envision ever having more than 1000 items in there, so the
5000-10000 test was good enough for me.
I default to Transactional Queues in my coding practices.
Using Peter's startup work, I created an email sending ICommand object.
I slammed the Queue with 8000 messages, 8 threads sending 1000 each. I
think that took about 90 seconds.
The queue took it. And my WindowsService processed them all. I think it
took about 2 hours to send them all out.
It is the way to go to queue up work, instead of overload failure.

Yes, I was also wondering what to do if a DB query fails after the message
is removed from the queue. Howewer, since I don't particularly care about
message odering (each data block has its own timestamp), I think I can just
send the message back to the queue.


It should, since I'm using only framework-provided data types.
Will the queue sending throw an exception if an unserializable object is
sent?

Do a search on BinaryFormatter and msmq.
There is also an xmlformatter and an ActiveX formater (if you put objects in
from VB6 )
The WindowsService that tries to deserialize the object will fail. Esp if
you use the ICommand type thing that Peter uses.
In this scenario, the producer and the consumer are all threads of a single
application; this should avoid any issue between them.
What about the queue itself? Does MSMQ need to be told (someway) what kind
of objects it's processing?

Peter uses everything as an ICommand. That's why his approach is very clean
and good OO.
The queue is just an object holder. It doesn't care.
The thing putting messages in.... and the thing (windows service) reading
things out needs to know what it handling, aka, what kind of objects its
dealing with.


.NET has a whole namespace of messaging objects (System.Messaging); basic
use is quite easy. Maybe many of these details have already been addressed
by the framework developers.

Yeah. My approach to this framework idea has been:
Use the Factory Design Pattern, and it handles the Transactional or
NonTransactional queue decisions for me.
That's where taking the time to create a generic framework object can help,
among others.
 
I don't know the specifics of this one. But I don't think there are alot
of
transactional logs.

I testout my Queue at 5000-10000 items. I pounce it with .Send()'s and it
handles it.
I don't envision ever having more than 1000 items in there, so the
5000-10000 test was good enough for me.
I default to Transactional Queues in my coding practices.
Using Peter's startup work, I created an email sending ICommand object.
I slammed the Queue with 8000 messages, 8 threads sending 1000 each. I
think that took about 90 seconds.
The queue took it. And my WindowsService processed them all. I think it
took about 2 hours to send them all out.
It is the way to go to queue up work, instead of overload failure.

Ok, so it seems this can scale quite well. Good.
Do a search on BinaryFormatter and msmq.
There is also an xmlformatter and an ActiveX formater (if you put objects
in
from VB6 )
The WindowsService that tries to deserialize the object will fail. Esp if
you use the ICommand type thing that Peter uses.

I'm not using any complex architecture here, just sending a couple of
objects containing data to be inserted into the DB; these classes are just
data structures, there's no code in them. And the producer and the consumer
will be threads of the same process, or maybe (I'm still thinking about it)
two processes using the same assembly DLLs which define the data objects.
The only thing I don't know is how, when and why exactly an object becomes
not serializable...
Peter uses everything as an ICommand. That's why his approach is very
clean
and good OO.

But it's just too complex for my needs :-)
The queue is just an object holder. It doesn't care.
The thing putting messages in.... and the thing (windows service) reading
things out needs to know what it handling, aka, what kind of objects its
dealing with.

They will.


Massimo
 
At the most basic level

[Serializable]
public class MySerializableClass

Just mark it with this attribute.

See here for more info:
http://www.ondotnet.com/pub/a/dotnet/2002/08/26/serialization.html

If you using vb.net.... there is an additional gotcha.
VB.NET does not allow the application the <NonSerialized> attribute to
events (you can in C# by using the Field: modifier).
As a result, there is no simple way of telling the runtime not to serialize
the event fields.
This results in serializing objects that you didn't expect, resulting in a
larger stream.
If the object handling the events is not Serializable, then the
serialization process will throw an exception.
( See http://www.codeproject.com/vb/net/serializevbclasses.asp for more info
and workaround )


But if you simply have some int, strings or basic objects, then just mark it
as [Serializable] as that should be it.

...

Sounds like you're off to the races....................
 
[Serializable]
public class MySerializableClass

Just mark it with this attribute.
[...]

But if you simply have some int, strings or basic objects, then just mark
it
as [Serializable] as that should be it.

Ok, thanks.


Massimo
 
Back
Top