Beginner help on a service to synchronise two databases

Colin Peters · Jun 9, 2009

My requirement is to have a windows service that monitors 2 data stores.
When there’s a change in one it should be written to the other. One
data store is a local DB, and the other is remote and is accessed
through a http-based interface. The data stores aren’t mirrors of each
other; they just contain some common data that needs to be synchronised.
Each data store already exposes a method for determining if there is
anything to be synchronised.

My initial thought is that the service launches a thread that will poll
each data store at regular intervals. (This would be done in a thread
because the “queries” can take many seconds, and I want the service to
still respond to system message during this time). If it discovers
updates to be made, it will then write them to the other data store. The
thing is, the remote data store can be quite slow, so I’ll need to have
different polling frequencies for each end. Also, I want to ensure that
the last transaction with a particular store is finished before I launch
the next one. For example, I read from localStore, and determine that an
update to remoteStore needs to be made. During the time that the
remoteStore is being updated I don’t want to perform a query on
remoteStore; there's no point, because it's busy doing its' update.

Clearly some kind of workflow/queuing mechanism is required that can
respect my wish to have at most 1 connection to each data store. And
maybe some well-established framework for keeping track of what changes
are pending/timed out etc. Does C#/.Net provide any interesting
namespaces and classes to prevent me having to hand-roll my own
implementation?

Just a few links or topic names would be great.

Thank you

Gregory A. Beamer · Jun 15, 2009

My requirement is to have a windows service that monitors 2 data
stores. When there’s a change in one it should be written to the
other. One data store is a local DB, and the other is remote and is
accessed through a http-based interface. The data stores aren’t
mirrors of each other; they just contain some common data that needs
to be synchronised. Each data store already exposes a method for
determining if there is anything to be synchronised.

My initial thought is that the service launches a thread that will
poll each data store at regular intervals. (This would be done in a
thread because the “queries” can take many seconds, and I want the
service to still respond to system message during this time). If it
discovers updates to be made, it will then write them to the other
data store. The thing is, the remote data store can be quite slow, so
I’ll need to have different polling frequencies for each end. Also, I
want to ensure that the last transaction with a particular store is
finished before I launch the next one. For example, I read from
localStore, and determine that an update to remoteStore needs to be
made. During the time that the remoteStore is being updated I don’t
want to perform a query on remoteStore; there's no point, because it's
busy doing its' update.

Clearly some kind of workflow/queuing mechanism is required that can
respect my wish to have at most 1 connection to each data store. And
maybe some well-established framework for keeping track of what
changes are pending/timed out etc. Does C#/.Net provide any
interesting namespaces and classes to prevent me having to hand-roll
my own implementation?

Just a few links or topic names would be great.

There are a couple of things you can try:

1. If there is anyway to "hard wire" the network connection (VPN?), SQL
Server replication is your best option, as it can work both ways, if
necessary.

2. Another option, which has a learning curve, is to set up a queue for
insert. Service Broker (SQL 2005 or newer) is a great way to set this
up. You would queue on both sides with a service that inserts and sends
to a queue on the other side. If two ways, you end up with two queues on
each side, or some logic in the service to ensure a second item has not
been inserted (check insert for record before inserting).

3. Set up a trigger to insert to the other machine on insert in one
table. Same for updates and deletes.

If both sides, however, you have a potential issue with numbering of
items (options 2 & 3). If using row ids (unique identifier or GUID),
there is no real issue, as the network card is included in the
algorithm. If Identity Insert, you will have to set aside different
identities for each machine to make sure two inserts at the same time do
not clash.

Setting up the HTTP is not difficult either, as you can set up an HTTP
endpoint in SQL Server.

Without knowing more about the constraints between the databases, I
cannot tell you which of the options will actually work for you.
Replication is the easiest, if you can set the plumbing up correctly.

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************

Colin Peters · Jun 16, 2009

Gregory said:
There are a couple of things you can try:

1. If there is anyway to "hard wire" the network connection (VPN?), SQL
Server replication is your best option, as it can work both ways, if
necessary.

Thanks for the reply. The thing is, I don't know what format the second
data store is; it's just accessed through SOAP style calls, so
replication isn't going to work. This isn't SQLServer to SQLServer, or
even SQLServer to XML or some other known format.

In the meantime I've started to investigate the Producer Consumer
pattern. C#/.Net has some great support for problems involving threading.

Thanks for your reply.

Beginner help on a service to synchronise two databases

Colin Peters

Gregory A. Beamer

Colin Peters