Multiple concurrent isolated transactions on ADO Dataset

  • Thread starter Thread starter Mark Roberts
  • Start date Start date
M

Mark Roberts

Hi,

I have been thinking about a problem for some time, and i'd like to throw it
open to the newsgroups in case someone has come across a similar situation.

I have a C# .Net 2.0 web application with a very large server-side dataset
(i.e. can only be 1 in memory).

I have multiple users that wish to make changes to the dataset, and each
user's should be batched up and only visible to themselves until they're
committed, at which point they become part of the main dataset.

I am thinking of have one dataset per user as well as the main dataset, and
recording changes into the user-based dataset.

This is ok, but it adds a lot of complexity when querying the data, since
queries are essentially across two datasets.

Ideally, i'd like to have multiple transaction layers on the main dataset,
one for each user that isolates their changes, but permits dataset searching
to work in the context of their transaction until it's committed or rolled
back.


Many thanks!
Mark.
 
What database do you use? You describe scenario that is easily handled by
practically any database. You said that amount of data is huge - creating
per-user copies will kill your server - think about memory allocation and
copying. Is it not possible to operate directly on database?

Peter
 
Hi

No, it's not really possible to operate directly on the database
unfortunately - the changes that the user can make will be over a long time
(i.e. could be 1-2 hours, maybe more)

I don't really want to attempt to keep a database transaction open for that
long :)

Regards,

Mark.
 
Hi Mark,

So, you want an in-memory database... with isolated transactions etc...
I wouldn't go for this solution unless you find something you can buy and
use - it's technically quite complex.
I would be aware wether your solution scales out and what happens if server
processes get recycled etc.

Do you really need that large dataset in memory? Can't you fetch the data
from the database on a per-request basis and let the database worry about
the ACI(D) ?
The database caches data in memory as well and scales... if you want, you
can get it on the same machine as your web server (well, if you want).

My experience with this big in-memory datasets is that people consider some
kind of "performance" somehow a bit to much. But often we could find
alternative solutions fetching data directly from the database and keeping
some kind of data cached (which is mostly readonly and therefore
transactionally not critically).
It's dangerous to consider all the data of an application as "the same".
Some is transactional, others is more or less just configuration information
or static data (like some metainformation etc.) that can easily be cached.

Transactional caches is what some ORMs use and for my understanding, it's
really complex (event though the theory isn't that crazy, in most of the
cases it just doesn't make sense to implement that unless it's your business
to build this kind of things)

The idea of having some scaling, transactional in-memory database that can
be distributed over nodes is intersting - if someone knows about or has
experience in the .net area with that, please post a message.

just my thoughts about your post

cheers

Chris
 
You don't need to keep the connection open

Why don't you use optimistic or pesimistic offline locking? (the optmisitc
one is more or less included in the dataadapter)
 
Well, i have a situation where it's fairly critical for changes made to the
data to be only visible to the person making the changes (until commit),
whilst the unchanged version should be visible to all other users... until
they make changes.

When the changes are committed, it's a first come, first served. I.e. it's
possible for person B to have their changes rejected as a result of person
A's changes.

I'm using a command pattern to implement the updates, inserts and deletes,
and stacking them up for the final commit...

Scenario:
Imagine a plane with 200 seats, and person A is allocated to seat 171, but
wants to be allocated to seat 89... the changes necessary re-allocate him
would be to deallocate from 171 and allocate to 89. Say the user was
interupted on the phone between deallocating from 171 and re-allocating to
89, and at that same time, another user allocated someone to 89 and another
to 171... Person A would now be without a seat on the plane.

Sorry for silly example :)

Mark.
 
I still don't see the problem

for you're scenario, you take out the data you need from the database change
it and write it back. You just don't need to write back temporary results
while you're in a business transaction.

Person A want's to change the seat:
- he checks the current status of seats and sees 89 is not taken yet, so
he changes his local data: he removes him self from 171 and places him on
89 - just in the small dataset that contains the seat reservation relevant
to him
- when he comits, after his coffe break and phone calls, optimistic
concurrency is applied
- he won't be rejected on removing him from seat 171 because the
customer only talked to one travel agent
- he will maybe fail to change to seat 89 (be cause first come first
serve or optmistic concurrency detected modifications on that "row" - or how
ever this is done...) then the whole sysem transaction is rolled back which
consists of removing him from 171 and placing him on 89 (but it runs only
for milliseconds)

if it's web app there might not even be a dataset involved
- the guy just says he want to change the booking, sees a list of
available seats, clicks on one and the system will change the booking - if
the list he saw was 2 hours old chances are high that the system detects
concurrency vialoations (optimistic once in that case) and again the system
transaction is rolled back...

maybe you should try to get your datasets smaller and fill them on demand
with data. I mean, it doesn't make sense to fill the dataset with all flight
reservation informations for all the flights of the next 12 months when a
customer is only interested in one particular flight. Then you're master
dataset is suddenly the database and the datasets you have with the chances
contain more than the changes but not all the data but only the one relevant
to the user or the use cases that user is working in.

Since it's likely that this client is human it's natural that he will have a
cetain limitation on data he can work on in one particular use case or maybe
a series of use cases. So, I still think you can get rid of the big caching
dataset.

cheers

chris
 
Thanks for the reply Chris :)

Well, your suggestion is along the lines of what i'm trying to do:

I have command objects that broadly have "Do", "Undo" and "GetData" methods
on them.

When a command is called (e.g. Deallocate), the GetData is called, and the
command object populates a user-specific dataset with the necessary data
from the central shared dataset, then the "Do" does the work on it.

Then, the allocate command would do similar. The commands are stacked up
(there could be many of them), so that when the user "Commits", they can be
run again, but this time, the data is retrieved directly from the database,
and the commands are run consequitively; all wrapped withing a database
transaction which succeeds or fails.

This is quite a common pattern, i believe, but it's not perfect: whenever a
query is performed to display data to the user, it must take into account
the changes he's made to his personal dataset... so if there's one dataset
with all the data, and one with his changes, the query has to run over them
both. A simple "GetAllocations" query then becomes a lot more complex.

Another suggestion could be to build up a stack of SQL statements (inserts,
deletes, updates etc.) that represent the commands, and then begin a
transaction and apply them all in order each time a query needs to be made,
to get the correct state of the database for the user, then rollback the
transaction. (Obviously, when the user presses "Commit", the transaction is
committed). There is some appeal to this, but it still doesn't seme right to
have to re-create transactions each time a view of pre-commit-user-specific
changes is required.


Regards,
Mark.
 
Ok, I can imagine such scenario:
User loads data with seats in plane. He wants to change several seats, let's
say
seat 89->141
141->56
78->79
and so on.
meanwhile, other user also changes seats on this plane: 66->141, 34->56,
56->43 and so on.
Do you think it is reasonable to keep those users unaware of changes they do
to each other?
In my opinion each change of seats should be wrapped in one short
transaction and view should be updated after it. If not, there is huge
possibility that complex transaction that you would want to commit at the
end of work of each user will fail - and their time would be wasted.

Peter
 
I can see your point, but in my case, the time taken between a deallocation
and an allocation is too great to warrant one short transaction, also, there
are a great many of them (i.e. many deallocations, then potentially many
re-allocations) - it's almost being used to model a situation. The user can
then make use of the undo facilicity to remove any number of previous
allocations/deallocations/other commands, as part of this modelling, prior
to commit.


Regards,
Mark.
 
Back
Top