Help, Updating Access with ado.net performance problem

  • Thread starter Thread starter Jason
  • Start date Start date
J

Jason

Hi,

I have an access database with around 500k records.
I have C++.Net code that does the normal ado.net thing:
1) open connection
2) attach adapter with sql select statement
3) fill dataset
4) loop through all rows in dataset, modify each row as desired
5) fill in the update sql statement in adapter: "Update table_name
set col = ? Where ..."
6) call adapter.update

It took more than 2 hours to finish, yes, more than 2 hours.
My program spent almost all of the time in the adapter.Update()
function call.


My questions are:
1) Is there a work around this problem?
2) If this is a known problem, is Microsoft working on a solution? Or
is Microsoft leaving it as it is?
2) does the same performance problem exist between ADO.NET and all
other database as well?

Thanks in advance for your help.
 
Update sends the commands sequentially so if you actually changed 500k
records, performance could be slow although 2 hours for the updates alone at
first glimpse does sound like a long time....

Is there a work around? Well, it depends on what's causing the problem in
the first place. Is this over a network? If so, what kind of muscle does
the host machine have on it? Have you checked the performance of the
machine hosting the DB? Table structure can definitely affect the speed and
indexes, while speeding select queries slow insert and delete statements.
It's hard to know for sure without looking at the indexes.

Is this a known problem? Well, ado.net functions in a disconnected
environment as you are aware, so with tremendous flexibility comes a
performance hit. Is this the sole cause of the problem? It's hard to tell
without knowing about the above factors.

Does this happen with other DB's? I don't know the record size, but that
seems inordinately long. I just created a test app and a dummy table on a
SQL Server 2000 DB and an Oracle 9i machine. I'm only updating one field in
each respective table. Inserting the rows in the first place took just over
12 minutes on the SQL Server machine and a little over 11 minutes on the
Oracle box. Neither is a high performance server hardware wise. Changing
each record took under 30 seconds on both machines (the results were almost
identical b/c the Change is happening to the local DataTable). Sending back
the updates took about 5 minutes on both machines. I did have an index on
both table when I fired the updates but not the inserts, and I only had a
one record field, but no, I think 2 hours is a bit long unless you have
multiple indices and big tables.

Updating 500k records is never going to be that fast, but 2 hours seems a
bit long. On the other hand, doing anything with Access and 500k records
over a network is probably going to be a headache.

HTH,

Bill
 
William,
thanks for your reply.

Is this over a network? If so, what kind of muscle does
the host machine have on it? Have you checked the >performance of the
machine hosting the DB?

Its in a local machine. My machine is over 700 mhz and over 500 MB ram.
Thats what drives me nuts, how can it still be so slow? I can do the
same kind of updates with ODBC in VC++ 6 within maybe 10 mins, so this
must be a problem with ADO.NET?
Table structure can definitely affect the speed and
indexes, while speeding select queries slow insert and >delete statements.
It's hard to know for sure without looking at the indexes.

I am not an expert on databases, What I have is 1 table in the Access
database, and in the table, 1 key column and other text columns. The
whole mdb is only 12 Mega Bytes.
Updating 500k records is never going to be that fast, but >2 hours seems a
bit long. On the other hand, doing anything with Access >and 500k records
over a network is probably going to be a headache.

Can you or someone reading this post try updating an access database
over 500k rows with ADO.NET in a local machine? I am just tired of
trying diffrent things with the same result.

Thanks.
 
This is unfortunatelly not so much a known problem as a known feature, the
Adapter will create for you a unique update statement for each modified row
in the dataset. So if you modify 500k records it will generate 500k update
statements and send each one _individually_ over to the database. We are
definitelly aware of this current limitation and working at improving this
model.

For now the answer you are looking for is to not use autogen for a task it
was not designed for. Autogen is a great feature to jumpstart a sample, but
for a production environment you should not be relying on it. The solution
is for you to create your own update commands, there are many good
description of this in the current batch of ado.net books, you may want to
look at Bill Vaughns ADO.NET book for a good description of how to do this
and a harsh critique of autogen.

Even creating your own update command and batching your update statements
manually is probably not going to give you the performance you are looking
for. You may be able to increase the performance significantly by optimizing
your batches and using transactions arround each batch to avoid autocommit
penalties but ultimatelly you are probably going to be better off with
something like Bulk Copy for updating that many rows.
 
Hi,

I have an access database with around 500k records.
I have C++.Net code that does the normal ado.net thing:
1) open connection
2) attach adapter with sql select statement
3) fill dataset
4) loop through all rows in dataset, modify each row as desired
5) fill in the update sql statement in adapter: "Update table_name
set col = ? Where ..."
6) call adapter.update

It took more than 2 hours to finish, yes, more than 2 hours.
My program spent almost all of the time in the adapter.Update()
function call.


My questions are:
1) Is there a work around this problem?
2) If this is a known problem, is Microsoft working on a solution? Or
is Microsoft leaving it as it is?
2) does the same performance problem exist between ADO.NET and all
other database as well?

Thanks in advance for your help.
If this is a bulk database operation then why not write the query in
access sql and run it there?
 
This is unfortunatelly not so much a known problem as a >known feature, the
Adapter will create for you a unique update statement for >each modified row
in the dataset. So if you modify 500k records it will >generate 500k update
statements and send each one _individually_ over to the >database. We are
definitelly aware of this current limitation and working >at improving this
model.

Thank you, good information.
For now the answer you are looking for is to not use >autogen for a task it
was not designed for

No I didn't use autogen( I am assumming autogen is the 1 function call
that's suppose to automatically generate update statements for you). I
called "adr->set_updatecmd(update_command)" before calling Update();
I've tried autogen, it has same speed problem.
you may want to
look at Bill Vaughns ADO.NET book for a good description >of how to do this
and a harsh critique of autogen.

Thanks, I'll probably check it out or any ADO.NET books.
We are currently considering using ADO.NET as an interface to SQL2000,
I'll have to test if it has the same problem as in Access.

You may be able to increase the performance significantly >by optimizing
your batches and using transactions arround each batch to >avoid autocommit
penalties but ultimatelly you are probably going to be >better off with
something like Bulk Copy for updating that many rows.

I don't know how to do this in ADO.NET yet, maybe after reading some
books.

Thank you.
 
If this is a bulk database operation then why not write >the query in
access sql and run it there?

I need to do some business logics and then update the database. Yes, I
could've probably do every thing inside access with query or VBA, but I
am more familar with the langauge and tools I have with C++.

I guess the question is: Should I advoid using ADO.NET to update access
because its too slow?

Thx.
 
Back
Top