Hello Frank
Because we are comparing the performance between DataTable.Merge and our own
implementation, I looked into the code of DataTable.Merge with .NET
reflector
http://www.aisto.com/roeder/dotnet/.
In the implementation of DataTable.Merge, I see it has this code to merge
src table to dst table:
foreach (DataRow row2 in src.Rows)
{
DataRow targetRow = null;
if (ndx != null)
{
targetRow = dst.FindMergeTarget(row2, srcKey, ndx);
}
dst.MergeRow(row2, targetRow, this.preserveChanges, ndx);
}
It enumerates the source table's rows and finds the target row with
"dst.FindMergeTarget(row2, srcKey, ndx);" according to the primary key. In
the MergeRow function, the code updates the rows based on the DataRowState
as you said. Here is part of MergeRow's code:
if ((targetRow.RowState == DataRowState.Unchanged) && (row.RowState ==
DataRowState.Unchanged))
{
oldRecord = targetRow.oldRecord;
newRecord = preserveChanges ? this.recordManager.CopyRecord(this,
oldRecord, -1) : targetRow.newRecord;
oldRecord = this.recordManager.CopyRecord(row.Table, row.oldRecord,
targetRow.oldRecord);
this.SetMergeRecords(targetRow, newRecord, oldRecord,
DataRowAction.Change);
}
else if (row.newRecord == -1)
{
oldRecord = targetRow.oldRecord;
.......
I think what we need to do is to re-write this part of code in our own
library to do a real comparison of the column values, instead of relying on
RowState. The comparison of the column values is definitely slower than
retrieving the RowState, thus, this modification may consume some
performance.
In order to delete the rows, we reverse through the target datatable (i.e.
dt1 in dt1.Merge(dt2)), and do a row Find against the source datatable (i.e.
dt2 in dt1.Merge(dt2)). If no match is found, we call Delete() on the row
which will make the rowstate as deleted.
In conclusion, the above solution brings some performance issues in two
aspects:
1. the additional comparison of the column values for each rows in the
source datatable.
2. the additional Find behavior (based on pk) for each rows in the target
datatable.
To determine which of the above majorly influences the performance, you may
use some .NET profilers
http://search.live.com/results.aspx?q=.NET+profiler&form=QBLH, or simply
calculate the code execution time with
http://msmvps.com/blogs/omar/archiv...ng-_2600_quot_3B00_using_2600_quot_3B00_.aspx.
In addition, I find a relevant discussion of this issue in the newsgroup:
http://www.devnewsgroups.net/group/microsoft.public.dotnet.framework.adonet/topic38348.aspx
What about your current method to merge the datatables? Is my analysis
helpful to you?
Regards,
Jialiang Ge (
[email protected], remove 'online.')
Microsoft Online Community Support
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).
==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notifications.
Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.