File compare classes ??

  • Thread starter Thread starter Uwe Kuhne
  • Start date Start date
U

Uwe Kuhne

I've the following situation to solve.
A customer has an old software with a proprietary DB-system.
That means I can't use sql to query the database. But I need the data in an
other piece of software.
The only way to access the data is by exporting a whole table from the old
system and importing it in SQL Server of the new system.
Because this has to be done on a regular basis (several time per day),
performance is very important.
Therefore my idea was to store always two version of the exported file. The
last one and the actual one, execute a file compare and import just the
differences.
An additional problem might be the size of the files because the old system
contains the customer data and an export of the customer datas can be more
than 10 MB.

My question is, if there are any classes in .NET framework, which I can use
for this fast file compare or does someone know a piece of code to download
to do this compare.

Different, may be better suggestion to solve this problem, would also be
appreciated.

Many thanks for help

Uwe
 
You are better served to import into a temporary table and find records with
differences in SQL Server. You can certainly compare the file, but it is
difficult to write an engine to make that meaningful.

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

**********************************************************************
Think Outside the Box!
**********************************************************************
 
Thanks for reply. Nice idea, I would never had this on my
own. I'll try to see if it works.
Uwe
 
Thanks for answering. I'll also try this idea to see if
it's more easier to compare inside SQL Server.
 
Hello Uwe,

While I would suggest that comparing the data in SQL may be easier, as Mr.
Beamer suggests, I doubt that it would be faster.
SQL's overhead is high for things like this, given that the vast majority of
records in your table will be read into SQL and then deleted.
Depending on the complexity of the unique key, this is a slow process in
SQL.

Good news is: 10MB files are not large (at least, not by SQL Server
standards).

In my opinion, you should probably follow Mr. Beamer's advice for a proof of
concept stage, to get the rest of your application working. However, for a
full production version, you should write a simple difference check app that
scans the files, as you suggested, and returns differences.

The algorithm for such an app is simple, and has been coded thousands of
times over the years. This is a VERY common thing to do.

Prerequisite: produce your output query from the old system sorted by the
unique key values. This is the file NEWDATA

a) open the file OLDDATA which has the last output query in it.
b) open the file NEWDATA which has the most recent query in it.
c) move through the NEWDATA one record at a time, forward only
For each record, compare the next record in OLDDATA
c1) If the records exists in the OLDDATA - it still exists and is not
new.
c2) If the next record in OLDDATA does not match, compare the keys. The
new record would sort Before or After the old record.
c2a) if the new record sorts Before the old record, it is new.
Add it to a NEWRECORDS file.
c2b) if the new record sorts After the old record, the old record
has been deleted. Add the
old record to a DELETEDRECORDS file and read in the next
old record. Restart the
loop without reading a new record.
d) at the end of the loop, you have two files: NEWRECORDS and
DELETEDRECORDS. It is fairly
simple to insert the new records and delete the deleted ones using ADO
calls. From the sounds of your
volume, I doubt that you will need to "bulk load" the data.

Note: if you are bringing across many tables from the old system to SQL
Server, you will want to do this:
1) For the very first load of data from the old system, bring across
independent tables first (ones with no foreign keys) and then the dependent
tables.
2) for the algorithm above, do the Dependent tables first, and then the
independent ones. This will allow a cascading delete on the source side
result in the correct data on the destination side.

I hope this helps,
--- Nick Malik
 
Back
Top