SQL Bait-and-Switch

  • Thread starter Thread starter MDW
  • Start date Start date
M

MDW

OK, Access97.

I've got two tables, one with bad (redundant) data, one
with clean (unique) data. The table with the bad data is
(of course) the one in production, and I'm trying to clean
it up using the clean data.

However, the column I'm most interested in correcting is
the table's Primary Key and this table is involved in
referential integrity-type relationships with about 4
other tables. I don't want to delete those relationships
because that would mean 4 times as many updates to do.

Here's the types of inconcistancies I'm trying to
eliminate:

If the "good" table lists an ID as '12345', the "bad"
table might list it as '12345', and ALSO as '00012345'.
Both of those values might be included in relationships.
What I want to do is somehow update the '00012345' to
be '12345', and then delete one of the two duplicates. The
trick, of course, is to maintain my relationships so that
anywhere the db saw '00012345', the extra zeroes are
eliminated.

What do you think - am I chasing a white whale here? ;)
 
I've had to do something very like this in the past. You have to
approach it in stages, and you do, at some point, have to delete and
recreate relationships, but you can do it in a way that avoids too
much manual labour. I believe that Access 97 didn't have the automatic
field name change tracking feature that 2000 and 2002 has - if it
does, turn it off before you start.

1) For each Table which links to the primary key on the Table you are
cleaning up, add a new field of the same type of the primary Table's
primary key (presumably text, if it can tell the difference between
"12345" and "00012345").

2) Again for each linked Table, create an update query to update the
new field to contain _correctly_ _formatted_ keys - the calculated
field that you will use as the source for the new field should strip
the leading zeroes off keys with the "bad" format but transfer those
with the "good" format over unchanged.

3) Again for each linked Table, delete the relational link to the
primary Table, change the name of the old link field to (e.g.)
OldLink, and change the name of the field that you have just created
and populated to whatever the old link field was called. Recreate the
relational link to the renamed new link field. This should "just
work". If it doesn't, look for records in the primary Table where the
primary key is only in the "bad" format.

4) In the primary table, consolidate the information from the
corresponding "bad" and "good" records in the "good" records (if
necessary) and delete the bad ones, which will no longer have
associated records in the linked tables.

5) In each linked table, delete the old (renamed) link field.
 
If the "good" table lists an ID as '12345', the "bad"
table might list it as '12345', and ALSO as '00012345'.
Both of those values might be included in relationships.
What I want to do is somehow update the '00012345' to
be '12345', and then delete one of the two duplicates. The
trick, of course, is to maintain my relationships so that
anywhere the db saw '00012345', the extra zeroes are
eliminated.

EEeeuwwww...

My concern would be in the related tables. Might there be duplicate
data, some with 12345 and some with 00012345? Might there be
CONFLICTING data? Blindly updating or deleting related records would
be unwise in the extreme...

I don't see any *easy* way to do this. If you decide that the 12345
record is in fact valid, then you could run Update queries to update
all 00012345 records in the related tables to 12345 (since the
relational integrity is ok), *IF* the data in the related tables is in
fact ok and not duplicated. But you would not be able to use Cascade
Updates to do so, since you cannot change the main table's 00012345 to
12345 without getting a key violation.

Good luck.
 
Back
Top