Trying to remove very similar (not quite duplicate) records

h.a. collins · Apr 11, 2004

Hi Everyone:

I am working on an Access 2003 database with one table(!) holding
close to 70K records. I am trying to help a professor who uses this
"database" (really a glorified spreadsheet)to keep track of Nonprofit
Orgs throughout the state. While I understand the best course of
action would be to build a new database for her, time and
circumstances do not permit; I have been asked to help with one
particular issue they are having with similar, but not quite
duplicate, records.

I have built a select query that will find all records with duplicate
ID numbers (there are two possible ID nos. assigned by the state/fed
gov.):

SELECT INP.EIN, Count(INP.EIN) AS Expr1, INP.SOS_ID, Count(INP.SOS_ID)
AS CountOfSOS_ID
FROM INP
GROUP BY INP.EIN, INP.SOS_ID
HAVING (((Count(INP.EIN))>1)) OR (((Count(INP.SOS_ID))>1))
ORDER BY INP.EIN, INP.SOS_ID;

Unfortunately, these are not pure duplicate records: in most
instances, the Organization name is a little bit different, or the
Contact name is different. Thus, I cannot just delete one of the dupes
and keep the other. I was hoping to create a form that would allow the
client to see the records and actually choose the one she wants to
remove (using a checkbox). I then thought we could run an append query
to remove all those records in the system that have a check in the
"delete record" box. So I created another query, based on the above
one, so we could see the names, IDS, etc. for each record:

SELECT INP.EIN, INP.SOS_ID, INP.ORGNAME, INP.ADDRESS, INP.ADDRESS2,
INP.CITY, INP.ZIP, INP.ZIP2, INP.CONTACT, INP.[OLD CONTACT],
INP.STATUS, INP.[MARK FOR DELETION]
FROM INP RIGHT JOIN [01 - INP EIN Duplicates] ON INP.EIN = [01 - INP
EIN Duplicates].EIN
ORDER BY INP.EIN, INP.ORGNAME;

However, I have now learned that I cannot update the resulting records
from this query, because it is based on the first query, which uses
the group by/count functions.
Any ideas on how best to proceed?
I changed my second query to a "Make Table", it will create a new
table with these records which could then be updated/marked for
deletion. But, if I do this, I am unsure how next to proceed (that is,
update the other main table)...

Thanks in advance for any insight,
Heather

John Spencer (MVP) · Apr 13, 2004

I would try to break this down into two queries instead of the one, but you
should be able to do it in one,

SELECT INP.*
FROM INP
WHERE INP.EIN IN
(SELECT A.EIN
FROM INP as A
GROUP BY A.EIN
HAVING Count(A.EIN) > 1)
OR INP.SOS_ID IN
(SELECT B.SOS_ID
FROM INP as B
GROUP BY A.SOS_ID
HAVING Count(B.SOS_ID) > 1)

This will be updatable and if you have a field "DeleteThisOne" you can set the
field to true and then do a simple delete query once you've marked all the
records you wish to delete.

OR you could directly delete any record, just by clicking on it and selecting
delete record from the menu.

h.a. collins said:
Hi Everyone:

I am working on an Access 2003 database with one table(!) holding
close to 70K records. I am trying to help a professor who uses this
"database" (really a glorified spreadsheet)to keep track of Nonprofit
Orgs throughout the state. While I understand the best course of
action would be to build a new database for her, time and
circumstances do not permit; I have been asked to help with one
particular issue they are having with similar, but not quite
duplicate, records.

I have built a select query that will find all records with duplicate
ID numbers (there are two possible ID nos. assigned by the state/fed
gov.):

SELECT INP.EIN, Count(INP.EIN) AS Expr1, INP.SOS_ID, Count(INP.SOS_ID)
AS CountOfSOS_ID
FROM INP
GROUP BY INP.EIN, INP.SOS_ID
HAVING (((Count(INP.EIN))>1)) OR (((Count(INP.SOS_ID))>1))
ORDER BY INP.EIN, INP.SOS_ID;

Unfortunately, these are not pure duplicate records: in most
instances, the Organization name is a little bit different, or the
Contact name is different. Thus, I cannot just delete one of the dupes
and keep the other. I was hoping to create a form that would allow the
client to see the records and actually choose the one she wants to
remove (using a checkbox). I then thought we could run an append query
to remove all those records in the system that have a check in the
"delete record" box. So I created another query, based on the above
one, so we could see the names, IDS, etc. for each record:

SELECT INP.EIN, INP.SOS_ID, INP.ORGNAME, INP.ADDRESS, INP.ADDRESS2,
INP.CITY, INP.ZIP, INP.ZIP2, INP.CONTACT, INP.[OLD CONTACT],
INP.STATUS, INP.[MARK FOR DELETION]
FROM INP RIGHT JOIN [01 - INP EIN Duplicates] ON INP.EIN = [01 - INP
EIN Duplicates].EIN
ORDER BY INP.EIN, INP.ORGNAME;

However, I have now learned that I cannot update the resulting records
from this query, because it is based on the first query, which uses
the group by/count functions.
Any ideas on how best to proceed?
I changed my second query to a "Make Table", it will create a new
table with these records which could then be updated/marked for
deletion. But, if I do this, I am unsure how next to proceed (that is,
update the other main table)...

Thanks in advance for any insight,
Heather

h.a. collins · Apr 18, 2004

Thanks John--
I actually did break it into two queries. This is perfect.
Thank you again!

John Spencer (MVP) said:
I would try to break this down into two queries instead of the one, but you
should be able to do it in one,

SELECT INP.*
FROM INP
WHERE INP.EIN IN
(SELECT A.EIN
FROM INP as A
GROUP BY A.EIN
HAVING Count(A.EIN) > 1)
OR INP.SOS_ID IN
(SELECT B.SOS_ID
FROM INP as B
GROUP BY A.SOS_ID
HAVING Count(B.SOS_ID) > 1)

This will be updatable and if you have a field "DeleteThisOne" you can set the
field to true and then do a simple delete query once you've marked all the
records you wish to delete.

OR you could directly delete any record, just by clicking on it and selecting
delete record from the menu.

h.a. collins said:

Hi Everyone:

I am working on an Access 2003 database with one table(!) holding
close to 70K records. I am trying to help a professor who uses this
"database" (really a glorified spreadsheet)to keep track of Nonprofit
Orgs throughout the state. While I understand the best course of
action would be to build a new database for her, time and
circumstances do not permit; I have been asked to help with one
particular issue they are having with similar, but not quite
duplicate, records.

I have built a select query that will find all records with duplicate
ID numbers (there are two possible ID nos. assigned by the state/fed
gov.):

SELECT INP.EIN, Count(INP.EIN) AS Expr1, INP.SOS_ID, Count(INP.SOS_ID)
AS CountOfSOS_ID
FROM INP
GROUP BY INP.EIN, INP.SOS_ID
HAVING (((Count(INP.EIN))>1)) OR (((Count(INP.SOS_ID))>1))
ORDER BY INP.EIN, INP.SOS_ID;

Unfortunately, these are not pure duplicate records: in most
instances, the Organization name is a little bit different, or the
Contact name is different. Thus, I cannot just delete one of the dupes
and keep the other. I was hoping to create a form that would allow the
client to see the records and actually choose the one she wants to
remove (using a checkbox). I then thought we could run an append query
to remove all those records in the system that have a check in the
"delete record" box. So I created another query, based on the above
one, so we could see the names, IDS, etc. for each record:

SELECT INP.EIN, INP.SOS_ID, INP.ORGNAME, INP.ADDRESS, INP.ADDRESS2,
INP.CITY, INP.ZIP, INP.ZIP2, INP.CONTACT, INP.[OLD CONTACT],
INP.STATUS, INP.[MARK FOR DELETION]
FROM INP RIGHT JOIN [01 - INP EIN Duplicates] ON INP.EIN = [01 - INP
EIN Duplicates].EIN
ORDER BY INP.EIN, INP.ORGNAME;

However, I have now learned that I cannot update the resulting records
from this query, because it is based on the first query, which uses
the group by/count functions.
Any ideas on how best to proceed?
I changed my second query to a "Make Table", it will create a new
table with these records which could then be updated/marked for
deletion. But, if I do this, I am unsure how next to proceed (that is,
update the other main table)...

Thanks in advance for any insight,
Heather

Click to expand...

RCavallaro · Apr 22, 2004

Hi Heaher, Did you ever get a response on your access
question about deleting duplicate records. I have been
trying to do something similiar and am having the same
problem. It seems like if you use a "find duplicate"
query, it becomes impossible to delete the duplicate
records due to the calculation. Any help would be
appreciated!!! THANKS, Rosemary

Trying to remove very similar (not quite duplicate) records

h.a. collins

John Spencer (MVP)

h.a. collins

RCavallaro