Subject: modifying SQL criteria in a search for duplicates

  • Thread starter Thread starter Paul James
  • Start date Start date
P

Paul James

I'm trying to build a query that displays all records that have matching
strings in the first four characters of two different fields: AgentName and
StreetAddress. That is, where the first four characters of AgentName are
equal in both records, AND the first four characters of StreetAddress are
equal in both records.

Since the Find Duplicates Query Wizard creates a query that only finds exact
matches, I ran the Wizard and tried replacing those exact matches in the SQL
code created by the Wizard. I tried enclosing AgentName and StreetAddress
with the
Left() function, using 4 as a parameter. However, Access could not run a
query on that modified code and produced the error message saying that the
criteria was not part of an aggregate function.

To produce the exact matches in those two fields, the query wizard inserts
this SQL code as criteria in the AgentName field (in the QBE screen):

In (SELECT [AgentName] FROM [tblAgent] As Tmp GROUP BY
[AgentName],[AgentStreetAddress] HAVING Count(*)>1 And [AgentStreetAddress]
= [tblAgent].[AgentStreetAddress])

The entire SQL code which includes the above criteria is:

SELECT tblAgent.AgentName, tblAgent.AgentStreetAddress, tblAgent.LicenseeID,
tblAgent.AgentID, tblAgent.AgentOwnerLastName,
tblAgent.AgentOwnerFirstName2, tblAgent.AgentCity, tblAgent.AgentCloseDate,
tblAgent.AgentDenyDate, tblAgent.AgentWithdrawnDate
FROM tblAgent
WHERE (((tblAgent.AgentName) In (SELECT [AgentName] FROM [tblAgent] As Tmp
GROUP BY [AgentName],[AgentStreetAddress] HAVING Count(*)>1 And
[AgentStreetAddress] = [tblAgent].[AgentStreetAddress])))
ORDER BY tblAgent.AgentName, tblAgent.AgentStreetAddress;

Can anyone tell me how to modify this code so that instead of just
displaying records that have exact matches in both the AgentName and
AgentStreetAddress fields, the query will display all records that have
matching strings in the first four (left) characters of both those fields?

Thanks in advance,

Paul
 
Paul

Try it like:

SELECT tblAgent.AgentName, tblAgent.AgentStreetAddress, tblAgent.LicenseeID,
tblAgent.AgentID, tblAgent.AgentOwnerLastName,
tblAgent.AgentOwnerFirstName2, tblAgent.AgentCity,
tblAgent.AgentCloseDate,
tblAgent.AgentDenyDate, tblAgent.AgentWithdrawnDate
FROM tblAgent
WHERE LEFT(tblAgent.AgentName, 4)
In (SELECT LEFT(Tmp.[AgentName], 4)
FROM [tblAgent] As Tmp
GROUP BY LEFT(Tmp.[AgentName], 4) ,
LEFT(Tmp.[AgentStreetAddress],4)
HAVING Count(*)>1
AND LEFT(tmp.[AgentStreetAddress],4) =
LEFT([tblAgent].[AgentStreetAddress], 4))
ORDER BY tblAgent.AgentName, tblAgent.AgentStreetAddress;

This is untested, but should give you the idea.

Dale
 
It's displaying the exact duplicate records I was looking for, Dale. It
takes a while (20 minutes) because so much is going on in the SQL statement,
but that's ok. I'm only using this to eliminate duplicate records, and once
that's done, I won't need to keep running the query. So thank you for
putting it together for me.

One final (additional) request which I didn't think to include in my
original message:

I would like to add the condition that these records are only considered
duplicates if

1. tblAgent.LicenseeID is identical for the matching records AND
2. the three date fields are all Null: AgentCloseDate, AgentDenyDate and
AgentWithdrawnDate.

I think I know that #1 is a matter of adding GROUP BY tblAgent.LicenseeID
somewhere in the expression you wrote, and #2 will be accomplished by adding
WHERE tblAgent.AgentCloseDate Is Null AND tblAgent.AgentDenyDate Is Null AND
tblAgent.AgentWithdrawnDate Is Null. However, I don't know exactly where to
put them and where parenthesis should be added.

Is there any chance you could copy the code you wrote below and put these
additional conditions in the appropriate places?

Thanks again in advance,

Paul


Dale Fye said:
Paul

Try it like:

SELECT tblAgent.AgentName, tblAgent.AgentStreetAddress, tblAgent.LicenseeID,
tblAgent.AgentID, tblAgent.AgentOwnerLastName,
tblAgent.AgentOwnerFirstName2, tblAgent.AgentCity,
tblAgent.AgentCloseDate,
tblAgent.AgentDenyDate, tblAgent.AgentWithdrawnDate
FROM tblAgent
WHERE LEFT(tblAgent.AgentName, 4)
In (SELECT LEFT(Tmp.[AgentName], 4)
FROM [tblAgent] As Tmp
GROUP BY LEFT(Tmp.[AgentName], 4) ,
LEFT(Tmp.[AgentStreetAddress],4)
HAVING Count(*)>1
AND LEFT(tmp.[AgentStreetAddress],4) =
LEFT([tblAgent].[AgentStreetAddress], 4))
ORDER BY tblAgent.AgentName, tblAgent.AgentStreetAddress;

This is untested, but should give you the idea.

Dale

Paul James said:
I'm trying to build a query that displays all records that have matching
strings in the first four characters of two different fields: AgentName and
StreetAddress. That is, where the first four characters of AgentName are
equal in both records, AND the first four characters of StreetAddress are
equal in both records.

Since the Find Duplicates Query Wizard creates a query that only finds exact
matches, I ran the Wizard and tried replacing those exact matches in the SQL
code created by the Wizard. I tried enclosing AgentName and StreetAddress
with the
Left() function, using 4 as a parameter. However, Access could not run a
query on that modified code and produced the error message saying that the
criteria was not part of an aggregate function.

To produce the exact matches in those two fields, the query wizard inserts
this SQL code as criteria in the AgentName field (in the QBE screen):

In (SELECT [AgentName] FROM [tblAgent] As Tmp GROUP BY
[AgentName],[AgentStreetAddress] HAVING Count(*)>1 And [AgentStreetAddress]
= [tblAgent].[AgentStreetAddress])

The entire SQL code which includes the above criteria is:

SELECT tblAgent.AgentName, tblAgent.AgentStreetAddress, tblAgent.LicenseeID,
tblAgent.AgentID, tblAgent.AgentOwnerLastName,
tblAgent.AgentOwnerFirstName2, tblAgent.AgentCity, tblAgent.AgentCloseDate,
tblAgent.AgentDenyDate, tblAgent.AgentWithdrawnDate
FROM tblAgent
WHERE (((tblAgent.AgentName) In (SELECT [AgentName] FROM [tblAgent] As Tmp
GROUP BY [AgentName],[AgentStreetAddress] HAVING Count(*)>1 And
[AgentStreetAddress] = [tblAgent].[AgentStreetAddress])))
ORDER BY tblAgent.AgentName, tblAgent.AgentStreetAddress;

Can anyone tell me how to modify this code so that instead of just
displaying records that have exact matches in both the AgentName and
AgentStreetAddress fields, the query will display all records that have
matching strings in the first four (left) characters of both those fields?

Thanks in advance,

Paul
 
My apologies to the group for adding a second request for the same issue I
described in my previous message, but Dale wrote some great SQL code for me,
and I asked if he could show me where to insert an additional GROUP BY and
WHERE clause in the code he had already written. In reading it over, I'm
concerned that I didn't articulate my question properly, and so I will
restate my request here:

I'm trying to build a query that displays all records that have matching
strings in the first four characters of two different fields: AgentName and
StreetAddress. That is, where the first four characters of AgentName are
equal in both records, AND the first four characters of StreetAddress are
equal in both records.

Here's the code Dale wrote that produces exactly what I asked for:

SELECT tblAgent.AgentName, tblAgent.AgentStreetAddress, tblAgent.LicenseeID,
tblAgent.AgentID, tblAgent.AgentOwnerLastName,
tblAgent.AgentOwnerFirstName2, tblAgent.AgentCity,
tblAgent.AgentCloseDate,
tblAgent.AgentDenyDate, tblAgent.AgentWithdrawnDate
FROM tblAgent
WHERE LEFT(tblAgent.AgentName, 4)
In (SELECT LEFT(Tmp.[AgentName], 4)
FROM [tblAgent] As Tmp
GROUP BY LEFT(Tmp.[AgentName], 4) ,
LEFT(Tmp.[AgentStreetAddress],4)
HAVING Count(*)>1
AND LEFT(tmp.[AgentStreetAddress],4) =
LEFT([tblAgent].[AgentStreetAddress], 4))
ORDER BY tblAgent.AgentName, tblAgent.AgentStreetAddress;

After I looked at the results, I realized (duh) that I should have added the
following criteria to my original request:

1. In addition to matching records on the Left(,4) characters of AgentName
and StreetAddress, the matching records must have an exact match on
LicenseeID. That is, two different licensees will not have matching records
in any event.
2. The only records under consideration will be those in which the three
date fields AgentCloseDate, AgentDenyDate and
AgentWithdrawnDate are all Null.

I believe the first new criterion requires that I insert "GROUP BY
tblAgent.LicenseeID" somewhere in Dale's expression above, and the second
requires "WHERE tblAgent.AgentCloseDate Is Null AND tblAgent.AgentDenyDate
Is Null AND
tblAgent.AgentWithdrawnDate Is Null." However, I don't know where to put
these clauses in the SQL code, nor can I handle the required parenthesis.

Can anyone help with this? Dale, Gerald - are either of you out there?

Thanks to everyone for considering this.

Paul
 
Back
Top