Identifying Partial Duplicates

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Is there anyway to identify partial duplicates in a table??

I upload data into a database on a monthly basis. Recently, the data coming to me is not "clean". For example, full serial numbers coming in may be truncated from the beginning or the end (noone knows why this is happening). Is there anyway that I can check with existing data in the table, row by row if necessary, to be sure that there are no duplicates and if there are duplicates, reveal them all in a query??

thanks for your help.
 
Lynn Arlington said:
Is there anyway to identify partial duplicates in a table??

I upload data into a database on a monthly basis. Recently, the data
coming to me is not "clean". For example, full serial numbers coming
in may be truncated from the beginning or the end (noone knows why
this is happening). Is there anyway that I can check with existing
data in the table, row by row if necessary, to be sure that there are
no duplicates and if there are duplicates, reveal them all in a
query??

thanks for your help.

If I understand your question, you could look for partial matches with a
query that performs an unequal join, along these lines:

SELECT tblImported.SerNo, tblMaster.SerNo, tblMaster.Description
FROM tblImported INNER JOIN tblMaster
ON tblMaster.SerNo Like '*' & tblImported.SerNo & '*'
ORDER By tblImported.SerNo;
 
Thanks Dirk, you got me thinking about other ways to do it.

SELECT tblGMSImport.SerialNo, GMS_Invoicing_tbl.EQMFSN
FROM GMS_Invoicing_tbl RIGHT JOIN tblGMSImport ON GMS_Invoicing_tbl.EQMFSN = tblGMSImport.SerialNo
WHERE (((tblGMSImport.SerialNo) Like "*" & [EQMFSN] & "*"))
WITH OWNERACCESS OPTION;

It's always the syntax that gets me, this is what I ended up using and it works great for pulling out exact matches

What if the serial number is supposed to be 00012345dk2

and all that is coming from the source is 5dk2

do you think there is anyway to make a match based on a partial field??

I don't believe it's possible because there is no consistency on how the data is being dropped ie. it's not always the first 5 characters and the last 3 characters. Sometimes it's only the preceding zero or sometimes its the last 4 characters, it's really bizarre and the only reason I know it's happening is when I run the cross-tab query at the end of the month, I am seeing extra rows appear values appearing on their own row when they should be summed in with other data and I am just "assuming" that these are partial serial numbers that are for that month because they have values for all other months but the one with the straggler row.

Anyways, let me know what you think...

Thanks Lynn
 
Lynn Arlington said:
Thanks Dirk, you got me thinking about other ways to do it.

SELECT tblGMSImport.SerialNo, GMS_Invoicing_tbl.EQMFSN
FROM GMS_Invoicing_tbl RIGHT JOIN tblGMSImport ON
GMS_Invoicing_tbl.EQMFSN = tblGMSImport.SerialNo
WHERE (((tblGMSImport.SerialNo) Like "*" & [EQMFSN] & "*"))
WITH OWNERACCESS OPTION;

It's always the syntax that gets me, this is what I ended up using
and it works great for pulling out exact matches

What if the serial number is supposed to be 00012345dk2

and all that is coming from the source is 5dk2

do you think there is anyway to make a match based on a partial
field??

The SQL I posted, properly translated for your table and field names,
should do that. Based on what I see in the SQL you posted above, try
this:

SELECT tblGMSImport.SerialNo, GMS_Invoicing_tbl.EQMFSN
FROM GMS_Invoicing_tbl INNER JOIN tblGMSImport
ON GMS_Invoicing_tbl.EQMFSN
Like "*" & tblGMSImport.SerialNo & "*"
WITH OWNERACCESS OPTION;
I don't believe it's possible because there is no consistency on how
the data is being dropped

I think you're wrong. Try this new SQL statement and tell me what you
get.
 
Lynn Arlington said:
Thank you so much, this is what I was trying to do. You are amazing.
I am new to queries in SQL, I like the other view better but I would
like to learn if there is more flexibility.

I agree that the regular "query by example" design view is much easier
to work with, but there are things you can do in SQL view that can't be
represented in the QBE design view.
Now, what do I put in to get it to show only the ones that are not
equal to each other [SerialNo]<>[EQMFSN]

Try this:

SELECT tblGMSImport.SerialNo, GMS_Invoicing_tbl.EQMFSN
FROM GMS_Invoicing_tbl INNER JOIN tblGMSImport
ON GMS_Invoicing_tbl.EQMFSN
Like "*" & tblGMSImport.SerialNo & "*"
WHERE GMS_Invoicing_tbl.EQMFSN <> tblGMSImport.SerialNo
WITH OWNERACCESS OPTION;
 
Thank you so much, it's working the way I was hoping.

Dirk Goldgar said:
Lynn Arlington said:
Thank you so much, this is what I was trying to do. You are amazing.
I am new to queries in SQL, I like the other view better but I would
like to learn if there is more flexibility.

I agree that the regular "query by example" design view is much easier
to work with, but there are things you can do in SQL view that can't be
represented in the QBE design view.
Now, what do I put in to get it to show only the ones that are not
equal to each other [SerialNo]<>[EQMFSN]

Try this:

SELECT tblGMSImport.SerialNo, GMS_Invoicing_tbl.EQMFSN
FROM GMS_Invoicing_tbl INNER JOIN tblGMSImport
ON GMS_Invoicing_tbl.EQMFSN
Like "*" & tblGMSImport.SerialNo & "*"
WHERE GMS_Invoicing_tbl.EQMFSN <> tblGMSImport.SerialNo
WITH OWNERACCESS OPTION;

--
Dirk Goldgar, MS Access MVP
www.datagnostics.com

(please reply to the newsgroup)
 
Back
Top