finding dupe addresses with 'near simliar' postcodes...

  • Thread starter Thread starter efandango
  • Start date Start date
E

efandango

I want to find address's with near similar postcodes; for example in the
table below, Albany Road would not be classed as near similar: whereas the
others would be because at least two of the first 2 characters are the same.

How can I do this with a query:


StreetName Postcode StreetNameID
Albany Road TW8 29853
Albany Road SE17 53290

Albemarle Street W1S 29858
Albemarle Street W1 29859

Albemarle Way EC1V 29861
Albemarle Way EC1 29862

Albert Embankment SE1 29873
Albert Embankment SE11 29874


this is my existing SQL:

SELECT tbl_Street_Names.StreetName, tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE (((tbl_Street_Names.StreetName) In (SELECT [StreetName] FROM
[tbl_Street_Names] As Tmp GROUP BY [StreetName] HAVING Count(*)>1 )))
ORDER BY tbl_Street_Names.StreetName;
 
How about just modifying the subquery portion to include the left two
characters of the PostCode in the GroupBy clause:

SELECT tbl_Street_Names.StreetName,
tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE tbl_Street_Names.StreetName
IN (SELECT [StreetName] FROM [tbl_Street_Names] As Tmp
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1)
ORDER BY tbl_Street_Names.StreetName;

Or, you might want to try the following, which I think will be faster:

SELECT T1.StreetName,
T1.StreetNameID,
T1.Postcode
FROM tbl_Street_Names
INNER JOIN
(SELECT [StreetName], Left(PostCode,2) as SimPost
FROM [tbl_Street_Names]
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1) as Temp
ON T1.StreetName = Temp.StreetName
AND LEFT(T1.PostCode, 2) = Temp.SimPost

You obviously will not be able to edit this in the query design window,
since that view does not allow joins like the one defined above.

--
HTH
Dale

Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.
 
the first query threw up these results:

StreetName Postcode
Abbey Road EN1
Abbey Road E15
Abbey Road NW10
Abbey Road NW6
Abbey Road NW8
Abbey Road SE2
Abbey Road TW17
Acacia Avenue HA9
Acacia Avenue TW17
Acacia Avenue HA4
Acacia Avenue TW19
Acacia Road E11
Acacia Road E17
Acacia Road EN2
Acacia Road NW8
Acacia Road SW16
Acacia Road TW12
Acacia Road W3

StreetName Postcode
Adams Row W1K
Adams Row W1


Air Street W1B
Air Street W1J
Air Street W1


Out of them, Air Street, and Adams Row are the type of results I am looking
for. eg: where a streetname is duplicated but its postcode begins with the
same first two characters, but has an extra (extaraneous) character; in the
case of Air Street, that would be the B and J. and in the case of Adams Row
it wold be the K.


Your 2nd SQl threw a JOIN error.

Dale Fye said:
How about just modifying the subquery portion to include the left two
characters of the PostCode in the GroupBy clause:

SELECT tbl_Street_Names.StreetName,
tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE tbl_Street_Names.StreetName
IN (SELECT [StreetName] FROM [tbl_Street_Names] As Tmp
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1)
ORDER BY tbl_Street_Names.StreetName;

Or, you might want to try the following, which I think will be faster:

SELECT T1.StreetName,
T1.StreetNameID,
T1.Postcode
FROM tbl_Street_Names
INNER JOIN
(SELECT [StreetName], Left(PostCode,2) as SimPost
FROM [tbl_Street_Names]
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1) as Temp
ON T1.StreetName = Temp.StreetName
AND LEFT(T1.PostCode, 2) = Temp.SimPost

You obviously will not be able to edit this in the query design window,
since that view does not allow joins like the one defined above.

--
HTH
Dale

Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.



efandango said:
I want to find address's with near similar postcodes; for example in the
table below, Albany Road would not be classed as near similar: whereas the
others would be because at least two of the first 2 characters are the same.

How can I do this with a query:


StreetName Postcode StreetNameID
Albany Road TW8 29853
Albany Road SE17 53290

Albemarle Street W1S 29858
Albemarle Street W1 29859

Albemarle Way EC1V 29861
Albemarle Way EC1 29862

Albert Embankment SE1 29873
Albert Embankment SE11 29874


this is my existing SQL:

SELECT tbl_Street_Names.StreetName, tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE (((tbl_Street_Names.StreetName) In (SELECT [StreetName] FROM
[tbl_Street_Names] As Tmp GROUP BY [StreetName] HAVING Count(*)>1 )))
ORDER BY tbl_Street_Names.StreetName;
 
Try this --
SELECT tbl_Street_Names.StreetName, tbl_Street_Names.Postcode,
tbl_Street_Names.StreetNameID
FROM tbl_Street_Names, tbl_Street_Names AS tbl_Street_Names_1
WHERE (((tbl_Street_Names.StreetName)=[tbl_Street_Names_1].[StreetName]) AND
((Left([tbl_Street_Names].[Postcode],2))=Left([tbl_Street_Names_1].[Postcode],2)) AND ((tbl_Street_Names.Postcode)<>[tbl_Street_Names_1].[Postcode]));

--
KARL DEWEY
Build a little - Test a little


efandango said:
the first query threw up these results:

StreetName Postcode
Abbey Road EN1
Abbey Road E15
Abbey Road NW10
Abbey Road NW6
Abbey Road NW8
Abbey Road SE2
Abbey Road TW17
Acacia Avenue HA9
Acacia Avenue TW17
Acacia Avenue HA4
Acacia Avenue TW19
Acacia Road E11
Acacia Road E17
Acacia Road EN2
Acacia Road NW8
Acacia Road SW16
Acacia Road TW12
Acacia Road W3

StreetName Postcode
Adams Row W1K
Adams Row W1


Air Street W1B
Air Street W1J
Air Street W1


Out of them, Air Street, and Adams Row are the type of results I am looking
for. eg: where a streetname is duplicated but its postcode begins with the
same first two characters, but has an extra (extaraneous) character; in the
case of Air Street, that would be the B and J. and in the case of Adams Row
it wold be the K.


Your 2nd SQl threw a JOIN error.

Dale Fye said:
How about just modifying the subquery portion to include the left two
characters of the PostCode in the GroupBy clause:

SELECT tbl_Street_Names.StreetName,
tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE tbl_Street_Names.StreetName
IN (SELECT [StreetName] FROM [tbl_Street_Names] As Tmp
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1)
ORDER BY tbl_Street_Names.StreetName;

Or, you might want to try the following, which I think will be faster:

SELECT T1.StreetName,
T1.StreetNameID,
T1.Postcode
FROM tbl_Street_Names
INNER JOIN
(SELECT [StreetName], Left(PostCode,2) as SimPost
FROM [tbl_Street_Names]
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1) as Temp
ON T1.StreetName = Temp.StreetName
AND LEFT(T1.PostCode, 2) = Temp.SimPost

You obviously will not be able to edit this in the query design window,
since that view does not allow joins like the one defined above.

--
HTH
Dale

Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.



efandango said:
I want to find address's with near similar postcodes; for example in the
table below, Albany Road would not be classed as near similar: whereas the
others would be because at least two of the first 2 characters are the same.

How can I do this with a query:


StreetName Postcode StreetNameID
Albany Road TW8 29853
Albany Road SE17 53290

Albemarle Street W1S 29858
Albemarle Street W1 29859

Albemarle Way EC1V 29861
Albemarle Way EC1 29862

Albert Embankment SE1 29873
Albert Embankment SE11 29874


this is my existing SQL:

SELECT tbl_Street_Names.StreetName, tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE (((tbl_Street_Names.StreetName) In (SELECT [StreetName] FROM
[tbl_Street_Names] As Tmp GROUP BY [StreetName] HAVING Count(*)>1 )))
ORDER BY tbl_Street_Names.StreetName;
 
I can see how that happened. The subquery only included the streetname,
left 2 characters of the postcode. I also realized that I left out the
alias (T1) in the FROM clause of the second query. I pasted some of your
streetnames and postcodes into a table, and ran the following, and it
returned what I think you are looking for. Try:

SELECT T1.StreetName,
T1.Postcode
FROM tbl_Street_Names T1
INNER JOIN
(SELECT [StreetName], Left(PostCode,2) as SimPost
FROM [tbl_Street_Names]
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1) as Temp
ON T1.StreetName = Temp.StreetName
AND LEFT(T1.PostCode, 2) = Temp.SimPost

I would try both this one and Karls, and test the time it takes for each of
them. Then use the one that runs quicker.

Dale

efandango said:
the first query threw up these results:

StreetName Postcode
Abbey Road EN1
Abbey Road E15
Abbey Road NW10
Abbey Road NW6
Abbey Road NW8
Abbey Road SE2
Abbey Road TW17
Acacia Avenue HA9
Acacia Avenue TW17
Acacia Avenue HA4
Acacia Avenue TW19
Acacia Road E11
Acacia Road E17
Acacia Road EN2
Acacia Road NW8
Acacia Road SW16
Acacia Road TW12
Acacia Road W3

StreetName Postcode
Adams Row W1K
Adams Row W1


Air Street W1B
Air Street W1J
Air Street W1


Out of them, Air Street, and Adams Row are the type of results I am
looking
for. eg: where a streetname is duplicated but its postcode begins with the
same first two characters, but has an extra (extaraneous) character; in
the
case of Air Street, that would be the B and J. and in the case of Adams
Row
it wold be the K.


Your 2nd SQl threw a JOIN error.

Dale Fye said:
How about just modifying the subquery portion to include the left two
characters of the PostCode in the GroupBy clause:

SELECT tbl_Street_Names.StreetName,
tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE tbl_Street_Names.StreetName
IN (SELECT [StreetName] FROM [tbl_Street_Names] As Tmp
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1)
ORDER BY tbl_Street_Names.StreetName;

Or, you might want to try the following, which I think will be faster:

SELECT T1.StreetName,
T1.StreetNameID,
T1.Postcode
FROM tbl_Street_Names
INNER JOIN
(SELECT [StreetName], Left(PostCode,2) as SimPost
FROM [tbl_Street_Names]
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1) as Temp
ON T1.StreetName = Temp.StreetName
AND LEFT(T1.PostCode, 2) = Temp.SimPost

You obviously will not be able to edit this in the query design window,
since that view does not allow joins like the one defined above.

--
HTH
Dale

Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.



efandango said:
I want to find address's with near similar postcodes; for example in
the
table below, Albany Road would not be classed as near similar: whereas
the
others would be because at least two of the first 2 characters are the
same.

How can I do this with a query:


StreetName Postcode StreetNameID
Albany Road TW8 29853
Albany Road SE17 53290

Albemarle Street W1S 29858
Albemarle Street W1 29859

Albemarle Way EC1V 29861
Albemarle Way EC1 29862

Albert Embankment SE1 29873
Albert Embankment SE11 29874


this is my existing SQL:

SELECT tbl_Street_Names.StreetName, tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE (((tbl_Street_Names.StreetName) In (SELECT [StreetName] FROM
[tbl_Street_Names] As Tmp GROUP BY [StreetName] HAVING Count(*)>1 )))
ORDER BY tbl_Street_Names.StreetName;
 
Well, the good news is they both work, and run exact same times 26,000
records = 2secs...

though Karl's version includes the actual Street_ID in the results, i found
that i could tweak your version by playing with the Left(PostCode,#) number
which gives me closer matches to what i am after bearing in mind postcodes
being what they are, and worse... British one's being what they are, I am
having to tweak between the first two significant characters and the first
three. So thanks muchly to both you guys for helping out.

regards

Eric



Dale Fye said:
I can see how that happened. The subquery only included the streetname,
left 2 characters of the postcode. I also realized that I left out the
alias (T1) in the FROM clause of the second query. I pasted some of your
streetnames and postcodes into a table, and ran the following, and it
returned what I think you are looking for. Try:

SELECT T1.StreetName,
T1.Postcode
FROM tbl_Street_Names T1
INNER JOIN
(SELECT [StreetName], Left(PostCode,2) as SimPost
FROM [tbl_Street_Names]
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1) as Temp
ON T1.StreetName = Temp.StreetName
AND LEFT(T1.PostCode, 2) = Temp.SimPost

I would try both this one and Karls, and test the time it takes for each of
them. Then use the one that runs quicker.

Dale

efandango said:
the first query threw up these results:

StreetName Postcode
Abbey Road EN1
Abbey Road E15
Abbey Road NW10
Abbey Road NW6
Abbey Road NW8
Abbey Road SE2
Abbey Road TW17
Acacia Avenue HA9
Acacia Avenue TW17
Acacia Avenue HA4
Acacia Avenue TW19
Acacia Road E11
Acacia Road E17
Acacia Road EN2
Acacia Road NW8
Acacia Road SW16
Acacia Road TW12
Acacia Road W3

StreetName Postcode
Adams Row W1K
Adams Row W1


Air Street W1B
Air Street W1J
Air Street W1


Out of them, Air Street, and Adams Row are the type of results I am
looking
for. eg: where a streetname is duplicated but its postcode begins with the
same first two characters, but has an extra (extaraneous) character; in
the
case of Air Street, that would be the B and J. and in the case of Adams
Row
it wold be the K.


Your 2nd SQl threw a JOIN error.

Dale Fye said:
How about just modifying the subquery portion to include the left two
characters of the PostCode in the GroupBy clause:

SELECT tbl_Street_Names.StreetName,
tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE tbl_Street_Names.StreetName
IN (SELECT [StreetName] FROM [tbl_Street_Names] As Tmp
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1)
ORDER BY tbl_Street_Names.StreetName;

Or, you might want to try the following, which I think will be faster:

SELECT T1.StreetName,
T1.StreetNameID,
T1.Postcode
FROM tbl_Street_Names
INNER JOIN
(SELECT [StreetName], Left(PostCode,2) as SimPost
FROM [tbl_Street_Names]
GROUP BY [StreetName], Left(PostCode,2)
HAVING Count(*)>1) as Temp
ON T1.StreetName = Temp.StreetName
AND LEFT(T1.PostCode, 2) = Temp.SimPost

You obviously will not be able to edit this in the query design window,
since that view does not allow joins like the one defined above.

--
HTH
Dale

Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.



:

I want to find address's with near similar postcodes; for example in
the
table below, Albany Road would not be classed as near similar: whereas
the
others would be because at least two of the first 2 characters are the
same.

How can I do this with a query:


StreetName Postcode StreetNameID
Albany Road TW8 29853
Albany Road SE17 53290

Albemarle Street W1S 29858
Albemarle Street W1 29859

Albemarle Way EC1V 29861
Albemarle Way EC1 29862

Albert Embankment SE1 29873
Albert Embankment SE11 29874


this is my existing SQL:

SELECT tbl_Street_Names.StreetName, tbl_Street_Names.StreetNameID,
tbl_Street_Names.Postcode
FROM tbl_Street_Names
WHERE (((tbl_Street_Names.StreetName) In (SELECT [StreetName] FROM
[tbl_Street_Names] As Tmp GROUP BY [StreetName] HAVING Count(*)>1 )))
ORDER BY tbl_Street_Names.StreetName;
 
Back
Top