simple (?) SQL query question

  • Thread starter Thread starter gerry
  • Start date Start date
G

gerry

I have 2 SQL tables :

Data :
DataID ( key ) Lookup1 Lookup2
-------------- ------- -------
1 1 2
2 1 3
3 3 4

Lookup:
LookupId ( key ) LookupKey DataID
---------------- --------- ------
1 1 1
2 1 2

Here are 2 select statements and their respective output :

select
[Data].[DataID] ,
[Data].[Lookup1] ,
[Data].[Lookup2]
from [Data]
left join [Lookup]
on [Data].[Lookup1] = [Lookup].[DataID]
or [Data].[Lookup2] = [Lookup].[DataID]
where [Lookup].[LookupKey] = 1
order by [Data].[DataID]

DataID LookUp1 Lookup2
------ ------- -------
1 1 2
1 1 2
2 1 3



select
[Data].[DataID] ,
[Data].[Lookup1] ,
[Data].[Lookup2]
from [Data]
where
[Data].[Lookup1] in (select [DataID] from [Lookup] where [LookupKey] = 1 )
or
[Data].[Lookup2] in (select [DataID] from [Lookup] where [LookupKey] = 1 )
order by [Data].[DataID]

DataID LookUp1 Lookup2
------ ------- -------
1 1 2
2 1 3


The important difference between the 2 results being the uniqueness of the returned [Data] rows regardless of whether they matched [Lookup] on one or both of [Lookup1] / [Lookup2].

Maybe I am just having a brain fart, but it seems that I should be able to get the second results by tweaking the 1st select without requiring the 2 subqueries in the 2nd select.

Any suggestions would be appreciated.

Gerry
 
Good morning Gerry, thanks for using Microsoft Newsgroup Support Service!
My name is Ji Zhou [MSFT] and I will be working on this issue with you.

In order NOT to get the duplicated rows in the linked table, we need to use
semi-join instead of inner-join. There are two ways to use semi-join. One
is using the EXISTS keyword and the other is the DISTINCT keyword. I post
them as follows and will give a detailed explanation later:

****** POSSIBLE SOLUTIONS ******

SOLUTION1: Use exists:

select
[Data].[DataID] ,
[Data].[Lookup1] ,
[Data].[Lookup2]
from [Data]
where exists
(select * from [Lookup]
where ([Data].[Lookup1]=[Lookup].[DataID]) and
([Lookup].[LookupKey] =1 )
or
([Data].[Lookup2]=[Lookup].[DataID]) and
([Lookup].[LookupKey] =1 )
)
order by [Data].[DataID]

SOLUTION2: Use distinct:

select
distinct [Data].[DataID] ,
[Data].[Lookup1] ,
[Data].[Lookup2]
from [Data]
left join [Lookup]
on [Data].[Lookup1] = [Lookup].[DataID]
or [Data].[Lookup2] = [Lookup].[DataID]
where [Lookup].[LookupKey] = 1
order by [Data].[DataID]

****** CAUSES ****************

1.Why does the original SELECT returns duplicated rows?

In fact, those duplicated rows are different rows. We "SELECT"ed the first
three columns in the joined table and the data in those columns are exactly
same. If we SELECT four columns [Data].[DataID], [Data].[Lookup1],
[Data].[Lookup2], [Lookup].[LookupID], the following result will return:
1 1 2 1
1 1 2 2
2 1 3 1
,where you will see
1 1 2 1 and 1 1 2 2 are different rows

2.Why do the above two approaches work fine?

Both the "EXISTS" and "DISTINCT" keywords will use a semi-join implicitly.
A "semi-join" between two tables returns rows from the first table where
one or more matches are found in the second table. The difference between a
semi-join and a conventional join is that rows in the first table will be
returned at most once. Even if the second table contains two matches for a
row in the first table, only one copy of the row will be returned.

3.What about the performance of these three SQL statements?

To explorer the performance of these SQL statements, we can use the
Execution Plan command in SQL Server Management Studio. We can access it
from the Menu Query->Display Estimated Execution Plan. It will generate a
picture to show us how SQL executes the statement. With the generated
picture, we can see the two sub queries in

select
[Data].[DataID] ,
[Data].[Lookup1] ,
[Data].[Lookup2]
from [Data]
where
[Data].[Lookup1] in (select [DataID] from [Lookup] where [LookupKey]
= 1 )
or
[Data].[Lookup2] in (select [DataID] from [Lookup] where [LookupKey]
= 1 )
order by [Data].[DataID]
DataID LookUp1 Lookup2

makes extra cost when compared with the "EXISTS" and the "DISTINCT"
approaches. Using "EXISTS" or "DISTINCT" gets a better performance.

Gerry, please try my suggestions and tell me whether they are helpful to
you. If you have any future concerns or questions, please DON'T hesitate to
let me know. I will do my best to help.

Best regards,
Ji Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa948868.aspx#notifications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Thanks

I knew why I was getting duplicate rows - I was just stumped as to how to
avoid them.
DISTINCT did the trick.




"Ji Zhou [MSFT]" said:
Good morning Gerry, thanks for using Microsoft Newsgroup Support Service!
My name is Ji Zhou [MSFT] and I will be working on this issue with you.

In order NOT to get the duplicated rows in the linked table, we need to
use
semi-join instead of inner-join. There are two ways to use semi-join. One
is using the EXISTS keyword and the other is the DISTINCT keyword. I post
them as follows and will give a detailed explanation later:

****** POSSIBLE SOLUTIONS ******

SOLUTION1: Use exists:

select
[Data].[DataID] ,
[Data].[Lookup1] ,
[Data].[Lookup2]
from [Data]
where exists
(select * from [Lookup]
where ([Data].[Lookup1]=[Lookup].[DataID]) and
([Lookup].[LookupKey] =1 )
or
([Data].[Lookup2]=[Lookup].[DataID]) and
([Lookup].[LookupKey] =1 )
)
order by [Data].[DataID]

SOLUTION2: Use distinct:

select
distinct [Data].[DataID] ,
[Data].[Lookup1] ,
[Data].[Lookup2]
from [Data]
left join [Lookup]
on [Data].[Lookup1] = [Lookup].[DataID]
or [Data].[Lookup2] = [Lookup].[DataID]
where [Lookup].[LookupKey] = 1
order by [Data].[DataID]

****** CAUSES ****************

1.Why does the original SELECT returns duplicated rows?

In fact, those duplicated rows are different rows. We "SELECT"ed the first
three columns in the joined table and the data in those columns are
exactly
same. If we SELECT four columns [Data].[DataID], [Data].[Lookup1],
[Data].[Lookup2], [Lookup].[LookupID], the following result will return:
1 1 2 1
1 1 2 2
2 1 3 1
,where you will see
1 1 2 1 and 1 1 2 2 are different rows

2.Why do the above two approaches work fine?

Both the "EXISTS" and "DISTINCT" keywords will use a semi-join implicitly.
A "semi-join" between two tables returns rows from the first table where
one or more matches are found in the second table. The difference between
a
semi-join and a conventional join is that rows in the first table will be
returned at most once. Even if the second table contains two matches for a
row in the first table, only one copy of the row will be returned.

3.What about the performance of these three SQL statements?

To explorer the performance of these SQL statements, we can use the
Execution Plan command in SQL Server Management Studio. We can access it
from the Menu Query->Display Estimated Execution Plan. It will generate a
picture to show us how SQL executes the statement. With the generated
picture, we can see the two sub queries in

select
[Data].[DataID] ,
[Data].[Lookup1] ,
[Data].[Lookup2]
from [Data]
where
[Data].[Lookup1] in (select [DataID] from [Lookup] where [LookupKey]
= 1 )
or
[Data].[Lookup2] in (select [DataID] from [Lookup] where [LookupKey]
= 1 )
order by [Data].[DataID]
DataID LookUp1 Lookup2

makes extra cost when compared with the "EXISTS" and the "DISTINCT"
approaches. Using "EXISTS" or "DISTINCT" gets a better performance.

Gerry, please try my suggestions and tell me whether they are helpful to
you. If you have any future concerns or questions, please DON'T hesitate
to
let me know. I will do my best to help.

Best regards,
Ji Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa948868.aspx#notifications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
==================================================
This posting is provided "AS IS" with no warranties, and confers no
rights.
 
Back
Top