SQL performance: Nested SELECT vs. INNER JOIN

  • Thread starter Thread starter Brian
  • Start date Start date
B

Brian

Hello All -

I am wondering if anyone has any thoughts on which is better from a
performance perspective: a nested Select statement or an Inner Join.


For example, I could do either of the following:

SELECT supplier_name
FROM supplier
WHERE supplier_ID IN (SELECT supplier_ID FROM products WHERE
product_ID =22);

VS.

SELECT supplier.supplier_name
FROM supplier INNER JOIN products ON supplier.supplier_ID =
products.supplier_ID
WHERE products.product_ID = 22;


Which is faster?

Thanks.
Brian
 
On 25 Sep 2003 11:37:05 -0700 in comp.databases.ms-access,
Hello All -

I am wondering if anyone has any thoughts on which is better from a
performance perspective: a nested Select statement or an Inner Join.


For example, I could do either of the following:

(air code below, be sure to post your findings :)

Dim strSQL(1 to 2) As String
Dim i as long, j as long
Dim varStart As variant
Dim rs As Recordset
Dim db As Database

strSQL(1)="SELECT supplier_name " & _
"FROM supplier " & _
"WHERE supplier_ID IN (SELECT supplier_ID FROM products WHERE " & _
"product_ID =22);"

strSQL(2)="SELECT supplier.supplier_name " & _
"FROM supplier INNER JOIN products ON supplier.supplier_ID = " & _
"products.supplier_ID " & _
"WHERE products.product_ID = 22; "

set db = currentdb
For i = 1 to 2
varStart=now()
for j=1 to 1000
set rs=db.openrecordset(strSQL(i),dbopensnapshot)
rs.close
set rs=nothing
next j
debug.print "Method " & i & " " & DateDiff("s",varStart,Now())
Next i
set db=nothing
 
Brian said:
Hello All -

I am wondering if anyone has any thoughts on which is better from a
performance perspective: a nested Select statement or an Inner Join.


For example, I could do either of the following:

SELECT supplier_name
FROM supplier
WHERE supplier_ID IN (SELECT supplier_ID FROM products WHERE
product_ID =22);

VS.

SELECT supplier.supplier_name
FROM supplier INNER JOIN products ON supplier.supplier_ID =
products.supplier_ID
WHERE products.product_ID = 22;


Which is faster?


Try them and see! A clever optimizer should produce identical query plans.
However, a mistake that I've seen very often is where the subquery generates
multiple rows for each supplier, (for example, if you wanted to see all
suppliers that sold a certain product type). In this case query 1 is
superior, because the query processor can stop searching for rows as soon as
it finds the first one. <rant>Some beginners would use query 2 and put a
DISTINCT clause in to eliminate duplicates. Obviously, this is a bad, bad
thing to do, yet I've seen it done many times. Not only by beginners, but by
supposedly experienced developers.</rant>

BTW, I would, in any case, use EXISTS instead of IN, so query 1 would
become:

SELECT s.supplier_name
FROM supplier AS s
WHERE EXISTS
(
SELECT * FROM products AS p
WHERE p.product_ID =22
AND p.Supplier_ID = s.Supplier_ID
)

but, as I said, query 2 would be just as good.
 
Just to add to what I was syaing earlier, these two examples both use the
Northwind database to find suppliers that sell seafood products.

The right way
-------------------------------------
select s.CompanyName
from Suppliers as s
where exists
(
select * from products p
inner join categories c on p.CategoryID = c.CategoryID
where c.CategoryName = "Seafood"
and p.SupplierID=s.SupplierID
)
----------------------------------------


The wrong way
------------------------------------------------
SELECT DISTINCT Suppliers.CompanyName
FROM Suppliers INNER JOIN (Categories INNER JOIN Products ON
Categories.CategoryID = Products.CategoryID) ON Suppliers.SupplierID =
Products.SupplierID
WHERE (((Categories.CategoryName)="Seafood"))
------------------------------------------------
 
John,

Just to add to what I was syaing earlier, these two examples both use the
Northwind database to find suppliers that sell seafood products.

The right way
-------------------------------------
select s.CompanyName
from Suppliers as s
where exists
(
select * from products p
inner join categories c on p.CategoryID = c.CategoryID
where c.CategoryName = "Seafood"
and p.SupplierID=s.SupplierID
)
----------------------------------------


The wrong way
------------------------------------------------
SELECT DISTINCT Suppliers.CompanyName
FROM Suppliers INNER JOIN (Categories INNER JOIN Products ON
Categories.CategoryID = Products.CategoryID) ON Suppliers.SupplierID =
Products.SupplierID
WHERE (((Categories.CategoryName)="Seafood"))
------------------------------------------------

I find it very interesting that you would say this.

I thought it was well understood and accepted that Jet is very good at
optimizing multi-table sql statements that utilize joins, and very
poor at handling subqueries. I would kindly suggest that you test
your two example queries, because I would hazard a guess that you'll
find that the performance is the exact opposite of what you expect.

Of course, with rdbms' more generally, subqueries are fine and often
preferable in situations like this. But its not that way with Jet.

Peter Miller
____________________________________________________________
PK Solutions -- Data Recovery for Microsoft Access/Jet/SQL
Free quotes, Guaranteed lowest prices and best results
www.pksolutions.com 1.800.987.7716 1.619.839.3900
 
Peter Miller said:
John,



I find it very interesting that you would say this.

I thought it was well understood and accepted that Jet is very good at
optimizing multi-table sql statements that utilize joins, and very
poor at handling subqueries. I would kindly suggest that you test
your two example queries, because I would hazard a guess that you'll
find that the performance is the exact opposite of what you expect.

Of course, with rdbms' more generally, subqueries are fine and often
preferable in situations like this. But its not that way with Jet.


Right now I can only test in SQL Server - and it's as I said; query 1 has a
lower cost than query 2. I'd be very surprised if Jet was different. The
reason is because of how a select works, (or how it shold work). Joe Celko
has posted this several times - I'll see if I can find it if you like.

Basically, when you use SELECT DISTINCT...WHERE, (or a GROUP BY clause), the
query engine first builds a working table with all the rows satisfying the
WHERE condition. Only then can it aggregate them. This is more expensive. In
the first example, (using EXISTS), the processor only needs to test for the
existence of a single row. In other words, if you have 5000 seafood products
from the same supplier, all it needs to do is find one of them and the
EXISTS condition is satisfied. The processor can move on to the next
supplier.
 
Right now I can only test in SQL Server - and it's as I said; query 1 has a
lower cost than query 2. I'd be very surprised if Jet was different. The
reason is because of how a select works, (or how it shold work). Joe Celko
has posted this several times - I'll see if I can find it if you like.

Hey, Joe's the man, but your statement doesn't follow...

To wit:
I'd be very surprised if Jet was different.
Understood.

The reason is because of how a select works, (or how it shold work).

But that's precisely the point I was making. Jet is fine at joins,
and weak at subqueries. It's got nothing to do with what 'should' be
the case. It's been clear through the history of Jet that it is weak
in this area.
Basically, when you use SELECT DISTINCT...WHERE, (or a GROUP BY clause), the
query engine first builds a working table with all the rows satisfying the
WHERE condition. Only then can it aggregate them. This is more expensive. In
the first example, (using EXISTS), the processor only needs to test for the
existence of a single row. In other words, if you have 5000 seafood products
from the same supplier, all it needs to do is find one of them and the
EXISTS condition is satisfied. The processor can move on to the next
supplier.

....yes, all well understood. But What you'll find with jet is that
EXISTS clauses don't work well, because they rely on subqueries, and
subqueries are poorly handled by Jet.

For example, take your 'correct' example, and break it down again by
using another exists statement (ie, no joins at all, just a query with
a subquery and a nested subquery beneath the subquery.

It is my understanding that Jet actually works the full subquery (or
subqueries in this case) up from the bottom, and essentially does the
join implicitly only once the subquery has been processed (to the
extent possible). At the same time, DISTINCT is nicely optimized to
an effective TOP 1 / GROUP BY combo (ie, no processing occurs beyond
finding the first matching instance at each level). So you see little
penalty in Jet by using DISTINCT and a large penalty for using the
subquery.

As I'm sure Celko would agree, despite the obvious benefits of generic
sql that's 100% standards compliant, certain sql implementations will
have certain nuances that simply can't be ignored in real-world
database applications. Jet's poor handling of subqueries is a case in
point.

Peter Miller
____________________________________________________________
PK Solutions -- Data Recovery for Microsoft Access/Jet/SQL
Free quotes, Guaranteed lowest prices and best results
www.pksolutions.com 1.800.987.7716 1.619.839.3900
 
John,

Thank you VERY much. I was browsing some of the posts and
in reading your response I found the answer to a problem
query I have been working on. I am a novice Access/SQL
guy and your "Exists" statement in the WHERE clause is
EXACTLY what I needed.

*** didn't mean to hijack this post, but thanks ***

-dc
 
With all due respect to Joe Celko, he has never had
much time for JET...

(david)
 
Peter Miller said:
...yes, all well understood. But What you'll find with jet is that
EXISTS clauses don't work well, because they rely on subqueries, and
subqueries are poorly handled by Jet.

For example, take your 'correct' example, and break it down again by
using another exists statement (ie, no joins at all, just a query with
a subquery and a nested subquery beneath the subquery.

It is my understanding that Jet actually works the full subquery (or
subqueries in this case) up from the bottom, and essentially does the
join implicitly only once the subquery has been processed (to the
extent possible). At the same time, DISTINCT is nicely optimized to
an effective TOP 1 / GROUP BY combo (ie, no processing occurs beyond
finding the first matching instance at each level). So you see little
penalty in Jet by using DISTINCT and a large penalty for using the
subquery.

Peter, you're absolutely right. I tested this and there's no difference
whatsoever in execution time for either query running against a Jet
database. I haven't looked at the query plans to see if they are the same.
I'm pretty amazed at this. If you run the two in SQL Server you'll see a
major improvement for query 1.

Live and learn. Thanks for the correction Pater, and the explanation. That's
twice in one day I've made a claim that's incorrect, so now it's time for me
to go and have a beer.
 
david epsom dot com dot au said:
With all due respect to Joe Celko, he has never had
much time for JET...



True. This may be one of the reasons why!

I had just assumed, since the designers of Jet had chosen to implement the
exists clause, that they would have optimized it as well, (as do Oracle, DB2
and SQL Server). Otehrwise what's the point? Oh well, as I said to Peter,
live and learn.
 
I tested this and there's no difference
whatsoever in execution time for either query running against a Jet
database.

Hmm. I tested it here too, but found a 33% performance benefit using
query 2 over query 1. Did you iterate enough times to get meaningful
results? I found that using a 20,000 iteration loop on query 1
against the Northwind tables, I saw 29 seconds for query 1 and 19
seconds for query 2. If I broke query 1 down into two exists/subquery
clauses instead of just one and a join, the time required escalated to
38 seconds for 20,000 iterations.

In other words, there's very much a difference, and its decidedly
faster to join that to use subqueries.
I'm pretty amazed at this. If you run the two in SQL Server you'll see a
major improvement for query 1.

As you would with Oracle, DB2 or Sybase. Just not Jet.

Peter Miller
____________________________________________________________
PK Solutions -- Data Recovery for Microsoft Access/Jet/SQL
Free quotes, Guaranteed lowest prices and best results
www.pksolutions.com 1.800.987.7716 1.619.839.3900
 
I like to think that, at least originally, Jet was optimised
for naive users...


(david)
 
With all due respect to Joe Celko, he
has never had much time for JET...

Joe and I have crossed words (not swords) in the past, and then I decided he
was an "elitist" who didn't seem to have either much time or respect for
anything he considered a mere desktop database. He's certainly not the only
one in that category, of course.
 
Oh, and another point: Complex 'Nested Select' queries sometimes don't work
with tables linked to an ODBC datasource. The Jet queryplan builder fails
to build a valid queryplan.

(david)
 
Back
Top