SQL: Finding Data That Is In One Table But Not The Other

  • Thread starter Thread starter David Kaye
  • Start date Start date
D

David Kaye

Okay, I need to find data that is in one table but not the other. One
method would be:

SELECT artist,title FROM table1 WHERE artist&title NOT IN (SELECT
artist&title FROM table2)

I'm sure this query would work if only it would finish! The problem is that
table1 has 40,000 records and table2 has 37,000 records. Both tables have
indices on artist and title, and I also added an index on artist&title on
each table. But it STILL takes forever to run. After half an hour I gave
up.

You may wonder why I combined artist and title in the query. It's because
the Jet(Access) engine tends to run faster when combining fields. I could
also run it with the fields separated, but it still takes forever and a day
to run.

I've also tried to JOIN the tables, and while it works very well with IN it
simply won't run in a decent amount of time when using NOT IN.

Does anybody have a query solution that will run fast or at least won't run
my CPU at 50% and never finish?
 
Okay, I need to find data that is in one table but not the other. One
method would be:

SELECT artist,title FROM table1 WHERE artist&title NOT IN (SELECT
artist&title FROM table2)

I'm sure this query would work if only it would finish! The problem is that
table1 has 40,000 records and table2 has 37,000 records. Both tables have
indices on artist and title, and I also added an index on artist&title on
each table. But it STILL takes forever to run. After half an hour I gave
up.

You may wonder why I combined artist and title in the query. It's because
the Jet(Access) engine tends to run faster when combining fields. I could
also run it with the fields separated, but it still takes forever and a day
to run.

I've also tried to JOIN the tables, and while it works very well with IN it
simply won't run in a decent amount of time when using NOT IN.

Does anybody have a query solution that will run fast or at least won't run
my CPU at 50% and never finish?

A "Frustrated Outer Join" query is going to be much faster than a NOT IN. And
I have to emphatically disagree with you on the combinining fields; if you
define a multifield Index on the Artist and Title fields this should be quite
efficient:

SELECT table1.Artist, table1.Title
FROM table1 LEFT JOIN table2
ON table1.Artist = table2.Artist AND table1.Title = table2.Title
WHERE table2.Artist IS NULL AND table2.Title IS NULL;

A 40000 row table is quite modest, I've worked with tables over a million rows
with no difficulty. Proper indexing can make an enormous difference! The fact
that you're doing a subquery on a calculated (concatenated) value and testing
it against another calculated value will not use any indexes and will force
40000 full table scans in the subquery, so I'm not at all surprised it's very
slow.
--

John W. Vinson [MVP]
Microsoft's replacements for these newsgroups:
http://social.msdn.microsoft.com/Forums/en-US/accessdev/
http://social.answers.microsoft.com/Forums/en-US/addbuz/
and see also http://www.utteraccess.com
 
On Mon, 28 Nov 2011 10:15:20 -0700, John W. Vinson

[snip]
A "Frustrated Outer Join" query is going to be much faster than a NOT IN. And
^^^^^^^^^^^^^^^^^^^^^
I have never heard of this term before. A search was not too
successful.
I have to emphatically disagree with you on the combinining fields; if you
define a multifield Index on the Artist and Title fields this should be quite
efficient:

SELECT table1.Artist, table1.Title
FROM table1 LEFT JOIN table2
ON table1.Artist = table2.Artist AND table1.Title = table2.Title
WHERE table2.Artist IS NULL AND table2.Title IS NULL;

This is very opaque to me. Could you please explain how it
works? I do not follow why the IS NULLs. A URL would be fine.

[snip]

Sincerely,

Gene Wirchenko
 
On Mon, 28 Nov 2011 10:15:20 -0700, John W. Vinson

[snip]
A "Frustrated Outer Join" query is going to be much faster than a NOT IN. And
^^^^^^^^^^^^^^^^^^^^^
I have never heard of this term before. A search was not too
successful.
I have to emphatically disagree with you on the combinining fields; if you
define a multifield Index on the Artist and Title fields this should be quite
efficient:

SELECT table1.Artist, table1.Title
FROM table1 LEFT JOIN table2
ON table1.Artist = table2.Artist AND table1.Title = table2.Title
WHERE table2.Artist IS NULL AND table2.Title IS NULL;

This is very opaque to me. Could you please explain how it
works? I do not follow why the IS NULLs. A URL would be fine.
Sorry... should have explained!

The LEFT JOIN will include all records from table1, and any matching records
from table2. For those rows where there is no match in table2, the fields from
table2 will be in the recordset but with NULL values.

So if you apply a criterion of IS NULL to the joining field(s) in table2, you
will display only those records which do NOT have a match.

Try creating a query using the "Unmatched Query Wizard" - you'll see that
Access uses this very technique.
--

John W. Vinson [MVP]
Microsoft's replacements for these newsgroups:
http://social.msdn.microsoft.com/Forums/en-US/accessdev/
http://social.answers.microsoft.com/Forums/en-US/addbuz/
and see also http://www.utteraccess.com
 
On Mon, 28 Nov 2011 13:25:46 -0700, John W. Vinson

[snip]
Sorry... should have explained!

The LEFT JOIN will include all records from table1, and any matching records
from table2. For those rows where there is no match in table2, the fields from
table2 will be in the recordset but with NULL values.

So if you apply a criterion of IS NULL to the joining field(s) in table2, you
will display only those records which do NOT have a match.

Try creating a query using the "Unmatched Query Wizard" - you'll see that
Access uses this very technique.

Thank you.

That was a nasty one. I got thrown off by the table2 columns not
being in the selected columns.

Sincerely,

Gene Wirchenko
 
Back
Top