Dear Luke:
To make sure everyone understands, the original recommendation is to
use UNION ALL to remove those duplicates. That's usually fine, but
there's extra processing needed to do this.
To understand the duplicating, first let's understand where it comes
from.
There are 3 cases:
- rows in set A that have related rows in set B (the ones that would
show up in an inner join)
- rows in A not in B
- rows in B not in A
The first of these cases are the ones that show up as duplicates in a
UNION, and would be removed in a UNION ALL.
To take advantage of the better performance of the UNION, you can
simply filter out these rows from ONE of the two joins.
Assuming the two tables are joined on a common column "id" then do
this:
SELECT * FROM A LEFT JOIN B ON B.id = A.id
UNION ALL
SELECT * FROM A RIGHT JOIN B ON B.id = A.id
WHERE A.id IS NULL
The last line filters out all those common lines, allowing you to use
UNION ALL instead and making it run faster (usually - the filter takes
a little time)
Does this make sense? If you have a significant amount of data, you
can make a comparison with:
SELECT * FROM A LEFT JOIN B ON B.id = A.id
UNION
SELECT * FROM A RIGHT JOIN B ON B.id = A.id
If anybody does this, please get back with your performance results.
Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
Thanks Tom, makes sense.
I understand that the UNION of the LEFT JOIN and RIGHT JOIN would cause
duplicates. You say you remove the rows from the RIGHT JOIN that was cause
duplicates
that exist in the LEFT JOIN below. How do you achieve this?
Thanks again