WHERE clause applies to right-hand table of LEFT JOIN

  • Thread starter Thread starter Joseph Geretz
  • Start date Start date
J

Joseph Geretz

I came across a performance problem recently in my application. I'm not
using Access, the DB is Interbase, but I tracked down the problem to a
specific query. This may not be a vendor specific issue at all and if not,
perhaps you can help with this.

I have a Table, let's call it T1 which has a one to many relationship with a
secondary table, let's call it T2. Within the context of my application, the
join between these two is so common that I've created a VIEW for it:

SELECT T1.T1A,
T1.T1B,
T2.T2A,
T2.T2B
FROM T1 LEFT JOIN T2 ...

Now, at various points in my application I'm selecting * from this VIEW.
Sometimes I'm selecting WHERE T1B = 'blah'. This yields excellent
performance.

Other times though, I'm selecting WHERE T2B = 'yada'. In this case,
performance is abysmal. Now this is interesting. The field for my filter
criteria is from the right hand table of a LEFT JOIN. Now LEFT JOIN
specifies 'All the records from the table on the left and any matching
records from table on the right'. This being the case, it seems that if the
WHERE clause relates to the table on the right, ALL records from the table
T1 are gathered. Then ALL records from the table on the right are gathered.
Then from the resulting recordset, rows which don't fit the WHERE criteria
are eliminated. This explains the abysmal performance.

(But I'd have thought the plan optimizer would see the WHERE criteria
pertains to the table on the right, and use that table as the launching
point for execution.)

So I switched T1 and T2 around and placed T2 on the left. This immediately
improved performance in the problem area, but had the effect of slowing down
the other areas of the application which were previously performing
beautifully.

Ultimately, the only way I could see to address this is by creating two
VIEWS, one with T1 on the left, and one with T2 on the left, and taking care
to select from the appropriate VIEW depending on the WHERE filtering
criteria. Is there a better way of doing this with only one VIEW?

(BTW, I did try an INNER JOIN, but this resulted in mediocre performance in
all areas.)

Have you ever encountered this type of scenario? If so, how did you address
this?

Thanks for any advice which you can provide.

- Joe Geretz -
 
This may not be a vendor specific issue at all and if not,
perhaps you can help with this.

Your question is specific to Interbase but I'll try to shed some light on
you problem.
So I switched T1 and T2 around and placed T2 on the left. This immediately
improved performance in the problem area, but had the effect of slowing down
the other areas of the application which were previously performing
beautifully.

The order in which tables are specified in an OUTER JOIN is significant so
you are asking for potentially different results. The order does not matter
with an INNER JOIN but that is also a different query. It doesn't surprise
me that you different performance with these fundamentally different
queries.

Generally speaking, one addresses performance issues by creating useful
indexes. In SQL, you are describing the desired result. It's up the
optimizer to figure out the best way to obtain it based on available
indexes, statistics and search algorithms.
 
Hi Dan,

Thanks for your reply. A couple comments inline.
The order in which tables are specified in an OUTER JOIN is significant so
you are asking for potentially different results.

Yes, this is often true. However, in my case T2 is contrained by a foreign
key relationship to T1. This means that every record in T1 must have one or
more records in T2. This being the case, the exact same result set is
returned regardless of whether T1 or T2 is on the LEFT of this OUTER join.
(And an INNER JOIN would also yield the same result set.)
The order does not matter
with an INNER JOIN but that is also a different query. It doesn't surprise
me that you different performance with these fundamentally different
queries.
Generally speaking, one addresses performance issues by creating useful
indexes. In SQL, you are describing the desired result. It's up the
optimizer to figure out the best way to obtain it based on available
indexes, statistics and search algorithms.

This is what surprised me. All columns used in the WHERE clause of any of my
SELECT statements are indexed. It really threw me for a loop when some of my
SELECTS started really bogging down, until I found the common denominator
that all poorly performing queries were filtering on columns from the
right-hand table of the LEFT OUTER JOIN. I was wondering if this is typical.
It's really abysmal on the part of the Interbase execution planner. If it
would, in this case, select from the table on the right first, according to
the WHERE criteria, it would immediately narrow down the qualifying records
to about half a dozen, from a total set of thousands.

Since we're currently migrating to SQL Server, I suppose I should use the
query analyzer to see how this would behave against SQL Server. (Since I
made the change which I mentioned for Interbase, the issue is no longer
pressing.) If I get a chance to analyze this further I'll report back to the
group.

Thanks,

- Joe Geretz -
 
Yes, this is often true. However, in my case T2 is contrained by a foreign
key relationship to T1. This means that every record in T1 must have one or
more records in T2. This being the case, the exact same result set is
returned regardless of whether T1 or T2 is on the LEFT of this OUTER join.
(And an INNER JOIN would also yield the same result set.)

Which of course begs the question why you do an outer join in the first
place? All you do is limit the possibilities that the optimizer has to
optimize the query. And make the person after you who is reading the query
really confused.
 
Yes, this is often true. However, in my case T2 is contrained by a foreign
key relationship to T1. This means that every record in T1 must have one or
more records in T2. This being the case, the exact same result set is
returned regardless of whether T1 or T2 is on the LEFT of this OUTER join.
(And an INNER JOIN would also yield the same result set.)

But you may get different results with different data as demonstrated by the
example below. Like Tibor mentioned, it's always best to specify INNER JOIN
unless you have a reason to do otherwise.

CREATE TABLE T1
(
T1_PK int NOT NULL
CONSTRAINT PK_T1 PRIMARY KEY,
T1A int NOT NULL,
T1B int
)

CREATE TABLE T2
(
T2_PK int NOT NULL
CONSTRAINT PK_T2 PRIMARY KEY,
T2A int NULL
CONSTRAINT FK_T2_T1 FOREIGN KEY
(T2A) REFERENCES T1(T1_PK),
T2B int
)

CREATE INDEX T2_T2A ON T2(T2A)
CREATE INDEX T2_T2B ON T2(T2B)

INSERT INTO T1 VALUES(1,1,1)
INSERT INTO T1 VALUES(2,2,2)
INSERT INTO T2 VALUES(1,1,1)

SELECT T1.T1A,
T1.T1B,
T2.T2A,
T2.T2B
FROM T1
LEFT JOIN T2 ON T2.T2A = T1.T1A

SELECT T1.T1A,
T1.T1B,
T2.T2A,
T2.T2B
FROM T1
INNER JOIN T2 ON T2.T2A = T1.T1A
Since we're currently migrating to SQL Server, I suppose I should use the
query analyzer to see how this would behave against SQL Server. (Since I
made the change which I mentioned for Interbase, the issue is no longer
pressing.) If I get a chance to analyze this further I'll report back to the
group.

The SQL Server cost-based optimizer is very good at generating efficient
execution plans in most cases. I believe you'll be pleased with the
performance, especially if you formulate your queries properly (e.g. use
INNER JOIN) and create appropriate indexes.
 
But you may get different results with different data as demonstrated by
the
example below. Like Tibor mentioned, it's always best to specify INNER JOIN
unless you have a reason to do otherwise.

Unfortunately, I don't have the best analysis and monitoring tools for
Interbase. But I do have Windows Task Manager. Here's what I've observed:

Single database accessor scenario, monitoring IBServer.exe:

With the LEFT OUTER JOIN, selecting from the VIEW based on a filter from the
LEFT table, yields results with 2% CPU consumption. - GOOD

With the LEFT OUTER JOIN, selecting from the VIEW based on a filter from the
RIGHT table, yields results with 50-60% CPU consumption. - ABYSMAL

With the INNER JOIN, selecting from the VIEW based on a filter from EITHER
table, yields results with 15-20% CPU consumption. - FAIR

Since (In the Interbase environment at least) the LEFT OUTER JOIN performs
so much better, as long as the WHERE clause addresses columns from the left
hand table, I've arranged my VIEWs to accommodate this.
INSERT INTO T1 VALUES(1,1,1)
INSERT INTO T1 VALUES(2,2,2)
INSERT INTO T2 VALUES(1,1,1)

Your example is technically correct, but in my case this doesn't happen.
Every record in T1 must have at least one and possibly more corresponding
records in T2. (If I get a record in T1 without a corresponding match in T2,
then I have bigger problems on the data input side.)

But your points are well taken. Once we get sqarely into SQL Server, I'll
take another look at this.

Thanks to both of you for your advice,

- Joe Geretz -
 
It's really abysmal on the part of the Interbase execution planner.

Well, there are two views on that as well :)

My first experience of an Oracle database was that it took literally hours
to do simple queries that Jet would have handled in minutes. Obviously,
that engine was less good than Jet at optimising simple ad-hoc queries.

On the other hand, it was widely believed that Oracle could do a good job on
very large datasets if properly optimised by a DBA: and you can never be
sure (from one compact to the next) which execution plan Jet will choose,
and if it will be better or worse.


It used to be common to require the services of a DBA to tune a large
database.

MS has apparently put a lot of work into making SQL Server self-tuning, but
no-one has said that the result is BETTER than that achieved by a good DBA:
just cheaper and easier.

(david)
 
Back
Top