Hello Dave (Brian?)
Just out of curiosity, I created some queries in QA that are similar to
yours in construct. After looking at the execution plans resulting from
these variations I found out the following:
I ran an execution plan on this query:
Select FirstName from persons
where PersonID not In
(select personid from v_personaddresses
where statecode in ('mo','il'))
It is identical in construct to your query below just different tables
What the plan showed was that 47% of the query cost was in the sub select
and the remaining 53% of the query cost was in the top query. Ok, does not
mean much yet. But then I did an execution plan on this query:
Select FirstName from persons
where PersonID In
(select personid from v_personaddresses
where statecode in ('mo','il'))
Exact same query but without the 'NOT' statement and the cost was 69% in the
subquery and 31% in the upper query. Now this shows why the query without
the 'NOT' statement is faster. In both cases the SubQuery takes exactly the
same time because they are identical. But the 'NOT' statement added a
significant shift in the cost balance to the upper query. 22 percentage
points of cost shifted to the upper query because of the NOT statement which
can be roughly translated into a 22% increase in time since the subquery's
performance is not affected in either case.
Now there is another way to do what you want with the same results:
Select FirstName from persons
where PersonID In
(select personid from v_personaddresses
where statecode not in ('mo','il'))
Here the NOT is shifted to the where clause of the subquery. What is the
affect? 75% of the cost is now in the Subquery and only 25% of the cost is
in the upper query. I don't at the moment have access to a database large
enough for me to do any real time trials. But this latter query looks like
it will perform better than your original. Like query 2 above it is built
of two nested loop/inner joins. But has an aggregate stream instead of a
Distinct Sort to join them. The aggregate stream is much faster than the
distinct sort. All things taken into account, query 3 would give you near
the same performance of query2 and give you the rowset result you are
looking for.
Your original query has a merge join/right anti semi join instead of a
nested loop/inner join in the top query. Very expensive.
I am guessing that putting the not statement in the sub-query will show some
improvement
Oh Yeah, I am assuming this is a SQL-Server environment. If it's oracle,
well...check your execution plans using oracle tools.
Ibrahim
dave said:
Hello -
Im looking to improve the performance of the following SQL query:
SELECT name_first AS [First], name_last AS [Last], org_name AS
[Organization], person_ID as [ID]
FROM tblPerson
WHERE ( person_ID NOT IN (SELECT person_ID from
tblPersonServiceCodes WHERE service_code_ID IN ('SC1')));
Any suggestions are greatly appreciated. Note, when I remove the
"NOT" from the statement, the performance is fine.
Thanks,
Brian