SQL performance question

dave · Nov 2, 2003

Hello -

Im looking to improve the performance of the following SQL query:

SELECT name_first AS [First], name_last AS [Last], org_name AS
[Organization], person_ID as [ID]
FROM tblPerson
WHERE ( person_ID NOT IN (SELECT person_ID from
tblPersonServiceCodes WHERE service_code_ID IN ('SC1')));

Any suggestions are greatly appreciated. Note, when I remove the
"NOT" from the statement, the performance is fine.

Thanks,
Brian

William Ryan · Nov 2, 2003

Speeding up queries can take a few forms. 1) You can optimize syntax 2) YOu
can use indexing strategies. These aren't mutually exclusive, but I'm
guessing that you are referring to syntactic improvements.

In the subquery, do you always only have one value in the service_code_ID IN
value? If so, why not use =? I think using Exists instead of the First In
will speed it up too.

However, if performance is critical, I'd definitely look to use every trick
I have, including indexes. Not sure if it's already indexed...but if not,
you may want to add on. Another thing that may help, if person_ID isn't an
Int value, you may want to change it. I'm not sure what RDBMS you are
using, but Oracle and SQL Server can defintely resolve joins faster if you
have in values. If you are using SQL Server or Oracle, you probably will
want to run a trace as well as the Index Tuning Wizard and see what it tells
you.

In my experience, there are certianly things that you want to categorically
avoid, but in many instances, changes involve tradeoffs. Further, things
that work well under one context often become unmanageable under other ones.

HTH,

Bill

IbrahimMalluf · Nov 2, 2003

Hello Dave (Brian?)

Just out of curiosity, I created some queries in QA that are similar to
yours in construct. After looking at the execution plans resulting from
these variations I found out the following:

I ran an execution plan on this query:

Select FirstName from persons

where PersonID not In

(select personid from v_personaddresses

where statecode in ('mo','il'))

It is identical in construct to your query below just different tables

What the plan showed was that 47% of the query cost was in the sub select
and the remaining 53% of the query cost was in the top query. Ok, does not
mean much yet. But then I did an execution plan on this query:

Select FirstName from persons

where PersonID In

(select personid from v_personaddresses

where statecode in ('mo','il'))

Exact same query but without the 'NOT' statement and the cost was 69% in the
subquery and 31% in the upper query. Now this shows why the query without
the 'NOT' statement is faster. In both cases the SubQuery takes exactly the
same time because they are identical. But the 'NOT' statement added a
significant shift in the cost balance to the upper query. 22 percentage
points of cost shifted to the upper query because of the NOT statement which
can be roughly translated into a 22% increase in time since the subquery's
performance is not affected in either case.

Now there is another way to do what you want with the same results:

Select FirstName from persons

where PersonID In

(select personid from v_personaddresses

where statecode not in ('mo','il'))

Here the NOT is shifted to the where clause of the subquery. What is the
affect? 75% of the cost is now in the Subquery and only 25% of the cost is
in the upper query. I don't at the moment have access to a database large
enough for me to do any real time trials. But this latter query looks like
it will perform better than your original. Like query 2 above it is built
of two nested loop/inner joins. But has an aggregate stream instead of a
Distinct Sort to join them. The aggregate stream is much faster than the
distinct sort. All things taken into account, query 3 would give you near
the same performance of query2 and give you the rowset result you are
looking for.

Your original query has a merge join/right anti semi join instead of a
nested loop/inner join in the top query. Very expensive.

I am guessing that putting the not statement in the sub-query will show some
improvement

Oh Yeah, I am assuming this is a SQL-Server environment. If it's oracle,
well...check your execution plans using oracle tools.

Ibrahim

dave said:
Hello -

Im looking to improve the performance of the following SQL query:

SELECT name_first AS [First], name_last AS [Last], org_name AS
[Organization], person_ID as [ID]
FROM tblPerson
WHERE ( person_ID NOT IN (SELECT person_ID from
tblPersonServiceCodes WHERE service_code_ID IN ('SC1')));

Any suggestions are greatly appreciated. Note, when I remove the
"NOT" from the statement, the performance is fine.

Thanks,
Brian

Tom Leylan · Nov 2, 2003

IbrahimMalluf said:
Now there is another way to do what you want with the same results:

Select FirstName from persons
where PersonID In
(select personid from v_personaddresses
where statecode not in ('mo','il'))

You posted a terrific reply. I keep looking at the original though and I
think you've made an assumption that isn't necessarily the case. Let me
outline the difference and you can compare it yourself.

Trimming away the excess the original was basically as follows:

SELECT name_first, person_ID
FROM tblPerson
WHERE ( person_ID NOT IN (SELECT person_ID from
tblPersonServiceCodes WHERE service_code_ID = 'SC1' ))

If you assume that a person can have more than one service code (which I
did) then this yields all the people who don't have the SC1 service code at
all.

Yours yields all the people who have a service code other than SC1 as their
"only" service code. If they have both SC1 and SC2 your sub-query returns
the SC2 record and your top query matches that one.

If SC1 represented all the people who received a "mailing 1" and SC2
represented those that received "mailing 2" you would end up sending mailing
1 again to some people, excluding only those that "only" received mailing 1.

Is that how you see it?

Kalpesh Shah · Nov 2, 2003

dave said:
Hello -

Im looking to improve the performance of the following SQL query:

SELECT name_first AS [First], name_last AS [Last], org_name AS
[Organization], person_ID as [ID]
FROM tblPerson
WHERE ( person_ID NOT IN (SELECT person_ID from
tblPersonServiceCodes WHERE service_code_ID IN ('SC1')));

Any suggestions are greatly appreciated. Note, when I remove the
"NOT" from the statement, the performance is fine.

Thanks,
Brian

If there is 1 to 1 mapping between person & service code (1 person
having 1 service code record in tblPersonServiceCodes)

SELECT P.*
FROM tblPerson P , tblPersonServiceCodes PSC
WHERE PSC.person_ID = p.person_ID
AND PSC.service_code_ID <> 'SC1'

My suggestion is not to use, 'IN' clause unless it is a list in which
you are searching as against a single value. Although, internally it
might be getting translated intelligently

HTH
Kalpesh

Glenn Wilson · Nov 3, 2003

Why not just use a join, ?

SELECT p.name_first AS [First], p.name_last AS [Last],
p.org_name AS
[Organization], p.person_ID as [ID]
FROM tblPerson as p
join tblpersonservicecodes as sc
on p.personid = sc.personid
WHERE sc.service_code_ID NOT IN ('SC1');

SQL performance question

dave

William Ryan

IbrahimMalluf

Tom Leylan

Kalpesh Shah

Glenn Wilson