Bizarre slow query problem (again)

  • Thread starter Thread starter wizofaus
  • Start date Start date
'bout 2 million records, 9 columns.
Quite possibly - unfortunately I can't re-create the problem now to
test it!
Will be the first thing I try if I see the same problem again, though.

Could probably reproduce it - insert a 1,000,000 rows = 1, then a
handfull of rows numbered 2-10. Query it for =1, it will scan, then
query it for =2. Or something like that.

J.
 
JXStern said:
Could probably reproduce it - insert a 1,000,000 rows = 1, then a
handfull of rows numbered 2-10. Query it for =1, it will scan, then
query it for =2. Or something like that.
Not sure, but now I have another issue which does seem to point towards
parameters being a problem.

If I run a query with quite a large number of parameters which
basically

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN (@13, @14, @15...@20)

it takes over 2 seconds, but when I substitute the last 8 parameters
(13-20) with their values, i.e.

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN ('value1', 'value2', 'value3', 'value4'...'value8')

the query takes 15 milliseconds! (actually it took slightly longer -
about 400msec - on the first run - the previous query always takes ~2
seconds no matter how often I run it).

Curiously, the fewer parameters I substitute for the Key2 filter, the
slower it runs (I can't improve on the 15 msec by doing the same for
they Key1 filter), in a more or less linear relationship.

I've tried all the previously mentioned "tricks", including reindexing
the table, but no luck.

So it seems maybe SQL server has problems optimizing if there are too
many parameters.
FWIW, I did try putting the parameter values in a temporary table and
even using a join, but the total time is significantly longer than
15ms.

Again, the parameter values are under my control, so there's no risk of
SQL injection, so if there's another good reason NOT to do my own
parameter substitution in this case, I'd be interested in hearing it.
 
Not sure, but now I have another issue which does seem to point
towards parameters being a problem.

If I run a query with quite a large number of parameters which
basically

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN (@13, @14, @15...@20)

it takes over 2 seconds, but when I substitute the last 8 parameters
(13-20) with their values, i.e.

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN ('value1', 'value2', 'value3', 'value4'...'value8')

the query takes 15 milliseconds! (actually it took slightly longer -
about 400msec - on the first run - the previous query always takes ~2
seconds no matter how often I run it).

Now you've got me really intrigued. Based on everything I've ever read, IN
comparisons are supposed to be non-sargable, and therefore non-optimizable.
Someone from the SQL Server groups please correct me if I am wrong. Oh wait,
maybe you have a sufficient number of values to cause the query engine to
use a temp table and join, in which case optimization can occur.

I suggest you use SQL Profiler to determne the difference in the execution
plans for each method. If the fast query plan involves an index that is not
used in the slow query plan, you can use an index hint to force the query
engine to use that index
(http://www.sql-server-performance.com/transact_sql_where.asp)
Curiously, the fewer parameters I substitute for the Key2 filter, the
slower it runs (I can't improve on the 15 msec by doing the same for
they Key1 filter), in a more or less linear relationship.

I've tried all the previously mentioned "tricks", including reindexing
the table, but no luck.

What about the trick that involves using local variables in your batch,
instead of directly using the parameters:

declare @tmp1, @tmp2, etc.
set @tmp1=@parm1
etc.
SELECT ...(@tmp1, ...)

Personally, I would do this in a stored procedure, but you have ignored
previous advice to use stored procedures.
So it seems maybe SQL server has problems optimizing if there are too
many parameters.
FWIW, I did try putting the parameter values in a temporary table and
even using a join, but the total time is significantly longer than
15ms.

ISTR reading somewhere that with a sufficient number of values, that the
query engine does this anyways behind the scenes.
Again, the parameter values are under my control, so there's no risk
of SQL injection,

I always cringe when I hear somebody say this. Unless those values are
hard-coded into your application code, you have to be getting those values
from somewhere. If a user was involved at any point in the process that
generates those values, then you need to at least entertain the possibility
that some funny business may have occurred. Look in these articles for
"secondary sql injection". It is always a mistake to assume that your user
base is too ignorant to take advantage of these techniques:
http://mvp.unixwiz.net/techtips/sql-injection.html
http://www.sqlsecurity.com/DesktopDefault.aspx?tabid=23
http://www.nextgenss.com/papers/advanced_sql_injection.pdf
http://www.nextgenss.com/papers/more_advanced_sql_injection.pdf

The bottom line may turn out to be that you need to choose between secure
and fast.
 
Not sure, but now I have another issue which does seem to point towards
parameters being a problem.

If I run a query with quite a large number of parameters which
basically

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN (@13, @14, @15...@20)

it takes over 2 seconds, but when I substitute the last 8 parameters
(13-20) with their values, i.e.

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN ('value1', 'value2', 'value3', 'value4'...'value8')

the query takes 15 milliseconds! (actually it took slightly longer -
about 400msec - on the first run - the previous query always takes ~2
seconds no matter how often I run it).

Indeed, that is because with literals will really compile the statement
based on the actual values. The optimizer will build a kind of binary
tree. It will also remove any duplicates (when applicable). So the
execution phase will be very fast.

On the other hand, the compilation phase is relatively expensive. And if
you have hundreds of values, the optimizer start to choke.
Curiously, the fewer parameters I substitute for the Key2 filter, the
slower it runs (I can't improve on the 15 msec by doing the same for
they Key1 filter), in a more or less linear relationship.

Explanation: see above
I've tried all the previously mentioned "tricks", including reindexing
the table, but no luck.

So it seems maybe SQL server has problems optimizing if there are too
many parameters.
FWIW, I did try putting the parameter values in a temporary table and
even using a join, but the total time is significantly longer than
15ms.

It is hard to optimizer a scenario like yours. You might try something
like this:

SELECT SUM(cnt)
FROM (
SELECT COUNT(*) AS cnt FROM MyTable WHERE Key1=@0
UNION ALL
SELECT COUNT(*) AS cnt FROM MyTable WHERE Key1=@1
UNION ALL
...
) AS T

Although the optimizer might automatically come up with a query plan
that reflects this strategy, I doubt that it actually will.

Please let me know if it actually increases your query performance.

HTH,
Gert-Jan
 
Bob said:
Now you've got me really intrigued. Based on everything I've ever read, IN
comparisons are supposed to be non-sargable, and therefore non-optimizable.
Someone from the SQL Server groups please correct me if I am wrong. Oh wait,
maybe you have a sufficient number of values to cause the query engine to
use a temp table and join, in which case optimization can occur.

But that's the thing - I tried doing that explicitly myself, and it's
considerably slower.
I suggest you use SQL Profiler to determne the difference in the execution
plans for each method. If the fast query plan involves an index that is not
used in the slow query plan, you can use an index hint to force the query
engine to use that index
(http://www.sql-server-performance.com/transact_sql_where.asp)


What about the trick that involves using local variables in your batch,
instead of directly using the parameters:

declare @tmp1, @tmp2, etc.
set @tmp1=@parm1
etc.
SELECT ...(@tmp1, ...)

Personally, I would do this in a stored procedure, but you have ignored
previous advice to use stored procedures.

Actually I tried stored procs as well, and it didn't seem to be
helping. Also, can you write a stored proc to take a variable number
of parameters?
ISTR reading somewhere that with a sufficient number of values, that the
query engine does this anyways behind the scenes.


I always cringe when I hear somebody say this. Unless those values are
hard-coded into your application code, you have to be getting those values
from somewhere. If a user was involved at any point in the process that
generates those values, then you need to at least entertain the possibility
that some funny business may have occurred.

Perhaps, but in this case they are auto-generated - the user has no
control over what the actual values are (only how many there are).

At any rate, if the only thing to take into consideration here is
performance vs security, then I'm afraid performance does win. No-one
will use the application if it's a dog. And as it happens, the
database doesn't hold particularly sensitive (or irreplaceable) data
anyway.
 
Bob said:
Now you've got me really intrigued. Based on everything I've ever read,
IN comparisons are supposed to be non-sargable, and therefore
non-optimizable. Someone from the SQL Server groups please correct me if
I am wrong.

coi IN (val1, val2, ...)

is just a shortcut for

col1 = val1 OR col2 = val2 OR ...

and it's perfectly possible for the optimizer to work with IN to produce a
good plan. The main problem is that for many values, the optimization time
can exceed the real execution time by far.


--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Back
Top