Bizarre slow query problem (again)

  • Thread starter Thread starter wizofaus
  • Start date Start date
W

wizofaus

I previously posted about a problem where it seemed that changing the
case of the word "BY" in a SELECT query was causing it to run much much
faster.

Now I've hit the same thing again, where basically almost any change I
make to how the query is executed (so that it still performs the same
function) causes the performance to jump from a dismal 7 or 8 seconds
to instantaneous. It's a very simple query of the form:

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

which was running fine until a moment ago, when it suddently started
running hopelessly slowly. If change anything in the query to
lowercase (or the Min to uppercase), it runs fine again. Last time
someone suggested something about a bad plan being cached, and after a
bit of research I found the commands DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE. Sure enough, after running these, the query started
running fine again. The question is

a) why is this happening? Is it a bug in my code, or in SQL server?
b) is it worth detecting it and fixing it automatically? I.e, should I
put some code in that notices that a query is running far too slowly,
then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that
cause other problems?

Thanks
 
Uri said:
Thanks for that...amazingly enough it turned that that was exactly my
problem, although I'm using ad-hoc queries rather than stored procs. I
did some more testing, and it turned out that it was because it was
executing the same query twice, the first time with an atypical
parameter value, and the second time with a more typical one, that the
query was running so slowly. That is, executing

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 999
followed by
SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 123

Caused the second query to run absurdly slowly, because in the first
case only very few rows in the table had MyKey = 999 whereas almost
every row had MyKey = 123. After doing a DBCC FREEPROCCACHE and
swapping the queries around, they both ran fine.

In the end I ended up de-parameterizing the query just for this case,
but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters because of
this possibility?
 
In the end I ended up de-parameterizing the query just for this case,
but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters because of
this possibility?
An ability using parameters is very powerful , don't afraid using parameters
, just test it carefuly
 
Uri said:
An ability using parameters is very powerful , don't afraid using parameters
, just test it carefuly
Sure, except that the content of the database is out of my control -
this particular scenario (where nearly all the records matched a
particular key, but the query was first run against a different key)
could easily arrise in a production environment. More to the point,
I've seen no evidence that I'm getting any performance benefits from
using parameterized queries.
I suspect I will at least add a configuration option to avoid
parameterized queries (relatively straightforward, as I have a layer of
code that handles query parameters) if I see a problem like this again.
 
"de-parameterizing"? You mean changing to dynamic sql and leaving yourself
vulnerable to sql injection??
Sure, except that the content of the database is out of my control -
this particular scenario (where nearly all the records matched a
particular key, but the query was first run against a different key)
could easily arrise in a production environment. More to the point,
I've seen no evidence that I'm getting any performance benefits from
using parameterized queries.
I suspect I will at least add a configuration option to avoid
parameterized queries (relatively straightforward, as I have a layer
of
code that handles query parameters) if I see a problem like this
again.

This is a ridiculous overreaction. Problems due to parameter-sniffing are
too rare to justify eliminating the benefits of using parameters. Talk about
"throwing the baby out with te bath water".

The article showed two, no three, ways to alleviate the problems caused by
parameter sniffing and still use parameters. So what do you do? ignore the
article's advice and "de-parameterize" your query...
 
a) why is this happening? Is it a bug in my code, or in SQL server?
b) is it worth detecting it and fixing it automatically? I.e, should I
put some code in that notices that a query is running far too slowly,
then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that
cause other problems?

Pardon me but

Why did you not say which version of SQL Server you are running on?
And why not mention the version of MDAC you have installed (since you posted
to microsoft.public.data.ado)?
And why not mention something about the Primary Keys & Indexes of the table
you are querying - I take it you have an index on MyKey?
And have you checked the documentation, whitepapers, MSDN for details on
query performance?
And have you checked the query plan in Query Analyser?

Cheers

Stephen Howe
 
be EXTREMELY wary when using DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE on a production machine. They clear all cached SPs,
queries and plans. The instance is bound to run under extreme stress
for a considerable amount of time.
 
be EXTREMELY wary when using DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE on a production machine. They clear all cached SPs,
queries and plans. The instance is bound to run under extreme stress
for a considerable amount of time.
The instance? You mean it affects all databases?
In this case, I determined I'd have to do it before every single query
call, so obviously that's not practical.
 
Sure, except that the content of the database is out of my control -
this particular scenario (where nearly all the records matched a
particular key, but the query was first run against a different key)
could easily arrise in a production environment. More to the point,
I've seen no evidence that I'm getting any performance benefits from
using parameterized queries.
I suspect I will at least add a configuration option to avoid
parameterized queries (relatively straightforward, as I have a layer of
code that handles query parameters) if I see a problem like this again.

Which version of SQL Server are you using? Here is a test that you can
try to see that you can do to actually test the benefit of
parameterised queries. First create this database:

CREATE DATABASE many_sps
go
USE many_sps
go
DECLARE @sql nvarchar(4000),
@x int
SELECT @x = 1000
WHILE @x > 0
BEGIN
SELECT @sql = 'CREATE PROCEDURE abc_' + ltrim(str(@x)) +
'_sp @orderid int AS
SELECT O.OrderID, O.OrderDate, O.CustomerID, C.CompanyName,
Prodcnt = OD.cnt, Totalsum = OD.total
FROM Northwind..Orders O
JOIN Northwind..Customers C ON O.CustomerID = C.CustomerID
JOIN (SELECT OrderID, cnt = COUNT(*), total = SUM(Quantity * UnitPrice)
FROM Northwind..[Order Details]
GROUP BY OrderID) AS OD ON OD.OrderID = O.OrderID
WHERE O.OrderID = @orderid'
EXEC(@sql)
SELECT @x = @x - 1
END

(Don't worry if you don't have Northwind on your server, you are not going
to run these procedures.)

Then use F7 to get the Summary page, and navigate to the Stored Procedures
node for many_sps. Select some 200 procedures, right-click and select
Script As Create To New Query Window. Go for a cup of coffee - this will
take some time depending on your hardware.

When the operation has completed (or you have gotten tired of waiting
and killed SSMS), issue this command:

ALTER DATABASE db SET PARAMETERIZATION FORCED

Redo the scripting operation. It will now complete in five seconds.

The reason for this is that SQL Server Management Studio does not use
parameterised queries. For every procedure it scripts, Mgmt Studio
issues around five queries. All these queries makes it to the
cache that explodes, and all these queries are compiled.

When you set a database to forced parameterisation, SQL Server will
auto-parameterise all statements (with some exceptions documented in
Books Online); normally it only auto-parameterise very simple queries.
In the case of Mgmt Studio it's reallly a go-faster switch.

So dismissing caching of parameterised queries can be a serious mistake.
But it is certainly true that there are situations where parameter
sniffing can be a problem. If it is possible for you to tell in the
appliocation "this is an odd value that needs a special plan", then
you can modify the query text by adding a redudant condition like
"AND 1 = 1". Actually as you have found, changing "BY" to "by" or even
adding extra spaces help. This is because the lookup in the cache
is done on a hash without first collapsing spaces or parsing the
query text.



--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Erland said:
When the operation has completed (or you have gotten tired of waiting
and killed SSMS), issue this command:

ALTER DATABASE db SET PARAMETERIZATION FORCED

Redo the scripting operation. It will now complete in five seconds.

By the way, this is something important for your application as well.
Say that a DBA finds out that your app is thrashing the cache by not
using parameterised queries, and sets the database to forced
parameterisation, you will get back the behaviour you have now.

A bettery remedy is to add OPTION (RECOMPILE) at the end of sensitive
queries. This forces a statement recompile, and the query will not be
put in cache. This means that you can still use parameterised queries
and get the other benefits of it. (Protection for SQL injection and
repsecting the user's regional settings.) You also avoid thrashing
the cache.


--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Erland said:
When you set a database to forced parameterisation, SQL Server will
auto-parameterise all statements (with some exceptions documented in
Books Online); normally it only auto-parameterise very simple queries.
In the case of Mgmt Studio it's reallly a go-faster switch.

So dismissing caching of parameterised queries can be a serious mistake.
But it is certainly true that there are situations where parameter
sniffing can be a problem. If it is possible for you to tell in the
appliocation "this is an odd value that needs a special plan", then
you can modify the query text by adding a redudant condition like
"AND 1 = 1". Actually as you have found, changing "BY" to "by" or even
adding extra spaces help. This is because the lookup in the cache
is done on a hash without first collapsing spaces or parsing the
query text.
Thanks...one of the most helpful replies I've had on usenet for some
time now!

The problem is that it's pretty hard for me to know that a value is
"odd". In this case, like I said the query in this case is

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

Where @0 is actually a value from a list that is obtained from
elsewhere (not the database). It loops through this list, calling the
same query for each one.

Now, I suppose I could first do

SELECT Count(*) FROM MyTable WHERE MyKey = @0

and determine if the number was very low, and if so, de-parameterize it
or add a space or whatever, but then this second query would
potentially suffer from the same problem.

I suppose another alternative is to build another query first

SELECT MyKey, Min(MyValue) FROM MyTable GROUP BY MyKey

then use this to obtain the minimum value for each key, but there's
only so much time I can spend rewriting queries to side-step quirky
performance issues (the application has many many ad-hoc queries,
nearly all parameterized on the same key, so they are all prone to the
same problem).

BTW, this is under SQL server 2000. I've yet to determine if this
particular problem does actually exist under 2005, which is what we're
using for the production environment. Will definitely do that before I
waste too much more time on it.
 
Thanks...one of the most helpful replies I've had on usenet for some
time now!

The problem is that it's pretty hard for me to know that a value is
"odd".

I can understand that this is not always simple. I didn't say this, in
hope it would be. :-)

However, I think I have a cure for you:
In this case, like I said the query in this case is

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

Where @0 is actually a value from a list that is obtained from
elsewhere (not the database). It loops through this list, calling the
same query for each one.

Stop! Don't do that! The problems with query plans aside, this is an
ineffecient use of SQL Server. Get all data at once with:

SELECT t.MyKey, Min(t.MyValue)
FROM MyTable t
JOIN list_to_table(@list) f ON t.MyKey = f.value
GROUP BY t.MyKey

Where list_to_table is a table-valued function that transform the list
to a table. I have a whole bunch of such functions on
http://www.sommarskog.se/arrays-in-sql.html.


--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Stephen,

You forget in the list, the version of the framework that is used. It is to
AdoNet as well you know.

:-)


Cor
 
Erland said:
I can understand that this is not always simple. I didn't say this, in
hope it would be. :-)

However, I think I have a cure for you:


Stop! Don't do that! The problems with query plans aside, this is an
ineffecient use of SQL Server. Get all data at once with:

SELECT t.MyKey, Min(t.MyValue)
FROM MyTable t
JOIN list_to_table(@list) f ON t.MyKey = f.value
GROUP BY t.MyKey

Where list_to_table is a table-valued function that transform the list
to a table. I have a whole bunch of such functions on
http://www.sommarskog.se/arrays-in-sql.html.
I agree this sort of thing would be preferable, and more efficient, but
as it is, providing I don't get the bad plan problem, it's efficient
enough as it is, and I'm wary of making too many big changes at this
point. That sort of thing I'd prefer to leave for the next version.

The thing is, I just tried exporting all the records to a CSV file,
creating a clean database with the same tables/indices, re-importing
all the records, and the same problem DOESN'T happen: I can run that
query with the "odd" value first, then with the common value, and both
queries run fast. So something else has happened to my database that
surely I must be able to reset somehow?
 
I agree this sort of thing would be preferable, and more efficient, but
as it is, providing I don't get the bad plan problem, it's efficient
enough as it is, and I'm wary of making too many big changes at this
point. That sort of thing I'd prefer to leave for the next version.

But keep in mind that the solution you have now will not scale well. If
the data in production is ten times larger than you have expected, you
will get ten times more execution time, even with the good plan.
The thing is, I just tried exporting all the records to a CSV file,
creating a clean database with the same tables/indices, re-importing
all the records, and the same problem DOESN'T happen: I can run that
query with the "odd" value first, then with the common value, and both
queries run fast. So something else has happened to my database that
surely I must be able to reset somehow?

I eavesdropped a discussion at PASS in Seattle last year, when a guy
had done extensive tests, and he could repeat a scenario that depending
on which order he loaded the same data, he would get different plans,
good or bad. I presume that part of the answer lies what exactly is in
the statistics. Normally, statistics are only samples, and if the
statistics does not well reflect the data distribution, your plans
will not always be the best.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Erland said:
But keep in mind that the solution you have now will not scale well. If
the data in production is ten times larger than you have expected, you
will get ten times more execution time, even with the good plan.

Sure. I definitely plan on doing some query optimization and
consolidation for the next version. Your routines may well come in
handy, so thanks.
I eavesdropped a discussion at PASS in Seattle last year, when a guy
had done extensive tests, and he could repeat a scenario that depending
on which order he loaded the same data, he would get different plans,
good or bad. I presume that part of the answer lies what exactly is in
the statistics. Normally, statistics are only samples, and if the
statistics does not well reflect the data distribution, your plans
will not always be the best.
Well I found another solution - reindex the table.

I ran

dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the
problem is gone away. My main concern was that if we did see this
problem occuring in productoion databases, how could we fix it, other
than changing the code, and at least now I have such a solution, and
it's a bit less drastic than exporting and re-importing all the data
(which potentially could have taken hours).

I'm not sure whether 90 is the best parameter value here, that was just
from the MSDN article.
 
Well I found another solution - reindex the table.

I ran

dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the
problem is gone away. My main concern was that if we did see this
problem occuring in productoion databases, how could we fix it, other
than changing the code, and at least now I have such a solution, and
it's a bit less drastic than exporting and re-importing all the data
(which potentially could have taken hours).

It's recommended to run a maintenance job to reindex the table with some
frequency. The main reason for this is to prevent defragmentation. A side
effect of this is that statistics are updated with fullscan, that is all
rows are considered. That gives the optimizer more accurate information.


--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Well I found another solution - reindex the table.

I ran

dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the
problem is gone away. My main concern was that if we did see this
problem occuring in productoion databases, how could we fix it, other
than changing the code, and at least now I have such a solution, and
it's a bit less drastic than exporting and re-importing all the data
(which potentially could have taken hours).

How large is your table?

Maybe a simple "update statistics" would also fix things?

J.
 
JXStern said:
How large is your table?

'bout 2 million records, 9 columns.
Maybe a simple "update statistics" would also fix things?
Quite possibly - unfortunately I can't re-create the problem now to
test it!
Will be the first thing I try if I see the same problem again, though.
 
Back
Top