Avoid sort on SELECT command

  • Thread starter Thread starter _VJ
  • Start date Start date
V

_VJ

SQL "SELECT" (via Ado.net) gives results in sorted order by default.
IOW, "ORDER BY" is turned on by default (seems to use the primary key).

I'd like to retrieve rows in the order that I originally inserted them.
Is there any way to turn off the default sort?

VJ

PS: I've hex-dumped the data file, and confirmed that the rows are not
pre-sorted inside the database.
 
In SQL the way to return a sorted result is to use the ORDER BY clause on a
SELECT statement. If you need to return rows in the order in which they were
inserted then you'll need to have a column in your table that records that
information. Example:

CREATE TABLE Something
(... creation_dt DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP)

SELECT ...
FROM Something
ORDER BY creation_dt
 
The only way to guarantee the order of a "select" result set is to use the
"order by" clause. You'll have to order by a column that indicates the
order in which the rows were inserted.

tml

Roji. P. Thomas
SQL Server Programmer

A subscription site? Think I should pay just to see if that question is
answered? (By the looks of it, it wasn't)

Anyone else have any ideas?

VJ
 
The responses are at the bottom of the page.... scroll down past all of the
garbage after the original question.

Example:

Assisted Answer from astankovic
Date: 05/10/2004 02:51PM PDT
Assisted Answer
You should never use a SELECT statement without ORDER BY clause if you
need the results to be ordered in specific order.
That is because if ORDER BY is not used the order of the rows is
arbitrary.



Assisted Answer from danblake
Date: 05/11/2004 12:39AM PDT
Assisted Answer


MS are also in the process of No guarantees of returned results setups
without the order by clause specified regardless of indexes on the base
tables or data-insertation order.

Clustered indexes will normally return an ordered sequence back, but
this is going to be no longer the case.

I recommend using the ORDER BY in every instance, you require results
back in a specific order -- if you don't do it now, its going to cause
problem ahead.

Accepted Answer from rdrunner
Date: 05/11/2004 01:17AM PDT
Accepted Answer


If there are rows being deleted then the SQL Server will fill up those
rows with data. This will directly accect the way the data is returned (in
case the SQL Server is doing a table scan) So if you have a small table with
no index (just one INT field) and say add 100 K records...

Now you delete the 1st 2000 records and add another 2000 records...
The results returned SHOULD be...

100001
100002
.....
101999
102000
2001
2002.....

But like mentioned before as long as there are no Order by clauses you
wont have a garantee that there will be a corect order...
 
If the ORDER BY is not specified SQL Server returns:
a) If the clustered key is defined - in the clustered key order
b) If clustered key is not defined - you cannot know the correct order. For
small tables it is 99% the insert order, but for big tables it is not. To be
sure you should, as David said, to add a field that will contain the needed
order. I suggest to use identity column for it.
 
If the ORDER BY is not specified SQL Server returns:
a) If the clustered key is defined - in the clustered key order

Not so. It depends on the execution plan. For example if the query is
covered by a non-clustered index then the rows may be in the non-clustered
index order. There are no gurarantees unless you use ORDER BY.
 
Here's a repro of a "simple" query that doesn't use the clustered index
order. It's good practice not to use SELECT * in production code but it
doesn't make any difference to the result.

CREATE TABLE Something (x INTEGER PRIMARY KEY CLUSTERED, y VARCHAR(10) NOT
NULL UNIQUE NONCLUSTERED)

INSERT INTO Something (x,y) VALUES (1,'Tom')
INSERT INTO Something (x,y) VALUES (2,'Dick')
INSERT INTO Something (x,y) VALUES (3,'Harry')

SELECT *
FROM Something

Result:

x y
----------- ----------
2 Dick
3 Harry
1 Tom

(3 row(s) affected)

There are other scenarios too but this is a nice straightforward example of
why you should always use ORDER BY if you require an ordered result.
 
Sergei,

Using identity column as a sorting column is not always good, depending on
design and usage of the db. For example, if the database is replicated and
each subscriber is using a different identity range, then ordering by the
identity column will not be right.
The table can also be reseeded where the increment is >1, so the identity
values generated are not always in sequence.

Raj Moloye


If the ORDER BY is not specified SQL Server returns:
a) If the clustered key is defined - in the clustered key order
b) If clustered key is not defined - you cannot know the correct order. For
small tables it is 99% the insert order, but for big tables it is not. To be
sure you should, as David said, to add a field that will contain the needed
order. I suggest to use identity column for it.
 
I thought I knew something in SQL Server :-\

Than I have another question: how to set the correct order in the update
statement?
Sometimes I use update queries, that are order-dependent, like

declare @i int
set @i = 0
update Something set @i=@i+1, x=@i

I used clustered index for such purposes (but there were no other indexes,
so it worked correctly)
 
Hi VJ,

The only thing that comes up in my mind is to add a column to your database
with the insertion date and order it by that.

Cor
 
You can't reliably control the order in which the UPDATE happens. Your
UPDATE statement is really an (unreliable) method of producing a ranking. It
generates a sequential number which you presumably want to be based on the
order of some other column(s) within the table. It's really better not to
put the ranking into the table itself because if you do so you could have to
update the entire column X again if just one row changes that affects the
ranked order.

A better option is to create the ranking when you query the table. For this
example I'll assume that you want X to follow the order of column Y.

SELECT COUNT(*) AS x, S1.y
FROM Something AS S1
JOIN Something AS S2
ON S1.y >= S2.y
GROUP BY S1.y

If you have to do the UPDATE you can use this method:

UPDATE Something
SET x =
(SELECT COUNT(*)
FROM Something AS S
WHERE y <= Something.y)

Both these methods assume that the sequence is defined by a unqiue column or
columns. If the sequence isn't unique then you can add the primary key
column(s) into the query or update to ensure that you have a well-defined
set of values on which to base the ranking.
 
CREATE TABLE Something (x INTEGER PRIMARY KEY CLUSTERED, y VARCHAR(10) NOT
NULL UNIQUE NONCLUSTERED)

INSERT INTO Something (x,y) VALUES (1,'Tom')
INSERT INTO Something (x,y) VALUES (2,'Dick')
INSERT INTO Something (x,y) VALUES (3,'Harry')

SELECT *
FROM Something

Result:

x y
----------- ----------
2 Dick
3 Harry
1 Tom

(3 row(s) affected)

There are other scenarios too but this is a nice straightforward example of
why you should always use ORDER BY if you require an ordered result.

understood; what I *don't* understand is why the result is (apparently) so
consistently 2,3,1 ... why not 1,3,2 or 3,1,2 in some cases ??
 
Because the execution plan uses the non-clustered index to retrieve the data
therefore the result is sorted by the indexed column (Y). If the data and
the schema doesn't change then the execution plan won't change either so
you'll most likely see the same result repeatedly.
 
Look at the execution plan, it shows that records are read using second
index (UNIQUE KEY on the y column). But I don't understand why SQL Server
chooses it instead of clustered index.
 
The responses are at the bottom of the page.... scroll down past all of the
garbage after the original question.

Sorry, I didn't notice that. The "Sign up to see solution" link threw me.

VJ
 
A nonclustered index also contains the clustered index key therefore all of
the data required for the query (both columns) is contained in the
nonclustered index. The query is said to be "covered" by the index.

Typically, a nonclustered, covering index is a better choice for an index
scan than a clustered index because it is usually smaller (the clustered
leaf nodes include *all* the columns of a table, not just the indexed ones)
and therefore requires less I/O. That's why the optimizer prefers to choose
a nonclustered index scan over a clustered index scan, although in my
example there's no advantage either way.
 
What I also should say is that this is another very good reason why you
shouldn't use SELECT * in production code. Listing only the required columns
gives a better chance of taking advantage of a covering index.
 
Back
Top