Product articles
SQL Prompt
Query Performance
SQL Prompt Code Analysis: INSERT INTO…

SQL Prompt Code Analysis: INSERT INTO a permanent table with ORDER BY (PE020)

Data in relational tables is never guaranteed to be returned in a particular order, so adding an ORDER BY to an INSERT INTO statement is meaningless and, pre-SQL Server 2012, could cause performance issues. If you really need to impose a particular order on rows in the table, use a Row_Number() window clause instead.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

The SQL query that is used to produce the result that is inserted into the permanent table has its order specified by an ORDER BY statement. Relational tables are not ordered, so the ORDER BY is meaningless. Use a Row_Number() window clause instead, if you need to impose a particular order on rows in the table, via a query, especially if you need to allow subsequent insertions into the table. For temporary tables or table variables that will have no subsequent insertions, the current order can be maintained via an IDENTITY field in the target table, but it is an anachronism now that there are better and more versatile ways of doing it.

SQL Prompt will raise a violation of a performance rule (PE020) when it detects use of INSERT INTO…ORDER BY, when inserting into a permanent table.

SQL Prompt detects code smell PE020 - a query that inserts into a permanent table using an ORDER BY.

It does so to catch out a mistake that is easy to make, i.e. leaving in an ORDER BY clause that is no longer needed. Before SQL Server 2012, these ORDER BY clauses could be quite costly in terms of elapsed time and resources, but now they are generally ignored by the optimizer.

Meaningless ORDER BYs

Imagine that we have the query in Listing 1.

Listing 1

The results are as follows:

OK. We like this, and the order it gives, so we attempt to insert the data into our work table, preserving the same order.

Listing 2

Now we select from the table using:

Listing 3

Of course, the contents of a table have no natural order, and so the use of ORDER BY in Listing 2 is meaningless. If we had created a clustered index, on the FullName column, to turn OurPeople1 into a table rather than a heap, then Listing 3, then we’d get a different order, the order of the clustered index.

In this example, the ORDER BY clause in Listing 2 has been entirely ignored by SQL Server. The only way to guarantee to return data in a specific order, from our dbo.OurPeople1 table is to specify that order. If you want it ordered by Person.LastName, followed by Person.FirstName then first include the sort terms (LastName and FirstName) as columns in the work table and then do…

Listing 4

You could add the PRIMARY KEY field of the original data, i.e. add the BusinessEntity_ID of the Person.Person table, do a join and then order by those missing fields, as demonstrated in Listing 5.

Listing 5

Both these solutions preserve the order, whatever subsequent changes you make to the data.

The IDENTITY Trick

However, there is sometimes a reason why you can’t reference the original data by which you did the ordering, and you find that you need to specify an order. It is possible to specify the order with a surrogate field of incrementing numbers, but you’ll experience the downside to this solution if you add or amend the data so as to affect the sort order.

There was a time, before the introduction of the Row_Number()window function, that the only way that you could provide the order of the data in a work table was via the ‘IDENTITY trick’. You insert into a temporary table, to increment the IDENTITY field in the order provided by the ORDER BY clause.

Listing 6

The results are:

That’s a lot better, but why put the IDENTITY field in a table variable? Why not just create it in the destination table? The problem is in doing insertions. The IDENTITY field is immutable, so without this intermediate stage, you wouldn’t be able to change the order, or to easily do any subsequent insertions other than to the beginning or end of your order.

Using the Row_Number() window function

Nowadays, we don’t need any of that extra work: we don’t use an ORDER BY clause at all, and we can subsequently change the order if we need to. We just use the Row_Number()window function.

Listing 7

Summary

Of all the code I’ve shown here, only Listing 2 triggers the PE020 warning. It only happens if you insert into a permanent table using a SELECT statement with an ORDER BY clause. It is simply advising you of an unnecessary ORDER BY clause. If you are deliberately seeking to preserve a certain order, on insertion, then use the Row_Number() window function in that SELECT query, within the INSERT INTO statement, instead of using the ORDER BY statement.

Tools in this post

SQL Prompt

Write, format, and refactor SQL effortlessly

Find out more