13 August 2018
13 August 2018

SQL Prompt Code Analysis: INSERT INTO a permanent table with ORDER BY (PE020)

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

The SQL query that is used to produce the result that is inserted into the permanent table has its order specified by an ORDER BY statement. Relational tables are not ordered, so the ORDER BY is meaningless. Use a Row_Number() window clause instead, if you need to impose a particular order on rows in the table, via a query, especially if you need to allow subsequent insertions into the table. For temporary tables or table variables that will have no subsequent insertions, the current order can be maintained via an IDENTITY field in the target table, but it is an anachronism now that there are better and more versatile ways of doing it.

SQL Prompt will raise a violation of a performance rule (PE020) when it detects use of INSERT INTO…ORDER BY, when inserting into a permanent table.

SQL Prompt detects code smell PE020 - a query that inserts into a permanent table using an ORDER BY.

It does so to catch out a mistake that is easy to make, i.e. leaving in an ORDER BY clause that is no longer needed. Before SQL Server 2012, these ORDER BY clauses could be quite costly in terms of elapsed time and resources, but now they are generally ignored by the optimizer.

Meaningless ORDER BYs

Imagine that we have the query in Listing 1.

Listing 1

The results are as follows:

OK. We like this, and the order it gives, so we attempt to insert the data into our work table, preserving the same order.

Listing 2

Now we select from the table using:

Listing 3

Of course, the contents of a table have no natural order, and so the use of ORDER BY in Listing 2 is meaningless. If we had created a clustered index, on the FullName column, to turn OurPeople1 into a table rather than a heap, then Listing 3, then we’d get a different order, the order of the clustered index.

In this example, the ORDER BY clause in Listing 2 has been entirely ignored by SQL Server. The only way to guarantee to return data in a specific order, from our dbo.OurPeople1 table is to specify that order. If you want it ordered by Person.LastName, followed by Person.FirstName then first include the sort terms (LastName and FirstName) as columns in the work table and then do…

Listing 4

You could add the PRIMARY KEY field of the original data, i.e. add the BusinessEntity_ID of the Person.Person table, do a join and then order by those missing fields, as demonstrated in Listing 5.

Listing 5

Both these solutions preserve the order, whatever subsequent changes you make to the data.

The IDENTITY Trick

However, there is sometimes a reason why you can’t reference the original data by which you did the ordering, and you find that you need to specify an order. It is possible to specify the order with a surrogate field of incrementing numbers, but you’ll experience the downside to this solution if you add or amend the data so as to affect the sort order.

There was a time, before the introduction of the Row_Number()window function, that the only way that you could provide the order of the data in a work table was via the ‘IDENTITY trick’. You insert into a temporary table, to increment the IDENTITY field in the order provided by the ORDER BY clause.

Listing 6

The results are:

That’s a lot better, but why put the IDENTITY field in a table variable? Why not just create it in the destination table? The problem is in doing insertions. The IDENTITY field is immutable, so without this intermediate stage, you wouldn’t be able to change the order, or to easily do any subsequent insertions other than to the beginning or end of your order.

Using the Row_Number() window function

Nowadays, we don’t need any of that extra work: we don’t use an ORDER BY clause at all, and we can subsequently change the order if we need to. We just use the Row_Number()window function.

Listing 7

Summary

Of all the code I’ve shown here, only Listing 2 triggers the PE020 warning. It only happens if you insert into a permanent table using a SELECT statement with an ORDER BY clause. It is simply advising you of an unnecessary ORDER BY clause. If you are deliberately seeking to preserve a certain order, on insertion, then use the Row_Number() window function in that SELECT query, within the INSERT INTO statement, instead of using the ORDER BY statement.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

You may also like

  • Article

    Quick SQL Prompt tip – using the ii snippet

    SQL Prompt includes a number of snippets by default that can help you quickly write T-SQL code. These are templates of code that users use regularly. One of the more popular snippets is the “ii” snippet that helps with inserting data into a table. If I type “ii” and hit tab, I quickly get the

  • Article

    Finding code smells using SQL Prompt: Asterisk in SELECT list

    Using SELECT * FROM within IF EXISTS statements are fine, but in other contexts it causes several problems. For example, INSERT INTO…SELECT * FROM statement assumes the columns are in a particular order, and that order may not last. Also, selecting all the columns in a table when you don’t need them all can cause

  • Article

    Using multiple custom SQL formatting styles in SQL Prompt

    Usually, when coding SQL, you will want to go along with the coding style of the team you are working with. In my previous articles on SQL Prompt, I explained why a team of programmers might want to adopt a T-SQL coding standard, how SQL Prompt could make this easier, and then how to use

  • Article

    Cleaning up common T-SQL coding issues with SQL Prompt

    Some SQL coding habits are just annoying. Commas in front of column names? No way! Others are actively harmful; they’ll make your code error prone, harder for others to read and understand, and even harder to edit without making mistakes. This article covers five harmful problems that I see regularly in T-SQL code, and shows

  • Article

    SQL Prompt Code Analysis: Avoid using the ISNUMERIC Function (E1029)

    Avoid using the IsNumeric() function, because it can often lead to data type conversion errors, when importing data. SQL Prompt Code Analysis rules include an Execution rule, E1029, which will alert you to use of this function, in your T-SQL. If you’re working on SQL Server 2012 or later, it’s much better to use the Try_Convert()or

  • Forums

    SQL Prompt Forum

    Write, format, and refactor SQL effortlessly