13 August 2018

1 Comment

13 August 2018

1 Comment

SQL Prompt Code Analysis: INSERT INTO a permanent table with ORDER BY (PE020)

Data in relational tables is never guaranteed to be returned in a particular order, so adding an ORDER BY to an INSERT INTO statement is meaningless and, pre-SQL Server 2012, could cause performance issues. If you really need to impose a particular order on rows in the table, use a Row_Number() window clause instead.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

The SQL query that is used to produce the result that is inserted into the permanent table has its order specified by an ORDER BY statement. Relational tables are not ordered, so the ORDER BY is meaningless. Use a Row_Number() window clause instead, if you need to impose a particular order on rows in the table, via a query, especially if you need to allow subsequent insertions into the table. For temporary tables or table variables that will have no subsequent insertions, the current order can be maintained via an IDENTITY field in the target table, but it is an anachronism now that there are better and more versatile ways of doing it.

SQL Prompt will raise a violation of a performance rule (PE020) when it detects use of INSERT INTO…ORDER BY, when inserting into a permanent table.

SQL Prompt detects code smell PE020 - a query that inserts into a permanent table using an ORDER BY.

It does so to catch out a mistake that is easy to make, i.e. leaving in an ORDER BY clause that is no longer needed. Before SQL Server 2012, these ORDER BY clauses could be quite costly in terms of elapsed time and resources, but now they are generally ignored by the optimizer.

Meaningless ORDER BYs

Imagine that we have the query in Listing 1.

Listing 1

The results are as follows:

OK. We like this, and the order it gives, so we attempt to insert the data into our work table, preserving the same order.

Listing 2

Now we select from the table using:

Listing 3

Of course, the contents of a table have no natural order, and so the use of ORDER BY in Listing 2 is meaningless. If we had created a clustered index, on the FullName column, to turn OurPeople1 into a table rather than a heap, then Listing 3, then we’d get a different order, the order of the clustered index.

In this example, the ORDER BY clause in Listing 2 has been entirely ignored by SQL Server. The only way to guarantee to return data in a specific order, from our dbo.OurPeople1 table is to specify that order. If you want it ordered by Person.LastName, followed by Person.FirstName then first include the sort terms (LastName and FirstName) as columns in the work table and then do…

Listing 4

You could add the PRIMARY KEY field of the original data, i.e. add the BusinessEntity_ID of the Person.Person table, do a join and then order by those missing fields, as demonstrated in Listing 5.

Listing 5

Both these solutions preserve the order, whatever subsequent changes you make to the data.

The IDENTITY Trick

However, there is sometimes a reason why you can’t reference the original data by which you did the ordering, and you find that you need to specify an order. It is possible to specify the order with a surrogate field of incrementing numbers, but you’ll experience the downside to this solution if you add or amend the data so as to affect the sort order.

There was a time, before the introduction of the Row_Number()window function, that the only way that you could provide the order of the data in a work table was via the ‘IDENTITY trick’. You insert into a temporary table, to increment the IDENTITY field in the order provided by the ORDER BY clause.

Listing 6

The results are:

That’s a lot better, but why put the IDENTITY field in a table variable? Why not just create it in the destination table? The problem is in doing insertions. The IDENTITY field is immutable, so without this intermediate stage, you wouldn’t be able to change the order, or to easily do any subsequent insertions other than to the beginning or end of your order.

Using the Row_Number() window function

Nowadays, we don’t need any of that extra work: we don’t use an ORDER BY clause at all, and we can subsequently change the order if we need to. We just use the Row_Number()window function.

Listing 7

Summary

Of all the code I’ve shown here, only Listing 2 triggers the PE020 warning. It only happens if you insert into a permanent table using a SELECT statement with an ORDER BY clause. It is simply advising you of an unnecessary ORDER BY clause. If you are deliberately seeking to preserve a certain order, on insertion, then use the Row_Number() window function in that SELECT query, within the INSERT INTO statement, instead of using the ORDER BY statement.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

You may also like

  • Article

    How to implement a SQL formatting standard using SQL Prompt v8

    Louis Davidson explains why your team needs a standard style for formatting SQL and how SQL Prompt will help make this happen.

  • Article

    When to use the SELECT…INTO statement (PE003)

    SELECT…INTO is a useful shortcut for development work, especially for creating temporary tables. However, it no longer has a clear performance advantage and should be avoided in production code. It is better to use a CREATE TABLE statement, where you can specify constraints and datatypes in advance, making it less likely that inconsistencies will sneak into the data.

  • Article

    Using a Variable-length Datatype Without Explicit Length: The Whys and Wherefores (BP007/8)

    If you declare a variable-length string , or coerce a string, without specifying its length, you can fall foul of ‘silent’ string truncation. Some developers resort to using the (MAX) specification, which is a mistake too. Phil Factor explains the dangers and then offers a workaround for the problem, when you're importing text and simply don't know the correct length of each string.

  • Article

    How to use the SQL Prompt snippet placeholders for selecting and copying text

    Phil Factor demonstrates the power of placeholders in SQL Prompt Snippets, for selecting and copying text. They make it very easy to build queries by dragging-and-dropping from the SSMS object explorer, and also to save queries as reusable views or inline table valued functions.

  • University

    Take the SQL Prompt course

    This Redgate University course takes you from installation all the way up to getting the most out of the advanced operations in SQL Prompt. As well as autocompleting your code, you’ll learn how SQL Prompt can help you with code formatting, object renaming, code analysis, and other useful tip and tricks.

  • Forums

    SQL Prompt Forum

    Write, format, and refactor SQL effortlessly

  • Chad Baldwin

    Ran into this today….But it’s on a permanent table with an IDENTITY column that is the primary key (clustered).

    It’s a table that tracks usage, I am randomly generating some sample data, so I want the records to be inserted in a certain order. So I simply used an ORDER BY and it works great.

    So is what I’m doing correct? Or is this frowned upon? Cause it doesn’t seem to go along with the examples above, but I’m still getting the PE020 error….