13 August 2018

1 Comment

13 August 2018

1 Comment

SQL Prompt Code Analysis: INSERT INTO a permanent table with ORDER BY (PE020)

Data in relational tables is never guaranteed to be returned in a particular order, so adding an ORDER BY to an INSERT INTO statement is meaningless and, pre-SQL Server 2012, could cause performance issues. If you really need to impose a particular order on rows in the table, use a Row_Number() window clause instead.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

The SQL query that is used to produce the result that is inserted into the permanent table has its order specified by an ORDER BY statement. Relational tables are not ordered, so the ORDER BY is meaningless. Use a Row_Number() window clause instead, if you need to impose a particular order on rows in the table, via a query, especially if you need to allow subsequent insertions into the table. For temporary tables or table variables that will have no subsequent insertions, the current order can be maintained via an IDENTITY field in the target table, but it is an anachronism now that there are better and more versatile ways of doing it.

SQL Prompt will raise a violation of a performance rule (PE020) when it detects use of INSERT INTO…ORDER BY, when inserting into a permanent table.

SQL Prompt detects code smell PE020 - a query that inserts into a permanent table using an ORDER BY.

It does so to catch out a mistake that is easy to make, i.e. leaving in an ORDER BY clause that is no longer needed. Before SQL Server 2012, these ORDER BY clauses could be quite costly in terms of elapsed time and resources, but now they are generally ignored by the optimizer.

Meaningless ORDER BYs

Imagine that we have the query in Listing 1.

Listing 1

The results are as follows:

OK. We like this, and the order it gives, so we attempt to insert the data into our work table, preserving the same order.

Listing 2

Now we select from the table using:

Listing 3

Of course, the contents of a table have no natural order, and so the use of ORDER BY in Listing 2 is meaningless. If we had created a clustered index, on the FullName column, to turn OurPeople1 into a table rather than a heap, then Listing 3, then we’d get a different order, the order of the clustered index.

In this example, the ORDER BY clause in Listing 2 has been entirely ignored by SQL Server. The only way to guarantee to return data in a specific order, from our dbo.OurPeople1 table is to specify that order. If you want it ordered by Person.LastName, followed by Person.FirstName then first include the sort terms (LastName and FirstName) as columns in the work table and then do…

Listing 4

You could add the PRIMARY KEY field of the original data, i.e. add the BusinessEntity_ID of the Person.Person table, do a join and then order by those missing fields, as demonstrated in Listing 5.

Listing 5

Both these solutions preserve the order, whatever subsequent changes you make to the data.

The IDENTITY Trick

However, there is sometimes a reason why you can’t reference the original data by which you did the ordering, and you find that you need to specify an order. It is possible to specify the order with a surrogate field of incrementing numbers, but you’ll experience the downside to this solution if you add or amend the data so as to affect the sort order.

There was a time, before the introduction of the Row_Number()window function, that the only way that you could provide the order of the data in a work table was via the ‘IDENTITY trick’. You insert into a temporary table, to increment the IDENTITY field in the order provided by the ORDER BY clause.

Listing 6

The results are:

That’s a lot better, but why put the IDENTITY field in a table variable? Why not just create it in the destination table? The problem is in doing insertions. The IDENTITY field is immutable, so without this intermediate stage, you wouldn’t be able to change the order, or to easily do any subsequent insertions other than to the beginning or end of your order.

Using the Row_Number() window function

Nowadays, we don’t need any of that extra work: we don’t use an ORDER BY clause at all, and we can subsequently change the order if we need to. We just use the Row_Number()window function.

Listing 7

Summary

Of all the code I’ve shown here, only Listing 2 triggers the PE020 warning. It only happens if you insert into a permanent table using a SELECT statement with an ORDER BY clause. It is simply advising you of an unnecessary ORDER BY clause. If you are deliberately seeking to preserve a certain order, on insertion, then use the Row_Number() window function in that SELECT query, within the INSERT INTO statement, instead of using the ORDER BY statement.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

You may also like

  • Article

    Cleaning up common T-SQL coding issues with SQL Prompt

    Some SQL coding habits are just annoying. Commas in front of column names? No way! Others are actively harmful; they’ll make your code error prone, harder for others to read and understand, and even harder to edit without making mistakes. This article covers five harmful problems that I see regularly in T-SQL code, and shows

  • Article

    How to format a set of SQL scripts using SQL Prompt

    Any database developer or DBA who spends much of their working week staring at SQL code quickly becomes set in their ways. They like to see the code laid out in a very particular way. They will struggle to look at, let alone digest and understand, code formatted in a ‘foreign’ style. It’s also rare

  • Article

    How to implement a T-SQL code formatting standard using SQL Prompt

    To what extent should a team attempt to impose a uniform SQL coding format? It can sometimes seem like a futile task. All programmers have a preferred formatting style, to which they profess an unwavering attachment. Check the code of ten programmers, though, and you will likely discover at least 15 different formats, since few

  • Article

    The risks of using EXECUTE ('SQL Script')

    SQL Prompt’s code analysis rule, BP013, will alert you to use of Execute(string) to execute a batch in a string, often assembled dynamically from user input. This technique is dangerous because the parameter values are injected before the statement is parsed by SQL Server, allowing an attacker to "tag on" extra statements. Use sp_ExecuteSql instead, and validate the string inputs.

  • University

    Take the SQL Prompt course

    This Redgate University course takes you from installation all the way up to getting the most out of the advanced operations in SQL Prompt. As well as autocompleting your code, you’ll learn how SQL Prompt can help you with code formatting, object renaming, code analysis, and other useful tip and tricks.

  • Forums

    SQL Prompt Forum

    Write, format, and refactor SQL effortlessly

  • Chad Baldwin

    Ran into this today….But it’s on a permanent table with an IDENTITY column that is the primary key (clustered).

    It’s a table that tracks usage, I am randomly generating some sample data, so I want the records to be inserted in a certain order. So I simply used an ORDER BY and it works great.

    So is what I’m doing correct? Or is this frowned upon? Cause it doesn’t seem to go along with the examples above, but I’m still getting the PE020 error….