SQL Prompt Code Analysis: INSERT INTO a permanent table with ORDER BY (PE020)
Data in relational tables is never guaranteed to be returned in a particular order, so adding an ORDER BY to an INSERT INTO statement is meaningless and, pre-SQL Server 2012, could cause performance issues. If you really need to impose a particular order on rows in the table, use a Row_Number() window clause instead.
The SQL query that is used to produce the result that is inserted into the permanent table has its order specified by an ORDER
BY
statement. Relational tables are not ordered, so the ORDER
BY
is meaningless. Use a Row_Number()
window clause instead, if you need to impose a particular order on rows in the table, via a query, especially if you need to allow subsequent insertions into the table. For temporary tables or table variables that will have no subsequent insertions, the current order can be maintained via an IDENTITY
field in the target table, but it is an anachronism now that there are better and more versatile ways of doing it.
SQL Prompt will raise a violation of a performance rule (PE020) when it detects use of INSERT INTO…ORDER
BY
, when inserting into a permanent table.
It does so to catch out a mistake that is easy to make, i.e. leaving in an ORDER
BY
clause that is no longer needed. Before SQL Server 2012, these ORDER
BY
clauses could be quite costly in terms of elapsed time and resources, but now they are generally ignored by the optimizer.
Meaningless ORDER BYs
Imagine that we have the query in Listing 1.
1 2 3 4 5 |
SELECT Coalesce(Person.Title + ' ', '') + Person.FirstName + Coalesce(' ' + Person.MiddleName + ' ', ' ') + Person.LastName + Coalesce(' ' + Person.Suffix, '') AS FullName FROM AdventureWorks2016.Person.Person ORDER BY Person.LastName, Person.FirstName; |
Listing 1
The results are as follows:
OK. We like this, and the order it gives, so we attempt to insert the data into our work table, preserving the same order.
1 2 3 4 5 6 7 8 |
IF Object_Id('dbo.OurPeople1', 'U') IS NOT NULL DROP TABLE dbo.OurPeople1; CREATE TABLE OurPeople1 (FullName VARCHAR(40) NOT NULL); INSERT INTO OurPeople1 (FullName) SELECT Coalesce(Person.Title + ' ', '') + Person.FirstName + Coalesce(' ' + Person.MiddleName + ' ', ' ') + Person.LastName + Coalesce(' ' + Person.Suffix, '') AS FullName FROM AdventureWorks2016.Person.Person ORDER BY Person.LastName, Person.FirstName; |
Listing 2
Now we select from the table using:
1 |
SELECT OurPeople1.FullName FROM dbo.OurPeople1; |
Listing 3
Of course, the contents of a table have no natural order, and so the use of ORDER
BY
in Listing 2 is meaningless. If we had created a clustered index, on the FullName
column, to turn OurPeople1
into a table rather than a heap, then Listing 3, then we’d get a different order, the order of the clustered index.
In this example, the ORDER
BY
clause in Listing 2 has been entirely ignored by SQL Server. The only way to guarantee to return data in a specific order, from our dbo.OurPeople1
table is to specify that order. If you want it ordered by Person.LastName
, followed by Person.FirstName
then first include the sort terms (LastName
and FirstName
) as columns in the work table and then do…
1 2 |
SELECT OurPeople1.FullName FROM dbo.OurPeople1 ORDER BY OurPeople1.LastName, OurPeople1.FirstName; |
Listing 4
You could add the PRIMARY
KEY
field of the original data, i.e. add the BusinessEntity_ID
of the Person.Person
table, do a join and then order by those missing fields, as demonstrated in Listing 5.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
IF Object_Id('dbo.OurPeople4', 'U') IS NOT NULL DROP TABLE dbo.OurPeople4; CREATE TABLE dbo.OurPeople4 (BusinessEntityid INT NOT NULL, FullName VARCHAR(40) NOT NULL); INSERT INTO dbo.OurPeople4 (BusinessEntityid, FullName) SELECT Person.BusinessEntityID, Coalesce(Person.Title + ' ', '') + Person.FirstName + Coalesce(' ' + Person.MiddleName + ' ', ' ') + Person.LastName + Coalesce(' ' + Person.Suffix, '') FROM AdventureWorks2016.Person.Person; /* and when you want an ordered list you do this ... */ SELECT OurPeople4.FullName FROM dbo.OurPeople4 INNER JOIN AdventureWorks2016.Person.Person AS po ON OurPeople4.BusinessEntityid = po.BusinessEntityID ORDER BY po.LastName, po.FirstName; |
Listing 5
Both these solutions preserve the order, whatever subsequent changes you make to the data.
The IDENTITY Trick
However, there is sometimes a reason why you can’t reference the original data by which you did the ordering, and you find that you need to specify an order. It is possible to specify the order with a surrogate field of incrementing numbers, but you’ll experience the downside to this solution if you add or amend the data so as to affect the sort order.
There was a time, before the introduction of the Row_Number()
window function, that the only way that you could provide the order of the data in a work table was via the ‘IDENTITY
trick’. You insert into a temporary table, to increment the IDENTITY
field in the order provided by the ORDER
BY
clause.
1 2 3 4 5 6 7 8 9 10 11 12 |
IF Object_Id('dbo.OurPeople2', 'U') IS NOT NULL DROP TABLE dbo.OurPeople2; CREATE TABLE dbo.OurPeople2 (FullName VARCHAR(40) NOT NULL, TheOrder INT NOT null); DECLARE @People TABLE (TheIdentityField INT IDENTITY NOT NULL, FullName VARCHAR(40) NOT NULL, TheOrder AS TheIdentityField) INSERT INTO @People (FullName) SELECT Coalesce(Person.Title + ' ', '') + Person.FirstName + Coalesce(' ' + Person.MiddleName + ' ', ' ') + Person.LastName + Coalesce(' ' + Person.Suffix, '') AS FullName FROM AdventureWorks2016.Person.Person ORDER BY Person.LastName, Person.FirstName; INSERT INTO dbo.OurPeople2 (FullName, TheOrder) SELECT fullName, Theorder FROM @People SELECT OurPeople2.FullName FROM dbo.OurPeople2 ORDER BY TheOrder |
Listing 6
The results are:
That’s a lot better, but why put the IDENTITY
field in a table variable? Why not just create it in the destination table? The problem is in doing insertions. The IDENTITY
field is immutable, so without this intermediate stage, you wouldn’t be able to change the order, or to easily do any subsequent insertions other than to the beginning or end of your order.
Using the Row_Number() window function
Nowadays, we don’t need any of that extra work: we don’t use an ORDER
BY
clause at all, and we can subsequently change the order if we need to. We just use the Row_Number()
window function.
1 2 3 4 5 6 7 8 9 |
IF Object_Id('dbo.OurPeople3', 'U') IS NOT NULL DROP TABLE dbo.OurPeople3; CREATE TABLE dbo.OurPeople3 (FullName VARCHAR(40) NOT NULL, TheOrder INT NOT NULL); INSERT INTO dbo.OurPeople3 (FullName, TheOrder) SELECT Coalesce(Person.Title + ' ', '') + Person.FirstName + Coalesce(' ' + Person.MiddleName + ' ', ' ') + Person.LastName + Coalesce(' ' + Person.Suffix, ''), Row_Number() OVER (ORDER BY Person.LastName, Person.FirstName) FROM AdventureWorks2016.Person.Person; SELECT OurPeople3.FullName FROM dbo.OurPeople3 ORDER BY OurPeople3.TheOrder; |
Listing 7
Summary
Of all the code I’ve shown here, only Listing 2 triggers the PE020 warning. It only happens if you insert into a permanent table using a SELECT
statement with an ORDER
BY
clause. It is simply advising you of an unnecessary ORDER
BY
clause. If you are deliberately seeking to preserve a certain order, on insertion, then use the Row_Number()
window function in that SELECT
query, within the INSERT
INTO
statement, instead of using the ORDER
BY
statement.