Finding code smells using SQL Prompt: Asterisk in SELECT list (BP005)

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Using SELECT * FROM within IF EXISTS statements are fine, but in other contexts it causes several problems. For example, INSERT INTOSELECT * FROM statement assumes the columns are in a particular order, and that order may not last. Also, selecting all the columns in a table when you don’t need them all can cause unnecessary network load and query performance problems. The use of SELECT * in derived tables within a query is now less harmful to performance than it was because the optimizer can now generally restrict the columns to what is necessary, but it is still wrong, especially in the outer query. Results should always consist of just the columns you need.

SQL Prompt implements a ‘best practice’ code analysis rule (BP005), which will prompt you to replace the asterisk(*) with an explicit column, in SELECT statements.

Using SELECT * for development work

Without SELECT *, development work in SQL Server would be more painful, because there is no quicker way of finding out about the data and metadata that is being returned from a table source, meaning anything that can go into a FROM clause. This will include base tables, views, TABLESAMPLE, derived tables, joined tables and table functions.

In short, SELECT * is still indispensable for eye-balling the data. Sure, sp_help will give you information about tables and other objects, and you have SSMSs browser and, hopefully, SQL Prompt, but nothing beats seeing that SSMS data grid when you’re investigating the result of a query, table-valued function or procedure.

It used to be a popular way of getting a list of column names from tables, views or TVFs too. Nowadays, if all you need is a column list for a table or view (but not a TVF), you can just drag the Columns subfolder of the table from SSMS object explorer into the query pane.

Alternatively, within the context of the target database, in this case Adventureworks2016, you could investigate tables using:

The advantage of SELECT * is that you can copy and paste the column names of any result, regardless of it source. However, nowadays it’s better to use sys.dm_exec_describe_first_result_set for a statement or sp_describe_first_result_set for a batch. If, for example, we want to see what columns are there in the Person.Address table in AdventureWorks, we could use:

Reasons to avoid SELECT * in production code

Once you’re beyond the investigation phase, SELECT * should be replaced by an explicit column list, otherwise it can cause problems:

  • Difficult to index well: It is almost impossible to provide a covering index for SELECT * queries, and even if you were to try, it would be far harder than spelling out the columns you want explicitly.
  • Can make SELECT…INTO Crash: SELECT * when used with joined tables can, and probably will, have column names that are duplicated. You can’t use SELECT…INTO when you have a result that has duplicate column names. Changing a name in one table could create a duplicate column in a SELECT * somewhere else, and you are left scratching your head and wondering why a working routine suddenly crashes.
  • Degrades performance: It cuts across the general principle that, for performance reasons, you reduce the result as soon as possible when you are filtering and projecting data. This means that you cut down both the columns and rows to what is needed first, before you do the fancy stuff such as aggregation.
  • Not resilient to refactoring. You aren’t ensuring that the result has the same columns in the same order. Upstream modifications to the table source can change the order or number of columns. If you’re transferring that data using a SELECT * then the best result would be an error, for the consequences of data being assigned the wrong destination column can be dreadful
  • Intelligibility: It makes it more difficult to understand your code and therefore to maintain or support it. By spelling out the columns, you make your code more intelligible.

I’ll demonstrate just how dangerous SELECT * can be, if you use it in production code and then need to do some database refactoring. Here, we will make a mistake in copying sensitive information. It is frighteningly easy to do, and could cause financial irregularities without any errors being triggered. If you are of a nervous disposition please look away now.

Here are the ‘before’ and ‘after’ results….

As you can see, we have ‘unintentionally’ corrupted the data by switching the ‘to’ and ‘from’ columns, purely because we used SELECT * assuming the columns were in a particular order.

SELECT * in the application

Sometimes you’ll see long-running queries that request all columns and that originate from an application, often one that uses LINQ. Usually, it isn’t deliberate, but the developer has made a mistake, leaving out the specification of the columns, and an innocent-looking LINQ query translates into a SELECT *, or a column list that includes every column. If the WHERE clause is too general, or is even left out altogether, then this compounds the consequences, because the network is always the slowest component, and all that unnecessary data is being heaved across the network.

For example, using Adventureworks and LinqPad, you can do this in LINQ:

… which LINQ translates to this query that is what is actually executed. You’ll see that it selects all the columns…

Likewise, this expression…

…will deliver every column from every row in the entire table.

By contrast, this…

…translates to the more sensible:

Conclusions

A general code smell is to ask for more data than you need. It is almost always better, and quicker, to allow the source of the data to do the filtering for you. The use of SELECT *, which is perfectly legitimate in some circumstances, is often a sign of this more general problem. It is tempting for a developer who is competent in C# or VB, but not SQL, to download a whole row, or even an entire table, and do the filtering in more familiar territory. The extra network load and latency should, by itself, be sufficient deterrent for such a practice, but this is often mistaken for a ‘slow database’. A long column list, often listing all columns, is almost as pernicious as a SELECT *, though SELECT * has the extra risk when there is any refactoring.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

You may also like

  • Article

    Quick SQL Prompt tip – using the CDB snippet

    SQL Prompt includes a number of snippets by default that can help you quickly write T-SQL code. These are templates of code that users use regularly. One of the more popular snippets is the ‘cdb’ snippet that helps with quickly creating a new database for development work. This can also be used to ensure production

  • Article

    SQL Prompt code analysis: avoid non-standard column aliases (ST002 and DEP021)

    There was a time when column aliases in SQL Server and Sybase were only declared by an assignment expression, and some people are still fond of the old way of doing it, in much the same way as some of us still like line-dancing, or singing madrigals dressed in tights. Nowadays, the other relational database

  • Article

    Testing the Performance of Individual SQL Statements within a Batch using SQL Prompt

    This article is all about using a SQL Prompt snippet to create and run a test harness on a batch of SQL code, to provide performance data. Behind the scenes, the snippet creates and starts an Extended Events event session, inserts and runs the text of the batch of SQL you’ve selected in SSMS, and

  • Article

    Quickly rename variables all at once with SQL Prompt

    You’re going to run into situations where someone has made poor choices in naming objects, or maybe the business logic has changed and the object name no longer makes sense. Either way, easily and quickly taking care of renaming the objects within scripts can be a pain. Let’s take this code as an example: The

  • University

    Take the SQL Prompt course

    This Redgate University course takes you from installation all the way up to getting the most out of the advanced operations in SQL Prompt. As well as autocompleting your code, you’ll learn how SQL Prompt can help you with code formatting, object renaming, code analysis, and other useful tip and tricks.

  • Forums

    SQL Prompt Forum

    Write, format, and refactor SQL effortlessly

  • Tom

    Judging by recent articles, this is not the same “Phil Factor” whose articles I have long enjoyed. I confess that I have not read this particular article – the “smells” headline turned me off before I even got started. If, in fact, this is not the same actual author, I would suggest using a different byline just for “truth in advertising” sake.

    • Phil Factor

      Tom,
      Thank you. I’m very pleased that you have ‘long enjoyed’ my articles. I hastily looked in the mirror when I read this, but I can confirm that it is indeed still me. The ‘SQL Code Smells’ byline is to indicate that this is intended to amplify the code smells in my booklet ‘SQL Code Smells’, published in Simple Talk. I would like to explain in more detail why certain patterns in SQL tend to raise eyebrows amongst experienced developers. This seems to be the best place to do it because SQL Prompt is using what I write to explain in its documentation why it underlines certain code in SSMS. I’m sorry if it seems different to what I’ve written in the past, but it is still me, being edited by the same person who has been patient enough to do so in the past.

      Phil.

      • Tom

        Terrific! I will continue to look forward to new articles and do appreciate the body of work you have produced to this point. Your articles are real bright points in the tech lit. Thank you!