Finding code smells using SQL Prompt: Asterisk in SELECT list (BP005)

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Using SELECT * FROM within IF EXISTS statements are fine, but in other contexts it causes several problems. For example, INSERT INTOSELECT * FROM statement assumes the columns are in a particular order, and that order may not last. Also, selecting all the columns in a table when you don’t need them all can cause unnecessary network load and query performance problems. The use of SELECT * in derived tables within a query is now less harmful to performance than it was because the optimizer can now generally restrict the columns to what is necessary, but it is still wrong, especially in the outer query. Results should always consist of just the columns you need.

SQL Prompt implements a ‘best practice’ code analysis rule (BP005), which will prompt you to replace the asterisk(*) with an explicit column, in SELECT statements.

Using SELECT * for development work

Without SELECT *, development work in SQL Server would be more painful, because there is no quicker way of finding out about the data and metadata that is being returned from a table source, meaning anything that can go into a FROM clause. This will include base tables, views, TABLESAMPLE, derived tables, joined tables and table functions.

In short, SELECT * is still indispensable for eye-balling the data. Sure, sp_help will give you information about tables and other objects, and you have SSMSs browser and, hopefully, SQL Prompt, but nothing beats seeing that SSMS data grid when you’re investigating the result of a query, table-valued function or procedure.

It used to be a popular way of getting a list of column names from tables, views or TVFs too. Nowadays, if all you need is a column list for a table or view (but not a TVF), you can just drag the Columns subfolder of the table from SSMS object explorer into the query pane.

Alternatively, within the context of the target database, in this case Adventureworks2016, you could investigate tables using:

The advantage of SELECT * is that you can copy and paste the column names of any result, regardless of it source. However, nowadays it’s better to use sys.dm_exec_describe_first_result_set for a statement or sp_describe_first_result_set for a batch. If, for example, we want to see what columns are there in the Person.Address table in AdventureWorks, we could use:

Reasons to avoid SELECT * in production code

Once you’re beyond the investigation phase, SELECT * should be replaced by an explicit column list, otherwise it can cause problems:

  • Difficult to index well: It is almost impossible to provide a covering index for SELECT * queries, and even if you were to try, it would be far harder than spelling out the columns you want explicitly.
  • Can make SELECT…INTO Crash: SELECT * when used with joined tables can, and probably will, have column names that are duplicated. You can’t use SELECT…INTO when you have a result that has duplicate column names. Changing a name in one table could create a duplicate column in a SELECT * somewhere else, and you are left scratching your head and wondering why a working routine suddenly crashes.
  • Degrades performance: It cuts across the general principle that, for performance reasons, you reduce the result as soon as possible when you are filtering and projecting data. This means that you cut down both the columns and rows to what is needed first, before you do the fancy stuff such as aggregation.
  • Not resilient to refactoring. You aren’t ensuring that the result has the same columns in the same order. Upstream modifications to the table source can change the order or number of columns. If you’re transferring that data using a SELECT * then the best result would be an error, for the consequences of data being assigned the wrong destination column can be dreadful
  • Intelligibility: It makes it more difficult to understand your code and therefore to maintain or support it. By spelling out the columns, you make your code more intelligible.

I’ll demonstrate just how dangerous SELECT * can be, if you use it in production code and then need to do some database refactoring. Here, we will make a mistake in copying sensitive information. It is frighteningly easy to do, and could cause financial irregularities without any errors being triggered. If you are of a nervous disposition please look away now.

Here are the ‘before’ and ‘after’ results….

As you can see, we have ‘unintentionally’ corrupted the data by switching the ‘to’ and ‘from’ columns, purely because we used SELECT * assuming the columns were in a particular order.

SELECT * in the application

Sometimes you’ll see long-running queries that request all columns and that originate from an application, often one that uses LINQ. Usually, it isn’t deliberate, but the developer has made a mistake, leaving out the specification of the columns, and an innocent-looking LINQ query translates into a SELECT *, or a column list that includes every column. If the WHERE clause is too general, or is even left out altogether, then this compounds the consequences, because the network is always the slowest component, and all that unnecessary data is being heaved across the network.

For example, using Adventureworks and LinqPad, you can do this in LINQ:

… which LINQ translates to this query that is what is actually executed. You’ll see that it selects all the columns…

Likewise, this expression…

…will deliver every column from every row in the entire table.

By contrast, this…

…translates to the more sensible:

Conclusions

A general code smell is to ask for more data than you need. It is almost always better, and quicker, to allow the source of the data to do the filtering for you. The use of SELECT *, which is perfectly legitimate in some circumstances, is often a sign of this more general problem. It is tempting for a developer who is competent in C# or VB, but not SQL, to download a whole row, or even an entire table, and do the filtering in more familiar territory. The extra network load and latency should, by itself, be sufficient deterrent for such a practice, but this is often mistaken for a ‘slow database’. A long column list, often listing all columns, is almost as pernicious as a SELECT *, though SELECT * has the extra risk when there is any refactoring.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

You may also like

  • Article

    Avoid T-SQL Technical Debt using SQL Prompt Code Analysis

    SQL Prompt’s Code Analysis feature helps you discover code issues and hidden pitfalls during code development, as you type. It also provides tips for improving your code, and includes links to documentation that offers advice on the about each of the supported rules, so you can decide on the best course of action. This is

  • Article

    When to use the SELECT…INTO statement (PE003)

    We can use SELECT…INTO in SQL Server to create a new table from a table source. SQL Server uses the attributes of the expressions in the SELECT list to define the structure of the new table. Prior to SQL Server 2005, use of SELECT…INTO in production code was a performance ‘code smell’ because it acquired

  • Article

    How to use the SQL Prompt snippet placeholders for selecting and copying text

    There are four SQL Prompt snippet placeholders that are all about selection and copying of text: $PASTE$ Inserts the contents of the clipboard at that position. $SELECTEDTEXT$ Inserts the selected text. $SELECTIONSTART$ Indicates where you want the start of the new selection of code after you have executed the snippet $SELECTIONEND$ Specifies the end of

  • Article

    3 Results Grid features SQL Prompt brings to SSMS

    You may not have heard about it yet, but three helpful features were recently added to SQL Prompt that are activated when you right-click on the Results Grid in SQL Server Management Studio (SSMS). 1. Copy as IN clause This feature is available when you have values from a single column selected. You can select

  • University

    Take the SQL Prompt course

    This Redgate University course takes you from installation all the way up to getting the most out of the advanced operations in SQL Prompt. As well as autocompleting your code, you’ll learn how SQL Prompt can help you with code formatting, object renaming, code analysis, and other useful tip and tricks.

  • Forums

    SQL Prompt Forum

    Write, format, and refactor SQL effortlessly

  • Tom

    Judging by recent articles, this is not the same “Phil Factor” whose articles I have long enjoyed. I confess that I have not read this particular article – the “smells” headline turned me off before I even got started. If, in fact, this is not the same actual author, I would suggest using a different byline just for “truth in advertising” sake.

    • Phil Factor

      Tom,
      Thank you. I’m very pleased that you have ‘long enjoyed’ my articles. I hastily looked in the mirror when I read this, but I can confirm that it is indeed still me. The ‘SQL Code Smells’ byline is to indicate that this is intended to amplify the code smells in my booklet ‘SQL Code Smells’, published in Simple Talk. I would like to explain in more detail why certain patterns in SQL tend to raise eyebrows amongst experienced developers. This seems to be the best place to do it because SQL Prompt is using what I write to explain in its documentation why it underlines certain code in SSMS. I’m sorry if it seems different to what I’ve written in the past, but it is still me, being edited by the same person who has been patient enough to do so in the past.

      Phil.

      • Tom

        Terrific! I will continue to look forward to new articles and do appreciate the body of work you have produced to this point. Your articles are real bright points in the tech lit. Thank you!