8 June 2018

1 Comment

8 June 2018

1 Comment

Finding code smells using SQL Prompt: Asterisk in SELECT list

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Using SELECT * FROM within IF EXISTS statements are fine, but in other contexts it causes several problems. For example, INSERT INTOSELECT * FROM statement assumes the columns are in a particular order, and that order may not last. Also, selecting all the columns in a table when you don’t need them all can cause unnecessary network load and query performance problems. The use of SELECT * in derived tables within a query is now less harmful to performance than it was because the optimizer can now generally restrict the columns to what is necessary, but it is still wrong, especially in the outer query. Results should always consist of just the columns you need.

SQL Prompt implements a ‘best practice’ code analysis rule (BP005), which will prompt you to replace the asterisk(*) with an explicit column, in SELECT statements.

Using SELECT * for development work

Without SELECT *, development work in SQL Server would be more painful, because there is no quicker way of finding out about the data and metadata that is being returned from a table source, meaning anything that can go into a FROM clause. This will include base tables, views, TABLESAMPLE, derived tables, joined tables and table functions.

In short, SELECT * is still indispensable for eye-balling the data. Sure, sp_help will give you information about tables and other objects, and you have SSMSs browser and, hopefully, SQL Prompt, but nothing beats seeing that SSMS data grid when you’re investigating the result of a query, table-valued function or procedure.

It used to be a popular way of getting a list of column names from tables, views or TVFs too. Nowadays, if all you need is a column list for a table or view (but not a TVF), you can just drag the Columns subfolder of the table from SSMS object explorer into the query pane.

Alternatively, within the context of the target database, in this case Adventureworks2016, you could investigate tables using:

The advantage of SELECT * is that you can copy and paste the column names of any result, regardless of it source. However, nowadays it’s better to use sys.dm_exec_describe_first_result_set for a statement or sp_describe_first_result_set for a batch. If, for example, we want to see what columns are there in the Person.Address table in AdventureWorks, we could use:

Reasons to avoid SELECT * in production code

Once you’re beyond the investigation phase, SELECT * should be replaced by an explicit column list, otherwise it can cause problems:

  • Difficult to index well: It is almost impossible to provide a covering index for SELECT * queries, and even if you were to try, it would be far harder than spelling out the columns you want explicitly.
  • Can make SELECT…INTO Crash: SELECT * when used with joined tables can, and probably will, have column names that are duplicated. You can’t use SELECT…INTO when you have a result that has duplicate column names. Changing a name in one table could create a duplicate column in a SELECT * somewhere else, and you are left scratching your head and wondering why a working routine suddenly crashes.
  • Degrades performance: It cuts across the general principle that, for performance reasons, you reduce the result as soon as possible when you are filtering and projecting data. This means that you cut down both the columns and rows to what is needed first, before you do the fancy stuff such as aggregation.
  • Not resilient to refactoring. You aren’t ensuring that the result has the same columns in the same order. Upstream modifications to the table source can change the order or number of columns. If you’re transferring that data using a SELECT * then the best result would be an error, for the consequences of data being assigned the wrong destination column can be dreadful
  • Intelligibility: It makes it more difficult to understand your code and therefore to maintain or support it. By spelling out the columns, you make your code more intelligible.

I’ll demonstrate just how dangerous SELECT * can be, if you use it in production code and then need to do some database refactoring. Here, we will make a mistake in copying sensitive information. It is frighteningly easy to do, and could cause financial irregularities without any errors being triggered. If you are of a nervous disposition please look away now.

Here are the ‘before’ and ‘after’ results….

As you can see, we have ‘unintentionally’ corrupted the data by switching the ‘to’ and ‘from’ columns, purely because we used SELECT * assuming the columns were in a particular order.

SELECT * in the application

Sometimes you’ll see long-running queries that request all columns and that originate from an application, often one that uses LINQ. Usually, it isn’t deliberate, but the developer has made a mistake, leaving out the specification of the columns, and an innocent-looking LINQ query translates into a SELECT *, or a column list that includes every column. If the WHERE clause is too general, or is even left out altogether, then this compounds the consequences, because the network is always the slowest component, and all that unnecessary data is being heaved across the network.

For example, using Adventureworks and LinqPad, you can do this in LINQ:

… which LINQ translates to this query that is what is actually executed. You’ll see that it selects all the columns…

Likewise, this expression…

…will deliver every column from every row in the entire table.

By contrast, this…

…translates to the more sensible:

Conclusions

A general code smell is to ask for more data than you need. It is almost always better, and quicker, to allow the source of the data to do the filtering for you. The use of SELECT *, which is perfectly legitimate in some circumstances, is often a sign of this more general problem. It is tempting for a developer who is competent in C# or VB, but not SQL, to download a whole row, or even an entire table, and do the filtering in more familiar territory. The extra network load and latency should, by itself, be sufficient deterrent for such a practice, but this is often mistaken for a ‘slow database’. A long column list, often listing all columns, is almost as pernicious as a SELECT *, though SELECT * has the extra risk when there is any refactoring.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

Related posts

Also in Hub

What is SQL Change Automation?

It is easy to deliver a small SQL Server test database without extra software tools, but once you edge into the real world you will soon need a consistent and repeatable way of automating as much as p...

Also in Product learning

Troubleshooting Blocking in SQL Server using SQL Monitor

A call comes in to the DBA from Help Desk. There is an urgent problem with a reporting application; unhappy users say that their screens are "frozen", and not because of an overactive A/C vent. Some h...

Also in SQL Prompt

SQL Prompt Hidden Gems: The SSMS Results Pane

Within SSMS, when executing a query, we can view the query results in a grid, or tabular, format (Ctrl-D), as plain text (Ctrl-T), or we can write them to a plain-text reporting (.rpt) file using Ctrl...

Also about SQL code smells

Avoid use of the MONEY and SMALLMONEY datatypes

The MONEY data type confuses the storage of data values with their display, though its name clearly suggests the sort of data it holds. It is proprietary to SQL Server and allows you to specify moneta...

Also about static code analysis

Finding code smells using SQL Prompt: TOP without ORDER BY in a SELECT statement

Using TOP in a SELECT statement without a subsequent ORDER BY clause is legal in SQL Server, but meaningless because asking for the TOP 10 rows implies that the data is guaranteed to be in a certain o...

  • Tom

    Judging by recent articles, this is not the same “Phil Factor” whose articles I have long enjoyed. I confess that I have not read this particular article – the “smells” headline turned me off before I even got started. If, in fact, this is not the same actual author, I would suggest using a different byline just for “truth in advertising” sake.

    • Phil Factor

      Tom,
      Thank you. I’m very pleased that you have ‘long enjoyed’ my articles. I hastily looked in the mirror when I read this, but I can confirm that it is indeed still me. The ‘SQL Code Smells’ byline is to indicate that this is intended to amplify the code smells in my booklet ‘SQL Code Smells’, published in Simple Talk. I would like to explain in more detail why certain patterns in SQL tend to raise eyebrows amongst experienced developers. This seems to be the best place to do it because SQL Prompt is using what I write to explain in its documentation why it underlines certain code in SSMS. I’m sorry if it seems different to what I’ve written in the past, but it is still me, being edited by the same person who has been patient enough to do so in the past.

      Phil.

      • Tom

        Terrific! I will continue to look forward to new articles and do appreciate the body of work you have produced to this point. Your articles are real bright points in the tech lit. Thank you!