17 April 2019

3 Comments

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

17 April 2019

3 Comments

The Sins of SELECT * (BP005)

If Prompt warns you of use of the asterisk, or 'star' (*), in SELECT statements, consider replacing it with an explicit column list. It will prevent unnecessary network load and query performance problems, and avoid problems if the column order changes, when inserting into a table.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

This is an update of an article first published in June 2018.

The first thing I ought to say about SELECT * is that we all use it. I use it every day for development work but hopefully it seldom escapes into production unless there is a good reason for it. Outside of development, SELECT * has only a few valid uses within a batch. Using SELECT * FROM within IF EXISTS statements is fine, as are SELECT COUNT(*), SELECT * FROM…FOR XML or SELECT * FROM…FOR JSON.

Elsewhere, a query using an asterisk as a column specifier will work fine for a while but can introduce vulnerabilities and performance problems. If you assume the column order is ‘set in stone’, by transferring data using INSERT INTOSELECT * FROM statements, then when that order changes your best hope is that it is causes errors, but at worst you’ll see the dire consequences of assigning the wrong data to a column. Also, queries that select all the columns in a table, even when the application doesn’t need them all, can cause unnecessary network load and query performance problems. It also makes queries very hard to index, leading to inefficient execution plans.

Finally, SELECT * will also make your code less readable, and harder to maintain. All of this explains why SQL Prompt implements a ‘best practice’ code analysis rule (BP005), which will prompt you to replace the asterisk(*) with an explicit list of columns, in SELECT statements.

Using SELECT * for development work

SELECT * is ideal for ad-hoc work, seeing exactly what is in a result at the point of starting to refine a query, or for debugging, but it is a development device that was intended for interactive use. Without SELECT *, development work in SQL Server would be more painful, because there is no quicker way of finding out about the data and metadata that is being returned from a table source, meaning anything that can go into a FROM clause. This will include base tables, views, TABLESAMPLE, derived tables, joined tables and table functions.

In short, SELECT * is still indispensable for eye-balling the data. Sure, sp_help will give you information about tables and other objects, and you have SSMSs browser and, hopefully, SQL Prompt, but nothing beats seeing that SSMS data grid when you’re investigating the result of a query, table-valued function or procedure.

It is still a popular way of getting a list of column names from tables, views or TVFs. However, if all you need is a column list for a table or view (but not a TVF), you can just drag the Columns subfolder of the table from SSMS object explorer into the query pane. Alternatively, within the context of the target database, in this case Adventureworks2016, you could investigate tables using:

SELECT col.name
  FROM sys.objects obj -- from all the objects (system and database)
      INNER JOIN sys.columns col --to get all the columns
          ON col.object_id = obj.object_id
  WHERE obj.object_id = OBJECT_ID('[Person].[Address]')
  ORDER BY col.column_id;

The advantage of SELECT * is that you can copy and paste the column names of any result, regardless of its source. However, nowadays it’s better to use sys.dm_exec_describe_first_result_set for a statement or sp_describe_first_result_set for a batch. If, for example, we want to see what columns are there in the Person.Address table in AdventureWorks, we could use:

SELECT f.name
  FROM sys.dm_exec_describe_first_result_set
   (N'SELECT * FROM adventureworks2016.Person.Address;', NULL, 0) AS f; 
         -- (@tsql, @Params, @include_browse_information)

Why is SELECT * bad in production code?

Once you’re beyond the investigation phase, SELECT * should be replaced by an explicit column list, otherwise it can cause problems:

  • Indigestion: There is a cost for every column of data you request, within both the database and the application. When you want a glass of water, you don’t turn on every tap in the house. When, in an application, you specify all the columns rather than just the columns you need, you will initially know, and maybe accept, the extent of the wastage. If someone subsequently extends the width of the table with more columns you get a whole lot more columns that aren’t needed, and your application will slow down with the increased memory management task.
  • Torpidity. A query that retrieves information will probably use an index, especially if your query uses a filter. SQL Server’s query optimizer will want to get the data from the covering index, if possible, rather than having to rummage around in the clustered index. If you use SELECT *, then the chances are high that no index won’t be covering. Even if you were wacky enough to cover the entire table with a non-clustered index, this will be rendered entirely useless if the table gets increased still further in size.
  • Misinterpretation: If you reference the columns by order rather than name, using INSERT INTOSELECT *, then you must hope the column order never changes. If it does, and it is possible to coerce the values into the datatype you are anticipating, then data can end up in the wrong columns without triggering an error.
  • Binding Problems. When you SELECT…INTO with a query that has been sprinkled with the asterisk fairy-dust, you can easily hit the problem of a duplicate name for a column. If you specify the columns, you will know up-front of duplicates and can alias them. If, instead, you pass such a query to an application, it has no easy way of knowing which column is the one with the correct value.
  • Maintainability: Anyone reading your code, and seeing SELECT *, will have to search the metadata to fine the names of the columns being referenced by the query. If you list them, and they are meaningful names, it will be clearer what is going on and the purpose of the query will be more obvious.

Indigestion

In applications, the worst habit I’ve seen is probably that of specifying data that isn’t needed. There is a misapprehension amongst some developers that databases are slow, but every query takes around the same amount of time. By this false logic, it makes sense to grab all you can in the one query and SELECT * means just that: ‘give me all you’ve got’. It is like a sales shopper who has had to queue and grabs more bargains off the shelf than they could possibly need or afford.

Ironically, a lot of the apparent slowness in queries turns out not to be “the database being slow” but the local memory management crumbling under the weight of unnecessary data being pulled across the network and having to be squirreled away in large objects. If you want to load data into an object from a database, it is usually better to use lazy loading, especially if the object is large. Passing data across a network is surprisingly expensive in terms of the time taken, and so it is much better to request just what you immediately need in a query. Several fast queries are faster than a single vast super-query.

Even within SQL Server, though, SELECT * will bite you. It cuts across the general principle that, for performance reasons, you reduce the result as soon as possible when you are filtering and projecting data. This means that you cut down both the columns and rows to what is needed first, before you do the fancy stuff such as aggregation. The use of SELECT * in derived tables within a query is now less harmful to performance than it was because the optimizer can now generally restrict the columns to what is necessary, but it is still wrong, especially in the outer query. Results should always consist of just the columns you need.

To demonstrate the impact, here I create in a timing harness two temporary tables, by using SELECT…INTO with a temporary table, and fill them with two million rows of data from a Directory table. In one I specify just the columns I need, and in the other one I shrug and just type * instead.

--set up
  DECLARE @log TABLE (TheOrder INT IDENTITY(1,1), 
    WhatHappened varchar(200), 
    WhenItDid  Datetime2 DEFAULT GETDATE())
  ----start of timing
  INSERT INTO @log(WhatHappened) SELECT 'Starting to time select into'--place at the start
  --end of setup
  SELECT * INTO #myTempDirectory FROM Directory;
  INSERT INTO @log (WhatHappened) 
    SELECT 'SELECT INTO with wildcard took';
     --copy and paste in anywhere
  SELECT Name, Address1, Address2, Town, City, County, Postcode, Region
    INTO #AnotherTempDirectory
    FROM Directory;
  INSERT INTO @log (WhatHappened) 
    SELECT 'SELECT INTO with fields specified took ';
     --where the routine you want to time ends
  SELECT ending.WhatHappened,
    DateDiff(ms, starting.WhenItDid, ending.WhenItDid) AS ms
    FROM @log AS starting
      INNER JOIN @log AS ending
        ON ending.TheOrder = starting.TheOrder + 1
  UNION ALL
  SELECT 'Total', DateDiff(ms, Min(WhenItDid), Max(WhenItDid)) FROM @log;
  --list out all the timings
  --tear down
  DROP TABLE #myTempDirectory;
  DROP TABLE #AnotherTempDirectory;

The version where we specified the columns took only 20% of the time of the other one, even though we only reduced the amount of data by 30%.

Putting it slightly differently, for this test, using a wildcard SELECT…INTO takes five times longer than a SELECT…INTO that returns only the required columns.

Torpidity

It is almost impossible to provide a covering index for SELECT * queries, and even if you were to try, it would be far harder than if your queries simply spelled out the required columns, explicitly.

If you are just retrieving information, SQL Server will generally use an index. For your frequent, important and expensive queries, you probably want it to have an index available from which it can retrieve all the columns requested by the query. If it doesn’t have one, it may get the available columns from the index and then lookup the other column values in the clustered index, or it may simply resort to scanning the clustered index. If the tables are big and the queries return many rows, you’ll start to notice a performance hit.

Say we wanted to search the Directory table to find all the pubs called the ‘something’ arms, in a given city. We could, do this

SELECT Name, Address1, Address2, Town, City, County, Postcode, Region FROM Directory WHERE name LIKE '% arms%' and city ='Cambridge'

Or we could do this…

SELECT * FROM Directory WHERE name LIKE '% arms%' and city ='Cambridge'

The first query might have a covering index like this:

CREATE NONCLUSTERED INDEX IX_Directory_City
  ON [dbo].[Directory] ([City])
  INCLUDE ([Address1], [Address2], [Town], [County], [Postcode], [Region])
  GO

However, no sensible indexing strategy can help much if the queries are SELECT *. The second version would only be covered by an index that included every column!

Using the same test harness as before, you’ll see that even though most of the work was searching for the word ‘arms’ within the name of all the businesses in Cambridge, it was faster to specify the columns. In the first case, the index is a covering index for this query, whereas with the SELECT * version, it is not. The execution plan still uses it, since the query returns only 19 rows in this case, but SQL Server must perform additional key lookups for each row, to return the column values that aren’t included in this index.

With more general queries on this database, the queries specifying the columns ran, on average, in two thirds of the time of those that used the same index but required additional key lookups.

Misinterpretation

With SELECT *, you aren’t ensuring that your code always returns the the same columns in the same order, which means it isn’t resilient to database refactoring. Upstream modifications to the table source can change the order or number of columns. If you’re transferring that data using an INSERT INTO…SELECT * then the best result would be an error, for the consequences of data being assigned the wrong destination column can be dreadful

I’ll demonstrate just how dangerous this can be, if you use it in production code and then need to do some database refactoring. Here, we will make a mistake in copying sensitive information. It is frighteningly easy to do and could cause financial irregularities without any errors being triggered. If you are of a nervous disposition, please look away now.

/* we create a table just for our testing */
  CREATE TABLE dbo.ExchangeRates --lets pretend we have this data
    (
    CurrencyRateDate DATETIME NOT NULL,
    AverageRate MONEY NOT NULL,
    EndOfDayRate MONEY NOT NULL,
    FromCurrency NVARCHAR(50) NOT NULL,
    FromRegion NVARCHAR(50) NOT NULL,
    ToCurrency NVARCHAR(50) NOT NULL,
    ToRegion NVARCHAR(50) NOT NULL
    );
  /* we now steal data for it from AdventureWorks next-door */
  INSERT INTO dbo.ExchangeRates
  SELECT CurrencyRate.CurrencyRateDate, CurrencyRate.AverageRate,
      CurrencyRate.EndOfDayRate, Currency.Name AS FromCurrency,
      CountryRegion.Name AS FromRegion, CurrencyTo.Name AS ToCurrency,
      CountryRegionTo.Name AS ToRegion
      FROM Adventureworks2016.Sales.CurrencyRate
        INNER JOIN Adventureworks2016.Sales.Currency
          ON CurrencyRate.FromCurrencyCode = Currency.CurrencyCode
        INNER JOIN Adventureworks2016.Sales.CountryRegionCurrency
          ON Currency.CurrencyCode = CountryRegionCurrency.CurrencyCode
        INNER JOIN Adventureworks2016.Person.CountryRegion
          ON CountryRegionCurrency.CountryRegionCode = CountryRegion.CountryRegionCode
        INNER JOIN Adventureworks2016.Sales.Currency AS CurrencyTo
          ON CurrencyRate.ToCurrencyCode = CurrencyTo.CurrencyCode
        INNER JOIN Adventureworks2016.Sales.CountryRegionCurrency AS CountryRegionCurrencyTo
          ON CurrencyTo.CurrencyCode = CountryRegionCurrencyTo.CurrencyCode
        INNER JOIN Adventureworks2016.Person.CountryRegion AS CountryRegionTo
          ON CountryRegionCurrencyTo.CountryRegionCode = CountryRegionTo.CountryRegionCode;
  GO
  /* so we start our test by creating a view to show exchange rates from equador  */
  CREATE VIEW dbo.EquadorExhangeRates
  AS
  SELECT ExchangeRates.CurrencyRateDate, ExchangeRates.AverageRate,
         ExchangeRates.EndOfDayRate, ExchangeRates.FromCurrency,
         ExchangeRates.FromRegion, ExchangeRates.ToCurrency, ExchangeRates.ToRegion
    FROM dbo.ExchangeRates
    WHERE ExchangeRates.FromRegion = 'Ecuador';
  go
  /* now we just fill a table variable with the first ten rows from the view and display them */
  DECLARE  @MyUsefulExchangeRates TABLE
    (
    CurrencyRateDate DATETIME NOT NULL,
    AverageRate MONEY NOT NULL,
    EndOfDayRate MONEY NOT NULL,
    FromCurrency NVARCHAR(50) NOT NULL,
    FromRegion NVARCHAR(50) NOT NULL,
    ToCurrency NVARCHAR(50) NOT NULL,
    ToRegion NVARCHAR(50) NOT NULL
    );
  INSERT INTO @MyUsefulExchangeRates (
    CurrencyRateDate, AverageRate, EndOfDayRate,
    FromCurrency, FromRegion,ToCurrency, ToRegion)
    SELECT * --this isn't good at all
      FROM dbo.EquadorExhangeRates;
  --disply the first ten rows from the table to see what we have
  SELECT TOP 10 UER.CurrencyRateDate, UER.AverageRate, UER.EndOfDayRate,
    UER.ToCurrency, UER.ToRegion, UER.FromCurrency, UER.FromRegion
    FROM @MyUsefulExchangeRates AS UER
    ORDER BY UER.CurrencyRateDate DESC;
  GO
  /* end of first part. Now someone decides to alter the view */
  alter VIEW dbo.EquadorExhangeRates
  AS
  SELECT ExchangeRates.CurrencyRateDate, ExchangeRates.AverageRate,
    ExchangeRates.EndOfDayRate, ExchangeRates.ToCurrency, ExchangeRates.ToRegion, ExchangeRates.FromCurrency,
    ExchangeRates.FromRegion
    FROM dbo.ExchangeRates
    WHERE ExchangeRates.FromRegion = 'Ecuador';
  GO
  /* we repeat the routine to extract the first ten rows exactly as before */
  DECLARE  @MyUsefulExchangeRates TABLE
    (
    CurrencyRateDate DATETIME NOT NULL,
    AverageRate MONEY NOT NULL,
    EndOfDayRate MONEY NOT NULL,
    FromCurrency NVARCHAR(50) NOT NULL,
    FromRegion NVARCHAR(50) NOT NULL,
    ToCurrency NVARCHAR(50) NOT NULL,
    ToRegion NVARCHAR(50) NOT NULL
    );
  INSERT INTO @MyUsefulExchangeRates(
    CurrencyRateDate, AverageRate, EndOfDayRate,
    FromCurrency, FromRegion,ToCurrency, ToRegion)
    SELECT * --bad, bad, bad
      FROM dbo.EquadorExhangeRates;
  --check that the data is the same. It isn't is it? No sir!
  SELECT TOP 10 UER.CurrencyRateDate, UER.AverageRate, UER.EndOfDayRate,
    UER.ToCurrency, UER.ToRegion, UER.FromCurrency, UER.FromRegion
    FROM @MyUsefulExchangeRates AS UER
    ORDER BY UER.CurrencyRateDate DESC;
  GO
  /* now just tidy up and tear down */
  DROP VIEW dbo.EquadorExhangeRates
  DROP TABLE  dbo.ExchangeRates

Here are the ‘before’ and ‘after’ results….

As you can see, we have ‘unintentionally’ corrupted the data by switching the ‘to’ and ‘from’ columns. Quoting the column list is extra bulk in your code. However, it performs as quickly, or even quicker than if you just airily specified all the columns with an asterisk, assuming that they remain in a particular order.

Binding Problems

When we use SELECT * with lots of joined tables we can, and probably will, have column names that are duplicated. Here is a simple query from AdventureWorks:

SELECT *
    FROM HumanResources.Employee AS e
      INNER JOIN Person.Person AS p
        ON p.BusinessEntityID = e.BusinessEntityID
      INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh
        ON e.BusinessEntityID = edh.BusinessEntityID
      INNER JOIN HumanResources.Department AS d
        ON edh.DepartmentID = d.DepartmentID
    WHERE (edh.EndDate IS NULL);

And this code will reveal the column names that are duplicated:

DECLARE @SourceCode NVARCHAR(4000)=' 
  SELECT *
    FROM HumanResources.Employee AS e
      INNER JOIN Person.Person AS p
        ON p.BusinessEntityID = e.BusinessEntityID
      INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh
        ON e.BusinessEntityID = edh.BusinessEntityID
      INNER JOIN HumanResources.Department AS d
        ON edh.DepartmentID = d.DepartmentID
    WHERE (edh.EndDate IS NULL);
  --'
  SELECT Count(*) AS Duplicates, name
    FROM sys.dm_exec_describe_first_result_set(@SourceCode, NULL, 1)
    GROUP BY name
    HAVING Count(*) > 1
    ORDER BY Count(*) DESC;

This will cause problems to an application trying to make sense of such a result when selecting a named column. If you try to create a temporary table from the result, using SELECT…INTO, it goes bang.

SELECT * INTO MyTempTable
    FROM HumanResources.Employee AS e
      INNER JOIN Person.Person AS p
        ON p.BusinessEntityID = e.BusinessEntityID
      INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh
        ON e.BusinessEntityID = edh.BusinessEntityID
      INNER JOIN HumanResources.Department AS d
        ON edh.DepartmentID = d.DepartmentID
    WHERE (edh.EndDate IS NULL);
  Msg 2705, Level 16, State 3, Line 19
  Column names in each table must be unique. Column name 'BusinessEntityID' in table 'MyTempTable' is specified more than once.

Again, this means that your SELECT * code is fragile. If someone changes a name in one table, it could create a duplicate column in a SELECT * INTO, somewhere else, and you are left scratching your head and wondering why a working routine suddenly crashes

There is one place where SELECT * has a special meaning and can’t be replaced. This is when converting a result into JSON and you want a result that has the joined tables embedded as objects.

SELECT * 
    FROM HumanResources.Employee AS employee
      INNER JOIN Person.Person AS person
        ON person.BusinessEntityID = employee.BusinessEntityID
      INNER JOIN HumanResources.EmployeeDepartmentHistory AS history
        ON employee.BusinessEntityID = history.BusinessEntityID
      INNER JOIN HumanResources.Department AS d
        ON  history.DepartmentID = d.DepartmentID
    WHERE ( history.EndDate IS NULL) FOR JSON AUTO

This will give you… (I’m showing only the first document in the array)

[{"BusinessEntityID": 1,"NationalIDNumber": "295847284","LoginID": "adventure-works\\ken0","JobTitle": "Chief Executive Officer","BirthDate": "1969-01-29","MaritalStatus": "S","Gender": "M","HireDate": "2009-01-14","SalariedFlag": true, "VacationHours": 99, "SickLeaveHours": 69, "CurrentFlag": true, "rowguid": "F01251E5-96A3-448D-981E-0F99D789110D","ModifiedDate": "2014-06-30T00:00:00",
        "person": [{
            "BusinessEntityID": 1, "PersonType": "EM","NameStyle": false, "FirstName": "Ken","MiddleName": "J","LastName": "Sánchez","EmailPromotion": 0, "Demographics": "0<\/TotalPurchaseYTD><\/IndividualSurvey>","rowguid": "92C4279F-1207-48A3-8448-4636514EB7E2","ModifiedDate": "2009-01-07T00:00:00",
            "history": [{
                "BusinessEntityID": 1, "DepartmentID": 16, "ShiftID": 1, "StartDate": "2009-01-14","ModifiedDate": "2009-01-13T00:00:00",
                "d": [{
                    "DepartmentID": 16, "Name": "Executive","GroupName": "Executive General and Administration","ModifiedDate": "2008-04-30T00:00:00"
                }]
            }]
        }]
    }}

There is no clash here because the ModifiedDate column, for example, is encapsulated inside an object representing the source table

The equivalent FOR XML gives this:

<employee BusinessEntityID="1" NationalIDNumber="295847284" LoginID="adventure-works\ken0" 
          JobTitle="Chief Executive Officer" BirthDate="1969-01-29" MaritalStatus="S" Gender="M" HireDate="2009-01-14" SalariedFlag="1" VacationHours="99"
          SickLeaveHours="69" CurrentFlag="1" rowguid="F01251E5-96A3-448D-981E-0F99D789110D" ModifiedDate="2014-06-30T00:00:00">
 <person BusinessEntityID="1" PersonType="EM" NameStyle="0" FirstName="Ken" MiddleName="J" LastName="Sánchez" EmailPromotion="0" 
         rowguid="92C4279F-1207-48A3-8448-4636514EB7E2" ModifiedDate="2009-01-07T00:00:00">
     <Demographics>
         <IndividualSurvey
             xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey">
             <TotalPurchaseYTD>0</TotalPurchaseYTD>
         </IndividualSurvey>
     </Demographics>
     <history BusinessEntityID="1" DepartmentID="16" ShiftID="1" StartDate="2009-01-14" ModifiedDate="2009-01-13T00:00:00">
         <d DepartmentID="16" Name="Executive" GroupName="Executive General and Administration" ModifiedDate="2008-04-30T00:00:00"/>
     </history>
  </person>
</employee>

Maintainability

When laying out code, the columns you specify not only avoid mistakes in assigning values to the correct columns or variables, but they also make the code more readable. Wherever you can, it is worth spelling out the name of the columns involved just for the sake of you-in-the-future, or the poor soul who is one day tasked with maintaining your code. Sure, the code looks a bit bulkier but if a fairy appeared at your shoulder and said that your code would be more intelligible and reliable if you typed it out twice, you’d do that wouldn’t you?

SELECT * in the application

Sometimes you’ll see long-running queries that request all columns and that originate from an application, often one that uses LINQ. Usually, it isn’t deliberate, but the developer has made a mistake, leaving out the specification of the columns, and an innocent-looking LINQ query translates into a SELECT *, or a column list that includes every column. If the WHERE clause is too general, or is even left out altogether, then this compounds the consequences, because the network is always the slowest component, and all that unnecessary data is being heaved across the network.

For example, using Adventureworks and LinqPad, you can do this in LINQ:

Persons.OrderBy (p => p.BusinessEntityID).Take (100)

… which LINQ translates to this query that is what is actually executed. You’ll see that it selects all the columns…

SELECT TOP (100) [t0].[BusinessEntityID], [t0].[PersonType], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[EmailPromotion], [t0].[AdditionalContactInfo], [t0].[Demographics], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate]
  FROM [Person].[Person] AS [t0]
  ORDER BY [t0].[BusinessEntityID]

Likewise, this expression…

from row in Persons select row

…will deliver every column from every row in the entire table.

SELECT [t0].[BusinessEntityID], [t0].[PersonType], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[EmailPromotion], [t0].[AdditionalContactInfo], [t0].[Demographics], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate]
  FROM [Person].[Person] AS [t0]

By contrast, this…

from row in Persons.Where(i => i.LastName == "Bradley") select row.FirstName+" "+row.LastName

…translates to the more sensible:

-- Region Parameters
  DECLARE @p0 NVarChar(1000) = 'Bradley'
  DECLARE @p1 NVarChar(1000) = ' '
  -- EndRegion
  SELECT ([t0].[FirstName] + @p1) + [t0].[LastName] AS [value]
  FROM [Person].[Person] AS [t0]
  WHERE [t0].[LastName] = @p0

Conclusions

A general code smell is to ask for more data than you need. It is almost always better, and quicker, to allow the source of the data to do the filtering for you. The use of SELECT *, which is perfectly legitimate in some circumstances, is often a sign of this more general problem. It is tempting for a developer who is competent in C# or VB, but not SQL, to download a whole row, or even an entire table, and do the filtering in more familiar territory. The extra network load and latency should, by itself, be enough to deter such a practice, but this is often mistaken for a ‘slow database’. A long column list, often listing all columns, is almost as pernicious as a SELECT *, though SELECT * has the extra risk when there is any refactoring.

Share this post.

  • Tom

    Judging by recent articles, this is not the same “Phil Factor” whose articles I have long enjoyed. I confess that I have not read this particular article – the “smells” headline turned me off before I even got started. If, in fact, this is not the same actual author, I would suggest using a different byline just for “truth in advertising” sake.

    • Phil Factor

      Tom,
      Thank you. I’m very pleased that you have ‘long enjoyed’ my articles. I hastily looked in the mirror when I read this, but I can confirm that it is indeed still me. The ‘SQL Code Smells’ byline is to indicate that this is intended to amplify the code smells in my booklet ‘SQL Code Smells’, published in Simple Talk. I would like to explain in more detail why certain patterns in SQL tend to raise eyebrows amongst experienced developers. This seems to be the best place to do it because SQL Prompt is using what I write to explain in its documentation why it underlines certain code in SSMS. I’m sorry if it seems different to what I’ve written in the past, but it is still me, being edited by the same person who has been patient enough to do so in the past.

      Phil.

      • Tom

        Terrific! I will continue to look forward to new articles and do appreciate the body of work you have produced to this point. Your articles are real bright points in the tech lit. Thank you!

You may also like