{"id":650,"date":"2009-08-20T00:00:00","date_gmt":"2009-08-20T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/ten-common-sql-programming-mistakes\/"},"modified":"2021-09-29T16:22:05","modified_gmt":"2021-09-29T16:22:05","slug":"ten-common-sql-programming-mistakes","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/ten-common-sql-programming-mistakes\/","title":{"rendered":"Ten Common SQL Programming Mistakes: NULL, NOT IN, Cursors, Scalar UDFs"},"content":{"rendered":"<div class=\"spaced-bottom padded--tight scheme--lightest-grey\">\n<img decoding=\"async\" class=\"media__image\" style=\"float: left; vertical-align: middle;\" src=\"https:\/\/assets.red-gate.com\/external\/SimpleTalk\/database-devops-cog-92.png\" \/> <\/p>\n<p class=\"padded-top--tight\"><strong>DevOps, Continuous Delivery & Database Lifecycle Management<\/strong><br \/>Continuous Integration<\/p>\n<\/div>\n\r\n<p class=\"START\">Programming in SQL can be both a fun and a challenging task. Those with a background in traditional programming languages such as Java, C, C++, and VB often find it difficult to adapt to the &#8220;set-based&#8221; mindset. Even seasoned SQL developers and DBAs can get trapped in one of the many pitfalls of the SQL language. I have been there and I&#8217;m willing to bet that you have too. Mastering the fundamentals takes time, and even then some mistakes are hard to spot.<\/p>\r\n<p>My intention with this article was to highlight some of the more common mistakes that I&#8217;ve seen people make when coding SQL. The list reflects my direct experience working with different development teams and developers, performing code review sessions, and so on, as well as the issues I see every day on the newsgroups and forums. The list of common mistakes is far from exhaustive, and is presented in no particular order of severity. In fact, the list could be regarded as the result of the following query:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT TOP(10) mistake\r\nFROM CommonSQLProgrammingMistakes\r\nORDER BY CHECKSUM(NEWID());\r\n<\/pre>\r\n<p>Without further ado, here is the list:<\/p>\r\n<ol>\r\n\t<li>NULLs and the NOT IN predicate<\/li>\r\n\t<li>Functions on indexed columns in predicates<\/li>\r\n\t<li>Incorrect subquery column<\/li>\r\n\t<li>Data type mismatch in predicates<\/li>\r\n\t<li>Predicate evaluation order<\/li>\r\n\t<li>Outer joins and placement of predicates<\/li>\r\n\t<li>Subqueries that return more than one value<\/li>\r\n\t<li>Use of SELECT *<\/li>\r\n\t<li>Scalar user-defined functions<\/li>\r\n\t<li>Overuse of cursors<\/li>\r\n<\/ol>\r\n<p>The examples are presented using SQL Server&#8217;s Transact-SQL dialect, but most of the concepts are valid in any SQL implementation.<\/p>\r\n<h1>NULLs and the NOT IN predicate<\/h1>\r\n<p>One of the most common requests is to retrieve data based on some column value not included in a list of values. The following two tables illustrate the scenario. We have tables with colors and products:<\/p>\r\n<p><b>Colors table:<\/b><\/p>\r\n<pre>color\r\n----------\r\nBlack\r\nBlue\r\nGreen\r\nRed\r\nProducts table:\r\nsku\u00a0 product_description\u00a0 color\r\n---- -------------------- ------\r\n1\u00a0\u00a0\u00a0 Ball\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Red\r\n2\u00a0\u00a0\u00a0 Bike\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Blue\r\n3\u00a0\u00a0\u00a0 Tent\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n\u00a0\r\n<\/pre>\r\n<p>Note that these tables do not represent a perfect design, following normalization rules and best practices. Rather, it is a simplified scenario to help illustrate this example better. In reality, the <strong>colors<\/strong> table would most likely contain a color code key column that would be referenced in the <strong>products<\/strong> table.<\/p>\r\n<p>The request is to select a list of colors that have not previously been used on products. In other words, we need to construct a query that returns only those colors for which there is no product with that color. It might seem, at first glance, that the <b>NOT<\/b> <strong>IN<\/strong> predicate provides a very intuitive way to satisfy this request, very close to how the problem would be stated in plain English:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.color\r\nFROM Colors AS C\r\nWHERE C.color NOT IN (SELECT P.color \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0FROM Products AS P);\r\n<\/pre>\r\n<p>You may have been expecting this query to return two rows (for &#8216;black&#8217; and &#8216;green&#8217;) but, in fact, it returns an empty result set:<\/p>\r\n<pre>color\r\n----------\r\n\u00a0\r\n(0 row(s) affected)\r\n<\/pre>\r\n<p>Obviously this is &#8216;incorrect&#8217;. What is the problem? It&#8217;s simply that SQL uses three-valued logic, driven by the existence of <strong>NULL<\/strong>, which is not a value but a marker to indicate missing (or <b>UNKNOWN<\/b>) information. When the <b>NOT<\/b> operator is applied to the list of values from the subquery, in the <strong>IN<\/strong> predicate, it is translated like this:<\/p>\r\n<pre>\"color NOT IN (Red, Blue, NULL)\"\r\n<\/pre>\r\n<p>This is equivalent to:<\/p>\r\n<pre>\"NOT(color=Red OR color=Blue OR color=NULL)\"\r\n<\/pre>\r\n<p>The expression &#8220;<b>color=NULL<\/b>&#8221; evaluates to <b>UNKNOWN<\/b> and, according to the rules of three-valued logic, <b>NOT UNKNOWN<\/b> also evaluates to <b>UNKNOWN<\/b>. As a result, all rows are filtered out and the query returns an empty set.<\/p>\r\n<p>This mistake will often surface if requirements change, and a non-nullable column is altered to allow NULLs. It also highlights the need for thorough testing. Even if, in the initial design, a column disallows NULLs, you should make sure your queries continue to work correctly with NULLs.<\/p>\r\n<p>One solution is to use the <b>EXISTS<\/b> predicate in place of <b>IN<\/b>, since <b>EXISTS<\/b> uses two-valued predicate logic evaluating to TRUE\/FALSE:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.color\r\nFROM Colors AS C\r\nWHERE NOT EXISTS(SELECT * \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Products AS P\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE C.color = P.color);\r\n<\/pre>\r\n<p>This query correctly returns the expected result set:<\/p>\r\n<pre>color\r\n----------\r\nBlack\r\nGreen\r\n<\/pre>\r\n<p>Other possible solutions are as follows:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">\/* IS NOT NULL in the subquery *\/\r\nSELECT C.color\r\nFROM Colors AS C\r\nWHERE C.color NOT IN (SELECT P.color \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Products AS P \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE P.color IS NOT NULL);\r\n\u00a0\r\n\/* EXCEPT *\/\r\nSELECT color\r\nFROM Colors\r\nEXCEPT\r\nSELECT color\r\nFROM Products;\r\n\u00a0\r\n\/* LEFT OUTER JOIN *\/\r\nSELECT C.color\r\nFROM Colors AS C\r\nLEFT OUTER JOIN Products AS P\r\n\u00a0 ON C.color = P.color\r\nWHERE P.color IS NULL;\r\n<\/pre>\r\n<p>While all solutions produce the desired results, using EXCEPT may be the easiest to understand and use. Note that the EXCEPT operator returns distinct values, which works fine in our scenario but may not be correct in another situation.<\/p>\r\n<h1>Functions on indexed columns in predicates<\/h1>\r\n<p>We often tend to write code as a direct translation of given request. For example, if we are asked to retrieve all customers whose name starts with the letter L, it feels very natural to write the query like this, using the <b>LEFT<\/b> function to return the first character of their name:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT customer_name\r\nFROM Customers\r\nWHERE LEFT(customer_name, 1) = 'L';\r\n<\/pre>\r\n<p>Alternatively, if we are asked to calculate the total sales for January 2009, we might write a query like the following, which uses the <b>DATEPART<\/b> function to extract the relevant month and year from the <b>sale_date<\/b> column:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT SUM(sale_amount) AS total_sales\r\nFROM Sales\r\nWHERE DATEPART(YEAR, sale_date) = 2009\r\n\u00a0 AND DATEPART(MONTH, sale_date) = 1;\r\n<\/pre>\r\n<p>While these queries look very intuitive, you will find that the indexes that you (of course!) have on your <b>customer_name<\/b> and <b>sale_date<\/b> columns remain unused, and that the execution plan for these queries reveal index scans.<\/p>\r\n<p>The problem arises from the fact that the index columns are being passed to a function, which the query engine must then evaluate for every single row in the table. In cases such as these, the <b>WHERE<\/b> clause predicate is deemed &#8220;non-SARGable&#8221; and the best that the query optimizer can do is perform a full index or table scan.<\/p>\r\n<p>To make sure the indexes get used, we need to avoid the use of functions on the indexed columns. In our two examples, it is a relatively simple task to rewrite the queries to use SARG-able predicates. The first requested can be expressed with this logically equivalent query:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT customer_name\r\nFROM Customers\r\nWHERE customer_name LIKE 'L%';\r\n<\/pre>\r\n<p>The equivalent for the second query is as follows:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT SUM(sale_amount) AS total_sales\r\nFROM Sales\r\nWHERE sale_date &gt;= '20090101'\r\n\u00a0 AND sale_date &lt;\u00a0 '20090201';\r\n<\/pre>\r\n<p>These two queries are most likely to utilize index seek to retrieve the data quickly and efficiently.<\/p>\r\n<p>It&#8217;s worth noting that SQL Server is getting &#8220;smarter&#8221; as it evolves. For example, consider the following query, which uses the <b>CAST<\/b> function on the indexed <b>sale_date<\/b> column:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT SUM(sale_amount) AS total_sales\r\nFROM Sales\r\nWHERE CAST(sale_date AS DATE) = '20090101';\r\n<\/pre>\r\n<p>If you run this query on SQL 2005 or earlier, you&#8217;ll see an index scan. However, on SQL Server 2008 you&#8217;ll see an index seek, despite the use of the <b>CAST<\/b> function. The execution plan reveals that the predicate is transformed into something like the following:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT SUM(sale_amount) AS total_sales\r\nFROM Sales\r\nWHERE sale_date &gt;= '20090101'\r\n\u00a0 AND sale_date &lt;\u00a0 '20090102';\r\n<\/pre>\r\n<p>However, in general, you should use SARGable predicates where possible, rather than rely on the evolving intelligence of the optimizer.<\/p>\r\n<h1>Incorrect subquery column<\/h1>\r\n<p>When writing a subquery, it is very easy to abstract yourself from the main query logic and concentrate on the subquery itself. This can lead to the innocent mistake of substituting a column from the subquery source table for a column with similar name from the main query.<\/p>\r\n<p>Let&#8217;s look at two very simple tables; one is a <b>Sales<\/b> table containing sales data, and the other is an auxiliary <b>Calendar<\/b> table that has all calendar dates and holidays (abbreviated here):<\/p>\r\n<p><b>Sales<\/b> table:<\/p>\r\n<pre>sale_date\u00a0 sale_amount\r\n---------- -----------\r\n2009-01-01 120.50\r\n2009-01-02 115.00\r\n2009-01-03 140.80\r\n2009-01-04 100.50\r\n<\/pre>\r\n<p><b>Calendar<\/b> table:<\/p>\r\n<pre>calendar_date holiday_name\r\n------------- ----------------\r\n2009-01-01\u00a0\u00a0\u00a0 New Year's Day\r\n2009-01-02\u00a0\u00a0\u00a0 NULL\r\n2009-01-03\u00a0\u00a0\u00a0 NULL\r\n2009-01-04\u00a0\u00a0\u00a0 NULL\r\n2009-01-05\u00a0\u00a0\u00a0 NULL \r\n<\/pre>\r\n<p>Our task is to retrieve sales data for holiday dates only. It seems like a trivial query to write:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT sale_date, sale_amount\r\nFROM Sales AS S\r\nWHERE sale_date IN (SELECT sale_date \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Calendar AS C\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE holiday_name IS NOT NULL);\r\n<\/pre>\r\n<p>However, you&#8217;ll find that query simply returns all rows from the <b>Sales<\/b> table! A closer look at the query reveals that the culprit to be the <b>SELECT<\/b> list of the subquery. It accidentally references the <b>sales_date<\/b> column from the <b>Sales<\/b> table, instead of the <b>calendar_date<\/b> column from the <b>Calendar<\/b> table.<\/p>\r\n<p>If that is the case, why did we not get an error? Although the outcome was not what we expected, this is still a valid SQL statement. When using a subquery, the outer query&#8217;s columns are exposed to the inner query. Here, we unintentionally converted the self-contained subquery, to be executed once and the value passed to the outer query, to a correlated subquery, logically executed once for every row returned by the outer query.<\/p>\r\n<p>As a result, the subquery evaluates to <b>sale_date IN (sale_date)<\/b> which is always true, as long as there is at least one holiday date in the Calendar table, and so our result set returns all rows from the <b>Sales<\/b> table. Of course, the fix is easy in this case; we simply use the correct date column from the <b>Calendar<\/b> table:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT sale_date, sale_amount\r\nFROM Sales AS S\r\nWHERE sale_date IN (SELECT C.calendar_date \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Calendar AS C\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE C.holiday_name IS NOT NULL);\r\n<\/pre>\r\n<p>This illustrates another important point: it is a best practice to prefix columns in subqueries with table aliases. For example, if we had used an alias like this:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT sale_date, sale_amount\r\nFROM Sales AS S\r\nWHERE sale_date IN (SELECT C.sale_date \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0FROM Calendar AS C\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE holiday_name IS NOT NULL);\r\n<\/pre>\r\n<p>Then this query would have resulted in an error &#8211; &#8220;Error: Invalid column name &#8216;sale_date'&#8221;.<\/p>\r\n<h1>Data type mismatch in predicates<\/h1>\r\n<p>This is another typical mistake that is sometimes hard to catch. It is very easy to mismatch data types in predicates. It could be in a stored procedure where the parameter is passed as one data type and then used in a query to filter data on a column of different data type. Another example is joining tables on columns with different data types, or simply using a predicate where data types are mismatched.<\/p>\r\n<p>For example, we may have a <b>Customers<\/b> table where the <b>last_name<\/b> column is of type <strong>VARCHAR<\/strong>:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE TABLE Customers (\r\n\u00a0customer_nbr INT NOT NULL PRIMARY KEY,\r\n\u00a0first_name VARCHAR(35) NOT NULL,\r\n\u00a0last_name VARCHAR(35) NOT NULL);\r\n<\/pre>\r\n<p>Then the following stored procedure is used to retrieve the customer information by customer last name:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE PROCEDURE GetCustomerByLastName\r\n\u00a0@last_name NVARCHAR(35)\r\nAS\r\n\u00a0SELECT first_name, last_name\r\n\u00a0FROM Customers\r\n\u00a0WHERE last_name = @last_name;\r\n<\/pre>\r\n<p>Notice here the parameter <b>@last_name<\/b> is of data type <strong>NVARCHAR<\/strong>. Although the code &#8220;works&#8221;, SQL Server will have to perform implicit conversion of the last name column to <strong>NVARCHAR<\/strong>, because <strong>NVARCHAR<\/strong> is of higher data precedence. This can result in a performance penalty. The implicit conversion is visible in the query plan as <b>CONVERT_IMPLICIT<\/b>. Based on collation, and other factors, a data type mismatch may also preclude the use of an index seek. Use of the correct data type resolves the problem:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE PROCEDURE GetCustomerByLastName\r\n\u00a0@last_name VARCHAR(35)\r\nAS\r\n\u00a0SELECT first_name, last_name\r\n\u00a0FROM Customers\r\n\u00a0WHERE last_name = @last_name;\r\n<\/pre>\r\n<p>In many cases, this mistake is the result of splitting responsibilities on the team, and having one team member design the tables and another implement stored procedures or code. Another reason could be using different data sources to join data where the join columns have different data types in the source systems. The same advice applies not only to character data type mismatches, but also to mismatches between numeric data types (like <b>INT<\/b> and <b>FLOAT<\/b>), or the mixing of numeric and character data types.<\/p>\r\n<h1>Predicate evaluation order<\/h1>\r\n<p>If you are familiar with the logical query processing order, then you may expect that a query is executed in the following order:<\/p>\r\n<ol>\r\n\t<li>FROM<\/li>\r\n\t<li>WHERE<\/li>\r\n\t<li>GROUP BY<\/li>\r\n\t<li>HAVING<\/li>\r\n\t<li>SELECT<\/li>\r\n<\/ol>\r\n<p>The sequence above outlines the logical order for executing query. Logically the <b>FROM<\/b> clause is processed first defining the source data set, next the <b>WHERE <\/b>predicates are applied, followed by <b>GROUP BY<\/b>, and so on.<\/p>\r\n<p>However, physically, the query is processed differently and the query optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data. This leads to a common misunderstanding that a filter in the <b>WHERE<\/b> clause is applied before the next phases are processed. In fact, a predicate can be applied much later in the physical execution plan. Also, there is no left to right order for execution of predicates. For example, if you have a <b>WHERE<\/b> clause containing &#8220;<b>WHERE x=1 AND y=2<\/b>&#8220;, there is no guarantee that &#8220;<b>x=1<\/b>&#8221; will be evaluated first. They can be executed in any order.<\/p>\r\n<p>For example, consider the following <b>Accounts<\/b> table where, in the <b>account_reference<\/b> column, Business accounts are denoted by a numeric reference and Personal accounts by a character reference:<\/p>\r\n<pre>account_nbr account_type\u00a0\u00a0\u00a0 account_reference\r\n----------- --------------- -----------------\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Personal\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 abc\r\n2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Business Basic\u00a0 101\r\n3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Personal\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 def\r\n4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Business Plus\u00a0\u00a0 5\r\n<\/pre>\r\n<p>In general, this table indicates bad design. The <strong>account_reference<\/strong> column should be represented as two different attributes, specific to business and personal accounts and each with the correct data type (not even belonging to the same table). However, in practice, we very often have to deal with systems designed with shortcomings, where altering the design is not an option.<\/p>\r\n<p>Given the above scenario, a valid request is to retrieve all business type accounts with an account reference that is greater than 20 (assuming account reference has some meaningful numeric value for business type accounts). The query may look like this:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT account_nbr, account_reference AS account_ref_nbr\r\nFROM Accounts\r\nWHERE account_type LIKE 'Business%'\r\n\u00a0 AND CAST(account_reference AS INT) &gt; 20;\r\n<\/pre>\r\n<p>However, the query results in error:<\/p>\r\n<pre>\"Conversion failed when converting the varchar value 'abc' to data type int\"\r\n<\/pre>\r\n<p>The query fails because, as noted earlier, there is no prescribed order for executing predicates and nothing guarantees that the predicate &#8220;<b>account_type LIKE &#8216;Business%&#8217;<\/b>&#8221; will be evaluated before the predicate &#8220;<b>CAST(account_reference AS INT) &gt; 20<\/b>&#8220;. In our case, the second predicate is evaluated first resulting in a conversion error, due to the incompatible values in the <b>account_reference<\/b> column, for personal accounts.<\/p>\r\n<p>One attempt to resolve this issue might be to use a derived table (or common table expression) to filter the business type accounts first, and then apply the predicate for <b>account_reference<\/b> column:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT account_nbr, account_ref_nbr\r\nFROM (SELECT account_nbr, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 CAST(account_reference AS INT) AS account_ref_nbr\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Accounts\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE account_type LIKE 'Business%') AS A\r\nWHERE account_ref_nbr &gt; 20;\r\n<\/pre>\r\n<p>However, this results in the exact same error because derived tables and CTEs are expanded in the query plan and a single query plan is produced, where predicates can again be pushed up or down in the plan.<\/p>\r\n<p>As indicated earlier, the problem here is a mix of bad design and misunderstanding of how SQL Server performs physical query execution. What is the solution? The best solution is to design the table correctly and avoid storing mixed data in a single column. In this case, a work around is to use a <b>CASE<\/b> expression to guarantee that only valid numeric values will be converted to INT data type:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT account_nbr, account_reference AS account_ref_nbr\r\nFROM Accounts\r\nWHERE account_type LIKE 'Business%'\r\n\u00a0 AND CASE WHEN account_reference NOT LIKE '%[^0-9]%' \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN CAST(account_reference AS INT)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 END &gt; 20;\r\n<\/pre>\r\n<p>The <b>CASE<\/b> expression uses a <strong>LIKE<\/strong> pattern to check for valid numeric values (a double negation logic is used which can be translated as &#8220;there is not a single character that is not a digit&#8221;), and only for those values performs the <b>CAST<\/b>. For the rest of the values the <b>CASE<\/b> expression results in <b>NULL<\/b>, which is filtered out because <b>NULL<\/b> is not matched with any value (even with <b>NULL<\/b>).<\/p>\r\n<h1>Outer joins and placement of predicates<\/h1>\r\n<p>Outer joins are such a great tool but are also much misunderstood and abused. Some people seem to like them so much that they throw one into almost every query, regardless of whether or not it is needed!<\/p>\r\n<p>The key to correct use of outer joins is an understanding of the logical steps required to process an outer join in a query. Here are the relevant steps from the query processing phases:<\/p>\r\n<ol>\r\n\t<li>A cross join (Cartesian product) is formed for the two input tables in the <strong>FROM<\/strong> clause. The result of the Cartesian product is every possible combination of a row from the first table and a row from the second table.<\/li>\r\n\t<li>The ON clause predicates are applied filtering only rows satisfying the predicate logic.<\/li>\r\n\t<li>Any Outer rows filtered out by the predicates in step 2 are added back. Rows from the preserved table are added with their actual attribute values (column values), and the attributes (columns) from the non preserved table are set to <strong>NULL<\/strong>.<\/li>\r\n\t<li>The WHERE clause predicates are applied.<\/li>\r\n<\/ol>\r\n<p>An outer join query can produce completely different results depending on how you write it, and where predicates are placed in that query. Let&#8217;s look at one example, based on the following two tables, <b>Customers<\/b> and <b>Orders<\/b>:<\/p>\r\n<p><strong>Customers<\/strong> table:<\/p>\r\n<pre>customer_nbr customer_name\r\n------------ --------------\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Jim Brown\r\n2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Jeff Gordon\r\n3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Peter Green\r\n4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Julie Peters\r\n<\/pre>\r\n<p><strong>Orders<\/strong> table:<\/p>\r\n<pre>order_nbr\u00a0\u00a0 order_date customer_nbr order_amt\r\n----------- ---------- ------------ ----------\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2008-10-01 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 15.50\r\n2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2008-12-15 2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 25.00\r\n3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2009-01-02 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 18.00\r\n4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2009-02-20 3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10.25\r\n5\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2009-03-05 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 30.00\r\n<\/pre>\r\n<p>Our task is to retrieve a list of all customers, and the total amount they have spent on orders, since the beginning of year 2009. Instinctively, one may write the following query:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009\r\nFROM Customers AS C\r\nLEFT OUTER JOIN Orders AS O\r\n\u00a0 ON C.customer_nbr = O.customer_nbr\r\nWHERE O.order_date &gt;= '20090101'\r\nGROUP BY C.customer_name;\r\n<\/pre>\r\n<p>But the results do not look good:<\/p>\r\n<pre>customer_name\u00a0 total_2009\r\n-------------- ------------\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0 48.00\r\nPeter Green\u00a0\u00a0\u00a0 10.25\r\n<\/pre>\r\n<p>Customers Jeff and Julie are missing from the result set. Where is the problem? In order to understand what went wrong, let&#8217;s play back this query one step at a time following the logical processing order. The first step is a cross join between the two input tables:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.customer_name, O.order_amt\r\nFROM Customers AS C\r\nCROSS JOIN Orders AS O;\r\n<\/pre>\r\n<p>This results in every possible combination of rows from both input tables:<\/p>\r\n<pre>customer_name\u00a0\u00a0\u00a0 order_amt\u00a0 order_date\r\n---------------- ---------- ----------\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 15.50\u00a0\u00a0\u00a0\u00a0\u00a0 2008-10-01\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 25.00\u00a0\u00a0\u00a0\u00a0\u00a0 2008-12-15\r\nJim Brown\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a018.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-01-02\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10.25\u00a0\u00a0\u00a0\u00a0\u00a0 2009-02-20\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 30.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-03-05\r\nJeff Gordon\u00a0\u00a0\u00a0\u00a0\u00a0 15.50\u00a0\u00a0\u00a0\u00a0\u00a0 2008-10-01\r\nJeff Gordon\u00a0\u00a0\u00a0\u00a0\u00a0 25.00\u00a0\u00a0\u00a0\u00a0\u00a0 2008-12-15\r\nJeff Gordon\u00a0\u00a0\u00a0\u00a0\u00a0 18.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-01-02\r\nJeff Gordon\u00a0\u00a0\u00a0\u00a0\u00a0 10.25\u00a0\u00a0\u00a0\u00a0\u00a0 2009-02-20\r\nJeff Gordon\u00a0\u00a0\u00a0\u00a0\u00a0 30.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-03-05\r\nPeter Green\u00a0\u00a0\u00a0\u00a0\u00a0 15.50\u00a0\u00a0\u00a0\u00a0\u00a0 2008-10-01\r\nPeter Green\u00a0\u00a0\u00a0\u00a0\u00a0 25.00\u00a0\u00a0\u00a0\u00a0\u00a0 2008-12-15\r\nPeter Green\u00a0\u00a0\u00a0\u00a0\u00a0 18.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-01-02\r\nPeter Green\u00a0\u00a0\u00a0\u00a0\u00a0 10.25\u00a0\u00a0\u00a0\u00a0\u00a0 2009-02-20\r\nPeter Green\u00a0\u00a0\u00a0\u00a0\u00a0 30.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-03-05\r\nJulie Peters\u00a0\u00a0\u00a0 \u00a015.50\u00a0\u00a0\u00a0\u00a0\u00a0 2008-10-01\r\nJulie Peters\u00a0\u00a0\u00a0\u00a0 25.00\u00a0\u00a0\u00a0\u00a0\u00a0 2008-12-15\r\nJulie Peters\u00a0\u00a0\u00a0\u00a0 18.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-01-02\r\nJulie Peters\u00a0\u00a0\u00a0\u00a0 10.25\u00a0\u00a0\u00a0\u00a0\u00a0 2009-02-20\r\nJulie Peters\u00a0\u00a0\u00a0\u00a0 30.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-03-05\r\n<\/pre>\r\n<p>The next step is applying the ON predicates of the JOIN clause:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.customer_name, O.order_amt, O.order_date\r\nFROM Customers AS C\r\nINNER JOIN Orders AS O\r\n\u00a0\u00a0 ON C.customer_nbr = O.customer_nbr;\r\n<\/pre>\r\n<p>The result of this query includes only customers with orders. Since customer Julie does not have any orders it is excluded from the result set:<\/p>\r\n<pre>customer_name\u00a0 order_amt\u00a0 order_date\r\n-------------- ---------- ----------\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0 15.50\u00a0\u00a0\u00a0\u00a0\u00a0 2008-10-01\r\nJeff Gordon\u00a0\u00a0\u00a0 25.00\u00a0\u00a0\u00a0\u00a0\u00a0 2008-12-15\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0 18.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-01-02\r\nPeter Green\u00a0\u00a0\u00a0 10.25\u00a0\u00a0\u00a0\u00a0\u00a0 2009-02-20\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0 30.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-03-05\r\n<\/pre>\r\n<p>The third step of the logical processing order is adding back the outer rows. These rows were excluded in the prior step because they did not satisfy the join predicates.<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.customer_name, O.order_amt, O.order_date\r\nFROM Customers AS C\r\nLEFT OUTER JOIN Orders AS O\r\n\u00a0\u00a0 ON C.customer_nbr = O.customer_nbr;\r\n<\/pre>\r\n<p>Now customer Julie is added back in the result set. Notice the added outer rows from the preserved table (<strong>Customers<\/strong>) have values for the selected attributes (<strong>customer_name<\/strong>) and the non-preserved table (<strong>Orders<\/strong>) rows have NULL for their attributes (<strong>order_amt<\/strong> and <strong>order_date<\/strong>):<\/p>\r\n<pre>customer_name\u00a0 order_amt\u00a0 order_date\r\n-------------- ---------- ----------\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0 15.50\u00a0\u00a0\u00a0\u00a0\u00a0 2008-10-01\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0 18.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-01-02\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0 30.00\u00a0\u00a0\u00a0\u00a0\u00a0 2009-03-05\r\nJeff Gordon\u00a0\u00a0\u00a0 25.00\u00a0\u00a0\u00a0\u00a0\u00a0 2008-12-15\r\nPeter Green\u00a0\u00a0\u00a0 10.25\u00a0\u00a0\u00a0\u00a0\u00a0 2009-02-20\r\nJulie Peters\u00a0\u00a0 NULL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n<\/pre>\r\n<p>The last step is applying the <strong>WHERE<\/strong> clause predicates:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.customer_name, O.order_amt, O.order_date\r\nFROM Customers AS C\r\nLEFT OUTER JOIN Orders AS O\r\n\u00a0\u00a0 ON C.customer_nbr = O.customer_nbr\r\nWHERE O.order_date &gt;= '20090101';\r\n<\/pre>\r\n<p>Now the picture is clear! The culprit is the <strong>WHERE<\/strong> clause predicate. Customer Jeff is filtered out from the result set because he does not have orders past January 1, 2009, and customer Julie is filtered out because she has no orders at all (since the outer row added for Julie has NULL for the <strong>order_date<\/strong> column). In effect, in this case, the predicate in the <b>WHERE<\/b> clause turns the outer join into an inner join.<\/p>\r\n<p>To correct our initial query, it is sufficient to move the WHERE predicate into the join condition.<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009\r\nFROM Customers AS C\r\nLEFT OUTER JOIN Orders AS O\r\n\u00a0 ON C.customer_nbr = O.customer_nbr\r\n\u00a0AND O.order_date &gt;= '20090101'\r\nGROUP BY C.customer_name;\r\n<\/pre>\r\n<p>Now, the query returns correct results because Jeff and Julie are filtered out in the join predicates, but then added back when the outer rows are added.<\/p>\r\n<pre>customer_name\u00a0 total_2009\r\n-------------- ------------\r\nJeff Gordon\u00a0\u00a0\u00a0 0.00\r\nJim Brown\u00a0\u00a0\u00a0\u00a0\u00a0 48.00\r\nJulie Peters\u00a0\u00a0 0.00\r\nPeter Green\u00a0\u00a0\u00a0 10.25\r\n<\/pre>\r\n<p>In a more complex example, with multiple joins, the incorrect filtering may happen on a subsequent table operator (like join to another table) instead in the <b>WHERE<\/b> clause. For example, say we have an <b>OrderDetails<\/b> table containing product <b>SKU<\/b> and quantity, and the request is to retrieve a list of all customers, with order amount and quantity, for selected product SKUs. The following query may seem correct:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.customer_name, O.order_amt, D.qty\r\nFROM Customers AS C\r\nLEFT OUTER JOIN Orders AS O\r\n\u00a0 ON C.customer_nbr = O.customer_nbr\r\nINNER JOIN OrderDetails AS D\r\n\u00a0 ON D.order_nbr = O.order_nbr\r\n\u00a0AND D.sku = 101;\r\n<\/pre>\r\n<p>However, here the <b>INNER<\/b> join with the <b>OrderDetails<\/b> table plays the exact same role as the predicate in the <b>WHERE<\/b> clause in our previous example, in effect turning the <b>LEFT OUTER<\/b> join to <b>INNER<\/b> join. The correct query to satisfy this request needs to use a <b>LEFT OUTER<\/b> join to join to the <b>OrderDetails<\/b> table:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT C.customer_name, O.order_amt, D.qty\r\nFROM Customers AS C\r\nLEFT OUTER JOIN Orders AS O\r\n\u00a0 ON C.customer_nbr = O.customer_nbr\r\nLEFT JOIN OrderDetails AS D\r\n\u00a0 ON D.order_nbr = O.order_nbr\r\n\u00a0AND D.sku = 101;\r\n<\/pre>\r\n<h1>Subqueries that return more than one value<\/h1>\r\n<p>A very frequent request is to retrieve a value based on some correlation with the main query table. For example, consider the following two tables, storing details of products and the plants that manufacture these products:<\/p>\r\n<p><b>Products<\/b> table:<\/p>\r\n<pre>sku\u00a0\u00a0 product_description\r\n----- ------------------\r\n1\u00a0\u00a0\u00a0\u00a0 Bike\r\n2\u00a0\u00a0\u00a0\u00a0 Ball\r\n3\u00a0\u00a0\u00a0\u00a0 Phone\r\n<\/pre>\r\n<p><b>ProductPlants<\/b> table:<\/p>\r\n<pre>sku\u00a0\u00a0 plant_nbr\r\n----- -----------\r\n1\u00a0\u00a0\u00a0\u00a0 1\r\n2\u00a0\u00a0\u00a0\u00a0 1\r\n3\u00a0\u00a0\u00a0\u00a0 2\r\n<\/pre>\r\n<p>The request is to extract the manufacturing plant for each product. One way to satisfy the request is to write the following query using correlated subquery to retrieve the plant:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT sku, product_description,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 (SELECT plant_nbr\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM ProductPlants AS B\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE B.sku = A.sku) AS plant_nbr\r\nFROM Products AS A;\r\n<\/pre>\r\n<p>Note that the point here is to illustrate a technique; there could be a more efficient way to accomplish the same task. However, all works fine and we get the correct result set:<\/p>\r\n<pre>sku\u00a0 product_description plant_nbr\r\n---- ------------------- -----------\r\n1\u00a0\u00a0\u00a0 Bike\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1\r\n2\u00a0\u00a0\u00a0 Ball\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1\r\n3\u00a0\u00a0\u00a0 Phone\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a02 \r\n<\/pre>\r\n<p>The query will continue to work happily until the day arrives that the company decides to start manufacturing Balls at plant 3, to cope with increasing demand. The <b>ProductPlants<\/b> table now looks like this:<\/p>\r\n<pre>sku\u00a0\u00a0 plant_nbr\r\n----- -----------\r\n1\u00a0\u00a0\u00a0\u00a0 1\r\n2\u00a0\u00a0\u00a0\u00a0 1\r\n2\u00a0\u00a0\u00a0\u00a0 3\r\n3\u00a0\u00a0\u00a0\u00a0 2\r\n<\/pre>\r\n<p>All of a sudden, our query starts generating the following error:<\/p>\r\n<pre>Msg 512, Level 16, State 1, Line 1\r\nSubquery returned more than 1 value. This is not permitted when the subquery follows =, !=, &lt;, &lt;= , &gt;, &gt;= or when the subquery is used as an expression.\r\n<\/pre>\r\n<p>The error is descriptive enough. Instead of the expected scalar value, our subquery returns a result set, which breaks the query. Based on our business requirements, the fix is simple. To list all plants manufacturing plant for a particular product, we simply use a JOIN:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT A.sku, A.product_description, B.plant_nbr\r\nFROM Products AS A\r\nJOIN ProductPlants AS B\r\n\u00a0 ON A.sku = B.sku;\r\n<\/pre>\r\n<p>Now the query completes without errors and returns the correct results:<\/p>\r\n<pre>sku\u00a0 product_description\u00a0 plant_nbr\r\n---- -------------------- -----------\r\n1\u00a0\u00a0\u00a0 Bike\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1\r\n2\u00a0\u00a0\u00a0 Ball\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1\r\n2\u00a0\u00a0\u00a0 Ball\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3\r\n3\u00a0\u00a0\u00a0 Phone\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2\r\n<\/pre>\r\n<p>Note that the same error can occur in a predicate where a column or expression is tested against a subquery, for example &#8220;<b>&#8230; column = (SELECT value FROM Table)<\/b>&#8220;. In that case, the solution is to use the <b>IN<\/b> predicate in place of &#8220;=&#8221;.<\/p>\r\n<h1>Use of SELECT *<\/h1>\r\n<p>On first encounter with SQL we always praise the genius who invented the syntax <b>SELECT *<\/b>! It&#8217;s so handy and easy to use! Instead of explicitly listing all column names in our query, we just use the magic wildchar &#8216;*&#8217; and retrieve all columns. For example, a common misuse of <b>SELECT *<\/b> is to extract a set of all plastic products and to insert them into another table with the same structure:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">INSERT INTO PlasticProducts\r\nSELECT *\r\nFROM Products\r\nWHERE material_type = 'plastic';\r\n<\/pre>\r\n<p>Job done! However, one day business requirements change and two new columns are added to the <b>Products<\/b> table:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">ALTER TABLE Products\r\nADD effective_start_date DATETIME,\r\n\u00a0\u00a0\u00a0 effective_end_date DATETIME;\r\n<\/pre>\r\n<p>All of sudden the magic query results in error:<\/p>\r\n<pre>Msg 213, Level 16, State 1, Line 1\r\nInsert Error: Column name or number of supplied values does not match table definition.\r\n<\/pre>\r\n<p>The fix is to explicitly list the column names in the query:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">INSERT INTO PlasticProducts (sku, product_description, material_type)\r\nSELECT sku, product_description, material_type\r\nFROM Products\r\nWHERE material_type = 'plastic';\r\n<\/pre>\r\n<p>The situation can get even worse if a view is created using <b>SELECT *<\/b>, and later the base tables are modified to add or drop columns.<\/p>\r\n<div class=\"NOTE\">\r\n<p class=\"NOTE\"><b>Note: <\/b>If a view is create using the <b>SCHEMABINDING<\/b> option, then the base tables cannot be modified in a way that will affect the view definition.<\/p>\r\n<\/div>\r\n<p>To conclude, do not use <b>SELECT *<\/b> in production code! One exception here is when using the <b>EXISTS<\/b> predicate. The select list in the subquery for the <b>EXISTS<\/b> predicate is ignored since only the existence of rows is important.<\/p>\r\n<h1>Scalar user-defined functions<\/h1>\r\n<p>Reuse of code is one of the fundamental principles we learn when programming in any language, and the SQL language is no exception. It provides many means by which to logically group code and reuse it.<\/p>\r\n<p>One such means in SQL Server is the scalar user-defined function. It seems so convenient to hide away all those complex calculations in a function, and then simply invoke it in our queries. However, the hidden &#8220;sting in the tail&#8221; is that it can bring a heavy toll in terms of performance. When used in a query, scalar functions are evaluated for each row and, with large tables, this can result in very slow running queries. This is especially true when the scalar function needs to access another table to retrieve data.<\/p>\r\n<p>Here is one example. Given tables with products and sales for products, the request is to retrieve total sales per product. Since the total sales value can be reused in another place, you decide to use a scalar function to calculate the total sales for a product:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE FUNCTION dbo.GetTotalSales(@sku INT)\r\nRETURNS DECIMAL(15, 2)\r\nAS\r\nBEGIN\r\n\u00a0 RETURN(SELECT SUM(sale_amount)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Sales \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE sku = @sku);\r\nEND\r\n<\/pre>\r\n<p>Then the query to retrieve the total sales for each product will look like this;<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT sku, product_description, dbo.GetTotalSales(sku) AS total_sales\r\nFROM Products;\r\n<\/pre>\r\n<p>Isn&#8217;t this a very neat and good looking query? But just wait until you run it over a large data set. The total sales calculation will be repeated for each and every row, and the overhead will be proportional to the number of rows. The correct way to handle this is, if possible, is to rewrite the function as a table-valued function, or simply perform the calculation in the main query. In our example, performing the calculation in the query will look like this:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT P.sku, P.product_description, SUM(S.sale_amount) As total_sales\r\nFROM Products AS P\r\nJOIN Sales AS S\r\n\u00a0 ON P.sku = S.sku\r\nGROUP BY P.sku, P.product_description;\r\n<\/pre>\r\n<p>And here is a table-valued function that can be used to calculate total sales:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE FUNCTION dbo.GetTotalSales(@sku INT)\r\nRETURNS TABLE\r\nAS\r\nRETURN(SELECT SUM(sale_amount) AS total_sales\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Sales \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE sku = @sku);\r\n<\/pre>\r\n<p>Now the table-valued function can be invoked in the query using the <strong>APPLY<\/strong> operator:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT sku, product_description, total_sales\r\nFROM Products AS P\r\nCROSS APPLY dbo.GetTotalSales(P.sku) AS S;\r\n<\/pre>\r\n<h1>Overuse of cursors<\/h1>\r\n<p>Let&#8217;s face it &#8211; we love loops! Whether we start programming with VB, C, C++, Java, or C#, one of the first constructs we encounter is some form of a loop. They can helpfully solve pretty much any challenge you might face.<\/p>\r\n<p>And so, it is only natural on the day we start programming with SQL to seek out our favorite loop construct. And here it is &#8211; the mighty cursor (and its little WHILE brother)! Then we hurry to put the well known tool to use in solving our problems.<\/p>\r\n<p>Let&#8217;s look at one example. Given a table with product prices, we have to perform a monthly update of prices for products; the price updates are stored in another table with new prices.<\/p>\r\n<p><b>ProductPrices table:<\/b><\/p>\r\n<pre>sku\u00a0 price\u00a0 effective_start_date effective_end_date\r\n---- ------ -------------------- ------------------\r\n1\u00a0\u00a0\u00a0 10.50\u00a0 2009-01-01\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n2\u00a0\u00a0\u00a0 11.50\u00a0 2009-01-01\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n3\u00a0\u00a0\u00a0 19.00\u00a0 2009-01-01\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n4\u00a0\u00a0\u00a0 11.25\u00a0 2009-01-01\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n<\/pre>\r\n<p><b>NewPrices table:<\/b><\/p>\r\n<pre>sku\u00a0 price\r\n---- ------\r\n2\u00a0\u00a0\u00a0 11.25\r\n4\u00a0\u00a0\u00a0 12.00\r\n<\/pre>\r\n<p>A cursor solution may look like this:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">DECLARE @sku INT;\r\nDECLARE @price DECIMAL(15, 2);\r\n\u00a0\r\nDECLARE PriceUpdates\r\nCURSOR LOCAL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FORWARD_ONLY\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 STATIC\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 READ_ONLY\r\nFOR SELECT sku, price\r\n\u00a0\u00a0\u00a0 FROM NewPrices;\r\n\u00a0\r\nOPEN PriceUpdates;\r\n\u00a0\r\nFETCH NEXT FROM PriceUpdates\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 INTO @sku, @price;\r\n\u00a0\r\nWHILE @@FETCH_STATUS = 0\r\nBEGIN\r\n\u00a0\r\n\u00a0 UPDATE ProductPrices\r\n\u00a0 SET price = @price, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0 effective_start_date = CURRENT_TIMESTAMP\r\n\u00a0 WHERE sku = @sku; \r\n\u00a0\r\n\u00a0 FETCH NEXT FROM PriceUpdates\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INTO @sku, @price;\r\n\u00a0\r\nEND\r\n\u00a0\r\nCLOSE PriceUpdates;\r\nDEALLOCATE PriceUpdates;\r\n<\/pre>\r\n<p>Mission accomplished! Now we can take a well-deserved break while the query is running. Soon, the realization dawns that procedural row by row processing is not working well in SQL. Besides being very slow, our solution is long, hard to read and maintain. This is the moment we understand the power of SQL is its set-based nature. The same task can be accomplished using a very efficient set-based query that is easier to understand and maintain:<\/p>\r\n<pre class=\"theme:ssms2012 lang:tsql\">UPDATE ProductPrices\r\nSET price = (SELECT N.price\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM NewPrices AS N\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE N.sku = ProductPrices.sku),\r\n\u00a0\u00a0\u00a0 effective_start_date = CURRENT_TIMESTAMP\r\nWHERE EXISTS(SELECT *\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM NewPrices AS N\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE N.sku = ProductPrices.sku);\r\n<\/pre>\r\n<p>There are different ways to write a set based query to solve this problem: using the <strong>MERGE<\/strong> statement, update with Common Table Expression, or the SQL Server specific update with join. But the point is to utilize the natural power of the SQL language and use set based techniques to solve problems and to avoid procedural solutions.<\/p>\r\n<div class=\"NOTE\">\r\n<p class=\"NOTE\"><b>Note:<\/b> While you should avoid cursors as much as possible, there are certain problems, such as running total aggregations, that today are still best solved using cursors. We can be optimistic that future enhancements will provide better tools to solve those problems in a set based way.<\/p>\r\n<\/div>\r\n<h1>Summary<\/h1>\r\n<p>Maybe now is a good time for the next code review of your SQL code. Hopefully some of these examples will help you be a better developer\/DBA and produce more reliable and efficient code. It is not always easy to spot these patterns, especially in more complex queries. But as long as you understand the fundamentals, avoiding these mistakes is easy.<\/p>\r\n<p>Happy SQL coding!<\/p>\r\n<div class=\"scheme--lightest-grey spaced-bottom\">\n  <img decoding=\"async\" class=\"media__image padded--tight\" style=\"float: right; vertical-align: middle;\" src=\"https:\/\/assets.red-gate.com\/external\/SimpleTalk\/database-devops-cog-160.png\"><\/p>\n<div class=\"padded\">\n<p class=\"padded-top--tight\"> <strong>DevOps, Continuous Delivery & Database Lifecycle Management<\/strong><br \/> <a href=\"\/collections\/database-lifecycle-management-patterns-practices-library\/?utm_source=simpletalk&amp;utm_medium=publink&amp;utm_campaign=dlm&amp;utm_content=pandparticle&amp;utm_term=bottombanner\">Go to the Simple Talk library<\/a> to find more articles, or visit <a href=\"https:\/\/www.red-gate.com\/solutions?utm_source=simpletalk&amp;utm_medium=publink&amp;utm_campaign=dlm&amp;utm_content=pandparticle&amp;utm_term=bottombanner&amp;__hstc=81186632.9b3ff88e8dc727f9f1f608ffe1681edd.1467735864337.1482415357732.1482419905332.123&amp;__hssc=81186632.36.1482419905332&amp;__hsfp=3698596899\">www.red-gate.com\/solutions<\/a> for more information on the benefits of extending DevOps practices to SQL Server databases.<\/p>\n<\/div>\n<\/div>\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. Why does NOT IN return no rows when the subquery contains NULL in SQL Server?<\/h3>\n            <div class=\"faq-answer\">\n                <p>NOT IN evaluates each candidate value against every value in the subquery; if any subquery value is NULL, the comparison evaluates to UNKNOWN (because NULL means &#8216;unknown&#8217; in three-valued logic), and any UNKNOWN in a NOT IN clause causes the entire predicate to fail for that candidate row. The result is that the query returns no rows even when intuitively it should return many. The fix is to use NOT EXISTS with a correlated subquery, which uses two-valued logic and handles NULLs correctly. This is one of the most common silent-correctness bugs in SQL Server code.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. Why are scalar user-defined functions slow in SQL Server?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Before SQL Server 2019, scalar UDFs were executed row-by-row by an interpreted engine that could not be inlined into the query plan, treated as a black box by the optimiser, and forced serial execution of the entire query containing them. The result was severe performance degradation &#8211; often 10x to 100x slower than equivalent inline expressions. SQL Server 2019 introduced scalar UDF inlining (compatibility level 150+), which significantly improves the situation by transforming the UDF into an equivalent expression at compile time. Even with inlining, scalar UDFs are best avoided when an inline computation, computed column, or table-valued function is feasible.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. What is the difference between WHERE and ON in SQL Server outer joins?<\/h3>\n            <div class=\"faq-answer\">\n                <p>In an OUTER JOIN, predicates in the ON clause filter the join itself &#8211; they determine which rows from the inner table are matched to rows from the outer table. Predicates in the WHERE clause filter the result of the join &#8211; they apply after the join has already produced rows including NULLs from the unmatched outer side. A LEFT OUTER JOIN with a filter on the inner table in the WHERE clause is effectively converted to an INNER JOIN because the WHERE filter rejects the NULL-padded outer rows. The fix is to put inner-table filters in the ON clause for OUTER JOINs and reserve the WHERE clause for outer-table or post-join filtering.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Ten common SQL programming mistakes that trip up developers and DBAs: NULLs and the NOT IN predicate, functions on indexed columns, incorrect subquery columns, data type mismatches, predicate evaluation order, outer join predicate placement, multi-value subqueries, SELECT *, scalar UDF performance, and cursor overuse.&hellip;<\/p>\n","protected":false},"author":156113,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4150,5134,4252,5012],"coauthors":[27095],"class_list":["post-650","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-sql","tag-sql-prompt","tag-t-sql-programming","tag-tsql-coding-examples-sample-cursor-null"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/650","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/156113"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=650"}],"version-history":[{"count":8,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/650\/revisions"}],"predecessor-version":[{"id":72865,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/650\/revisions\/72865"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=650"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=650"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=650"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=650"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}