{"id":1973,"date":"2015-03-26T00:00:00","date_gmt":"2015-03-25T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/painless-refactoring-of-sql-server-database-objects\/"},"modified":"2021-09-29T16:21:28","modified_gmt":"2021-09-29T16:21:28","slug":"painless-refactoring-of-sql-server-database-objects","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/painless-refactoring-of-sql-server-database-objects\/","title":{"rendered":"Painless Refactoring of SQL Server Database Objects"},"content":{"rendered":"<div class=\"article-content\">\n<p> \tIt is quite common for perfectly-functioning SQL code to be knocked off its feet by a change to the underlying database schema, or to other objects that are used in the code. If we are &#8220;lucky,&#8221; the code will suddenly start producing error messages; if not, it may just silently start producing different results. In either case, once the problem is discovered, the process of adjusting all of the application code can be long and painstaking. <\/p>\n<p> \tFortunately, there are a few simple defensive techniques that take little time to implement, but yet may significantly reduce the possibility of such errors. The relatively small extra effort up front will save a lot of maintenance toil later. <\/p>\n<p> \tThis article will examine several examples of how changes to database objects can cause unexpected behavior in the code that accesses them, and discuss how to develop code that will not break, or behave unpredictably, as a result of such changes. Specifically, we will cover: <\/p>\n<ul>\n<li><strong>changes to the Primary or unique keys<\/strong>, and how to test and validate assumptions regarding the &#8220;uniqueness&#8221; of column data<\/li>\n<li><strong>changes to stored procedure signatures<\/strong>, and the importance of using explicitly named parameters<\/li>\n<li><strong>changes<\/strong><strong> to<\/strong><strong> column<\/strong><strong>s<\/strong>, such as adding columns as well as modifying an existing column&#8217;s nullability, size or data type.<\/li>\n<\/ul>\n<p> \tOne of the main lessons to be learned is that if your implementation relies on a particular property of the underlying schema, such as the uniqueness of a given column, then you must document that assumption, preferably in your unit tests, and make sure it always holds true. <\/p>\n<h1>Refactoring a Primary or Unique Key<\/h1>\n<p> \tChanges to the keys in your tables should, hopefully, be rare, but they can cause trouble to the unwary when they happen. In a broader context, the defensive programmer should always fully document and test any assumptions about the underlying uniqueness of the column data, as I described in the <em>Ambiguous Updates<\/em> sections of my previous <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/basic-defensive-database-programming-techniques\/?utm_source=simpletalk&amp;utm_medium=weblink&amp;utm_content=refactoringdatabaseobjects\">Basic Defensive Database Programming Techniques<\/a> article. The following examples demonstrate what can happen to perfectly correct code when changes are made to the underlying unique or primary keys, thus invalidating assumptions in the code regarding the uniqueness of the column data. We&#8217;ll then discuss how a query against the system views, or use of <code>@@ROWCOUNT<\/code>, can detect if such assumptions are still true. <\/p>\n<p> \tIn Listing 1, we create a table, <code>Customers<\/code>, using a <code>UNIQUE<\/code> constraint to guarantee the uniqueness of phone numbers, and then populate it with some test data. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">CREATE TABLE dbo.Customers\n  (\n    CustomerId INT NOT NULL ,\n    FirstName VARCHAR(50) NOT NULL ,\n    LastName VARCHAR(50) NOT NULL ,\n    Status VARCHAR(50) NOT NULL ,\n    PhoneNumber VARCHAR(50) NOT NULL ,\n    CONSTRAINT PK_Customers PRIMARY KEY ( CustomerId ) ,\n    CONSTRAINT UNQ_Customers UNIQUE ( PhoneNumber )\n  ) ; \nGO\nINSERT  INTO dbo.Customers\n        ( CustomerId ,\n          FirstName ,\n          LastName ,\n          Status ,\n          PhoneNumber\n        )\n        SELECT  1 ,\n                'Darrel' ,\n                'Ling' ,\n                'Regular' ,\n                '(123)456-7890'\n        UNION ALL\n        SELECT  2 ,\n                'Peter' ,\n                'Hansen' ,\n                'Regular' ,\n                '(234)123-4567' ;\n\n<\/pre>\n<p class=\"caption\"> \tListing 1: Creating the <code>Customers<\/code> table, with a <code>UNIQUE<\/code> constraint on the <code>PhoneNumber<\/code> column. <\/p>\n<p> \tWe need to implement a simple stored procedure, shown in Listing 2, which will allow users to find a customer based on their phone number, and set their customer status (regular, preferred, or VIP). If no customer exists for a given phone number, we don&#8217;t need to raise an exception; we simply do nothing. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">CREATE PROCEDURE dbo.SetCustomerStatus\n    @PhoneNumber VARCHAR(50) ,\n    @Status VARCHAR(50)\nAS \n    BEGIN; \n        UPDATE  dbo.Customers\n        SET     Status = @Status\n        WHERE   PhoneNumber = @PhoneNumber ;\n    END ;\n<\/pre>\n<p class=\"caption\"> \tListing 2: The <code>SetCustomerStatus<\/code> stored procedure, which finds a customer by phone number and sets their status. <\/p>\n<p> \tThis implementation assumes that at most one customer has any given phone number. Clearly, right now, this assumption is true as it is guaranteed by the <code>UNQ_Customers<\/code> constraint. <\/p>\n<p> \tSuppose, however, that at some later time we need to store data about customers from different countries. At this point, the phone number alone no longer uniquely identifies a customer, but the combination of country code and phone number does. In order to accommodate this requirement, our <code>Customers<\/code> table is altered to add the new column, <code>CountryCode<\/code>, and our <code>UNQ_Customers<\/code> constraint is modified so that it enforces uniqueness based on a combination of the <code>CountryCode<\/code> and <code>PhoneNumber<\/code> columns. These alterations are shown in Listing 3. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">ALTER TABLE dbo.Customers\n   ADD CountryCode CHAR(2)  NOT NULL\n     CONSTRAINT DF_Customers_CountryCode\n        DEFAULT('US') ;\nGO\n\nALTER TABLE dbo.Customers DROP CONSTRAINT UNQ_Customers;\nGO\n\nALTER TABLE dbo.Customers \n   ADD CONSTRAINT UNQ_Customers \n      UNIQUE(PhoneNumber, CountryCode) ;\n<\/pre>\n<p class=\"caption\"> \tListing 3: Adding a <code>CountryCode<\/code> column to the table and to the unique constraint. <\/p>\n<p> \tNote that, in reality, we should have added a lookup table, <code>dbo.CountryCodes<\/code>, referred to by a <code>FOREIGN<\/code><code>KEY<\/code> constraint. However, I&#8217;ve avoided a lookup table in this case, in favor of keeping the example simple. <\/p>\n<p> \tAt this point, our constraint is no longer enforcing the uniqueness of values in the <code>PhoneNumber<\/code> column, so we can insert a customer with an identical phone number to an existing customer, but with a different country code, as shown in Listing 4. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">UPDATE  dbo.Customers\nSET     Status = 'Regular' ;\n\nINSERT  INTO dbo.Customers\n        ( CustomerId ,\n          FirstName ,\n          LastName ,\n          Status ,\n          PhoneNumber ,\n          CountryCode\n        )\n        SELECT  3 ,\n                'Wayne' ,\n                'Miller' ,\n                'Regular' ,\n                '(123)456-7890' ,\n                'UK' ;\n<\/pre>\n<p class=\"caption\"> \tListing 4: Wayne Miller has the same phone number as Darrell Ling, but with a different country code. <\/p>\n<p> \tOur Stored procedure, however, is still working on the assumption that a customer can be <em>uniquely<\/em> identified by their phone number alone. Since this assumption is no longer valid, the stored procedure, in its current form, could erroneously update more than one row of data, as demonstrated in Listing 5. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- at this moment all customers have Regular status\nEXEC dbo.SetCustomerStatus \n    @PhoneNumber = '(123)456-7890',\n    @Status = 'Preferred' ;\n\n-- the procedure has modified statuses of two customers\nSELECT  CustomerId ,\n        Status\nFROM    dbo.Customers ;\n\nCustomerId  Status\n----------- -------------\n1           Preferred\n2           Regular\n3           Preferred\n<\/pre>\n<p class=\"caption\"> \tListing 5: The unchanged stored procedure modifies two rows instead of one. <\/p>\n<p> \tPerhaps the most prevalent and damaging mistake made during the development of SQL code is a failure to define or recognize the assumptions on which the implementation relies. The result is code that is brittle, and liable to behave unpredictably when these assumptions are invalidated by changes to the underlying database objects. <\/p>\n<p> \tWhenever we change our unique and\/or primary keys, we need to review all the procedures that depend on the modified tables. However, the manual process of reviewing the potentially affected code is, like all manual processes, slow and prone to error. It may be more efficient to automate the process of identifying the modules that rely on particular assumptions about the underlying schema. Unit tests allow us to accomplish exactly that; we can easily, for example, write a unit test that succeeds if there is a <code>UNIQUE<\/code> constraint on the <code>PhoneNumber<\/code> column alone, and fails when this is no longer the case. <\/p>\n<h3>Using unit tests to document and test assumptions<\/h3>\n<p> \tLet&#8217;s translate the assumption that the <code>PhoneNumber<\/code> column uniquely identifies a customer into a query against the system views. The query is rather complex, so we&#8217;ll develop it in several steps. First of all, we need to know if there are any constraints on the <code>PhoneNumber<\/code> column, as shown in Listing 6. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT  COUNT(*)\nFROM    INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS u\nWHERE   u.TABLE_NAME = 'Customers'\n        AND u.TABLE_SCHEMA = 'dbo'\n        AND u.COLUMN_NAME = 'PhoneNumber' ;\n<\/pre>\n<p class=\"caption\"> \tListing 6: Step 1, a query to check for constraints on <code>PhoneNumber<\/code><code>.<\/code> <\/p>\n<p> \tThis query returns 1, confirming that there is a constraint on that column. Next, we need to verify that the constraint is either a primary key or a unique constraint: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT  COUNT(*)\nFROM    INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS u\n        JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS c\n            ON c.TABLE_NAME = u.TABLE_NAME\n            AND c.TABLE_SCHEMA = u.TABLE_SCHEMA\n            AND c.CONSTRAINT_NAME = u.CONSTRAINT_NAME\nWHERE   u.TABLE_NAME = 'Customers'\n        AND u.TABLE_SCHEMA = 'dbo'\n        AND u.COLUMN_NAME = 'PhoneNumber'\n        AND c.CONSTRAINT_TYPE \n            IN ( 'PRIMARY KEY', 'UNIQUE' ) ;\n<\/pre>\n<p class=\"caption\"> \tListing 7: Step 2 determines if the constraint on column <code>PhoneNumber<\/code> is a primary key or unique. <\/p>\n<p> \tFinally, we need to make sure that no other columns are included in that <code>UNIQUE<\/code> or <code>PRIMARY<\/code><code>KEY<\/code> constraint, as follows: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT  COUNT(*)\nFROM    INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS u\n        JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS c \n            ON c.TABLE_NAME = u.TABLE_NAME\n            AND c.TABLE_SCHEMA = u.TABLE_SCHEMA\n            AND c.CONSTRAINT_NAME = u.CONSTRAINT_NAME\nWHERE   u.TABLE_NAME = 'Customers'\n        AND u.TABLE_SCHEMA = 'dbo'\n        AND u.COLUMN_NAME = 'PhoneNumber'\n        AND c.CONSTRAINT_TYPE\n            IN ( 'PRIMARY KEY', 'UNIQUE' ) \n -- this constraint involves only one column\n        AND ( SELECT    COUNT(*)\n         FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE\n                AS u1\n              WHERE     u1.TABLE_NAME = u.TABLE_NAME\n                    AND u1.TABLE_SCHEMA = u.TABLE_SCHEMA\n                    AND u1.CONSTRAINT_NAME = \n                                  u.CONSTRAINT_NAME\n            ) = 1 ;\n<\/pre>\n<p class=\"caption\"> \tListing 8: Step 3, the final query determines whether there is a unique or primary key constraint that is built on only the <code>PhoneNumber<\/code><code><\/code>column. <\/p>\n<p> \tWhen we run this query against the original database schema, with a <code>UNIQUE<\/code> constraint on the <code>PhoneNumber<\/code> column, it returns a value of 1 indicating that there is indeed a constraint built only on the <code>PhoneNumber<\/code> column. However, when we run it after the column <code>CountryCode<\/code> has been added to the definition of the unique constraint, the second subquery returns the value 2, which means that the <code>UNIQUE<\/code> constraint <code>UNQ_Customers<\/code> is built on two columns, and so the outer query returns a value of 0. <\/p>\n<p> \tIn short, this query provides us with a means to verify the validity of the assumption that the <code>PhoneNumber<\/code> column uniquely identifies a customer. By incorporating this query into our <strong>unit test harness<\/strong>, we can accomplish two goals: <\/p>\n<ul>\n<li><strong>our assumption is documented<\/strong> &#8211; the code in Listing 8 clearly documents the fact that the <code>dbo.SetCustomerStatus<\/code> stored procedure needs a unique or primary constraint on a single column, <code>PhoneNumber<\/code><\/li>\n<li><strong>our assumption is tested<\/strong> &#8211; if the required constraint is dropped, or includes more than one column, we shall get a clear warning, because the unit test will fail.<\/li>\n<\/ul>\n<p> \tOf course, we should wrap this query in a stored procedure and reuse it, because there will surely be other cases when we rely on the uniqueness of a column used in our search condition. <\/p>\n<p> \tWe can use a similar technique to verify whether or not a combination of columns, considered together, are guaranteed to be unique. Implementing this query is left as an exercise to the reader. <\/p>\n<h3>Using @@ROWCOUNT to verify assumptions<\/h3>\n<p> \tAlternatively, instead of documenting our assumption as a unit test, we can have our stored procedure detect how many rows it modified, and roll back if it updated more than one row, as shown in Listing 9. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">ALTER PROCEDURE dbo.SetCustomerStatus\n    @PhoneNumber VARCHAR(50) ,\n    @Status VARCHAR(50)\nAS \n    BEGIN ; \n        BEGIN TRANSACTION ;\n    \n        UPDATE  dbo.Customers\n        SET     Status = @Status\n        WHERE   PhoneNumber = @PhoneNumber ;\n        \n        IF @@ROWCOUNT &gt; 1 \n            BEGIN ;\n                ROLLBACK ;\n                RAISERROR('More than one row updated',\n                            16, 1) ;\n            END ;\n        ELSE \n            BEGIN ;\n                COMMIT ;\n            END ;  \n    END ;\n<\/pre>\n<p class=\"caption\"> \tListing 9: A stored procedure that will not modify more than one row. <\/p>\n<p> \tTo see it in action, run Listing 10; the stored procedure raises an error and does not modify the data. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">UPDATE  dbo.Customers\nSET     Status = 'Regular' ;\n\nEXEC dbo.SetCustomerStatus \n    @PhoneNumber = '(123)456-7890',\n    @Status = 'Preferred' ;\n\nMsg 50000, Level 16, State 1, Procedure SetCustomerStatus, Line 15\nMore than one row updated\n\n-- verify if the procedure has modified any data\nSELECT  CustomerId ,\n        Status\nFROM    dbo.Customers ;\n<\/pre>\n<p class=\"caption\"> \tListing 10: Testing the altered stored procedure. <\/p>\n<p> \tIn general, this approach could be useful although, in this particular case, it is less preferable than a unit test. The reason is very simple: a unit test will alert us about a problem <em>before deployment<\/em>, allowing us fix the problem early, and to deploy without this particular bug. The altered stored procedure might not indicate a problem until the code has been deployed to production, which means troubleshooting a production system and redeploying a fix; a situation we usually want to avoid. <\/p>\n<h3>Using SET instead of SELECT when assigning variables<\/h3>\n<p> \tIt is important to understand the different behavior of <code>SET<\/code> and <code>SELECT<\/code> when assigning values to variables. For example, some developers do not realize that <code>SELECT<\/code> can leave a variable unchanged if the selected result set is empty, can lead to the dreaded infinite loop. If we&#8217;re using that variable to terminate the loop, we can end up in the dreaded &#8216;infinite loop&#8217;. This sort of knowledge will help you write application code that is resistant to changes to the underlying schema objects. <\/p>\n<p> \tLet&#8217;s consider an example whereby a search condition contains an implied assumption regarding the uniqueness of the underlying data column. The search condition, and subsequent variable assignment, shown in Listing 11 assumes, again, that the <code>PhoneNumber<\/code> column can uniquely identify a customer. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">DECLARE @CustomerId INT ;\n\nSELECT  @CustomerId = CustomerId\nFROM    dbo.Customers\nWHERE   PhoneNumber = '(123)456-7890' ;\n\nSELECT  @CustomerId AS CustomerId ;\n\n-- Do something with CustomerId\n<\/pre>\n<p class=\"caption\"> \tListing 11: Unpredictable variable assignment, using <code>SELECT<\/code><code>.<\/code> <\/p>\n<p> \tIn our original database schema, before we added <code>CountryCode<\/code> column to the <code>Customers<\/code> table, the result of this assignment was predictable. However, in our new schema, the <code>UNQ_Customers<\/code> constraint only guarantees the uniqueness of the values in the <code>PhoneNumber<\/code> and <code>CountryCode<\/code> columns, considered together. As a result, we have two customers with this phone number and so the variable assignment is unpredictable; we do not, and cannot, know which of the two <code>CustomerId<\/code> values, 1 or 3, will populate the variable. <\/p>\n<p> \tIn most cases, such ambiguity is not acceptable. The simplest fix is to use <code>SET<\/code> instead of <code>SELECT<\/code> to populate the variable, as shown in Listing 12. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">DECLARE @CustomerId INT ;\n\n-- this assignment will succeed,\n-- because in this case there is no ambiguity\nSET @CustomerId = ( SELECT CustomerId\n                    FROM   dbo.Customers\n                    WHERE  PhoneNumber = '(234)123-4567'\n                  ) ;\n\nSELECT  @CustomerId AS CustomerId ;\n\nCustomerId\n-----------\n2\n\n-- this assignment will fail,\n-- because there is ambiguity,\n-- two customers have the same phone number\nSET @CustomerId = ( SELECT CustomerId\n                    FROM   dbo.Customers\n                    WHERE  PhoneNumber = '(123)456-7890'\n                  ) ;\n\nMsg 512, Level 16, State 1, Line 16\nSubquery returned more than 1 value. This is not permitted when the subquery follows =, !=, &lt;, &lt;= , &gt;, &gt;= or when the subquery is used as an expression.\n\n-- the above error must be intercepted and handled. See:\n-- http:\/\/bit.ly\/1EG7wkL\n-- the variable is left unchanged\nSELECT  @CustomerId AS CustomerId ;\n\nCustomerId\n-----------\n2\n<\/pre>\n<p class=\"caption\"> \tListing 12: Whereas <code>SELECT<\/code> ignores the ambiguity, <code>SET<\/code> detects it and raises an error. <\/p>\n<h1>Refactoring the Signature of a Stored Procedure<\/h1>\n<p> \tConsider the stored procedure shown in Listing 13, <code>SelectCustomersByName<\/code><code>,<\/code> which takes two optional search conditions, and selects data from the <code>Customers<\/code> table, as defined in Listing 1. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">CREATE PROCEDURE dbo.SelectCustomersByName\n  @LastName VARCHAR(50) = NULL ,\n  @PhoneNumber VARCHAR(50) = NULL\nAS \n  BEGIN ;\n    SELECT  CustomerId ,\n            FirstName ,\n            LastName ,\n            PhoneNumber ,\n            Status\n    FROM    dbo.Customers\n    WHERE   LastName = COALESCE(@LastName, LastName)\n            AND PhoneNumber = COALESCE(@PhoneNumber,\n                                         PhoneNumber) ;\n  END ;\n<\/pre>\n<p class=\"caption\"> \tListing 13: The <code>SelectCustomersByName<\/code> stored procedure. <\/p>\n<p> \tWhen we invoke this stored procedure, we can explicitly name its parameters, and make the code more readable, but we are not forced to do so, as shown in Listing 14. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">EXEC dbo.SelectCustomersByName\n    'Hansen',         -- @LastName\n    '(234)123-4567' ; -- @PhoneNumber\n\nEXEC dbo.SelectCustomersByName\n    @LastName = 'Hansen',\n    @PhoneNumber = '(234)123-4567' ;\n<\/pre>\n<p class=\"caption\"> \tListing 14: Two ways to invoke the <code>SelectCustomersByName<\/code> stored procedure. <\/p>\n<p> \tAt the moment, either way of invoking the stored procedure produces the same result. Suppose, however, that the signature of this stored procedure is subsequently modified to accept an optional <code>@<\/code><code>FirstName<\/code> parameter, as described in Listing 15. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">ALTER PROCEDURE dbo.SelectCustomersByName\n  @FirstName VARCHAR(50) = NULL ,\n  @LastName VARCHAR(50) = NULL ,\n  @PhoneNumber VARCHAR(50) = NULL\nAS \n  BEGIN ;\n    SELECT  CustomerId ,\n      FirstName ,\n      LastName ,\n      PhoneNumber ,\n      Status\n    FROM    dbo.Customers\n    WHERE   FirstName = COALESCE (@FirstName, FirstName)\n            AND LastName = COALESCE (@LastName,LastName)\n            AND PhoneNumber = COALESCE (@PhoneNumber, \n                                          PhoneNumber) ;\n  END ;\nGO\n<\/pre>\n<p class=\"caption\"> \tListing 15: The modified <code>SelectCustomersByName<\/code> stored procedure includes an additional <code>First<\/code><code>N<\/code><code>ame<\/code> parameter. <\/p>\n<p> \tAs a result of this modification, the two ways of invoking the stored procedure are no longer equivalent. Of course, we will not receive any error message; we will just silently start getting different results, as shown in Listing 16. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- in the new context this call is interpreted \n-- differently. It will return no rows\nEXEC dbo.SelectCustomersByName \n    'Hansen',         -- @FirstName\n    '(234)123-4567' ; -- @LastName\n\n-- this stored procedure call is equivalent\n-- to the previous one\nEXEC dbo.SelectCustomersByName\n    @FirstName = 'Hansen',\n    @LastName = '(234)123-4567' ;\n\n-- this call returns the required row\nEXEC dbo.SelectCustomersByName\n    @LastName = 'Hansen',\n    @PhoneNumber = '(234)123-4567' ;\n<\/pre>\n<p class=\"caption\"> \tListing 16: The same stored procedure call is interpreted differently after the signature of that stored procedure has changed. <\/p>\n<p> \tThe lesson here is clear: stored procedure calls with explicitly named parameters are more robust; they continue to work correctly even when the signature of the stored procedure changes, or they give explicit errors instead of silently returning incorrect results. <\/p>\n<h1>Refactoring Columns<\/h1>\n<p> \tOne of the most common causes of brittle code is a failure to program defensively against subsequent changes to the columns of the underlying data tables. These changes can take the form of adding columns, or changing the definition of existing columns, for example, their data type or size. <\/p>\n<p> \tOf course, some changes are so serious that our code cannot survive them. For example, if a column that is required in a query is removed, then that is a breaking change that we can do nothing to protect against. However, in many other cases, we can develop code that is resilient to changes to the underlying columns. In this section, we&#8217;ll examine a few examples, explaining how to make our code more robust in each case. <\/p>\n<h3>Qualifying column names<\/h3>\n<p> \tIt takes a few extra keystrokes to qualify column names when writing our queries, but these keystrokes pay healthy dividends in terms of the resilience of the resulting code. Consider the example tables created in Listing 17, <code>Shipments<\/code> and <code>ShipmentItems<\/code>, populated with sample data. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">CREATE TABLE dbo.Shipments\n    (\n      Barcode VARCHAR(30) NOT NULL PRIMARY KEY,\n      SomeOtherData VARCHAR(100) NULL\n    ) ;\nGO\n\nINSERT  INTO dbo.Shipments\n        ( Barcode ,\n          SomeOtherData\n        )\n        SELECT  '123456' ,\n                '123456 data'\n        UNION ALL\n        SELECT  '123654' ,\n                '123654 data' ;\nGO\n\nCREATE TABLE dbo.ShipmentItems\n    (\n      ShipmentBarcode VARCHAR(30) NOT NULL,\n      Description VARCHAR(100) NULL\n    ) ;\nGO\n\nINSERT  INTO dbo.ShipmentItems\n        ( ShipmentBarcode ,\n          Description\n        )\n        SELECT  '123456' ,\n                'Some cool widget'\n        UNION ALL\n        SELECT  '123456' ,\n                'Some cool stuff for some gadget' ;\nGO\n<\/pre>\n<p class=\"caption\"> \tListing 17: The <code>Shipments<\/code> and <code>ShipmentItems<\/code> tables. <\/p>\n<p> \tAgainst this schema, we develop the query shown in Listing 18, which, for every shipment, selects its <code>Barcode<\/code> and calculates the number of shipment items with a <code>ShipmentBarcode<\/code> matching the <code>Barcode<\/code> for that shipment. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT  Barcode ,\n        ( SELECT    COUNT(*)\n          FROM      dbo.ShipmentItems\n          WHERE     ShipmentBarcode = Barcode\n        ) AS NumItems\nFROM    dbo.Shipments ;\n\nBarcode                        NumItems\n---------------------------------------\n123456                         2\n123654                         0\n<\/pre>\n<p class=\"caption\"> \tListing 18: A correlated subquery that works correctly even though column names are not qualified. <\/p>\n<p> \tThe inner query is an example of a correlated subquery; it uses in its <code>WHERE<\/code> clause the <code>Barcode<\/code> column from the <code>Shipments<\/code> table in the outer query. Notice that the query works even though we failed to qualify the column names. <\/p>\n<p> \tYet the situation can change. Rather than just having a barcode to identify shipments, we start using barcodes to identify individual items in the shipment, so we need to add a <code>Barcode<\/code> column to the <code>ShipmentItems<\/code> table, as shown in Listing 19. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">ALTER TABLE dbo.ShipmentItems\nADD Barcode VARCHAR(30) NULL ;\nGO\nSELECT  Barcode ,\n        ( SELECT    COUNT(*)\n          FROM      dbo.ShipmentItems\n          WHERE     ShipmentBarcode = Barcode\n        ) AS NumItems\nFROM    dbo.Shipments ;\n\nBarcode                        NumItems\n---------------------------------------\n123456                         0\n123654                         0\n<\/pre>\n<p class=\"caption\"> \tListing 19: The query works differently when a <code>Barcode<\/code> column is added to <code>ShipmentItems<\/code> table. <\/p>\n<p> \tWe do not get any error messages; our query continues to work but silently changes its behavior. With the addition of the <code>Barcode<\/code> column to the <code>ShipmentItems<\/code> table, our query is interpreted quite differently. Now, for every shipment, it selects its barcode followed by the number of <code>ShipmentItems<\/code> whose <code>Barcode<\/code> value matches their <code>ShipmentBarcode<\/code> value. In other words, the correlated subquery becomes uncorrelated; the <code>WHERE<\/code> clause of the inner query no longer uses a value from the outer query. <\/p>\n<p> \tIt takes just a few moments to properly qualify all the column names in our query, and the improved query will continue to work correctly even after the addition of the <code>Barcode<\/code> column to our <code>ShipmentItems<\/code> table, as shown in Listing 20. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT  s.Barcode ,\n        ( SELECT    COUNT(*)\n          FROM      dbo.ShipmentItems AS i\n          WHERE     i.ShipmentBarcode = s.Barcode\n        ) AS NumItems\nFROM    dbo.Shipments AS s ;\n\nBarcode                        NumItems\n---------------------------------------\n123456                         2\n123654                         0\n<\/pre>\n<p class=\"caption\"> \tListing 20: Qualified column names lead to more robust code. <\/p>\n<p> \tQualifying column names improves the robustness of our queries. The same technique also ensures that you get an error, instead of incorrect results, when a column is removed or when a column name is misspelled. For example, consider the case of an uncorrelated subquery that becomes correlated because a column from a table in the subquery is removed (or misspelled in the query), but happens to match a column in the outer query. Many developers forget that the parser will look in the outer query if it fails to find a match in the inner query. <\/p>\n<h3>Handling changes in nullability: NOT IN versus NOT EXISTS<\/h3>\n<p> \tQueries with <code>NOT<\/code><code>IN<\/code> have a well known vulnerability. They do not work as an inexperienced database programmer might expect, if the subquery contained in the <code>NOT<\/code><code>IN<\/code> clause returns at least one <code>NULL<\/code>. This is easy to demonstrate. In Listing 21, we recreate our <code>ShipmentItems<\/code> table with a <code>Barcode<\/code> column that does not accept <code>NULL<\/code>s, and then insert some fresh data. We then execute a query that uses the <code>NOT<\/code><code>IN<\/code> clause. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">DROP TABLE dbo.ShipmentItems ;\nGO\n\nCREATE TABLE dbo.ShipmentItems\n    (\n      ShipmentBarcode VARCHAR(30) NOT NULL ,\n      Description VARCHAR(100) NULL ,\n      Barcode VARCHAR(30) NOT NULL\n    ) ;\nGO\n\n\nINSERT  INTO dbo.ShipmentItems\n        ( ShipmentBarcode ,\n          Barcode ,\n          Description\n        )\n        SELECT  '123456' ,\n                '1010203' ,\n                'Some cool widget'\n        UNION ALL\n        SELECT  '123654' ,\n                '1010203' ,\n                'Some cool widget'\n        UNION ALL\n        SELECT  '123654' ,\n                '1010204' ,\n                'Some cool stuff for some gadget' ;\nGO\n\n-- retrieve all the items from shipment 123654\n-- that are not shipped in shipment 123456\nSELECT  Barcode\nFROM    dbo.ShipmentItems\nWHERE   ShipmentBarcode = '123654'\n        AND Barcode NOT IN ( SELECT Barcode\n                             FROM   dbo.ShipmentItems\n                             WHERE  ShipmentBarcode = \n                                            '123456' ) ;\n\nBarcode\n------------------------------\n1010204\n<\/pre>\n<p class=\"caption\"> \tListing 21: Creating the new <code>ShipmentItems<\/code> table, populating it with some test data, and proving that the query using the <code>NOT IN<\/code> clause succeeds. <\/p>\n<p> \tThe query works as expected, and will continue to do so as long as the <code>Barcode<\/code> column disallows <code>NULL<\/code>s. However, let&#8217;s see what happens when we change the nullability of that column, as shown in Listing 22. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">ALTER TABLE dbo.ShipmentItems\nALTER COLUMN Barcode VARCHAR(30) NULL ;\nINSERT  INTO dbo.ShipmentItems\n        ( ShipmentBarcode ,\n          Barcode ,\n          Description\n        )\n        SELECT  '123456' ,\n                NULL ,\n                'Users manual for some gadget' ;\nGO\n\nSELECT  Barcode\nFROM    dbo.ShipmentItems\nWHERE   ShipmentBarcode = '123654'\n        AND Barcode NOT IN ( SELECT Barcode\n                             FROM   dbo.ShipmentItems\n                             WHERE  ShipmentBarcode =\n                                          '123456' ) ;\n\nBarcode\n------------------------------\n(0 row(s) affected)\n<\/pre>\n<p class=\"caption\"> \tListing 22: Now that the <code>Barcode<\/code> column accepts <code>NULL<\/code>, our <code>NOT<\/code><code>IN<\/code> query no longer works as expected. <\/p>\n<p> \tThis can often seem like a very subtle bug; sometimes the query works as expected, but sometimes it does not. In fact, the behavior is completely consistent. Every time the subquery inside the <code>NOT<\/code><code>IN<\/code> clause returns at least one <code>NULL<\/code>, then the query returns nothing. Listing 23 shows a much simpler script that demonstrates this behavior very clearly. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT  CASE WHEN 1 NOT IN ( 2, 3 ) THEN 'True'\n             ELSE 'Unknown or False'\n        END ,\n        CASE WHEN 1 NOT IN ( 2, 3, NULL ) THEN 'True'\n             ELSE 'Unknown or False'\n        END ;\n\n----------------------\nTrue     Unknown or False\n<\/pre>\n<p class=\"caption\"> \tListing 23: <code>NOT<\/code><code>IN<\/code> queries will work differently when there are <code>NULL<\/code>s in the subquery. <\/p>\n<p> \tThis behavior may seem counterintuitive, but it actually makes perfect sense. Let me explain why, in just two simple steps. Listing 24 shows two queries. The first one uses an <code>IN<\/code> clause; the second is logically equivalent to the first, but the <code>IN<\/code> clause has been expressed using <code>OR<\/code> predicates. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- A query using the IN clause:\nSELECT  CASE WHEN 1 IN ( 1, 2, NULL ) THEN 'True'\n             ELSE 'Unknown or False'\n        END ;\n-- its logical eqiuvalent using OR\nSELECT  CASE WHEN ( 1 = 1 )\n                  OR ( 1 = 2 )\n                  OR ( 1 = NULL ) THEN 'True'\n             ELSE 'Unknown or False'\n        END ;\n<\/pre>\n<p class=\"caption\"> \tListing 24: A query with an <code>IN<\/code> clause, and a logically equivalent query using <code>OR<\/code>. <\/p>\n<p> \tIn the second step, we must consider the <code>NOT<\/code><code>IN<\/code> version of our query, convert it to use <code>OR<\/code> predicates, and then apply DeMorgan&#8217;s law, which states that for the logical expressions P and Q: <\/p>\n<div class=\"tips\">\n<p> \t\t<code>NOT(<\/code><code>P OR Q) = (NOT P) AND (NOT Q)<\/code> \t<\/p>\n<\/p><\/div>\n<p> \tThe result is shown in Listing 25. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- A query using the NOT IN clause:\nSELECT  CASE WHEN 1 NOT IN ( 1, 2, NULL ) THEN 'True'\n             ELSE 'Unknown or False'\n        END ;\n\n-- its logical eqiuvalent using OR\nSELECT  CASE WHEN NOT ( ( 1 = 1 )\n                        OR ( 1 = 2 )\n                        OR ( 1 = NULL )\n                      ) THEN 'True'\n             ELSE 'Unknown or False'\n        END ;\n-- applying DeMorgan's law, replacing every OR with AND,\n-- and every = with &lt;&gt;\nSELECT  CASE WHEN ( ( 1 &lt;&gt; 1 )\n                    AND ( 1 &lt;&gt; 2 )\n                    AND ( 1 &lt;&gt; NULL )\n                  ) THEN 'True'\n             ELSE 'Unknown or False'\n        END ;\n<\/pre>\n<p class=\"caption\"> \tListing 25: Three equivalent queries, the first using <code>NOT IN<\/code>, the second using two <code>OR<\/code> predicates and the third one with two <code>AND<\/code> predicates. <\/p>\n<p> \tTake note of the <code>(1&lt;&gt;NULL)<\/code> condition in the final query; by definition, the result of this condition is always unknown and, when joined with other conditions using <code>AND<\/code>, the result of the whole expression will always be false. This is why no row can ever qualify a <code>NOT<\/code><code>IN<\/code> condition if the subquery inside that <code>NOT<\/code><code>IN<\/code> returns at least one <code>NULL<\/code>. <\/p>\n<p> \tWhenever we write a query using the <code>NOT<\/code>`<code>IN<\/code> clause, we implicitly assume that the subquery can never return a <code>NULL<\/code>. Usually, we do not need to rely on such assumptions, because it is very easy to develop correct queries without making them. For example, Listing 26 shows how to remove this assumption from our original <code>SELECT<\/code> query. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT  Barcode\nFROM    dbo.ShipmentItems\nWHERE   ShipmentBarcode = '123654'\n  AND Barcode NOT IN ( SELECT Barcode\n                       FROM   dbo.ShipmentItems\n                       WHERE  ShipmentBarcode = '123456'\n                         AND Barcode IS NOT NULL ) ;\n<\/pre>\n<p class=\"caption\"> \tListing 26: A query with a subquery that never returns any <code>NULL<\/code>s. <\/p>\n<p> \tBy adding just one short and simple line to our query, we have improved its robustness. Alternatively, we can develop a query that does not assume anything about the nullability of any columns, as shown in Listing 27. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- retrieve all the items from shipment 123654\n-- that are not shipped in shipment 123456\nSELECT  i.Barcode\nFROM    dbo.ShipmentItems AS i\nWHERE   i.ShipmentBarcode = '123654'\n  AND NOT EXISTS ( SELECT *\n                   FROM   dbo.ShipmentItems AS i1\n                   WHERE  i1.ShipmentBarcode = '123456'\n                     AND i1.Barcode = i.Barcode ) ;\n<\/pre>\n<p class=\"caption\"> \tListing 27: An equivalent query with <code>NOT EXISTS<\/code>. <\/p>\n<p> \tThis query will work in the same way, regardless of whether or not the <code>Barcode<\/code> column is nullable. <\/p>\n<h3>Handling changes to data types and sizes<\/h3>\n<p> \tWe frequently develop stored procedures with parameters, the types and lengths of which must match the types and lengths of some underlying columns. By way of an example, Listing 28 creates a <code>Codes<\/code> table, and populates it with test data. It then to creates a <code>SelectCode<\/code> stored procedure with one parameter, <code>@Code<\/code>,<strong><\/strong>the type and length of which must match the type and length of the <code>Code<\/code> column in <code>Codes<\/code> table. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">DROP TABLE dbo.Codes\nGO\n\nCREATE TABLE dbo.Codes\n    (\n      Code VARCHAR(5) NOT NULL ,\n      Description VARCHAR(40) NOT NULL ,\n      CONSTRAINT PK_Codes PRIMARY KEY ( Code )\n    ) ;\nGO\n\t\nINSERT  INTO dbo.Codes\n        ( Code ,\n          Description\n        )\nVALUES  ( '12345' ,\n          'Description for 12345'\n        ) ;\nINSERT  INTO dbo.Codes\n        ( Code ,\n          Description\n        )\nVALUES  ( '34567' ,\n          'Description for 34567'\n        ) ;\nGO\n\nCREATE PROCEDURE dbo.SelectCode\n-- clearly the type and length of this parameter\n-- must match  the type and length of Code column\n-- in dbo.Codes table\n    @Code VARCHAR(5)\nAS \n    SELECT  Code ,\n            Description\n    FROM    dbo.Codes\n    WHERE   Code = @Code ;\nGO\n<\/pre>\n<p class=\"caption\"> \tListing 28: The <code>Codes<\/code> table and <code>SelectCode<\/code> stored procedure. <\/p>\n<p> \tWhen we execute the stored procedure, supplying an appropriate value for the <code>@Code<\/code> parameter, the result is as expected, and is shown in Listing 29. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">EXEC dbo.SelectCode @Code = '12345' ;\n\nCode       Description\n---------- ----------------------------------------\n12345      Description for 12345\n<\/pre>\n<p class=\"caption\"> \tListing 29: The <code>SelectCode<\/code> stored procedure works as expected. <\/p>\n<p> \tSuppose, however, that we have to change the length of the <code>C<\/code><code>ode<\/code> column to accept longer values, as shown in Listing 30. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">ALTER TABLE dbo.Codes DROP CONSTRAINT PK_Codes ;\nGO\n\nALTER TABLE dbo.Codes\n  ALTER COLUMN Code VARCHAR(10) NOT NULL ; \nGO\n\nALTER TABLE dbo.Codes \nADD CONSTRAINT PK_Codes \nPRIMARY KEY(Code) ;\nGO\n\nINSERT  INTO dbo.Codes\n        ( Code ,\n          Description\n        )\nVALUES  ( '1234567890' ,\n          'Description for 1234567890'\n        ) ;\n<\/pre>\n<p class=\"caption\"> \tListing 30: Increasing the length of the <code>Code<\/code> column and adding a row with maximum <code>Code<\/code> length. <\/p>\n<p> \tHowever, the unmodified stored procedure still expects a <code>VARCHAR(<\/code><code>5)<\/code> parameter, and it silently truncates any longer value, as shown in Listing 31. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">EXEC dbo.SelectCode @Code = '1234567890' ;\n\nCode       Description\n---------- ---------------------\n12345      Description for 12345\n<\/pre>\n<p class=\"caption\"> \tListing 31: The unchanged stored procedure retrieves the wrong row. <\/p>\n<p> \tSuch bugs are quite subtle, and may take considerable time to troubleshoot. How can we prevent such errors, except for manually referring to a data dictionary every time we change column types? The best way, in my opinion, is to document the requirement that the type and lengths of our stored procedure parameters must match the type and lengths of our columns and then incorporate this requirement into a boundary case unit test, to protect us from such errors. <\/p>\n<h1>Summary<\/h1>\n<p> \tWe have seen how changes in database objects may break our code, and discussed several defensive techniques that improve the robustness of our code, so that it better survives changes in other database objects. Specifically, I hope you&#8217;ve learned the following lessons in defensive programming: <\/p>\n<ul>\n<li>how to test the continuing validity of your assumptions about the characteristics of the underlying primary and unique keys<\/li>\n<li>why using <code>SET<\/code> is generally safer than using <code>SELECT<\/code> when assigning values to variables<\/li>\n<li>the importance of explicitly named parameters in stored procedure calls<\/li>\n<li>the need to qualify column names in queries<\/li>\n<li>why <code>NOT<\/code><code>EXISTS<\/code> leads to more resilient queries than <code>NOT<\/code><code>IN<\/code><\/li>\n<li>the importance of testing and validating even the &#8220;obvious&#8221; assumptions, such as the need for the types and length of parameters to match those of the underlying columns.<\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Refactoring a database object can often cause unexpected behavior in the code that accesses that object. In this article, adapted from his excellent book, Defensive Database Programming with SQL Server, Alex Kuznetsov discusses several techniques that will harden your code, so that it will not break, or behave unpredictably, as a result such changes.&hellip;<\/p>\n","protected":false},"author":6776,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4168,5554,6024,4500,4150,4151,4252],"coauthors":[],"class_list":["post-1973","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-database","tag-database-objects","tag-database-techniques","tag-refactoring","tag-sql","tag-sql-server","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1973","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/6776"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1973"}],"version-history":[{"count":2,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1973\/revisions"}],"predecessor-version":[{"id":39511,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1973\/revisions\/39511"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1973"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1973"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1973"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1973"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}