Reusing T-SQL Code

Database programmers are often caught in the paradox of wanting to implement business rules and complex functionality in one place only, but being concerned by the performance hit of having generic functions or procedures that have a wide range of parameters and outputs. Alex Kuznetsov, in an article taken from his book 'Defensive Database Programming with SQL Server', shows how DRY principles can be put in practice with constraints, stored procedures, triggers, UDFs and indexes.

Often, we have code that works perfectly well for a particular purpose, and then find that we need to implement some very similar functionality in another database. It is all too tempting to just copy the code, adapt it to meet the new requirements, and then deploy this slightly modified version of the same code. However, every time we copy and paste code in this manner, we are exposed to the following risk: the requirements change, and we fail to change the code in both places.

Whenever evidence of repetition is found in the code base, a defensive programmer should seek to refactor, so that the code to tackle a given problem or enforce a given rule is implemented in one place only. In other words, common logic should be refactored into a single reusable code unit, in the form of a constraint, stored procedure, trigger, user-defined function (UDF), or index. Whichever approach we use in each particular case, this proper form of code reuse reduces the possibility of bugs and is a vitally important part of defensive programming.

Unfortunately, many developers find it difficult to choose the correct implementation for the given requirement; in this article I will offer some useful pointers as to the approach that will lead to the most defensive, and therefore robust, code. We must, as always, benchmark our solutions because the one that most conveniently promotes reuse is not necessarily the one that will perform the best.

Specifically, this article will cover:

  • why copy-and-paste will get you into trouble
  • how proper code reuse will help
  • using views to encapsulate simple queries
  • using UDFs to encapsulate parameterized queries; and why UDFs may sometimes be preferable to stored procedures for this requirement
  • potential performance issues with UDFs
  • using constraints, triggers and filtered indexes to implement business logic in one place.

The Dangers of Copy-and-Paste

The biggest problem with copy-and-paste as a means of solving a set of similar problems is that, of course, it leads to code duplication. In turn, this means that we need to maintain multiples copies of essentially the same code, but with each copy subtly modified to suit a particular need. The real danger arises when requirements change, and we need to make sure that this is reflected, not just in the original code, but in all the subsequent copies. We can easily demonstrate this risk with an example. Listing 1 creates the Sales table and loads it with some test data.

Listing 1: Creating the Sales table and populating it with test data.

Listing 2 shows the stored procedure, SelectTotalSalesPerStateForMonth, which returns the total sales per state for a given month.

Listing 2: The SelectTotalSalesPerStateForMonth stored procedure.

At the time we developed this code, our understanding of a report “for a given month” is one that covers the period of time from the first calendar day of the month until the day we run the report. For this purpose, our stored procedure serves the customers’ needs well, and we soon receive a request for a similar report, returning the average sales per state, for a given month. Note that our new report is required to use the same definition of “for a given month.”

It is very tempting to just copy the existing SelectTotalSalesPerStateForMonth procedure, and replace sum with avg to meet the new requirements, as shown in Listing 3.

Listing 3: A simple adaptation of our “total sales” stored procedure allows us to produce an “average sales” equivalent.

In this way, we have completed the task in just a few seconds and, in the short term at least, it will do the job.

Suppose, however, that at some later time the users request to change the definition of “for a given month” to “thirty consecutive calendar days, ending on the day we run the report.” Unfortunately, the definition of “for a given month” is implemented twice, both in SelectTotalSalesPerStateForMonth and in SelectAverageSalesPerStateForMonth. Even if one and the same person developed them both, it is possible to forget it by the time we need to implement the change. Even if it is clearly documented that both procedures should use one and the same definition, it is still possible that the developer implementing the change has failed to modify both stored procedures in a consistent way.

Suppose, for example, that only the SelectAverageSalesPerStateForMonth stored procedure was modified to meet this new requirement. Listing 4 shows how it was changed.

Listing 4: The modified SelectAverageSalesPerStateForMonth stored procedure, accomodating the new definition of “for a given month.”

When we make such changes, it is very easy to forget that we have implemented the definition of “for a given month” in two places. If we update the definition in one place and not the other, we will get inconsistent results, as demonstrated by Listing 5.

Listing 5: The stored procedures produce different results.

Clearly the average sales size for the state of California (4,630,000) is many times greater than the total sales for the same state (10,000), which makes no sense at all. In this example, I have deliberately used test data that makes the discrepancy obvious. In general, however, such discrepancies may be more subtle and difficult to spot, so they can lurk around for a long time.

As this example clearly demonstrates, when we cut and paste code, we expose our code to the possibility of bugs if our requirements change, and we fail to change each of the multiple implementations of one and the same logic in exactly the same way. In short, copy-and-paste coding is a direct violation of the DRY (Don’t Repeat Yourself) principle, which is so fundamental in software engineering.

The DRY principle…

…was originally stated by Andy Hunt see http://en.wikipedia.org/wiki/Andy_Hunt_(author) and Dave Thomas see http://en.wikipedia.org/wiki/Dave_Thomas_(author) in their book The Pragmatic Programmer. For details, go to http://en.wikipedia.org/wiki/The_Pragmatic_Programmer. I encourage you to read this book; it is very relevant to every programmer.

The code to implement a given logic should be implemented once, and once only, and reused by all applications that need it. However, of course, due care must be taken when reusing SQL code. Careless reuse of code can lead to maintenance and performance issues, especially when this reuse takes the form of scalar UDFs. We cannot reuse code without first verifying that it runs fast enough.

How Reusing Code Improves its Robustness

Rather than repeat the same logic in multiple places, we need to refactor the common functionality out of our two stored procedures. We can implement the definition of “sales for a given month” in an inline UDF, as shown in Listing 6.

Listing 6: Implementing the definition of “sales for a given month” in an inline UDF.

This new inline UDF can then be used in both stored procedures.

Listing 7: Utilizing the new inline function in our two stored procedures.

After this refactoring, our two stored procedures are guaranteed to have the same definition of “for a given month.” We can rerun Listing 5 and try them out. If, at some later date, we change the definition of the reporting period again, we will have to modify only one module, SalesForMonth.

Can we reuse the definition of the reporting period in other queries against other tables? We can at least try to go one step further and have one module define our reporting period and do nothing else. As usual, we should verify that the performance is still acceptable. The code in Listing 8 shows how to implement the definition of reporting period as an inline UDF.

Listing 8: An inline UDF that implements the definition of a reporting period.

We can utilize this inline UDF when we implement the “sales for a given month” functionality.

Listing 9: Altering SalesPerStateForMonth to utilize the new MonthReportingPeriodStart function.

You can rerun Listing 5 one more time to verify that both our stored procedures still work correctly.

Alternatively, we can use a scalar UDF to implement the definition of reporting period, as shown in Listing 10.

Listing 10: Scalar UDF which implements the definition of reporting period.

We also have to change our SalesForMonth function, so that it utilizes our new scalar UDF, as shown in Listing 11.

Listing 11: Altering SalesForMonth to utilize the new scalar UDF MonthReportingPeriodStart.

Note that the new implementation of SalesForMonth is simpler than the previous one (Listing 9). Instead of using the CROSS APPLY clause to utilize the inline UDF, we can just invoke the scalar UDF directly in the WHERE clause.

In fact, however, the CROSS APPLY version will perform better in many cases. As always when we reuse code, we need to benchmark the performance of each of the possible approaches before making a choice. In some cases, chaining functions can lead to bad performance so, depending on the results of our benchmarking, we might even have to abandon the SalesForMonth and MonthReportingPeriodStart UDFs and return to the simpler function from Listing 6.

The basic fact remains, however, that implementing the same logic in multiple places increases the possibility of bugs when our requirements change. Instead, we should aim for sensible code reuse wherever possible, and UDFs are just one of the means to achieve this. Over the coming sections, we’ll discuss other ways in which we can reuse T-SQL code, as dictated by the given circumstances. Overall, reusing code is a very important component of defensive programming, and I cannot emphasize strongly enough how much it can improve the robustness of our code.

Wrapping SELECTs in Views

In some cases, it makes sense to wrap a frequently-used query in a view, as shown in Listing 12.

Listing 12: Wrapping a query inside a view.

You can SELECT from views in exactly the same way as you can SELECT from tables, so views are very convenient and useful. However, views do not offer the ability to provide parameters to the SELECT statements that we are reusing. When this requirement arises, we reuse SELECT statements by wrapping them either in stored procedures or in user-defined functions.

As usual, we need to consider performance whenever we choose to use views. Typically views do not cause any performance degradation at all. However, we need to use them in moderation: having to deal with too many layers of nested views may overwhelm the optimizer and cause it to choose a suboptimal plan.

Reusing Parameterized Queries: Stored Procedures versus Inline UDFs

If we want to reuse parameterized queries, it is usually preferable to wrap them in user-defined functions. It is typically less convenient to reuse parameterized queries that are wrapped in stored procedures, as the following examples will demonstrate.

Let’s say we have a stored procedure that returns all sales for the month, across all states, as shown in Listing 13.

Listing 13: A stored procedure that returns all sales for the month.

Hopefully, you spotted the missed opportunity for code reuse in this listing. We should have reused our MonthReportingPeriodStart in the WHERE clause; I leave this as an exercise for the reader.

We now need to develop a stored procedure that retrieves the total sales per state for a given month, and we want to reuse the SelectSalesForMonth stored procedure, Although it’s possible to do this, we will need to create a table variable or a temporary table with a structure that matches the structure of the result set returned by stored procedure, as shown in Listing 14.

Listing 14: The SelectSalesPerStateForMonth stored procedure, which reuses the SelectSalesForMonth stored procedure and returns total sales per state for the month.

We can run a smoke test to verify that our two stored procedures work.

Listing 15: Testing the new stored procedures.

So far so good; we have reused the code wrapped in SelectSalesForMonth procedure and it works. However, now suppose we want to select the state with the highest total sales for a given month. It looks as if we can simply reuse the SelectSalesPerStateForMonth procedure, again with a slight modification to create a table variable or a temporary table, as shown in Listing 16.

Listing 16: Reusing SelectSalesPerStateForMonth procedure to get the state with most sales.

Unfortunately, although the procedure creates, it does not work.

Listing 17: An INSERT...EXEC statement cannot be nested. Note that the exact error message may vary depending on the version of your SQL Server.

Unfortunately, the INSERT...EXEC approach that we used in SelectSalesPerStateForMonth procedure cannot be nested. This is a very serious limitation.

The two inline UDFs shown in Listing 18 implement the same requirements. Note that the TotalSalesPerStateForMonth function implements the same functionality as our previous SelectTotalSalesPerStateForMonth stored procedure. As per our rules of code reuse, we would only ever implement one or the other, not both, in our solutions.

Listing 18: Implementing the same functionality via inline UDFs.

In contrast to what we saw in Listing 17, our attempt to reuse result sets returned from nested inline UDFs works just fine.

Listing 19: Testing the inline UDFs.

It is often easier to reuse code when it is wrapped in inline UDFs than when it is wrapped in stored procedures. I should emphasize that I refer only to inline UDFs, not to all three varieties of UDF. Whenever we are deciding whether to use stored procedures or UDFs, we also need to consider the following:

  • INSERT EXEC requires you to create a table variable or temporary table before doing the call; stored procedures can have multiple and/or varying result sets, depending on code path, causing all kinds of problems with INSERT EXEC
  • certain functionality, such as data modifications and TRY...CATCH blocks, is not allowed in UDFs
  • the inline UDF, like a view, is expanded in the execution plan, giving the optimizer the choice to take shortcuts, or even remove joined tables if their columns are not used.

Let’s discuss performance considerations and see why it might not be a good idea to use scalar UDFs.

Scalar UDFs and Performance

Hopefully, the examples so far have demonstrated that laying out code in simple reusable modules can simplify maintenance, and reduce the chance of bugs when requirements change.

Although the emphasis of this book is on writing correct and robust code, we must, in this article, discuss performance. The reason is simple: careless code reuse can seriously hurt performance. For example, in some cases scalar UDFs may perform very poorly, and I will provide an example that demonstrates this, for SQL Server 2005 and 2008. Of course, in future versions of SQL Server the relative performance of the different flavors of UDFs may change, so it’s essential that you always benchmark the performance impact of code refactoring, and rerun these benchmarks when you upgrade to a new SQL Server version.

For this example, we’ll need to create a test table with a reasonable number of rows, so let’s first set up a 128K-row helper table, Numbers, as shown in Listing 20, which we can use to populate the test table. These helper tables are a must-have in database development. If you already have your own version that suits the same purpose, then feel free to use that instead.

Listing 20: Creating and populating the Numbers helper table.

Next, in Listing 21, we create the sample Packages table and populate it using our Numbers helper table.

Listing 21: Create the Packages table and populate it with test data.

The cost of shipping for a package is $1 if it weighs less than 5 pounds and $2 if it weighs 5 pounds or more. Listing 22 shows how to implement this simple algorithm, both as a scalar and as an inline UDF.

Listing 22: Calculating the shipping cost using a scalar UDF, GetShippingCost, and an inline UDF, GetShippingCost_Inline.

Now, we are ready to examine the comparative performance of each function, using the simple benchmark shown in Listing 23.

Listing 23: A simple benchmark to compare the performance of the scalar and inline UDFs vs. the performance of the copy-and-paste approach.

Although both functions implement exactly the same algorithm, the performance is dramatically different. When we run this benchmark on SQL Server 2012 or 2014, the query that uses our scalar UDF runs dramatically slower. Also, in this particular case, the query which uses the inline UDF performs very well, although not as fast as the query that does not use any UDFs at all, as shown in Listing 24. Of course, when you run these benchmarks on your system, you may get different results.

Listing 24: The performance of the query using our scalar UDF is dramatically slower than the performance of other equivalent queries.

I am not saying that using inline UDFs never incurs any performance penalties; blanket statements do not belong in database programming, and we always need to consider the performance of each particular case separately. However, in many cases, inline UDFs perform very well.

Multi-statement Table-valued UDFs

Besides scalar and inline UDFs, there are multi-statement table-valued UDFs. I will not discuss or benchmark them here, because I feel I’ve already proved the point that we need to consider performance when we refactor code. However, it’s worth noting that, in general, while inline UDFs tend to be “performance neutral,” scalar and multi-statement ones tend to hurt performance if not used carefully, and should be rigorously tested and benchmarked. Be especially wary of using a multi-statement table-valued UDF in an APPLY, since that may force the optimizer to re-execute the UDF for each row in the table the UDF is applied against.

If you are interested in learning about different flavors of UDF, I encourage you to read Books Online and Itzik Ben Gan’s T-SQL Querying book.

Reusing Business Logic: Stored Procedure, Trigger, Constraint or Index?

There are several ways in which we can choose to implement our business logic. For example, we could use:

  • stored procedures
  • constraints
  • triggers
  • unique filtered indexes.

Over the coming sections we’ll discuss the sort of situations where each approach may, or may not, be appropriate.

Use constraints where possible

In many cases, constraints are the easiest and simplest to use. To demonstrate this point, consider the Teams table shown in Listing 25, with a primary key constraint on the TeamID column.

Listing 25: Creating the Teams table.

Since we wish to forbid access to the base tables, teams will be inserted into the table, one at a time, by calling a stored procedure. Our business rule is simple: team names must be unique. So, we need to decide where to implement this business rule. One choice is to enforce it in the stored procedure, as shown in Listing 27.

Listing 26: The InsertTeam stored procedure inserts a team, if the team name does not already exist in the table.

So, we have a stored procedure that enforces our rule, at least in the absence of high concurrency. However, what happens when we need another stored procedure that modifies a single row in the Teams table, or one that merges a batch of new rows into that table? We’ll need to re-implement this same logic for every stored procedure that modifies this table. This is a form of copy-and-paste and is both time consuming and error prone.

Besides, unless you can guarantee that no applications can run modifications directly against the Teams table, it’s likely that your business rule will be bypassed at some point, and inconsistent data will be introduced.

It is much easier and safer to just create the business rule once, in one place, as a UNIQUE constraint, as shown in Listing 27.

Listing 27: The UNQ_Teams_Name constraint enforces the uniqueness of team names.

We can now let the database engine make sure that this business rule is always enforced, regardless of the module or command that modifies the table.

Turn to triggers when constraints are not practical

As we have seen, constraints are extremely useful in many simple cases. However, our business rules are often more complex, and it is sometimes not possible or not practical to use constraints. To demonstrate this point, let’s add one more table, TeamMembers, which references the Teams table through the TeamID column, as shown in Listing 28.

Listing 31: Creating the TeamMembers table.

Suppose that we need to implement the following business rule: no team can have more than two members. Implementing this business rule in a trigger is quite straightforward, as shown in Listing 29, and you only have to do it once. It is possible, but much more complex, to implement this rule via constraints.

Listing 29: The TeamMembers_TeamSizeLimitTrigger trigger ensures that the teams do not exceed the maximum size.

With our business rule implemented in only one place, we can comprehensively test just one object. In order to test this trigger, we need some test data in our parent table, as shown in Listing 30.

Listing 30: Adding some test data to the Teams table.

The script shown next, in Listing 31, verifies that we can successfully add new team members, as long as the teams’ sizes do not exceed the limit imposed by our trigger.

Listing 31: Testing the _TeamSizeLimitTrigger trigger with valid INSERTs.

The script shown next, in Listing 32, verifies that we can successfully transfer team members between teams, as long as the teams’ sizes do not exceed the limit.

Listing 32: Testing the _TeamSizeLimitTrigger trigger with valid UPDATEs.

So, we’ve proved that our trigger allows modifications that do not violate our business rules. Now we need to make sure that it does not allow modifications that do violate our business rules; there are quite a few cases, and we need to verify them all. First of all, Listing 33 verifies that we cannot add new team members if the resulting teams’ sizes are too big. All the statements in the script must, and do, fail.

Listing 33: Testing the _TeamSizeLimitTrigger trigger with invalid INSERTs.

Also, we need to make sure that we cannot transfer team members if the resulting teams’ sizes are too big, as shown in Listing 34. Again, all the following statements fail as expected.

Listing 34: Testing the _TeamSizeLimitTrigger trigger with invalid UPDATEs.

The amount of testing needed to ensure that a trigger works as expected can be quite substantial. However, this is the easiest alternative; if we were to re-implement this business rule in several stored procedures, then the same amount of testing required for the single trigger would be required for each of these procedures, in order to ensure that every one of them implements our business rule correctly.

Unique filtered indexes

Last, but not least, in some cases filtered indexes also allow us to implement business rules. For example, suppose that we need to make sure that each team has at most one team lead. If you are using SQL Server 2008 and upwards, then a filtered index can easily implement this business rule, as shown in Listing 35. I encourage you to try out this index and see for yourself that it works.

Listing 35: The TeamLeads filtered index ensures that each team has at most one team lead.

Summary

I hope that this article has proved to you that a copy-and-paste approach to code reuse will lead to multiple, inconsistent versions of the same logic being scattered throughout your code base, and a maintenance nightmare.

It has also demonstrated how common logic can be refactored into a single reusable code unit, in the form of a constraint, stored procedure, trigger, UDF or index. This careful reuse of code will reduce the possibility of bugs and greatly improve the robustness of our code.

Unfortunately, performance considerations may prevent us from reusing our code to the fullest. Yet, with careful benchmarking, we can usually find a nice compromise and develop code that is easy to maintain but still performs well enough.

Specifically, I hope the article has taught you the following lessons in defensive programming:

  • views are useful for simple reuse of non-parameterized queries
  • for reuse of parameterized queries, inline UDFs are often preferable to stored procedures
  • be wary of performance issues with scalar and multi-statement table-valued UDFs
  • if possible, enforce reusable business logic in a simple constraint, or possibly a filtered index in SQL 2008 and upwards
  • for more complex logic, triggers often offer the most convenient means of promoting reuse, but they require extensive testing.