Reusing T-SQL Code

Often, we have code that works perfectly well for a particular purpose, and then find that we need to implement some very similar functionality in another database. It is all too tempting to just copy the code, adapt it to meet the new requirements, and then deploy this slightly modified version of the same code. However, every time we copy and paste code in this manner, we are exposed to the following risk: the requirements change, and we fail to change the code in both places.

Whenever evidence of repetition is found in the code base, a defensive programmer should seek to refactor, so that the code to tackle a given problem or enforce a given rule is implemented in one place only. In other words, common logic should be refactored into a single reusable code unit, in the form of a constraint, stored procedure, trigger, user-defined function (UDF), or index. Whichever approach we use in each particular case, this proper form of code reuse reduces the possibility of bugs and is a vitally important part of defensive programming.

Unfortunately, many developers find it difficult to choose the correct implementation for the given requirement; in this article I will offer some useful pointers as to the approach that will lead to the most defensive, and therefore robust, code. We must, as always, benchmark our solutions because the one that most conveniently promotes reuse is not necessarily the one that will perform the best.

Specifically, this article will cover:

why copy-and-paste will get you into trouble
how proper code reuse will help
using views to encapsulate simple queries
using UDFs to encapsulate parameterized queries; and why UDFs may sometimes be preferable to stored procedures for this requirement
potential performance issues with UDFs
using constraints, triggers and filtered indexes to implement business logic in one place.

The Dangers of Copy-and-Paste

The biggest problem with copy-and-paste as a means of solving a set of similar problems is that, of course, it leads to code duplication. In turn, this means that we need to maintain multiples copies of essentially the same code, but with each copy subtly modified to suit a particular need. The real danger arises when requirements change, and we need to make sure that this is reflected, not just in the original code, but in all the subsequent copies. We can easily demonstrate this risk with an example. Listing 1 creates the Sales table and loads it with some test data.

CREATE TABLE dbo.Sales

(

SalesID INT NOT NULL

IDENTITY

PRIMARY KEY ,

StateCode CHAR(2) NOT NULL ,

SaleDateTime DATETIME NOT NULL ,

Amount DECIMAL(10, 2) NOT NULL

) ;

SET NOCOUNT ON ;

DECLARE @d DATETIME ,

@i INT ;

SET @d = '20091002' ;

SET @i = 0 ;

WHILE @i < 40

BEGIN ;

INSERT  INTO dbo.Sales

( StateCode ,

SaleDateTime ,

Amount

)

SELECT 'CA' ,

@d ,

case WHEN @d <'20091001' THEN 5000000

ELSE 5000

END

UNION ALL

SELECT 'OR' ,

@d ,

case WHEN @d <'20091001' THEN 1000000

ELSE 1000

END ;

SELECT  @d = DATEADD(day, -1, @d) ,

@i = @i + 1 ;

END ;

Listing 1: Creating the Sales table and populating it with test data.

Listing 2 shows the stored procedure, SelectTotalSalesPerStateForMonth, which returns the total sales per state for a given month.

CREATE PROCEDURE dbo.SelectTotalSalesPerStateForMonth

@AsOfDate DATETIME

SELECT SUM(Amount) AS SalesPerState ,

StateCode

FROM dbo.Sales

-- month begins on the first calendar day of the month

WHERE SaleDateTime >= DATEADD(month,

DATEDIFF(month, '19900101',

@AsOfDate),

'19900101')

AND SaleDateTime <= @AsOfDate

GROUP BY StateCode ;

Listing 2: The SelectTotalSalesPerStateForMonth stored procedure.

At the time we developed this code, our understanding of a report “for a given month” is one that covers the period of time from the first calendar day of the month until the day we run the report. For this purpose, our stored procedure serves the customers’ needs well, and we soon receive a request for a similar report, returning the average sales per state, for a given month. Note that our new report is required to use the same definition of “for a given month.”

It is very tempting to just copy the existing SelectTotalSalesPerStateForMonth procedure, and replace sum with avg to meet the new requirements, as shown in Listing 3.

CREATE PROCEDURE dbo.SelectAverageSalesPerStateForMonth

@AsOfDate DATETIME

SELECT AVG(Amount) AS SalesPerState ,

StateCode

FROM dbo.Sales

-- month begins on the first calendar day of the month

WHERE SaleDateTime >= DATEADD(month,

DATEDIFF(month, '19900101',

@AsOfDate),

'19900101')

AND SaleDateTime <= @AsOfDate

GROUP BY StateCode ;

Listing 3: A simple adaptation of our “total sales” stored procedure allows us to produce an “average sales” equivalent.

In this way, we have completed the task in just a few seconds and, in the short term at least, it will do the job.

Suppose, however, that at some later time the users request to change the definition of “for a given month” to “thirty consecutive calendar days, ending on the day we run the report.” Unfortunately, the definition of “for a given month” is implemented twice, both in SelectTotalSalesPerStateForMonth and in SelectAverageSalesPerStateForMonth. Even if one and the same person developed them both, it is possible to forget it by the time we need to implement the change. Even if it is clearly documented that both procedures should use one and the same definition, it is still possible that the developer implementing the change has failed to modify both stored procedures in a consistent way.

Suppose, for example, that only the SelectAverageSalesPerStateForMonth stored procedure was modified to meet this new requirement. Listing 4 shows how it was changed.

ALTER PROCEDURE dbo.SelectAverageSalesPerStateForMonth

@AsOfDate DATETIME

SELECT AVG(Amount) AS SalesPerState ,

StateCode

FROM dbo.Sales

-- month means 30 calendar days

WHERE SaleDateTime >= DATEADD(day, -29, @AsOfDate)

AND SaleDateTime <= @AsOfDate

GROUP BY StateCode ;

Listing 4: The modified SelectAverageSalesPerStateForMonth stored procedure, accomodating the new definition of “for a given month.”

When we make such changes, it is very easy to forget that we have implemented the definition of “for a given month” in two places. If we update the definition in one place and not the other, we will get inconsistent results, as demonstrated by Listing 5.

PRINT 'Total Sales Per State For Month:' ;

EXEC dbo.SelectTotalSalesPerStateForMonth

@AsOfDate = '20091005' ;

PRINT 'Average Sales Per State For Month:' ;

EXEC dbo.SelectAverageSalesPerStateForMonth

@AsOfDate = '20091005' ;

Total Sales Per State For Month:

SalesPerState StateCode

--------------------------------------- ---------

10000.00 CA

2000.00 OR

(2 row(s) affected)

Average Sales Per State For Month:

SalesPerState StateCode

--------------------------------------- ---------

4630000.000000 CA

926000.000000 OR

(2 row(s) affected)

Listing 5: The stored procedures produce different results.

Clearly the average sales size for the state of California (4,630,000) is many times greater than the total sales for the same state (10,000), which makes no sense at all. In this example, I have deliberately used test data that makes the discrepancy obvious. In general, however, such discrepancies may be more subtle and difficult to spot, so they can lurk around for a long time.

As this example clearly demonstrates, when we cut and paste code, we expose our code to the possibility of bugs if our requirements change, and we fail to change each of the multiple implementations of one and the same logic in exactly the same way. In short, copy-and-paste coding is a direct violation of the DRY (Don’t Repeat Yourself) principle, which is so fundamental in software engineering.

The DRY principle…

…was originally stated by Andy Hunt see http://en.wikipedia.org/wiki/Andy_Hunt_(author) and Dave Thomas see http://en.wikipedia.org/wiki/Dave_Thomas_(author) in their book The Pragmatic Programmer. For details, go to http://en.wikipedia.org/wiki/The_Pragmatic_Programmer. I encourage you to read this book; it is very relevant to every programmer.

The code to implement a given logic should be implemented once, and once only, and reused by all applications that need it. However, of course, due care must be taken when reusing SQL code. Careless reuse of code can lead to maintenance and performance issues, especially when this reuse takes the form of scalar UDFs. We cannot reuse code without first verifying that it runs fast enough.

How Reusing Code Improves its Robustness

Rather than repeat the same logic in multiple places, we need to refactor the common functionality out of our two stored procedures. We can implement the definition of “sales for a given month” in an inline UDF, as shown in Listing 6.

CREATE FUNCTION dbo.SalesForMonth (@AsOfDate DATETIME)

RETURNS TABLE

RETURN

( SELECT SalesID ,

StateCode ,

SaleDateTime ,

Amount

FROM dbo.Sales

WHERE SaleDateTime >= DATEADD(day, -29, @AsOfDate)

AND SaleDateTime <= @AsOfDate

) ;

Listing 6: Implementing the definition of “sales for a given month” in an inline UDF.

This new inline UDF can then be used in both stored procedures.

ALTER PROCEDURE dbo.SelectTotalSalesPerStateForMonth

@AsOfDate DATETIME

BEGIN

SELECT SUM(Amount) AS SalesPerState ,

StateCode

FROM dbo.SalesForMonth(@AsOfDate)

GROUP BY StateCode ;

END ;

ALTER PROCEDURE dbo.SelectAverageSalesPerStateForMonth

@AsOfDate DATETIME

BEGIN

SELECT AVG(Amount) AS SalesPerState ,

StateCode

FROM dbo.SalesForMonth(@AsOfDate)

GROUP BY StateCode ;

END ;

Listing 7: Utilizing the new inline function in our two stored procedures.

After this refactoring, our two stored procedures are guaranteed to have the same definition of “for a given month.” We can rerun Listing 5 and try them out. If, at some later date, we change the definition of the reporting period again, we will have to modify only one module, SalesForMonth.

Can we reuse the definition of the reporting period in other queries against other tables? We can at least try to go one step further and have one module define our reporting period and do nothing else. As usual, we should verify that the performance is still acceptable. The code in Listing 8 shows how to implement the definition of reporting period as an inline UDF.

CREATE FUNCTION dbo.MonthReportingPeriodStart

(@AsOfDate DATETIME )

RETURNS TABLE

RETURN

( SELECT DATEADD(day, -29, @AsOfDate) AS PeriodStart

) ;

Listing 8: An inline UDF that implements the definition of a reporting period.

We can utilize this inline UDF when we implement the “sales for a given month” functionality.

ALTER FUNCTION dbo.SalesForMonth ( @AsOfDate DATETIME )

RETURNS TABLE

RETURN

( SELECT SalesID ,

StateCode ,

SaleDateTime ,

Amount

FROM dbo.Sales AS s

CROSS APPLY

dbo.MonthReportingPeriodStart(@AsOfDate) AS ps

WHERE SaleDateTime >= ps.PeriodStart

AND SaleDateTime <= @AsOfDate

) ;

Listing 9: Altering SalesPerStateForMonth to utilize the new MonthReportingPeriodStart function.

You can rerun Listing 5 one more time to verify that both our stored procedures still work correctly.

Alternatively, we can use a scalar UDF to implement the definition of reporting period, as shown in Listing 10.

-- being defensive, we must drop the old implementation

-- so that reporting periods are implemented

-- only in one place

DROP FUNCTION dbo.MonthReportingPeriodStart ;

CREATE FUNCTION dbo.MonthReportingPeriodStart

( @AsOfDate DATETIME )

RETURNS DATETIME

BEGIN ;

DECLARE @ret DATETIME ;

SET @ret = DATEADD(day, -29, @AsOfDate) ;

RETURN @ret ;

END ;

Listing 10: Scalar UDF which implements the definition of reporting period.

We also have to change our SalesForMonth function, so that it utilizes our new scalar UDF, as shown in Listing 11.

ALTER FUNCTION dbo.SalesForMonth ( @AsOfDate DATETIME )

RETURNS TABLE

RETURN

( SELECT SalesID ,

StateCode ,

SaleDateTime ,

Amount

FROM dbo.Sales AS s

WHERE SaleDateTime >=

dbo.MonthReportingPeriodStart(@AsOfDate)

AND SaleDateTime <= @AsOfDate

) ;

Listing 11: Altering SalesForMonth to utilize the new scalar UDF MonthReportingPeriodStart.

Note that the new implementation of SalesForMonth is simpler than the previous one (Listing 9). Instead of using the CROSS APPLY clause to utilize the inline UDF, we can just invoke the scalar UDF directly in the WHERE clause.

In fact, however, the CROSS APPLY version will perform better in many cases. As always when we reuse code, we need to benchmark the performance of each of the possible approaches before making a choice. In some cases, chaining functions can lead to bad performance so, depending on the results of our benchmarking, we might even have to abandon the SalesForMonth and MonthReportingPeriodStart UDFs and return to the simpler function from Listing 6.

The basic fact remains, however, that implementing the same logic in multiple places increases the possibility of bugs when our requirements change. Instead, we should aim for sensible code reuse wherever possible, and UDFs are just one of the means to achieve this. Over the coming sections, we’ll discuss other ways in which we can reuse T-SQL code, as dictated by the given circumstances. Overall, reusing code is a very important component of defensive programming, and I cannot emphasize strongly enough how much it can improve the robustness of our code.

Wrapping SELECTs in Views

In some cases, it makes sense to wrap a frequently-used query in a view, as shown in Listing 12.

CREATE VIEW dbo.TotalSalesByState

SELECT SUM(Amount) AS TotalSales, StateCode

FROM dbo.Sales

GROUP BY StateCode ;

Listing 12: Wrapping a query inside a view.

You can SELECT from views in exactly the same way as you can SELECT from tables, so views are very convenient and useful. However, views do not offer the ability to provide parameters to the SELECT statements that we are reusing. When this requirement arises, we reuse SELECT statements by wrapping them either in stored procedures or in user-defined functions.

As usual, we need to consider performance whenever we choose to use views. Typically views do not cause any performance degradation at all. However, we need to use them in moderation: having to deal with too many layers of nested views may overwhelm the optimizer and cause it to choose a suboptimal plan.

Reusing Parameterized Queries: Stored Procedures versus Inline UDFs

If we want to reuse parameterized queries, it is usually preferable to wrap them in user-defined functions. It is typically less convenient to reuse parameterized queries that are wrapped in stored procedures, as the following examples will demonstrate.

Let’s say we have a stored procedure that returns all sales for the month, across all states, as shown in Listing 13.

CREATE PROCEDURE dbo.SelectSalesForMonth @AsOfDate DATETIME

BEGIN ;

SELECT Amount ,

StateCode

FROM dbo.Sales

WHERE SaleDateTime >= DATEADD(day, -29, @AsOfDate)

AND SaleDateTime <= @AsOfDate ;

END ;

Listing 13: A stored procedure that returns all sales for the month.

Hopefully, you spotted the missed opportunity for code reuse in this listing. We should have reused our MonthReportingPeriodStart in the WHERE clause; I leave this as an exercise for the reader.

We now need to develop a stored procedure that retrieves the total sales per state for a given month, and we want to reuse the SelectSalesForMonth stored procedure, Although it’s possible to do this, we will need to create a table variable or a temporary table with a structure that matches the structure of the result set returned by stored procedure, as shown in Listing 14.

CREATE PROCEDURE dbo.SelectSalesPerStateForMonth

@AsOfDate DATETIME

BEGIN ;

DECLARE @SalesForMonth TABLE

(

StateCode CHAR(2) ,

Amount DECIMAL(10, 2)

) ;

INSERT INTO @SalesForMonth

( Amount ,

StateCode

)

EXEC dbo.SelectSalesForMonth @AsOfDate ;

SELECT SUM(Amount) AS TotalSalesForMonth ,

StateCode

FROM @SalesForMonth

GROUP BY StateCode

ORDER BY StateCode ;

END ;

Listing 14: The SelectSalesPerStateForMonth stored procedure, which reuses the SelectSalesForMonth stored procedure and returns total sales per state for the month.

We can run a smoke test to verify that our two stored procedures work.

1 2	EXEC dbo.SelectSalesForMonth @AsOfDate = '20091002' ; EXEC dbo.SelectSalesPerStateForMonth @AsOfDate = '20091002' ;

Listing 15: Testing the new stored procedures.

So far so good; we have reused the code wrapped in SelectSalesForMonth procedure and it works. However, now suppose we want to select the state with the highest total sales for a given month. It looks as if we can simply reuse the SelectSalesPerStateForMonth procedure, again with a slight modification to create a table variable or a temporary table, as shown in Listing 16.

CREATE PROCEDURE dbo.SelectStateWithBestSalesForMonth

@AsOfDate DATETIME

BEGIN ;

DECLARE @SalesForMonth TABLE

(

TotalSales DECIMAL(10, 2) ,

StateCode CHAR(2)

) ;

INSERT INTO @SalesForMonth

( TotalSales ,

StateCode

)

EXEC dbo.SelectSalesPerStateForMonth @AsOfDate ;

SELECT TOP (1)

TotalSales ,

StateCode

FROM @SalesForMonth

ORDER BY TotalSales DESC ;

END ;

Listing 16: Reusing SelectSalesPerStateForMonth procedure to get the state with most sales.

Unfortunately, although the procedure creates, it does not work.

EXEC dbo.SelectStateWithBestSalesForMonth

@AsOfDate = '20091002' ;

Msg 8164, Level 16, State 1, Procedure SelectSalesPerStateForMonth, Line 10

An INSERT EXEC statement cannot be nested.

Listing 17: An INSERT...EXEC statement cannot be nested. Note that the exact error message may vary depending on the version of your SQL Server.

Unfortunately, the INSERT...EXEC approach that we used in SelectSalesPerStateForMonth procedure cannot be nested. This is a very serious limitation.

The two inline UDFs shown in Listing 18 implement the same requirements. Note that the TotalSalesPerStateForMonth function implements the same functionality as our previous SelectTotalSalesPerStateForMonth stored procedure. As per our rules of code reuse, we would only ever implement one or the other, not both, in our solutions.

CREATE FUNCTION dbo.TotalSalesPerStateForMonth

( @AsOfDate DATETIME )

RETURNS TABLE

RETURN

( SELECT StateCode ,

SUM(Amount) AS TotalSales

FROM dbo.SalesPerStateForMonth(@AsOfDate)

GROUP BY StateCode

) ;

CREATE FUNCTION dbo.StateWithBestSalesForMonth

( @AsOfDate DATETIME )

RETURNS TABLE

RETURN

( SELECT TOP (1)

StateCode ,

TotalSales

FROM dbo.TotalSalesPerStateForMonth(@AsOfDate)

ORDER BY TotalSales DESC

) ;

Listing 18: Implementing the same functionality via inline UDFs.

In contrast to what we saw in Listing 17, our attempt to reuse result sets returned from nested inline UDFs works just fine.

SELECT * FROM dbo.TotalSalesPerStateForMonth ( '20091002' ) ;

SELECT * FROM dbo.StateWithBestSalesForMonth ( '20091002' ) ;

StateCode TotalSales

--------- ---------------------------------------

CA 140010000.00

OR 28002000.00

(2 row(s) affected)

StateCode TotalSales

--------- ---------------------------------------

CA 140010000.00

(1 row(s) affected)

Listing 19: Testing the inline UDFs.

It is often easier to reuse code when it is wrapped in inline UDFs than when it is wrapped in stored procedures. I should emphasize that I refer only to inline UDFs, not to all three varieties of UDF. Whenever we are deciding whether to use stored procedures or UDFs, we also need to consider the following:

INSERT EXEC requires you to create a table variable or temporary table before doing the call; stored procedures can have multiple and/or varying result sets, depending on code path, causing all kinds of problems with INSERT EXEC
certain functionality, such as data modifications and TRY...CATCH blocks, is not allowed in UDFs
the inline UDF, like a view, is expanded in the execution plan, giving the optimizer the choice to take shortcuts, or even remove joined tables if their columns are not used.

Let’s discuss performance considerations and see why it might not be a good idea to use scalar UDFs.

Scalar UDFs and Performance

Hopefully, the examples so far have demonstrated that laying out code in simple reusable modules can simplify maintenance, and reduce the chance of bugs when requirements change.

Although the emphasis of this book is on writing correct and robust code, we must, in this article, discuss performance. The reason is simple: careless code reuse can seriously hurt performance. For example, in some cases scalar UDFs may perform very poorly, and I will provide an example that demonstrates this, for SQL Server 2005 and 2008. Of course, in future versions of SQL Server the relative performance of the different flavors of UDFs may change, so it’s essential that you always benchmark the performance impact of code refactoring, and rerun these benchmarks when you upgrade to a new SQL Server version.

For this example, we’ll need to create a test table with a reasonable number of rows, so let’s first set up a 128K-row helper table, Numbers, as shown in Listing 20, which we can use to populate the test table. These helper tables are a must-have in database development. If you already have your own version that suits the same purpose, then feel free to use that instead.

CREATE TABLE dbo.Numbers

(

n INT NOT NULL ,

CONSTRAINT PK_Numbers PRIMARY KEY ( n )

) ;

DECLARE @i INT ;

SET @i = 1 ;

INSERT INTO dbo.Numbers

( n )

VALUES ( 1 ) ;

WHILE @i < 100000

BEGIN ;

INSERT INTO dbo.Numbers

( n )

SELECT @i + n

FROM dbo.Numbers ;

SET @i = @i * 2 ;

END ;

Listing 20: Creating and populating the Numbers helper table.

Next, in Listing 21, we create the sample Packages table and populate it using our Numbers helper table.

CREATE TABLE dbo.Packages

(

PackageID INT NOT NULL ,

WeightInPounds DECIMAL(5, 2) NOT NULL ,

CONSTRAINT PK_Packages PRIMARY KEY ( PackageID )

) ;

INSERT INTO dbo.Packages

( PackageID ,

WeightInPounds

)

SELECT n ,

1.0 + ( n % 900 ) / 10

FROM dbo.Numbers ;

Listing 21: Create the Packages table and populate it with test data.

The cost of shipping for a package is $1 if it weighs less than 5 pounds and $2 if it weighs 5 pounds or more. Listing 22 shows how to implement this simple algorithm, both as a scalar and as an inline UDF.

CREATE FUNCTION dbo.GetShippingCost

(

@WeightInPounds DECIMAL(5, 2)

)

RETURNS DECIMAL(5, 2)

BEGIN

DECLARE @ret DECIMAL(5, 2) ;

SET @ret = CASE WHEN @WeightInPounds < 5 THEN 1.00

ELSE 2.00

END ;

RETURN @ret ;

END ;

CREATE FUNCTION dbo.GetShippingCost_Inline

(

@WeightInPounds DECIMAL(5, 2)

)

RETURNS TABLE

RETURN

( SELECT CAST(CASE WHEN @WeightInPounds < 5 THEN 1.00

ELSE 2.00

END AS DECIMAL(5, 2)) AS ShippingCost

) ;

Listing 22: Calculating the shipping cost using a scalar UDF, GetShippingCost, and an inline UDF, GetShippingCost_Inline.

Now, we are ready to examine the comparative performance of each function, using the simple benchmark shown in Listing 23.

SET STATISTICS TIME ON ;

SET NOCOUNT ON ;

PRINT 'Using a scalar UDF' ;

SELECT SUM(dbo.GetShippingCost(WeightInPounds))

AS TotalShippingCost

FROM dbo.Packages ;

PRINT 'Using an inline UDF' ;

SELECT SUM(s.ShippingCost) AS TotalShippingCost

FROM dbo.Packages AS p CROSS APPLY

dbo.GetShippingCost_Inline(p.WeightInPounds) AS s ;

PRINT 'Not using any funtions at all' ;

SELECT SUM(CASE WHEN p.WeightInPounds < 5 THEN 1.00

ELSE 2.00

END) AS TotalShippingCost

FROM dbo.Packages AS p ;

SET STATISTICS TIME OFF ;

Listing 23: A simple benchmark to compare the performance of the scalar and inline UDFs vs. the performance of the copy-and-paste approach.

Although both functions implement exactly the same algorithm, the performance is dramatically different. When we run this benchmark on SQL Server 2012 or 2014, the query that uses our scalar UDF runs dramatically slower. Also, in this particular case, the query which uses the inline UDF performs very well, although not as fast as the query that does not use any UDFs at all, as shown in Listing 24. Of course, when you run these benchmarks on your system, you may get different results.

Using a scalar UDF

...<snip>...

SQL Server Execution Times:

CPU time = 1531 ms, elapsed time = 1552 ms.

Using an inline UDF

...<snip>...

SQL Server Execution Times:

CPU time = 109 ms, elapsed time = 82 ms.

Not using any functions at all

...<snip>...

SQL Server Execution Times:

CPU time = 32 ms, elapsed time = 52 ms.

Listing 24: The performance of the query using our scalar UDF is dramatically slower than the performance of other equivalent queries.

I am not saying that using inline UDFs never incurs any performance penalties; blanket statements do not belong in database programming, and we always need to consider the performance of each particular case separately. However, in many cases, inline UDFs perform very well.

Multi-statement Table-valued UDFs

Besides scalar and inline UDFs, there are multi-statement table-valued UDFs. I will not discuss or benchmark them here, because I feel I’ve already proved the point that we need to consider performance when we refactor code. However, it’s worth noting that, in general, while inline UDFs tend to be “performance neutral,” scalar and multi-statement ones tend to hurt performance if not used carefully, and should be rigorously tested and benchmarked. Be especially wary of using a multi-statement table-valued UDF in an APPLY, since that may force the optimizer to re-execute the UDF for each row in the table the UDF is applied against.

If you are interested in learning about different flavors of UDF, I encourage you to read Books Online and Itzik Ben Gan’s T-SQL Querying book.

Reusing Business Logic: Stored Procedure, Trigger, Constraint or Index?

There are several ways in which we can choose to implement our business logic. For example, we could use:

stored procedures
constraints
triggers
unique filtered indexes.

Over the coming sections we’ll discuss the sort of situations where each approach may, or may not, be appropriate.

Use constraints where possible

In many cases, constraints are the easiest and simplest to use. To demonstrate this point, consider the Teams table shown in Listing 25, with a primary key constraint on the TeamID column.

CREATE TABLE dbo.Teams

(

TeamID INT NOT NULL ,

Name VARCHAR(50) NOT NULL ,

CONSTRAINT PK_Teams PRIMARY KEY ( TeamID )

) ;

Listing 25: Creating the Teams table.

Since we wish to forbid access to the base tables, teams will be inserted into the table, one at a time, by calling a stored procedure. Our business rule is simple: team names must be unique. So, we need to decide where to implement this business rule. One choice is to enforce it in the stored procedure, as shown in Listing 27.

CREATE PROCEDURE dbo.InsertTeam

@TeamID INT ,

@Name VARCHAR(50)

BEGIN ;

-- This is not a fully-functional stored

-- procedure. Error handling is skipped to keep

-- the example short.

-- Also potential race conditions

-- are not considered in this simple module

INSERT INTO dbo.Teams

( TeamID ,

Name

)

SELECT @TeamID ,

@Name

WHERE NOT EXISTS ( SELECT *

FROM dbo.Teams

WHERE Name = @Name ) ;

-- we also need to raise an error if we

-- already have a team with such a name

END ;

Listing 26: The InsertTeam stored procedure inserts a team, if the team name does not already exist in the table.

So, we have a stored procedure that enforces our rule, at least in the absence of high concurrency. However, what happens when we need another stored procedure that modifies a single row in the Teams table, or one that merges a batch of new rows into that table? We’ll need to re-implement this same logic for every stored procedure that modifies this table. This is a form of copy-and-paste and is both time consuming and error prone.

Besides, unless you can guarantee that no applications can run modifications directly against the Teams table, it’s likely that your business rule will be bypassed at some point, and inconsistent data will be introduced.

It is much easier and safer to just create the business rule once, in one place, as a UNIQUE constraint, as shown in Listing 27.

1 2	ALTER TABLE dbo.Teams ADD CONSTRAINT UNQ_Teams_Name UNIQUE(Name) ;

Listing 27: The UNQ_Teams_Name constraint enforces the uniqueness of team names.

We can now let the database engine make sure that this business rule is always enforced, regardless of the module or command that modifies the table.

Turn to triggers when constraints are not practical

As we have seen, constraints are extremely useful in many simple cases. However, our business rules are often more complex, and it is sometimes not possible or not practical to use constraints. To demonstrate this point, let’s add one more table, TeamMembers, which references the Teams table through the TeamID column, as shown in Listing 28.

CREATE TABLE dbo.TeamMembers

(

TeamMemberID INT NOT NULL ,

TeamID INT NOT NULL ,

Name VARCHAR(50) NOT NULL ,

IsTeamLead CHAR(1) NOT NULL ,

CONSTRAINT PK_TeamMembers PRIMARY KEY ( TeamMemberID ) ,

CONSTRAINT FK_TeamMembers_Teams

FOREIGN KEY ( TeamID ) REFERENCES dbo.Teams ( TeamID ) ,

CONSTRAINT CHK_TeamMembers_IsTeamLead

CHECK ( IsTeamLead IN ( 'Y', 'N' ) )

) ;

Listing 31: Creating the TeamMembers table.

Suppose that we need to implement the following business rule: no team can have more than two members. Implementing this business rule in a trigger is quite straightforward, as shown in Listing 29, and you only have to do it once. It is possible, but much more complex, to implement this rule via constraints.

CREATE TRIGGER dbo.TeamMembers_TeamSizeLimitTrigger

ON dbo.TeamMembers

FOR INSERT, UPDATE

IF EXISTS ( SELECT *

FROM ( SELECT TeamID ,

TeamMemberID

FROM inserted

UNION

SELECT TeamID ,

TeamMemberID

FROM dbo.TeamMembers

WHERE TeamID IN ( SELECT TeamID

FROM inserted )

) AS t

GROUP BY TeamID

HAVING COUNT(*) > 2 )

BEGIN ;

RAISERROR('Team size exceeded limit',16, 10) ;

ROLLBACK TRAN ;

END ;

Listing 29: The TeamMembers_TeamSizeLimitTrigger trigger ensures that the teams do not exceed the maximum size.

With our business rule implemented in only one place, we can comprehensively test just one object. In order to test this trigger, we need some test data in our parent table, as shown in Listing 30.

INSERT INTO dbo.Teams

( TeamID ,

Name

)

SELECT 1 ,

'Red Bulls'

UNION ALL

SELECT 2 ,

'Blue Tigers'

UNION ALL

SELECT 3 ,

'Pink Panthers' ;

Listing 30: Adding some test data to the Teams table.

The script shown next, in Listing 31, verifies that we can successfully add new team members, as long as the teams’ sizes do not exceed the limit imposed by our trigger.

-- adding team members to new teams

INSERT INTO dbo.TeamMembers

( TeamMemberID ,

TeamID ,

Name ,

IsTeamLead

)

SELECT 1 ,

1 ,

'Jill Hansen' ,

'N'

UNION ALL

SELECT 2 ,

1 ,

'Sydney Hobart' ,

'N'

UNION ALL

SELECT 3 ,

2 ,

'Hobart Sydney' ,

'N' ;

-- add more team members to existing teams

BEGIN TRANSACTION ;

INSERT INTO dbo.TeamMembers

( TeamMemberID ,

TeamID ,

Name ,

IsTeamLead

)

SELECT 4 ,

2 ,

'Lou Larry' ,

'N' ;

ROLLBACK TRANSACTION ;

Listing 31: Testing the _TeamSizeLimitTrigger trigger with valid INSERTs.

The script shown next, in Listing 32, verifies that we can successfully transfer team members between teams, as long as the teams’ sizes do not exceed the limit.

BEGIN TRANSACTION ;

UPDATE dbo.TeamMembers

SET TeamID = TeamID + 1 ;

ROLLBACK ;

BEGIN TRANSACTION ;

UPDATE dbo.TeamMembers

SET TeamID = 3 - TeamID ;

ROLLBACK ;

Listing 32: Testing the _TeamSizeLimitTrigger trigger with valid UPDATEs.

So, we’ve proved that our trigger allows modifications that do not violate our business rules. Now we need to make sure that it does not allow modifications that do violate our business rules; there are quite a few cases, and we need to verify them all. First of all, Listing 33 verifies that we cannot add new team members if the resulting teams’ sizes are too big. All the statements in the script must, and do, fail.

-- attempt to add too many team members

-- to a team which already has members

INSERT INTO dbo.TeamMembers

( TeamMemberID ,

TeamID ,

Name ,

IsTeamLead

)

SELECT 4 ,

2 ,

'Calvin Lee' ,

'N'

UNION ALL

SELECT 5 ,

2 ,

'Jim Lee' ,

'N' ;

-- attempt to add too many team members to an empty team

INSERT INTO dbo.TeamMembers

( TeamMemberID ,

TeamID ,

Name ,

IsTeamLead

)

SELECT 4 ,

3 ,

'Calvin Lee' ,

'N'

UNION ALL

SELECT 5 ,

3 ,

'Jim Lee' ,

'N'

UNION ALL

SELECT 6 ,

3 ,

'Jake Lee' ,

'N' ;

Listing 33: Testing the _TeamSizeLimitTrigger trigger with invalid INSERTs.

Also, we need to make sure that we cannot transfer team members if the resulting teams’ sizes are too big, as shown in Listing 34. Again, all the following statements fail as expected.

-- attempt to transfer members from other teams

-- to a team which is full to capacity

UPDATE dbo.TeamMembers

SET TeamID = 1

WHERE TeamMemberID = 3 ;

-- attempt to transfer too many team members

-- to a team that is not full yet

UPDATE dbo.TeamMembers

SET TeamID = 2

WHERE TeamMemberID IN ( 1, 2 ) ;

-- attempt to transfer too many team members

-- to an empty team

UPDATE dbo.TeamMembers

SET TeamID = 3 ;

Listing 34: Testing the _TeamSizeLimitTrigger trigger with invalid UPDATEs.

The amount of testing needed to ensure that a trigger works as expected can be quite substantial. However, this is the easiest alternative; if we were to re-implement this business rule in several stored procedures, then the same amount of testing required for the single trigger would be required for each of these procedures, in order to ensure that every one of them implements our business rule correctly.

Unique filtered indexes

Last, but not least, in some cases filtered indexes also allow us to implement business rules. For example, suppose that we need to make sure that each team has at most one team lead. If you are using SQL Server 2008 and upwards, then a filtered index can easily implement this business rule, as shown in Listing 35. I encourage you to try out this index and see for yourself that it works.

CREATE UNIQUE NONCLUSTERED INDEX TeamLeads

ON dbo.TeamMembers(TeamID)

WHERE IsTeamLead='Y' ;

Listing 35: The TeamLeads filtered index ensures that each team has at most one team lead.

Summary

I hope that this article has proved to you that a copy-and-paste approach to code reuse will lead to multiple, inconsistent versions of the same logic being scattered throughout your code base, and a maintenance nightmare.

It has also demonstrated how common logic can be refactored into a single reusable code unit, in the form of a constraint, stored procedure, trigger, UDF or index. This careful reuse of code will reduce the possibility of bugs and greatly improve the robustness of our code.

Unfortunately, performance considerations may prevent us from reusing our code to the fullest. Yet, with careful benchmarking, we can usually find a nice compromise and develop code that is easy to maintain but still performs well enough.

Specifically, I hope the article has taught you the following lessons in defensive programming:

views are useful for simple reuse of non-parameterized queries
for reuse of parameterized queries, inline UDFs are often preferable to stored procedures
be wary of performance issues with scalar and multi-statement table-valued UDFs
if possible, enforce reusable business logic in a simple constraint, or possibly a filtered index in SQL 2008 and upwards
for more complex logic, triggers often offer the most convenient means of promoting reuse, but they require extensive testing.

Register for Simple Talk

Reusing T-SQL Code

The Dangers of Copy-and-Paste

How Reusing Code Improves its Robustness

Wrapping SELECTs in Views

Reusing Parameterized Queries: Stored Procedures versus Inline UDFs

Scalar UDFs and Performance

Multi-statement Table-valued UDFs

Reusing Business Logic: Stored Procedure, Trigger, Constraint or Index?

Use constraints where possible

Turn to triggers when constraints are not practical

Unique filtered indexes

Summary

Article tags

About the author

Alex Kuznetsov

Alex Kuznetsov's contributions

Articles

Books

Top topics

Alex Kuznetsov's latest contributions:

Painless Refactoring of SQL Server Database Objects

Lessons Learned from Six Years of Agile Database Development

Developing Low-Maintenance Databases

The Dangers of Copy-and-Paste

How Reusing Code Improves its Robustness

Wrapping SELECTs in Views

Reusing Parameterized Queries: Stored Procedures versus Inline UDFs

Scalar UDFs and Performance

Multi-statement Table-valued UDFs

Reusing Business Logic: Stored Procedure, Trigger, Constraint or Index?

Use constraints where possible

Turn to triggers when constraints are not practical

Unique filtered indexes

Summary

Article tags

Recommended

About the author

Alex Kuznetsov

Alex Kuznetsov's contributions

Articles

Books

Top topics

Alex Kuznetsov's latest contributions:

Painless Refactoring of SQL Server Database Objects

Lessons Learned from Six Years of Agile Database Development

Developing Low-Maintenance Databases