Defensive Error Handling

TRY...CATCH error handling in SQL Server has certain limitations and inconsistencies that will trap the unwary developer, used to the more feature-rich error handling of client-side languages such as C# and Java. In this article, abstracted from his excellent new book, Defensive Database Programming with SQL Server, Alex Kuznetsov offers a simple, robust approach to checking and handling errors in SQL Server, with client-side error handling used to enforce what is done on the server.

The ability to handle errors is essential in any programming language and, naturally, we have to implement safe error handling in our T-SQL if we want to build solid SQL Server code. SQL Server 2005, and later, superseded the old style @@Error error handling, with the TRY…CATCH blocks that are more familiar to Java and C# programmers.

While use of TRY…CATCH certainly is the best way to handle errors in T-SQL, it is not without difficulties. Error handling in T-SQL can be very complex, and its behavior can sometimes seem erratic and inconsistent. Furthermore, error handling in Transact SQL lacks many features that developers who use languages such as Java and C# take for granted. For example, in SQL Server 2005 and 2008, we cannot even re-throw an error without changing its error code. This complicates handling errors, because we have to write separate conditions for detecting exceptions caught for the first time, and for detecting re-thrown exceptions.

This article will demonstrate simple SQL Server error handling, using XACT_ABORT and transactions, describe the most common problems with TRY…CATCH error handling, and advocate that the defensive programmer, where possible, implements only simple error checking and handling in SQL Server, with client-side error handling used to enforce what is done on the server.

Prepare for Unanticipated Failure

Any statement can, and at some point inevitably will, fail. This may seem to be a statement of the obvious, but too many programmers seem to assume that once their code “works” then the data modifications and queries that it contains will always succeed.

In fact, data modifications can and do fail unexpectedly. For example, the data may not validate against a constraint or a trigger, or the command may become a deadlock victim. Even if the table does not have any constraints or triggers at the time the code is developed, they may be added later. It is wise to assume that our modifications will not always succeed.

Many queries, too, can fail. Just as a modification can become a deadlock victim, so can a SELECT (unless that SELECT is running under either of the two snapshot isolation levels). If a SELECT statement utilizes a user-defined function, then errors may occur in that function that will cause the query to fail. Other common causes of failure are queries that attempt to use a temporary table that does not exist, or contain subqueries that return more than one value.

Listing 1-1 demonstrates a very simple case of a SELECT statement that may succeed or fail depending on locale settings.

Listing 1-1: Language settings can cause certain date queries to fail

The main point is clear: when we develop T-SQL code, we cannot assume that our queries or data modifications will always succeed and we need to be prepared for such failures and handle them gracefully. When an unexpected error occurs during data modification, it is essential that execution of the statement is terminated, the database is returned to the state it was in before the statement started, and a message is sent to the calling client, giving some details of the error that occurred and the action taken as a result. Likewise, if a SELECT fails that is part of a longer transaction that has already modified data then these modifications must be undone as well.

Using Transactions for Data Modifications

In many cases, during data modifications, we cannot take our database from one consistent state to another in a single statement. If a data modification requires more than one statement to effect the required change, then explicit transactions should be used to ensure that these statements succeed or fail as a unit, and that our error handling can handle them as a unit.

For example, suppose that we need to log in one table all the modifications made to another table. Listing 1-2 shows the code to create the table to be modified (Codes) and the table in which the modifications will be logged (CodeDescriptionsChangeLog).

Listing 1-2: The Codes and CodeDescriptionsChangeLog tables

Note that the log table does not have a FOREIGN KEY constraint referring to the Codes table, because the log records need to be kept even if we delete the corresponding rows in Codes.

The procedure shown in Listing 1-3 modifies the Codes table, and logs the change in the CodeDescriptionsChangeLog table.

Listing 1-3: The ChangeCodeDescription stored procedure

Listing 1-4 runs a simple smoke test on the new procedure.

Listing 1-4: A smoke test on the ChangeCodeDescription stored procedure

It looks like the procedure works, right? Note, however, that this stored procedure does not attempt to determine whether or not either of the two modifications failed, and it does not handle possible errors. Of course, one might argue that this stored procedure, could be a component of a perfectly valid system, if it is invoked by an application that does all the error handling. However, that does not make it, as is, a valid component. There’s a huge risk that a developer who builds another application may find this procedure and decide to call it, unaware of the required error handling in the calling procedure.

It may seem that nothing could possibly go wrong during these two trivial modifications, but we still cannot assume that both modifications will always succeed. In fact, even in this trivial example, we can devise a case where one modification can fail: if two modifications occur simultaneously, we may get a primary key violation on the CodeDescriptionsChangeLog table.

Rather than reproduce that case here, we can prove the same point simply by creating a CHECK constraint that prohibits inserts and updates against the CodeDescriptionsChangeLog table, and demonstrates what happens when one of our modifications fails and we do nothing to detect and handle it.

Listing 1-5: An INSERT into CodeDescriptionsChangeLog fails, but the UPDATE of Codes succeeds, and we end up with an UPDATE that has not been logged

In order to avoid this situation, we need to begin a transaction, attempt to do both modifications, determine whether or not both completed successfully, and commit the transaction only if both modifications succeeded. If either modification failed, we need to rollback the transaction, as part of our error handling. T-SQL allows several ways to accomplish that. Let’s begin with the simplest approach: using transactions along with the XACT_ABORT setting.

Using Transactions and XACT_ABORT to Handle Errors

In many cases, we do not need sophisticated error handling. Quite frequently all we need to do, in case of an error, is roll back all the changes and throw an exception, so that the client knows that there is a problem and will handle it. In such situations, a perfectly reasonable approach is to make use of the XACT_ABORT setting.

By default, in SQL Server this setting is OFF, which means that in some circumstances SQL Server can continue processing when a T-SQL statement causes a run-time error. In other words, for less severe errors, it may be possible to rollback only the statement that caused the error, and to continue processing other statements in the transaction.

If XACT_ABORT is turned on, SQL Server stops processing as soon as a T-SQL run-time error occurs, and the entire transaction is rolled back. When handling unexpected, unanticipated errors, there is often little choice but to cease execution and rollback to a point where there system is in a ‘known state’. Otherwise you risk seeing partially completed transactions persisted to your database, and so compromising data integrity. In dealing with such cases, it makes sense to have XACT_ABORT turned ON.

Data Modifications via OLDE DB
Note that in some cases XACT_ABORT is already set to ON by default. For example, OLE DB will do that for you. However, it is usually preferable to explicitly set it, because we do not know in which context our code will be used later.

Listing 1-6 illustrates a basic error handling approach, whereby our modifications take place within an explicit transaction, having set XACT_ABORT to ON. The PRINT commands in the procedure are for demonstration purposes only; we would not need them in production code.

Listing 1-6: Using the XACT_ABORT setting and an explicit transaction

Note that although we want to roll back all the changes if an error occurs, we do not need to explicitly determine if there are any errors, and we do not need to explicitly invoke ROLLBACK in our code; when XACT_ABORT is set to ON, it all happens automatically. Listing 1-7 tests our altered stored procedure.

Listing 1-7: Testing the altered stored procedure

As we have seen, the stored procedure worked perfectly well. Of course, this is just the first in as series of tests we should perform on our stored procedure. Complete testing would include:

  • Making sure that if both the modification of the Codes table and the INSERT into the CodeDescriptionsChangeLog table succeed, then the transaction commits and both changes persist.
  • Verifying that if an UPDATE of the Codes table fails then the transaction rolls back. To reproduce a failure, we can use a similar technique; a CHECK constraint that makes sure all UPDATEs against Codes table fail.
  • Invoking the stored procedure without an outstanding transaction, when @@TRANCOUNT is 0. In that case we shall have to explicitly drop the CHECK constraint which we create in our test.

I encourage you to tweak Listing 1-7 and try out these other tests. In many cases, this simple approach of setting XACT_ABORT to ON and using an explicit transaction for modifications gets the job done without much effort. We should use this simple and robust approach unless we really need more sophisticated functionality from our error handling.

If we really want to do some more complex error handling on the server, using T-SQL, then we should use TRY…CATCH blocks, which are available in SQL Server 2005 and upwards.

Using TRY…CATCH blocks to Handle Errors

To handle errors in T-SQL modules, in SQL Server 2005 and upwards, we can use TRY…CATCH blocks. If any command inside the TRY block raises an error, the execution of the TRY block terminates immediately, which is similar to the behavior under XACT_ABORT setting. But, unlike with XACT_ABORT where the whole batch terminates, only the execution of the code inside the TRY block terminates, and the CATCH block begins to execute.

In cases where you are aware that a certain specific error could occur, then your error handling strategy can be different. You may attempt to add code to your CATCH block that corrects the error, or at least allows processing to continue. In these cases, it makes more sense to have XACT_ABORT set to OFF, so that you can handle the errors, and inform the calling client of what happened, without rolling back the entire batch.

As will become clear as we progress, my current philosophy is that all but the simplest error handling should be implemented, ideally, in a client-side language where the error handling is more robust and feature-rich than it is in SQL Server TRY…CATCH.

Therefore my goal here is not to cover TRY…CATCH in full detail, but to set out, with examples, some of the reasons why error handling in T-SQL can be complex and a little bewildering. I really want to encourage you to either fully understand all the ins and outs of T-SQL error handling, or to not to use it at all, except in the simplest cases.

Erland Sommarskog’s website, http://www.sommarskog.se/, is an excellent source of information on error handling. The book “Expert SQL Server 2005 Development” by Adam Machanic, Hugo Kornelis, and Lara Rubbelke is another great resource.

Finally, note that I do not cover “old-style” error handling, using @@ERROR, at all in this chapter. Use of @@ERROR has some well-known problems, such as inability to handle errors raised by triggers, and the fact that sometimes SQL Server simply fails to set its value correctly. In general, my advice would be to upgrade from @@ERROR to TRY…CATCH, or even better to client-side error handling for all but the simplest cases, as soon as possible.

A TRY…CATCH Example: Retrying After Deadlocks

Sometimes, it may make sense to use TRY…CATCH blocks to retry the execution of a statement, after a deadlock. One must exercise caution when doing so as retrying an UPDATE statement in this manner may lead to lost updates, as we discuss in detail in Chapter 10 of my book, Surviving Concurrent Modifications. The defensive programmer must take all possible measures to ensure that the possibility of deadlocks is minimized but, in some cases, it may be deemed acceptable, in the short term at least, to automatically retry after a deadlock.

In order to provide an example that you can run on your server, we’ll alter our ChangeCodeDescription stored procedure, as shown in Listing 1-8, so that it is high likely to be chosen as a deadlock victim, if it embraces in a deadlock with a competing session. Our goal here is not to demonstrate how to develop stored procedures that are unlikely to embrace in deadlocks, but to see how to use a TRY…CATCH block to retry after a deadlock.

If processing switches to our CATCH block, we will attempt to re-execute our transaction once more, in response to a deadlock; otherwise we will simply re-throw the error so that the calling client is notified and can respond.

Listing 1-8: Altering the ChangeCodeDescription stored procedure so that it retries after a deadlock

Before we run our test, let’s reset the test data in our Codes and CodeDescriptionsChangeLog tables.

Listing 1-9: Resetting the test data

We’re now ready to run the test. From one tab in SSMS, we’ll start a SERIALIZABLE transaction against the CodeDescriptionsChangeLog table, as shown in Listing 1-10.

Listing 1-10: Tab #1, Start a transaction against the CodeDescriptionsChangeLog table

From a second tab, invoke our stored procedure, as shown in Listing 1-11. The session will ‘hang’ in lock waiting mode, due to our SERIALIZABLE transaction accessing the CodeDescriptionsChangeLog table.

Listing 1-11: Tab #2, Invoke the ChangeCodeDescription stored procedure

Now return to Tab #1, and execute the commented UPDATE against the Codes table, from Listing 1-10, including the COMMIT. As soon as this code tries to execute, a deadlock is detected. SQL Server chooses our stored procedure execution from Tab #2 as the deadlock victim, since we deliberately contrived for this to be the case. The transaction in our TRY block is rolled back, but then our CATCH block is executed and we try to execute our stored procedure again. This time, since Tab #1 has now committed, the modification succeeds. The output from Tab #2 is shown in Listing 1-12.

Listing 1-12. Tab #2, output from execution of the stored procedure

Note also however, that the UPDATE we execute form Tab#1 is ‘lost’; its changes were overwritten when the re-try succeeded.

From these examples, we have learned the following:

  • If several modifications must succeed or fail together, use transactions, and roll the modification back, as a unit, if any one of them fails
  • Always anticipate that any modification may fail. Use XACT_ABORT to ensure that transactions roll back after a failure. Alternatively, we can wrap our transactions in TRY blocks, and roll them back in CATCH blocks.

Unfortunately, there are a few problems with using TRY…CATCH error handling that we need to discuss. In the next section, we’ll look at some ways in which TRY…CATCH error handling is limited and its behavior surprising. We’ll then see what we can achieve when using C# for error handling, instead of T-SQL.

TRY…CATCH Gotchas

T-SQL is not really an efficient language for error handling, and is certainly less robust than error handling in client side languages such as C++, Java, and C#. As such, although in most cases TRY…CATCH blocks work as expected and catch errors as they should, there are also quite a few “special cases” that we need to know about, where the behavior is not as we might expect.

Furthermore, TRY…CATCH error handling does not really facilitate code reuse. If we want to use this approach in another stored procedure, we cannot fully reuse our T-SQL error handling code; we have to cut and paste much of it into that other stored procedure. This is a recipe for bugs and inconsistencies.

Over the following sections, we’ll discuss some of the special cases of which we need to be aware, when using TRY…CATCH.

Re-throwing Errors

In many cases, we do not wish to handle certain errors in our CATCH block, and instead want to re-throw them, so that they are handled elsewhere. In our previous example, where we wished to retry execution after a deadlock, all other errors were handled by capturing the error message, using the ERROR_MESSAGE function, and re-throwing the error using RAISERROR. However, the error message on its own is generally insufficient; we should also retrieve the information from the ERROR_LINE, ERROR_NUMBER, ERROR_PROCEDURE, ERROR_SEVERITY, and ERROR_STATE functions, declare variables to store this information, and then use RAISERROR to re-throw it. This is very verbose and, as we shall see later, we can achieve exactly the same outcome in C# by issuing one single command: throw.

However, the real problem with the TRY…CATCH approach is this: RAISERROR cannot preserve ERROR_NUMBER, so when we re-throw an error we often change its error code. For example, consider the ConversionErrorDemo stored procedure in Listing 1-13. It attempts to cast a string as an integer in the TRY block, and then in the CATCH block invokes two of the seven error handling functions and re-throws the error.

Listing 1-13: An error with error number 245, which gets a different ERROR_NUMBER, 50000, when re-thrown.

The fact that re-thrown errors get a different error number means that, when we actually come to handling conversion errors, both re-thrown and original, we cannot catch then using the  error number alone, as shown in Listing 1-14.

Listing 1-14: The re-thrown error is no longer assigned number 245.

To catch both the original and re-thrown error, we need to parse the error message, as shown in Listing 1-15.

Listing 1-15: Parsing the error message to catch a re-thrown error.

Although. This time, we did catch our re-thrown error, our method is not robust: we can by mistakenly catch other errors and handle them as if they were conversion errors, as shown in Listing 1-16.

Listing 1-16: Incorrectly handling a ticket-saving error as if it were a conversion error.

As we have seen, the inability of T-SQL to re-throw errors may prevent us from robustly handling re-thrown errors. If we need to re-throw errors, we should do it on the client.

TRY…CATCH Blocks Cannot Catch all Errors

Interestingly enough, sometimes TRY…CATCH blocks just do not catch errors. Sometimes, this represents “expected behavior”; in other words, the behavior is documented and the reason why the error is not caught, for example when a connection fails, is intuitive. However, in some other cases the behavior, while still documented, can be quite surprising.

In either case, however, it means that we cannot assume that all errors originating in the database can or will be handled in a TRY…CATCH. Whenever we issue an SQL statement from the client, we need to be aware that it can generate an exception, and we need to be ready to handle it on the client, in case the TRY…CATCH blocks that we use in our T-SQL code doesn’t catch it.

Killed Connections and Timeouts

In some cases, it is the expected behavior that errors cannot be caught by TRY…CATCH blocks. For example, if your connection is killed, it is documented and well known that your CATCH block will not catch and handle it.

Also, we need to be aware of ‘attentions’, also known as ‘timeouts’, as they also cannot be caught by TRY…CATCH blocks, and this is also the expected behavior. To demonstrate this, start the script in Listing 1-17, but cancel its execution immediately by pressing the “Cancel Executing Query” button.

Listing 1-17: TRY…CATCH behavior when a timeout occurs

The execution stops immediately, without executing the CATCH block. Listing 1-18 demonstrates that the connection is still in the middle of an outstanding transaction.

Listing 1-18: The connection is in the middle of an outstanding transaction

If the client initiates a timeout, the behavior is exactly the same: the execution stops immediately, the outstanding transaction is neither committed nor rolled back, and an unhandled exception is sent to the client. This is simply how timeouts work and the only way to avoid this behavior is to turn it off altogether. For instance, we can turn off timeouts in ADO.NET by setting the CommandTimeout property to 0.  Of course, we can turn XACT_ABORT on, in which case at least the transaction will be rolled back. The CATCH block, however, will still be bypassed.

Problems with TRY…CATCH Scope

In some cases, the behavior is TRY…CATCH is documented, but will be surprising to developers used to error handling in languages such as C#.

Listing 1-19 demonstrates a simple case of a query, wrapped in a TRY…CATCH, which tries to use a temporary table that does not exist. However, the CATCH block is not executed, and we get an unhandled exception.

Listing 1-19: Sometimes a CATCH block is bypassed when an error occurs

Even more surprising for object-oriented developers is that this is not a bug; it is the just the way SQL Server works in this case. According to MSDN for SQL Server 2008,

“Errors that occur during statement-level recompilation…are not handled by a CATCH block when they occur at the same level of execution as the TRY…CATCH construct”.

The issue here is that compilation errors that occur at run-time (as a result of deferred name resolution) abort the rest of the scope, which is equal to the batch in directly submitted SQL, but only equal to the rest of the procedure in a stored procedure or function. So a TRY…CATCH at the same scope will not intercept these errors, but a TRY…CATCH on a different scope (regardless of being nested or not) will catch it.

My point here is simple: SQL Server does not always handle errors in a way object oriented languages do. If we choose to use the error handling provided by SQL Server, we really need to learn it in detail or we will be in for some unpleasant surprises.

Doomed Transactions

There is another serious problem with T-SQL TRY…CATCH blocks: in some cases an error that occurred inside a TRY block is considered so severe that the whole transaction is doomed, or, in other words, it cannot be committed. Theoretically, the concept of doomed transactions makes perfect sense. Unfortunately, some really trivial errors, such as conversion errors, render transactions doomed if we use TRY…CATCH provided by T-SQL. For example, consider the transactions shown in Listing 1-20. The first attempts to perform a 1/0 calculation and the second to convert a strong to an integer. We do not want to roll back the whole transaction if an error occurs, so we set XACT_ABORT to OFF.

Listing 1-20: A transaction is doomed after a trivial error such as a conversion error.

As the output demonstrates, we can commit a transaction after a divide by zero, but a conversion error renders the transaction doomed, and therefore un-commitable. The latter case demonstrates that even a seemingly-trivial conversion error considered severe enough to override the XACT_ABORT setting, and the whole transaction is automatically rolled back.

To determine whether or not our transaction is committable, within TRY…CATCH, we can use the XACT_STATE() function, as demonstrated in listing 1-21.

Listing 1-21: Using xact_state to determine if our transaction is committable or doomed

Clearly there are situations where the concept of a doomed transaction makes sense. For example, if the server runs out of disk space while running a transaction then there is no way the transaction could complete. Unfortunately, the current implementation of SQL Server sometimes dooms transactions for very trivial reasons. In all-too-many cases this peculiar behavior of SQL Server makes it impossible to develop feature rich error handling in T-SQL, because if a transaction is doomed, we have no choice other than to roll it back.

We will not cover any examples here, but this can also cause problems when attempting to use SAVEPOINTs. Consider the following, very common, requirement:

“If our stored procedure is invoked in the middle of an outstanding transaction, and if any command in our stored procedure fails, undo only the changes made by the stored procedure. Do not make any decisions regarding the changes done outside of our stored procedure”.

Unfortunately, there is no robust way to implement such requirements in T-SQL using a SAVEPOINT. While it will work in most cases, it will not work as intended when a transaction is doomed.

Client-side Error Handling

In order to overcome the described limitations and difficulties with error handling using SQL Server’s TRY…CATCH, my advice is simple: when we need to implement feature-rich error handling, to respond intelligently to an anticipated error, we should do it in a language that offers more robust error handling, such as C#.

By doing so, we avoid complications caused by doomed transactions (for example, trivial conversion errors in a C# TRY block will never doom a transaction), or by error numbers being changed when they are re-thrown, and so on. Furthermore, once error handling is implemented in a C# class it can be re-used by all modules that need it, so we promote code reuse to its fullest extent.

Nowadays many of us developers use more than one language in our daily activities, and the reason is very simple and very pragmatic: in many cases it is much easier to learn a new language to accomplish a specific task, to which the language is well-suited, than it is to try to ‘bend’ a single language to all purposes.

By way of an example, Listing 1-22 re-implements in C# our “retry after deadlock” logic, from Listing 1-8. We need only implement this logic once, and we can use this class to execute any command against SQL Server.

Listing 1-22. Implementing the “retry after deadlock” logic in a C# class.

Let’s try this class out. First of all, we need to remove the retry logic from our ChangeCodeDescription stored procedure, but keep it just as prone to deadlocks as before. Listing 1-23 shows how to accomplish that.

Listing 1-23: Removing the retry logic from the ChangeCodeDescription stored procedure.

Obviously we’d first need to test this procedure and verify that it can successfully complete; a step that I will leave as a simple exercise.

In order to test what happens when we have a deadlock, we need to first reset our test data by rerunning script 1-9. Next, start a SERIALIZABLE transaction against the CodeDescriptionsChangeLog table, by running the script in Listing 1-10.

Rather than invoke our ChangeCodeDescription stored procedure forma  second SSMS session, as before, we need to execute the C# code shown in Listing 1-24, which invokes the same stored procedure through our RetryAfterDeadlock method.

Listing 1-24: Using the RetryAfterDeadlock method to invoke our stored procedure.

This method will not complete, as the table is locked by our SSMS transaction. Return to SSMS and highlight and execute the commented code, both the UPDATE command and the COMMIT. The transaction invoked from C# will be chosen as a deadlock victim and it will retry, and there is enough debugging output in our C# code to demonstrate what is happening.

Finally, let us verify that, after the retry, the modification completed, as shown in Listing 1-25.

Listing 1-25: Checking that the data is in the expected state.

In short, C# allows us to implement our “retry after deadlock” logic just once and reuse it as many times as we need. As defensive programmers, we really want to reuse our code, not to cut and paste the same code all over our systems and so we have a strong motivation to use a good modern tool such as C# for our error handling.

My message here is quite moderate. I am not suggesting that we abandon T-SQL error handling; far from it. In the simplest cases, when all we need is to roll back and raise an error, we should use XACT_ABORT and transactions. Notice that in Listing 1-23, we use XACT_ABORT and a transaction to roll back after a deadlock, but we implement all of the more complex error handling logic in C#.

Of course, there are situations when we do need to implement error handling in T-SQL. Whenever we are considering such an option, we need to realize that error handling in T-SQL is very complex and not really intuitive to a developer with experience in other languages. Also, it has a lot of gotchas, and it lacks some features which client side programmers consider as their birthright, such as the ability to re-throw an error exactly as it was caught.

Conclusion

It is essential that the defensive database programmer includes robust error handling in all production T-SQL code. However, as much as the introduction of TRY…CATCH has improved error handling in T-SQL, it still lacks the versatility, elegance and ease of use that is typical of client-side languages such as Java and C#. Ultimately, you will find that it is not possible to handle certain errors in Transact SQL at all and that we need to complement our T-SQL error handling with error handling on the client.

I hope this article has taught you the following specific lessons in defensive error handling:

  • If you already use a modern language such as C# in your system, then it makes sense to utilize it to do complex handling of errors related to the database
  • If handling errors on SQL Server, keep it simple where possible; set XACT_ABORT to ON and use transactions in order to roll back and raise an error
  • If you wish to use TRY…CATCH, learn it thoroughly, and watch out in particular for the following problems:
    • One and the same code may run differently depending on XACT_ABORT setting
    • We cannot re-throw errors exactly as we catch them
    • CATCH blocks do not catch all errors
    • Some errors do not respect XACT_ABORT settings
    • Some transaction may be rendered un-committable, a.k.a. doomed

This article is an extract from Alex’s book ‘Defensive Database Programming’ available from Amazon now.