Trigger Validations and Isolation Levels

Writing data integrity code in TRIGGER objects is generally is pretty simple. Typically you write a query to see “is there a row in inserted/deleted that is not set up as desired for the current transaction?” and no bad data is found, keep moving. And because MOST of the time, you would only use a TRIGGER object where transactions are quick coupled with the reality that a lot of the issues like I will describe would have to occur within milliseconds… concurrency issues seem to rarely occur. But…anything that isn’t impossible, is possible. Because of the rarity of the issue, if your code isn’t running a life of death system, when the anomalies happen, they are brushed off as “just something weird occurred”. 

This blog will cover one such weird occurrence that can happen when your trigger validation code queries another table in the default isolation level. You expect that your data is protected until your transaction ends, but it may not be.

As an example scenario, consider something like a Payment being processed for a Sale. For example, say the payment was set to ‘Processed’, but the user asked for a refund.  The users sets the Payment status to ‘Refunded’ and expects that there are no Sale rows recorded for that Payment. (The sale could be a shipment, etc.. Admittedly this scenario is very thin, and is really not a life or death scenario, but it should do for an example scenario to make the concurrency concerns clear.) Yet, somehow when the process has completed, it turns out that a Sale has been recorded.

To demonstrate the issue, I will first create two barebones tables, one for the Payment, and one for the Sale: 

Next, I created a TRIGGER object to check to see if the Payment is Processed before inserting the Sale row: 

Now, a few basic tests:

Hence the error message comes from the FOREIGN KEY constraint:

Msg 547, Level 16, State 0, Line 107
The INSERT statement conflicted with the FOREIGN KEY constraint "FKSale$References$PaymentId". The conflict occurred in database "tempdb", table "Example.Payment", column 'PaymentId'.

Now we set the Payment to Refunded and try again.

Execute the following statement.

And as expected, it returns the following error:

Msg 50000, Level 16, State 16, Procedure Sale$InsertCheckStatusTrigger, Line 40 [Batch Start Line 125]
The status code of PaymentId 1 is not 'Processed'.

Perfect. Now insert 2 rows:

This shows us the multi-row error message. (Note: Always test multiple rows. I have been working on TRIGGER examples the past few days and I got something wrong in the object on my first try because I didn’t adequately allow for rows that have good values in them mixed with bad values.)

Msg 50000, Level 16, State 16, Procedure Sale$InsertCheckStatusTrigger, Line 40 [Batch Start Line 134]
A PaymentId in the inserted Sale rows is not Processed

However, what if you are in the process of inserting a Sale row when the status is changed. For example, since 2 is still Processed, lets start a transaction and insert a new Sale row, without committing the transaction

Then on another connection, without starting a transaction, change the status:

This succeeds immediately. Now check the data:

This returns ‘Refunded’, even while we are in the active transaction.

SaleId      StatusCode TranCount
----------- ---------- -----------
3           Refunded   1

Neither transaction has been blocked from doing what they want, but if I commit this transaction, the data is not in the status that we desire, and we have presumably just shipped product that will have real costs to ship and get back. Obviously this is not a BIG issue for something like this, but it is not what you expect when you build the code this way. So I will rollback:

Rather, what is needed is to change the isolation level of the validation code to REPEATABLEREAD. You can do this with an isolation level hint, or SET TRANSACTION ISOLATION LEVEL statement. I will change the trigger to use a REPEATABLEREAD hint on the query to Example.Payment:

Reset the status of Example.Payment to Processed.

Now start a transaction, and insert a new sale:

Worked as expected. Then on another connection, without starting a transaction, change the status:

This statement is now blocked because the REPEATABLEREAD hint prevents rows that have been used in the transaction from changing, and hence it holds the shared locks until we commit our transaction. Look at the data:

This returns processed, and we won’t let them set the refunded status until we are finished with this row.

SaleId      StatusCode TranCount
----------- ---------- -----------
3           Processed  1

COMMIT the transaction and the UPDATE statement in the other connection will clear.

Of course, after the status update on the Example.Payment row goes through, your data is still in the situation you didn’t wish it to be in. Naturally this is because the update of the Payment should have its own checks to make sure that if the order has shipped, you can’t change the status (which in this particular scenario, should have caused the Payment update to be blocked if it was coded to look at all of the data that used the PaymentId.)  That refund process probably would do something to try to cancel the Sale rows.

The basics here are that time you use a trigger (and really any code) to validate some piece of data, consider if you need to use REPEATABLE READ isolation level (to stop rows you have read from being changed or deleted) or SERIALIZABLE (to add to REPEATABLE READ that no new rows could be introduced that affect your validation either.)

Concurrency is really the hardest part of building T-SQL code. There are just so many tiny details like this that complicate matters. That one hint took 10 seconds to code, but without it, we might randomly send product to customers that they didn’t pay for… Which never happens, right? I mean, who hasn’t gotten some form of messed up order?. I know I have gotten double my order on occasion. Most of those problems likely come down to concurrency issues, or poor transaction management (or both.)