Matters of Timing in Concurrency

There are a few topics in SQL Server that I love academically, but are truly a pain to get right. The other is security, and it pales in comparison to how much trouble it is to program for truly safe concurrent access to a resource. The biggest issue: time. What makes it more difficult is that it is not as easy to test as if a column can hold a negative value, or if user X can access column C.

Most of the time, when you are thinking about concurrency issues, they are couched in a discussion of performance. As in: “There is blocking happening on the server, so everything is running slow.” This is a very important problem to fix, as performance is something that your customers will notice and complain about, and loudly. Of course, there is that one magic bullet that is commonly applied by developers, that being the NOLOCK hint. No blocking, no concurrency issues, voila!

However… magic always comes at a price, in this case data integrity (which is NOT a price you want to pay.) As a data architect I appreciate locking, blocking, and even deadlocks for what they are: protection. When two applications are attempting to access the same resources in a server, there has to be some level of coordination between them. In later blogs, I will look deeper into this coordination, and the mechanics of how connections are synchronized. In this blog, I want to cover the concerns of time. The problem we have discussed so far has been that of synchronous access. That is the easy part, and is left up to the relational engine, with a few hints and tweaks along the way. It is when the access is asynchronous that things get interesting.

Asynchronous access… that’s caching data right? You may then be thinking that “my application does no caching”, so I have no worries. Two things. 1: 99.9999% of all applications cache data for a period of time. 2: Caching data is not 100% of the problem, as even two UPDATE statements can experience asynchronous concurrency problems.  I think of the problem in three different areas:

  • Long term caching: The application fetches data at some point, and the user works from this data, saving changes perhaps hours later. Data may have changed during that time period.
  • Short term caching: The application fetches a row from the database, the user makes changes, saves it perhaps seconds later. Data may have changed during that time period.
  • Synchronous contention resolution: The application may never store the data, but simply updates rows directly. Blocking occurs, so time passes, perhaps just microseconds, before the changes are made. Data may have changed during that time period.

Data may have changed during that time period, that is the problem in all cases. The problem for the system architect is to make sure that each of these scenarios does not result in data loss. The two caching options should be at least somewhat obvious what can occur. Consider the following table of actions, with time passing from top of the table to the end.

 

Connection1

Connection2

Fetch Row 1

Fetch Row 1

Update Row 1, Column 1 to ‘A’

 

 

Update Row 1, Column 1 to ‘B’

Select Row 1

Select Row 1

Both connections fetch the same row, each update it based on their view of the row, and then SELECT the row back out. When Connection 1 fetches the row at the end of its batch, it gets a view of the row that says Column1’s value is ‘B’ that does not match what it expects since it just updated that darn row. The change that it made to Column 1 is gone.

Transactions at any isolation level may not make any difference. The only difference that a transaction would perhaps afford Connection 1 is that if the SELECT was part of the transaction, the return value would likely be ‘A’, but Connection 2 is still going to change the row’s value without some additional coding.

Back in the old days, you might remember “edit mode” for a row, where no other row could even read the row. This was terrible for performance, and really painful to code. This was the least concurrent method of operation.

The problem can even exist between something as simple as two UPDATE statements:

 

Connection1

Connection2

Update Row 1, Column 1 to ‘A’

Update Row 1, Column 1 to ‘B’

Select Row 1

Select Row 1

Two users start an UPDATE statement to update the same row simultaneously. In SQL Server, there are two possibilities for what will occur. Using the on-disk engine, one connection would be blocked, and when it comes unblocked, would clobber the other connection’s changes. Using the in-memory engine, an error would be thrown by one connection if both updates were attempted simultaneously (which you could then attempt your update and it would work, with the same effect.)

These asynchronous issues are discussed as the Lost Update problem, and it is an issue that makes the complexity of creating an application where data cannot be lost, complex. I will cover it in more detail later, but until then, if I have scared you into considering that all of your applications may have situations where you could easily overwrite one user’s data with another’s, the solution is easy to understand and is called optimistic locking. The technique uses the same mechanisms that the in-memory engine uses to ensure two users don’t overwrite another’s changes.

You check to make sure that the data you are modifying is the same as you fetched. Usually done using some bit of data, perhaps a time value or a version number (the rowversion datatype works nicely). If the data in that column has changed, then throw an error and say “something has changed”.