Defrosting frozen screens: troubleshooting SQL Server blocking problems

By Tony Davis

A call comes in to the DBA from Help Desk. There is an urgent problem with a reporting application. Unhappy users say their screens are all "frozen", and not because of an overactive A/C vent. Some have reported seeing a timeout issue related to SQL. Is it a blocking problem?

As a DBA, you may need to use various system views and other tools to piece together a picture of which processes are active but blocked, which sessions are causing the blocking, what SQL is running, which locks are involved, and so on. Ideally, you'll receive an alert, providing all the required diagnostic information, so you can resolve the issue before Help Desk even pick up the phone.

What causes SQL Server blocking?

A database is a shared environment where many user transactions will be competing for simultaneous access to the same shared resources, such as tables and indexes.

Generally, SQL Server mediates access to these shared resources by acquiring various types of lock. Blocking occurs when one or more sessions request a lock on a resource, such as a row, page, or table, but SQL Server cannot grant that lock because another session already holds a non-compatible lock on that resource.

Assuming use of the default isolation level (READ COMMITTED), let's say that a transaction modifies a few rows in a table. SQL Server acquires Exclusive (X) locks on those rows. This lock mode is incompatible with other lock modes. A second transaction that wishes to read those same rows (which requires acquisition of a Shared (S) mode lock) will be blocked until the transaction holding the X lock commits or rolls back.

Locking, and the blocking it causes, is usually fleeting and is a perfectly normal and desirable operational characteristic of a database. It ensures, for example, that a transaction doesn't read data that is in flux (i.e. it prevents "dirty reads"), and that two transactions can't change the same row of data, which could lead to data corruption.

However, when blocking occurs for extended time periods, it can impact the performance of many user processes.

Diagnosing current blocking

If we're made aware of a potential blocking problem while it is still occurring, then SQL Server provides a wealth of information to help us investigate and resolve the problem.

Start with the sys.dm_os_waiting_tasks dynamic management view (DMV). If a request is active, but waiting to acquire a resource in order to proceed, it will appear in this view. The view tells us the type of wait, the resource on which the request is waiting for access, and more. If blocking is the source of our problem, we'd expect to see waits to acquire locks on rows, pages, tables, and so on.

There are plenty of different ways to query this DMV, joining to other DMVs for details of the blocked and blocking sessions and the queries being executed. I used the query provided in SQL Server Performance Tuning using Wait Statistics: A Beginner's Guide (free PDF download).

Figure 1 shows some sample output (I've split the result across two screens for legibility).

Checking blocked sessions with the sys.dm_os_waiting_tasks DMV

Figure 1 – checking blocked sessions with the sys.dm_os_waiting_tasks DMV

The wait type for locks has the form LCK_M_<lock type>. So, for example, LCK_M_SCH_M is a wait to acquire an SCH_M (Schema Modification) lock. Figure 1 shows a blocking chain involving three sessions (IDs 67, 69, and 71). At the head of the chain is session 67, which is blocking session 69, which in turn is blocking session 71.

Session 67 is the only session that is not waiting to acquire a lock. It is running a transaction against the SalesOrderHeader table. Typically, this might be a transaction that, for whatever reason, is taking a long time to modify data, or perhaps has experienced an unexpected error that caused the batch to abort but left the transaction open (uncommitted).

The locks held by session 67 are blocking session 69 from acquiring the lock that it needs in order to modify the clustered index on SalesOrderHeader (in this case, to perform an online index rebuild). At the bottom of the chain, session 71 is blocked, waiting for session 69 to complete its work, so that it can acquire a shared read lock on the SalesOrderHeader table, in order to read the required rows.

Severe blocking versus deadlocks

DBAs and developers sometimes misdiagnose cases of severe locking, with multiple sessions blocked at various points of a long blocking chain, as deadlocks, primarily because it can seem like the entire database is at gridlock.

The big difference is that a deadlock causes a specific error, and one of the transactions will be rolled back. With blocking, no error is raised. The session at the head of a blocking chain will not be waiting for a lock. It may be waiting for a latch, or a memory allocation, or IO, but the blocking chain will clear as soon as the resource becomes available and the head blocker can complete its work.

There is a wealth of other information that we can mine from various sources to diagnose current blocking issues. For example, if we need further details on the types of locks held by each session in the chain, on which resources, as well as locks that sessions are waiting to acquire, we can use the sys.dm_tran_locks DMV, supplying the session IDs acquired previously. DMVs such as sys.dm_tran_active_transactions DMV will provide a list of all transactions that are active at the time the query is executed. sys.dm_exec_sessions will reveal the owner of any blocking sessions, and so on.

However, if all these DMVs only show data for current queries, how do we retrospectively diagnose a case of "frozen screens"? How can we monitor SQL Server in order to gauge whether severe blocking is causing performance problems on our servers?

Monitoring for SQL Server blocking

If we suspect that blocking may be an ongoing issue for a particular database, but it's not happening at the moment, there are a number of tools we can use to investigate. For example, the sys.dm_os_wait_stats DMV will tell us if we have large accumulated wait times related to locking waits.

Accumulated wait times for each wait, as shown by the sys.dm_os_wait_stats DMV

Figure 2 – accumulated wait times for each wait, as shown by the sys.dm_os_wait_stats DMV

We can also use the sys.dm_db_index_operational_stats DMV to look for indexes that are locking 'hotspots', as demonstrated for example by Jason Strate.

Indexes associated with high lock waits, as shown by the sys.dm_db_index_operational_stats DMV

Figure 3 – indexes associated with high lock waits, as shown by the sys.dm_db_index_operational_stats DMV

These aggregated wait statistics and index usage statistics can help identify a problem, but do not identify the cause on their own.

A tool such as Performance Monitor (PerfMon) provides some counters for monitoring for cases of excessive locking and blocking. For example, the SQLServer:General Statistics object will show the number of blocked processes detected. The SQLServer:Locks object can provide Avg Wait Time (ms), Lock Waits/sec, and more. However, again, these really only provide an indication of a possible problem.

We need to set up a monitoring solution that not only alerts us to severe blocking as a potential problem, but provides enough information to diagnose and resolve it.

Extended Events: blocked process report

If the DMV data and PerfMon counters show worrying evidence of excessive locking and blocking, it's worth setting up SQL Trace, or preferably an Extended Events event session, to log occurrences of blocking that exceed a specific time threshold, and capture the blocked process report for these events.

By default, the "blocked process threshold" is zero, meaning that SQL Server won't generate the blocked process reports. We need to configure by the "blocked process threshold" to a specific value (in seconds) using the sp_configure option. Jeremiah Peschka shows how to configure the threshold then define an extended event to capture the report.

Listing 1 shows the truncated output for one of the reports, showing the online index rebuild (session 69) blocked by an uncommitted transaction on SalesOrderHeader (session 67).

	<blocked-process-report monitorLoop="694950">
	 <blocked-process>
	  <process id="process48d496108" taskpriority="-10" logused="168" waitresource="OBJECT: 19:1266103551:0 " waittime="116153" ownerId="136795154" transactionname="ALTER INDEX" lasttranstarted="2016-05-10T18:01:13.367" XDES="0x4753fb030" lockMode="Sch-M"
	…output truncated…">
	   <executionStack>
	    <frame line="1" stmtend="194" sqlhandle="0x01001300cb55c82e40623db60400000000000000000000000000000000000000000000000000000000000000" />
	   </executionStack>
	   <inputbuf>
	ALTER INDEX PK_SalesOrderHeader_SalesOrderID ON Sales.SalesOrderHeader
	REBUILD WITH (ONLINE = ON);
	   </inputbuf>
	  </process>
	 </blocked-process>
	 <blocking-process>
	  <process status="sleeping" spid="67" sbid="0" ecid="0" priority="0" trancount="1"
	…output truncated…">
	   <executionStack />
	   <inputbuf>
	BEGIN TRANSACTION
	SELECT  FirstName ,
	        LastName ,
	        SUM(soh.TotalDue) AS TotalDue ,
	        MAX(OrderDate) AS LastOrder
	FROM    Sales.SalesOrderHeader AS soh WITH (REPEATABLEREAD)
	        INNER JOIN Sales.Customer AS c ON soh.CustomerID = c.CustomerID
	        INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID
	WHERE   soh.OrderDate &gt;= '2011/01/01'
	GROUP BY c.CustomerID ,
	        FirstName ,
	        LastName  ;

	   </inputbuf/>
	  </process/>
	 </blocking-process/>
	</blocked-process-report/>
							

Listing 1 – output from a blocked process report, showing three blocked sessions

While the blocked process report provides the information we need to troubleshoot cases of blocking, it's hardly in an easy-to-digest format, and you'll often find an overwhelming number of reports generated, because a single blocking chain can generate multiple reports if it exceeds the value for the threshold multiple times.

Troubleshooting SQL Server blocking using Redgate Monitor

A tool such as Redgate Monitor provides monitoring and alerting for blocked processes. It raises the blocking process alert against any SQL process that has been blocking one or more other processes for longer than a specified duration. By default, it raises a Low severity alert when blocking exceeds 1 minute, but as with any alert in Redgate Monitor, we can adjust the threshold and set multiple levels of alerts for different thresholds. We're also likely to see Long-running query alerts relating to any blocked-process queries.

Long running query and Blocking process alerts in Redgate Monitor

Figure 4 – Long running query and Blocking process alerts in Redgate Monitor

If we click into the Blocking process alert, we can see that it relates to the head blocker in our chain, session 67. It has two blocked descendants. The session isn't actually executing any SQL. It is simply a transaction that has failed to commit, and is still holding locks on the target table. Details of a Blocking process alert in Redgate Monitor

Figure 5 – details of a Blocking process alert in Redgate Monitor

If we switch from the Details tab to the Processes tab, we see the full blocking chain, with session 67 at the top, which is blocking session 69, which is blocking session 71. We also see the text of the queries being executed by the two blocked sessions (69 and 71).

A full blocking chain, from the Processes tab of Redgate Monitor's Blocking process alert

Figure 6 – a full blocking chain, from the Processes tab of Redgate Monitor's Blocking process alert

In the Performance Data section of the page, on the Top queries tab, we see any queries running around the time of the alert, revealing the query that session 67 was running, which failed to commit, with its plan handle, and any related waits.

A query and its plan handle that's causing blocking, from the Top queries tab in Redgate Monitor

Figure 7 – a query and its plan handle that's causing blocking, from the Top queries tab in Redgate Monitor

Redgate Monitor also provides, as contextual information, all sorts of snapshots, aggregations, and summaries of resource usage on the server around the time of the alert. In this case in particular, we can see the spike in lock waits around the time of the alert (the green line at around 6.09 PM).

A spike in lock waits around the time of Redgate Monitor's Blocking process alert

Figure 8 – a spike in lock waits around the time of Redgate Monitor's Blocking process alert

Resolving blocking

Very often, resolving SQL Server blocking issues requires tuning inefficient queries and data modifications that run longer than necessary, and therefore cause blocking. Sometimes, queries run within transactions, when there is no real need for them to do so. Transactions need to be kept as short as possible, without compromising transactional integrity.

Sometimes indexes will help. In our example, session 67's query had to scan the entire SalesOrderHeader table, due to a lack of appropriate index on the search column (Orderdate). This means it likely read more data than was necessary, took longer to run, and so held locks longer than it needed to.

There are other possible issues to look out for. Figure 8 shows that session 67's query ran under the REPEATABLE READ isolation level. Aside from finding out why the transaction failed to commit, we could investigate whether the query truly requires this restrictive isolation level, where locks are held until the transaction commits. By rewriting to use the default READ COMMITTED isolation level, S locks would be released after statement completion, meaning that the index build would not be blocked, once the query had completed.

Alternatively, we might consider using one of the snapshot-based isolation levels, such as READ_COMMITTED_SNAPSHOT, which prevents any read phenomena. In this level, transactions don't take S locks when reading data, so they don't block other processes. Kalen Delaney's SQL Server Concurrency eBook (free download) provides a lot more details on the advantages, and possible drawbacks, of snapshot-based isolation levels.

A final option in this specific example might be to use the WAIT_AT_LOW_PRIORITY option (new in SQL Server 2014) for the online index rebuild. This would allow some waiting sessions to skip past it in the queue, if the index rebuild itself was blocked. In this example, using this option, session 71 would not be blocked since it could skip past and acquire the S lock it needed on the SalesOrderHeader table, as this lock mode is compatible with the S lock held by session 67.

Summary

Locking and blocking is a normal and required activity in SQL Server. However, if we execute long, complex transactions that read lots of data, and take a long time to execute, then locks will be held for extended periods, and blocking can become a problem. Likewise, if our databases lacks proper design characteristics, such as keys, constraints and indexes, or if our transactions use restrictive transaction isolation levels (such as REPEATABLE READ or SERIALIZABLE).

If blocking becomes severe, database performance grinds to a halt, and as a DBA we need to act very quickly, before an icy atmosphere develops among users.

Send free trial to your PC

Try Redgate Monitor free for 14 days

Monitoring the activity and performance of SQL Server and resolving problems is time-consuming

That's where Redgate Monitor steps in with clear insights into SQL Server performance in real time, bringing problems to your attention before anyone else notices.

With its embedded expertise from SQL Server experts and MVPs, it gives you the data and advice you need to find and fix issues before users are even aware.

So, whether you're looking for baselining, wait stats, expensive queries, or instant alerts, find out how Redgate Monitor gives you everything you need – and more – with a free 14 day trial.

Learn more
Redgate Monitor