Troubleshooting with Dynamic Management Views

If you work with SQL Server 2000, then you know how painful it is to triage a server that has "gone astray". Eric Brown thinks that the new Dynamic Management Views in SQL 2005 are a big step forward.

If you work with SQL Server 2000, then you know how painful it is to triage a server that has “gone astray”. In many cases, the error codes are undecipherable and the root cause of the problem is not easily found, let alone remedied. In fact, more times than not, a SQL 2000 DBA will simply do the unthinkable, and reboot the server.

During the development cycle for SQL Server 2005, the team at Microsoft received significant feedback – basically everyone said, “I need more visibility into what is happening inside the database.” Ta-da! The Dynamic Management Views (DMVs) were born.

Many DMVs are scoped to the database level, which in itself is a big step up from SQL 2000, where all system views were at the server level. Furthermore, the data stored in DMVs is much more user-friendly than was the case for the system views, which were pretty much unusable for many people.

The core DMVs

SQL Server 2005 provides more than 80 new DMVs. Special views are provided for checking on .NET assemblies, SQL Service Broker, security, and much, much more. All DMVs include current data, and some, in addition, use hidden tables which store historical data. For example, the transaction related DMV, sys.dm_tran_top_version_generators contains historical data.

DMVs are organized into five general categories according to the area on which they report:

  • sys.dm_exec_* – provide information about execution of .NET CLR Modules and connections. All contained here are a number of views available to help you drill into issues related to execution of queries.
  • sys.dm_os_* – report on memory, locks, and execution scheduling.
  • sys.dm_trans_* – provide insight into transactions and isolation.
  • sys.dm_io_*. – monitoring disk I/O
  • Sys.dm_db_* – provide database-level data.

If an application is sticking, I would first look at the DMVs relating to process execution and query state. If you are just getting to know the DMV’s, start with the star queries, provided here for convenience, and then refine them from there:

SELECT * FROM sys.dm_os_hosts
SELECT * FROM sys.dm_exec_connections 
SELECT * FROM sys.dm_exec_requests
SELECT * FROM sys.dm_exec_sessions
SELECT * FROM sys.dm_exec_query_stats

From there, you can drill down to DMvs that provide information from deeper inside the SQL Server Engine. I find most useful the DMV’s relating to the major causes of server failure, namely memory outages, process blocks, and execution stickage.

NOTE:
Execution stickage: the process that seems to be doing something, and eating all the available memory while doing it. Remember, I am an American living in the age of George Bush-isms ;).

Finding long running processes using sys.dm_os_wait_stats

The most obvious starting point for information on long running processes is the sys.dm_os_wait_stats DMV. This DMV shows how long a process has been waiting to execute. The view contains several columns-

Column name

Data type

Description

wait_type

nvarchar(60)

Name of the wait type

waiting_tasks_count

bigint

Number of waits on this wait type. This counter is incremented at the start of each wait.

wait_time_ms

bigint

Total wait time for this wait type in milliseconds. This time is inclusive of signal_wait_time

The available list of wait_types are:

  • Resource waits -occur when a worker requests access to a resource that is not available. Examples of resource waits are locks, latches, network and disk I/O waits. Lock and latch waits are waits on synchronization objects
  • Queue waits -occur when a worker is idle, waiting for work to be assigned..
  • External waits – occur when a SQL Server worker is waiting for an external event, such as an extended stored procedure call or a linked server query, to finish.

In working with this view, I would start with the basic star query and refine it as you go. I like executing it like this:

SELECT wait_type, (wait_time_ms * .001) wait_time_seconds 
  
FROM sys.dm_os_wait_stats 
   GROUP BY wait_type, wait_time_ms
   ORDER BY wait_time_ms DESC

You can learn about what processes are visible with this view, but visiting this page on MSDN there is too much information here for the column.

Finding sick worker processes using sys.dm_os_workers

A more explicit trouble shooting DMV is sys.dm_os_workers. This DMV explicitly reflects the worker processes that are “in trouble”. This DMV returns a lot of useful data.

For example, try running this query, which will have a flag for any process that is sick or experiencing a fatal exception. A dead, stuck process can be the cause of memory sticking and CPU red lining. Workers are the mechanisms which handle the request to execute some task. What you are looking for are workers that are sick, flagged fatal exception:

SELECT is_sick,
       is_in_cc_exception, 
       is_fatal_exception,
       state,
       return_code
   FROM sys.dm_os_workers

Then note, that if:

  • is_sick returns a value of 1 the process is in trouble.
  • is_in_cc_exception returns 1, then SQL is handling a non-SQL exception – for example, a .NET CLR exception.
  • is_fatal_exception returns 1, the exception… well it’s obvious.

The State and Return codes cover:

State:

  • INIT = Worker is currently being initialized.
  • RUNNING = Worker is currently running either non-pre-emptively or pre-emptively.
  • RUNNABLE = Worker is ready to run on the scheduler.
  • SUSPENDED = Worker is currently suspended, waiting for an event to send it a signal.

Return codes:

  • 0 =SUCCESS
  • 3 = DEADLOCK
  • 4 = PREMATURE_WAKEUP
  • 258 = TIMEOUT

In my next column, we’ll look at drilling down deeper into the DMV’s and see if we can’t come up with some cool and fun techniques for tracking down performance problems.