5 Monitoring Queries for SQL Server

Every DBA squirrels away favourite queries for monitoring SQL Server. Nowadays many of these are too complex to keep in your head. Dennes describes how he uses T-SQL queries for solving problems, whether it involves fixing the problems of missing indexes, preventing unrestrained autogrowth, avoiding index fragmentation, checking whether jobs have failed or avoiding memory stress conditions.

Every DBA needs a toolset full of queries to monitor SQL Server. Some queries to monitor the server are complex and take time to be built, so the DBA needs to have these queries readily to hand, maybe on a USB thumb-drive, but for use at a moment’s notice.

In this article, I’m going to introduce the five important queries that I use to monitor SQL Server.

Missing index

In a perfect world, you, the DBA, would have enough time to analyze all the queries that an application makes to SQL Server and ensure that the correct indexes are there in place to support those queries. In a perfect world, no developers could send new queries to the production environment before the DBA has enough time to analyze the queries.

We are not in a perfect world. Applications often go live without enough time for the DBA to analyze all the queries. SQL Server has a feature to work around these situations. The query optimizer can identify, during the query compilation, whether the query could have a better result with a new index. In this case, the query optimizer registers this information in some system objects. We can query DMVs to find out the missing index for our queries and optimize our database environment.

The DMVs are:

  • sys.dm_db_missing_index_details: This DMV has details about the keys that we can use to create new indexes.
  • sys.dm_db_misssing_index_group_stats: This DMV has details about how the missing index would be used were it to exist. It tells you how many seeks and scans would be done over the missing index.
  • sys.dm_db_missing_index_groups: This DMV has keys to relate the two results: provided by the sys.dm_db_missing_index_details and sys.dm_db_missing_index_group_stats DMVs.

Before get a result from these DMVs, we need to understand how the missing index keys are stored in sys.dm_db_missing_index_details. There are three fields containing key information, they are:

  • Equality_Columns: This field contains information about all the keys that will be used by predicates with the equal sign (“=”). These keys should be included first in a composite index.
  • Inequality_Columns: This field contains information about all the keys that will be used by predicates without the equal sign, it could be any comparison as greater than or lower than. These keys should be included last in composite indexes.
  • Included_Columns: This field contains columns that shouldn’t be part of the index keys, but should be used as included columns, with the INCLUDE statement in the CREATE INDEX instruction.

These DMVs will capture both the whole range of queries, ranging from those queries with high use and impact through to queries that are used only once and that will never executed again. We can use the information in sys.dm_db_missing_index_group_statsto identify the importance of each missing index information.

The sys.dm_db_missing_index_group_stats has these particularly interesting fields:

  • user_seeks: This column tells us how many seek operations would be done over the missing index.
  • user_scans: This column tells us how many scan operations would be done over the missing index.
  • avg_total_user_cost: This column has an average cost for all the queries that would have used the missing index. The “cost” is a ‘unitless’ value in an arbitrary currency, calculated by the query optimizer, but it is useful to know.
  • avg_user_impact: This column has an average percentage by which query cost could drop if we create this missing index.

Now we have enough information to create a query to show us the missing index in the order of the impact their creation will have:

It’s important to carefully analyze all the information that is presented before you create the index. Sometimes, for example, it’s better to change the query instead of creating a new index.

The most common problem with the information is the included_columns field. It is very common to find queries that request certain fields unnecessarily: these then have to be supported by indexes. Most of times the better solution is to change the query to include fewer fields than create an index with too many included columns.

Auto-growth

Auto-growth is a good feature that ensures that the DBA will never find out that the database stopped because the data or log files hasn’t enough space. Without a doubt, it’s a good practice to keep auto-growth enabled, but you, as a DBA, should retain control over when and why auto-growth happens.

As well as the regular and planned database growth that allows the DBA to calculate days in advance when the auto-growth will happen, there are certain events that can precipitate auto-growth through making large temporary or sudden demands for disk space. Batch processes, for example, can use a large amount of space in data and log files and thereby trigger the auto-growth.

If this happens in the data files, the sudden auto-growth event will cause performance problems for the execution of the batch. The batch that triggered the auto-growth will then suffer performance problems while the auto-growth is happening. Sometimes the person who created the batch hasn’t full knowledge about the database server, probably not even knowing that the batch could run considerably faster if the data file size was sufficient, which would need to be planned in advance.

On the other hand, if the auto-growth happens in the log file, the growth will create new VLFs inside the log. If the log files has too many VLFs then the log backups and database recovery will get slower.

To avoid this, the DBA should always check whether unplanned auto-growth is happening, and if so then identify the reason and adjust the plans for auto-growth.

How do you check on the pattern of file-growth? SQL Server has a default trace that captures information about the file growth. If the default trace is running – and it is by default – you can recover this information.

First you need to recover the path where the trace is storing its files. You can do that with a small query over sys.traces. Next, we need to use the DMF sys.fn_trace_gettable to read the file and return the information as a table.

The query to find out the auto-growth that happened in our server is the following:

Index Fragmentation

It is generally a good idea to choose ever-increasing fields as clustered keys. If, for any reason, a database doesn’t follow this pattern, index fragmentation will happen and you then need to check the fragmentation level of the indexes.

When a new record is inserted in the middle of existing records, a new page is allocated for the table and some records are moved from an existing page to the new page. This procedure is called page split and one of the consequences is that the table pages will not be sequentially organized inside the data file any more. In other words, the data pages of the table will be fragmented.

The DBA work round the problem by using configuration of the fill factor in the indexes that are subject to fragmentation. The fill factor determines the amount of the page that will be filled, leaving space for new records. The DBA has the task to manage the interval between the executions of the index maintenance in a way that index maintenance happens before the pages run out of free space for new records.

It’s the DBA’s task to balance the interval between the execution of indexes maintenances and the fill factor value. It’s important to check the fragmentation of the indexes to be sure that you achieved the correct balance between these values, and that fragmentation isn’t happening.

We can check the fragmentation of the indexes using the DMF sys.dm_db_index_physical_stats. This DMF can check information of a specific object or about all objects in a database. The first four parameters are the database id, table id, index id and partition id. The Table, index and partition parameters are optional.

The fifth parameter specifies how detailed the analysis should be. The options are ‘LIMITED’, ‘SAMPLED’ and ‘DETAILED’. The indexes are hierarchical, the fragmentation can happen in any level of the hierarchy. The most common and important fragmentation is in the leaf level and we can check the leaf level fragmentation using the ‘LIMITED’ option, which runs faster and with lower impact for the server.

We need to do a join with sys.indexes to get the name of the index and with sys.tables so we can use the field is_ms_shipped to exclude the system objects from the result. We also filter the result by the field index_type_desc looking only for clustered and nonclustered index, because this query would otherwise include heaps and xml indexes, and do some calculations in the predicate to exclude too small indexes, because these will always show some level of fragmentation.

The query to check the fragmentation in the leaf level of the indexes will be this:

We need a different query to check the fragmentation in other levels of the index. We need to use ‘Detailed’ mode instead of ‘Limited’ and we can exclude the information about the leaf level, since we already got this information using the ‘Limited’ mode.

The query to check the ‘Detailed’ fragmentation information will be this:

The most important field in both queries is avg_fragmentation_in_percent. You can solve the fragmentation of the leaf level by doing a rebuild or by reorganizing the index by setting a new fillfactor, but fillfactor works only with leaf levels: To extend the fill factor to non-leaf levels you need to use padIndex.

Job Status

We often have several jobs defined in our SQL Servers and we need an easy way to check if any job has failed. Of course, we can check one by one, but a query to check if any of the jobs has failed can be very useful.

We will use two system tables from the MSDB database to retrieve the information about the jobs.

  • sysjobs: This table has the information about the jobs
  • sysjobhistory: This table has the information about all the executions of the jobs, but for each execution there is a row for each step of the job
  • sysjobactivity: This table has the information about the last execution of each job and the last step executed for each job. Remember that the last step executed may not be the last existing step in many cases, for example, if the job failed.

A good result to retrieve is the last execution of each job, with the status of every step of the job. This information is in sysjobhistory table, but there is a problem: There isn’t a field to identify each execution. We can order the records by the job_id, run_date (desc) and run_time (desc), but we still need to get the records about the last execution. It will be a different number of records for each job, according to the last step executed.

The first step is a simple query to retrieve the last executed step for each job from sysjobactivity table. The query will be like this:

For each job we retrieved with the query above, we need to retrieve the records about the steps of the job from sysjobhistory table. We can do this using Cross Apply in the query and using the field last_executed_step_id as a parameter for the Top expression in the query. The query to retrieve the details of the last execution of each job will be like this:

Finally, we need to translate the fields run_date , run_time and run_status to a more readable format. Let’s use some functions to format these fields:

run_date:

run_time:

run_status:

To make the use of this query easier, we can filter the result for only the failed executions and create a view for this query. The view will be like this:

Memory Use

Does our SQL Server has enough memory or our server is under memory pressure?

We need to check the buffer cache to identify if we have a good amount of cache hits and there isn’t memory pressure.

We can do this by checking performance counters. Three good performance counters to check are:

  • Page Life Expectancy: It’s Lifetime of the pages in the cache. The recommended value is over 300 sec.
  • Free List Stalls/sec: The number of requests that have to wait for a free page. If this value is high, your server is under memory pressure.
  • Page Reads/sec: If this counter has a high value this will confirm the memory pressure already highlighted by two counters

The query to retrieve these values will be like this:

If you identify in the result that your server is under pressure, you have two ways to solve the problem:

  • Increase the server memory: You can add more memory or adjust the configuration ‘max server memory’ if possible.
  • Adjust the queries: You can optimize the queries of your applications, reducing the number of pages to read in each query and reducing the memory pressure.

Next Steps

These queries are just the beginning. You will need to plan how to execute these queries and to use other queries that I didn’t include here.

You will need to decide if you will use native features to collect information, such as Data Collector and SQL Agent Jobs or you will use monitoring tools such as Redgate SQL Monitor. I wrote an article about Data Collector here.

Here are some related resources on monitoring that I’ve written: