26 August 2020

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Phil Factor

26 August 2020

When SQL Server Performance Goes Bad: Rogue Indexes

Phil Factor explains how SQL Monitor helps focus performance tuning efforts on the tables and queries where 'rogue indexes' might be a significant problem, and then how to identify both 'missing' indexes that might be beneficial to the overall workload, and those that are unused or duplicated, and so are doing more harm than good.

Guest post

The right set of indexes

When SQL Server performance goes bad, all the main metrics for resource usage, such as CPU, IO, memory, wait times, and so on, will tell you that the server is under considerable stress. There are many possible causes for this, of course, one of which is missing, or ‘inappropriate’, indexes.

Having the right set of indexes in your SQL Server databases will increase the efficiency with which SQL Server can gather the data, and therefore the performance of your most important queries and processes. For each table in the database this generally means choosing the right clustered index, and then a supporting set of narrow, well-designed nonclustered indexes, selective for the filters, or ‘predicates’, used by the most important queries.

If SQL Server often has to read every row in a 2 million row table, just to find the 20 rows it needs, then we may need to consider modifying an existing index on that table, or adding a new one, in order to improve query performance. Conversely, if we try to speed up every ‘read’ just by adding another nonclustered index, all we’ll do is cause a performance problem for our ‘writes’, because all these indexes must be maintained, as data is modified.

In terms of how often SQL Server uses nonclustered indexes, there are three varieties: those that it uses often, those it uses seldomly or not at all, and those it wishes it had but are conspicuous by their absence. This article is about how to use SQL Monitor, plus various dynamic management views, and system catalog views, to ensure you have mainly only the first variety of index.

I won’t, in this article, discuss cases where a valid index exists, but is ‘ignored’ by the query optimizer, due to inaccurate cardinality estimates, caused by reasons such as stale or inaccurate statistics for the data stored within the index, and neither can I delve into the detail of index design, selectivity and cardinality.

Missing indexes

When an appropriate index doesn’t exist for a query, then the optimizer will have to devise other means to get the data, often resulting in an execution plan that scans another index, such as the clustered index. This means that it reads all or most of rows stored in that index. Similarly, the lack of an appropriate index might mean the optimizer has to use to a Sort operator to get the data into the right order to perform a join, or other operation.

There is nothing inherently bad about scans but, as noted above, if a frequently executed query results in SQL Server repeatedly reading far more rows than it needs, then such queries will cause significant CPU, IO and memory use, much of which is unnecessary. They can also block and impede the performance of any concurrent queries. Similarly, sort operations become expensive the greater the umber of rows that needs to be processed. In these cases, you’ll want to investigate whether you can supply a selective index, enabling the optimizer to choose a more effective plan.

Finding expensive queries with missing indexes using SQL Monitor

When performance goes bad, on the main activity graph for the SQL Server you’ll likely see that the CPU (the blue line) is under sustained load, with dramatic spikes in disk IO times (green), periods of high wait times (orange), and probably increasing or sustained high memory use (purple).

There are several possible causes for, and missing indexes is just one of them. If the query optimizer, when trying to come up with an efficient way of executing a SQL Statement, has to reject a good strategy because the index it requires isn’t there, it puts a warning into the execution plan.

The nice thing about using a tool like SQL Monitor is that it provides among its sea of data, a list of the expensive queries running over the period of performance slowdown:

By plotting and analyzing various of the SQL Server metrics that SQL Monitor collects, as well analyzing as the general wait types seen over the period, you can find out which resources are currently limiting performance. You can then filter the expensive queries list according to the limiting resource and focus on the queries that are causing problems for the system.

If you click on one of those, you will find both the source code of the query and its execution plan:

Click on View Query Plan, and you’ll see a diagram for the cached execution plan for the query, in this case with a Warning that the cost of this query could be substantially reduced, if the optimizer had a “missing index”:

Finally, click on that and you’ll see the suggested index:

Note that there may be more than one missing index, and this will only list the first one. Check the MissingIndexes property of the SELECT operator to see if there any others.

However, if you become misled by the index warnings that you get from query plans, and take every piece of the machine’s machine-like advice, you could easily end up with a rich undergrowth of infrequently-used and overlapping indexes that will bring the performance of your write operations (inserts, updates and so on) to its knees.

Each missing index warning is made purely from the perspective of that single query. They take no account of the wider context of the database application and the patterns of usage; the index may be non-selective and so completely useless of any other queries on that table. They also only consider the estimated cost saving for that query and make no consideration of the impact on the performance of write operations to this table, due to a high volume of data modifications when all its indexes have to be maintained.

In short, you need more information.

Drilling into missing index details

Index strategies aren’t always obvious. Before you go ahead and create any new index, you need to be certain of its net benefit for the workload as a whole, and also sure that you could not satisfy the “missing index” request with a simple modification to an existing index. If you fail to do this, you’ll inevitably, over time, end up with many indexes that are almost the same, but with subtly different index key column definitions and column orders, or included column definitions and orders.

We can see the current warnings of missing indexes, and the indexes that are suggested, with this query:

/*WARNING: This is only an estimate, and the Query Processor is making this recommendation

based solely upon analysis of specific queries. It has not considered the resulting index

size, or its workload-wide Impact, including its impact on INSERT, UPDATE, DELETE performance.

These factors should be taken into account before creating these indexes.*/

SELECT DB_NAME(mid.database_id) + '.' + OBJECT_SCHEMA_NAME(mid.object_id, mid.database_id) + '.'

+ OBJECT_NAME(mid.object_id, mid.database_id) AS [TheTable],

migs.user_seeks AS [Index Uses (est)],/* Number of seeks caused by user queries that the

recommended index in the group could have been used for.*/

migs.avg_user_impact [benefit % (est,Percent)],/*Average percentage benefit that user

queries could experience if this missing index group was implemented. The value means

that the query cost would on average drop by this percentage if this missing index group

was implemented.*/

CONVERT(NUMERIC(5,2),migs.avg_total_user_cost) [Avg Query Cost (est)],/*Average cost of the user queries that

could be reduced by the index in the group.*/

migs.unique_compiles,/*Number of compilations and recompilations that would benefit from

this missing index group. Compilations and recompilations of many different queries can

contribute to this column value.*/

CONVERT(CHAR(20),migs.last_user_seek,113) AS [last user seek],

'CREATE INDEX [IX_' + OBJECT_NAME(mid.object_id, mid.database_id) + '_'

+ REPLACE(REPLACE(REPLACE(ISNULL(mid.equality_columns, ''), ', ', '_'), '[', ''), ']', '')

+ CASE WHEN mid.equality_columns IS NOT NULL

AND mid.inequality_columns IS NOT NULL THEN '_'

ELSE '' END

+ REPLACE(REPLACE(REPLACE(ISNULL(mid.inequality_columns, ''), ', ', '_'), '[', ''), ']', '') + ']'

+ ' ON ' + mid.statement + ' (' + ISNULL(mid.equality_columns, '')

+ CASE

WHEN mid.equality_columns IS NOT NULL

AND mid.inequality_columns IS NOT NULL THEN

','

ELSE

END + ISNULL(mid.inequality_columns, '') + ')'

+ ISNULL(' INCLUDE (' + mid.included_columns

+ ') WITH (MAXDOP =?, FILLFACTOR=?, ONLINE=?, SORT_IN_TEMPDB=?);',

) AS [TSQL to create index]

FROM sys.dm_db_missing_index_group_stats AS migs

INNER JOIN sys.dm_db_missing_index_groups AS mig

ON migs.group_handle = mig.index_group_handle

INNER JOIN sys.dm_db_missing_index_details AS mid

ON mig.index_handle = mid.index_handle

ORDER BY [Index Uses (est)] DESC

Again though, care is needed, and you’ll see that indexes are often suggested that would hardly ever be used, and which just wouldn’t make much of a difference.

Every index should be tested, to confirm that it is useful, before deploying it to production. As for checking execution plans for problems like implicit conversions, it’s best to verify any new indexing strategy during development. You can run the database through a range of integration tests, check the output of the above query along with index usage statistics (see “Finding Unused Indexes” later in the article), adapt the indexes as required and test the overall performance impact by rerunning the workload. The goal is to have as few indexes as possible that will satisfy as many as possible of the most important queries as possible.

Missing indexes on Foreign Keys

While on the topic of missing indexes, we should give a special mention to foreign keys. A common cause of poor performance of queries is that foreign keys, unlike primary keys, aren’t automatically indexed in SQL Server. Although it is sometimes better to omit this index for a rarely-used foreign key, it is a good general practice to create an index on all foreign key columns that are often used in joins, to enhance the performance of the joins between the primary and foreign keys, and also reduce the cost of enforcing the relationship between the related tables. It is easy to experiment with adding an index and checking the performance of queries that join on that foreign key. This query will flush out all the foreign keys that aren’t supported by an index.

SELECT OBJECT_Schema_NAME(keys.parent_object_id)+'.'+OBJECT_NAME(keys.parent_object_id) AS TheTable,

keys.name AS Foreign_Key,

STRING_AGG(COL_NAME(keys.parent_object_id,TheColumns.constraint_column_id),',') AS Column_List

FROM sys.foreign_keys AS keys

INNER JOIN sys.foreign_key_columns AS TheColumns

ON keys.object_id = constraint_object_id

LEFT OUTER JOIN sys.index_columns AS ic

ON ic.object_id = TheColumns.parent_object_id

AND ic.column_id = TheColumns.parent_column_id

AND TheColumns.constraint_column_id = ic.key_ordinal

WHERE ic.object_id IS NULL

GROUP BY keys.parent_object_id,keys.name

Finding Unused Indexes

It is a waste of resources to keep an index around if it doesn’t get used or gets used very infrequently. Of course, if an index gets used only once every quarter for a financial reconciliation process, you need it to be available, but it’s probably best to create it at the start of the run of the process and delete it at the end.

Unused indexes accumulate in the database over time, often precisely because they were created for “one specific query” and are useless in general, or because changes in data distribution have cause the optimizer to change its strategy. These indexes still incur the overhead on any write operation: entries are written to it as data is inserted into the clustered index, table or view and index entries must be modified or deleted as that underlying data is altered or deleted. This results in index fragmentation and you often end up using even more resources to perform index rebuilds or reorganizations ion indexes that are rarely used!

To decide if an index is likely to be useful, it is worth predicting the level of use in a live system, understanding the relative importance of quick reads to quick writes in the typical usage of the table, and looking at factors such the width and likely length of the table, which will give you a good idea of how much space a new index will take up.

Here is a query to list the twenty most useless indexes in your database since the service was last started. Again, you would use this in development after running a complete set of integration tests on the database. It will also help production maintenance work because it will tell you which indexes are lightly or never used in the production environment, which could be very different from the case in development.

-- Indexes infrequently read since the SQL Server (MSSQLSERVER) service was started

-- Returns counts of different types of index operations and the time each type of operation was last performed.

-- You may want to consider dropping indexes that incur maintenance overhead, but are not used for queries, or are

-- only infrequently used for queries.

SELECT TOP 20

object_schema_name(i.Object_ID) + '.'+ object_name(i.Object_ID) as Thetable,

i.name AS 'Index',

COALESCE(s.user_lookups,0) + COALESCE(s.user_scans,0) + COALESCE(s.user_seeks,0) AS activity,

coalesce(s.user_updates,0) AS updates /*Number of updates by user queries. This includes Insert, Delete, and

Updates representing number of operations done not the actual rows affected. For example, if you delete 1000

rows in one statement, this count increments by 1*/

FROM sys.indexes i

left outer join sys.dm_db_index_usage_stats s

ON s.object_id = i.object_id

AND s.index_id = i.index_id

AND s.database_id = DB_ID()

WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1

AND i.index_id > 0 --Exclude heaps.

AND i.is_primary_key = 0 --and Exclude primary keys.

AND i.is_unique = 0 --and Exclude unique constraints.

--AND coalesce(s.user_updates,0) > 0 --Index is being updated.

ORDER BY activity asc

You can get a more detailed view of the usage of indexes from sys.dm_db_index_operational_stats. It is also valuable for tracking the length of time that users must wait to read or write to a table, index, or partition, and for identifying the tables or indexes that are encountering significant I/O activity or hot spots.

SELECT object_Schema_Name(IndexOpStats.object_id)+'.'

+OBJECT_NAME(IndexOpStats.object_id) AS TableName,

Indexes.name AS IndexName,

LOWER(Indexes.type_desc) AS IndexType,

SUM(PartitionStats.used_page_count) * 8 AS IndexSizeInKB,

SUM(IndexOpStats.leaf_insert_count) AS InsertCount,--

SUM(IndexOpStats.leaf_update_count) AS UpdateCount,--

SUM(IndexOpStats.leaf_delete_count) AS DeleteCount--

FROM sys.dm_db_index_operational_stats(NULL, NULL, NULL, NULL) IndexOpStats

---parameters are (Database_id, Object_id, index_id, Partition_id)

INNER JOIN sys.indexes AS Indexes

ON Indexes.object_id = IndexOpStats.object_id

AND Indexes.index_id = IndexOpStats.index_id

INNER JOIN sys.dm_db_partition_stats PartitionStats

ON PartitionStats.object_id = Indexes.object_id

WHERE OBJECTPROPERTY(Indexes.[object_id], 'IsUserTable') = 1

GROUP BY IndexOpStats.object_id,

Indexes.name,

Indexes.type_desc

Duplicate indexes

If you have more than one statistic for the same column or set of columns, it means that they are used in more than one index. You’ll find these just by comparing the list of columns that each statistic is associated with. This technique eliminates the XML columns conveniently.

SELECT object_schema_name(Object_ID)+'.'+object_name(Object_ID) as tableName,

count(*) as Similar, ColumnList as TheColumn,

max(name)+', '+min(name) as duplicates

FROM

(SELECT Object_ID, name,

stuff (--get a list of columns

(SELECT ', ' + col_name(sc.Object_Id, sc.Column_Id)

FROM sys.stats_columns sc

WHERE sc.Object_ID=s.Object_ID

AND sc.stats_ID=s.stats_ID

ORDER BY stats_column_ID ASC

FOR XML PATH(''), TYPE).value('.', 'varchar(max)'),1,2,'') AS ColumnList

FROM sys.stats s)f

WHERE OBJECTPROPERTYEX(f.object_id, N'IsUserTable') <> 0

GROUP BY Object_ID,ColumnList

HAVING count(*) >1

An alternative approach is just to see if two indexes share a column list, as shown in the following query. Of course, they could have them in a different order which would be more legitimate. To include those with a different order change the:

1	WITHIN GROUP (ORDER BY key_ordinal)

…into…

1	WITHIN GROUP (ORDER BY index_columns.column_id)

In this case, I’m using STRING_AGG() because it is more straightforward to see what is going on, but you can easily substitute the XML trick as in the previous query.

SELECT OBJECT_SCHEMA_NAME(object_id) + '.' + OBJECT_NAME(object_id) AS tableName,

-- the schema and tablename.

COUNT(*) AS Similar, -- how many identical indexes are there

ColumnList AS TheColumns, --list the identical columns

STRING_AGG(name, ',') AS duplicates -- the list of duplicate indexes

FROM

(

SELECT s.object_id,

s.name,

( -- now get the column list

SELECT --create the list of columns in the index

STRING_AGG(COL_NAME(sc.object_id, sc.column_id), ',')

WITHIN GROUP(ORDER BY key_ordinal) AS TheColumns

-- order them by the column name to assume that they

-- are the same regardless of order.

FROM sys.index_columns sc

WHERE sc.object_id = s.object_id

AND sc.index_id = s.index_id

) AS ColumnList

FROM sys.indexes s

LEFT OUTER JOIN sys.xml_indexes xi

ON xi.index_id = s.index_id

AND xi.object_id = s.object_id

WHERE xi.index_id IS NULL --eliminate XML indexes from list

) f

WHERE OBJECTPROPERTYEX(f.object_id, N'IsUserTable') <> 0

--we only want the user tables.

GROUP BY f.object_id, ColumnList

HAVING COUNT(*) > 1; -- if a table has more than one index with the same column list

Conclusions

Warnings of missing indexes mean different things to different teams. I’ve given in this article a perspective that leans slightly towards the administrator rather than developer. I’ve assumed that your role is to eliminate performance problems due to missing indexes, by checking index warnings and making a judgement as to how to react, based on your wide perspective of the application. For the developer, a persistent eruption of index problems usually signals a design issue: suggesting a faulty normalization, usually under-normalization, and poorly designed primary key and/or clustered index.

It is frustrating for a DBA who can see the same point but is even less able to do anything about the primary cause, but is instead forced to make the best of a poorly-optimised database design by refining the index strategy.

Tools in this post

Redgate Monitor

Real-time multi-platform performance monitoring, with alerts and diagnostics

Find out more

Redgate Monitor

Real-time multi-platform performance monitoring, with alerts and diagnostics

Find out more

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

Product articles Performance monitoring and optimization
When SQL Server Performance Goes Bad:…

Guest post

When SQL Server Performance Goes Bad: Rogue Indexes

Guest post

The right set of indexes

Missing indexes

Finding expensive queries with missing indexes using SQL Monitor

Drilling into missing index details

Missing indexes on Foreign Keys

Finding Unused Indexes

Duplicate indexes

Conclusions

Tools in this post

Redgate Monitor

Redgate Monitor

Product Articles

University

Events

Forums

Community

Simple Talk

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

Product articles Performance monitoring and optimization When SQL Server Performance Goes Bad:…

Guest post

Guest post

The right set of indexes

Missing indexes

Finding expensive queries with missing indexes using SQL Monitor

Drilling into missing index details

Missing indexes on Foreign Keys

Finding Unused Indexes

Duplicate indexes

Conclusions

Tools in this post

Redgate Monitor

Redgate Monitor

You may also like

Redgate Hub

Product Articles

University

Events

Forums

Community

Simple Talk

Product articles Performance monitoring and optimization
When SQL Server Performance Goes Bad:…