Index Usage: finding indexes not being used

I wrote before about how to check index usage on the database, in that article I wrote about exploring query plans.

However, another reason to check index usage would be to drop indexes not being used. Every index create some level of impact on changes (INSERT/UPDATE/DELETE) and if an index is not being useful, it will be only creating problems.

Of course it’s a good idea drop indexes not being used, but c’mmon, let’s tell the truth. The reason you are looking for drop indexes is because your environment is on premises, you are running short on disk space and you just discovered you have some indexes which are many times the size of the table and is trying to discover if they are really useful.

It doesn’t matter if you are trying to remove indexes for good reasons or just to work around a bad environment, let’s see in more details how to find which indexes doesn’t have enough usage to justify their existance.

First, some basic definitions, without going into many details:

Index Seek: That’s the best and desirable use of the index. It means the index tree is being used to go directly to the records we need.

Index Scan: Not so good as an index seek, so it could be better. However, sometimes even an index scan is good, a non clustered index scan means the pages of that index are smaller an better for a scan than the pages of the clustered index. There are many variations that makes an index scan good, but most times you don’t need to reach this level of analysis, you may reach your objective only analysing index seeks.

Update: When the fields are updated (update/insert/delete) all indexes which contain those fields need to be updated as well. Indexes are a balance: We increase performance on reading and suffer a bit more when writting. The problem is when the writting happens more than the reading.

Let’s check the index usage:

SELECT Db_name(database_id)   db,
       Object_name(object_id) [table],
       si.NAME,
       index_id, user_seeks, user_scans, user_lookups,
       user_updates, system_seeks, system_scans,
       system_lookups, last_user_seek,
       last_user_scan, last_user_lookup, last_user_update
FROM   sys.dm_db_index_usage_stats istat,
       sys.sysindexes si
WHERE  istat.object_id = si.id
       AND istat.index_id = si.indid
ORDER  BY user_seeks 

All indexes with a low usage may be hurting the system performance. However, what’s a low usage? How can we define the index usage as low?

We can use a simple metric as a start point: If the amount of seeks is lower than the amount of updates, we need to have a closer look on the index. We can check this adding the following line to the WHERE clause:

AND
user_updates > (user_seeks + user_lookups + system_seeks + system_lookups)

Discoverying the use of the indexes

Once we have the list of less useful indexes on our database, we can decide what to do with them. However, it’s not so simple as it appears: These may be the less used indexes on the database, but when the most used ones have numbers like 100.000 seeks, it will not be difficult to find the less used ones with numbers such as 10.000 . The numbers are relative and we need to be sure there isn’t any critical task of our system in the middle of these 10.000 seeks.

We need to discover which are the queries using these indexes. In order to do that, we can query the SQL Server Plan Cache, analysing the XML of each query plan and find where the indexes are being used.

The DMV sys.DM_exec_query_stats contains statistic information about the queries in the plan cache and also allow us to retrieve their query plan using the field plan_handle and the DMF sys.dm_exec_query_plan . We can make the result even better returning the SQL text as well with the field sql_handle and the DMF sys.dm_exec_sql_text.

Besides this, we also need to filter the results to get only the query plans which are using the index. We will need to use XQuery for that and the method EXIST is enough for our purpose.

The query will be like this:

SELECT qp.query_plan,
       qt.text
FROM   sys.dm_exec_query_stats
       CROSS apply sys.Dm_exec_sql_text(sql_handle) qt
       CROSS apply sys.Dm_exec_query_plan(plan_handle) qp
WHERE
qp.query_plan.exist(‘declare namespace  qplan=”http://schemas.microsoft.com/sqlserver/2004/07/showplan”; //qplan:Object[@Index=”[IX_IDTransaction_AccountTy_TransTy_RefNum_Usr]”]’) = 1 

 

As you may notice, this query is made for one single index, so you can analyse one by one. However, you may choose to get the information for all of them at once. We can achieve this result using the CROSS APPLY statement one more time. The CROSS APPLY executes one query for each row of another query, that’s exactly what we need. We need to execute the query above for each index listed as one of the less used indexes.

Using a CTE (Common Table Expression) we can make this query easier to read, breaking it in at least two parts. The result is the query below:

WITH idxnames AS
(
       SELECT si.NAME
       FROM   sys.dm_db_index_usage_stats istat,
              sys.sysindexes si
       WHERE  istat.object_id=si.id
       AND    istat.index_id=si.indid
       AND    user_updates > (user_seeks + user_lookups + system_seeks + system_lookups) )
SELECT      *
FROM        idxnames
CROSS apply
            (
                        SELECT      qp.query_plan,
                                    qt.text
                        FROM        sys.dm_exec_query_stats
                        CROSS apply sys.Dm_exec_sql_text(sql_handle) qt
                        CROSS apply sys.Dm_exec_query_plan(plan_handle) qp
                        WHERE       qp.query_plan.exist(‘declare namespace qplan=”http://schemas.microsoft.com/sqlserver/2004/07/showplan”; //qplan:Object[@Index=”[‘ + idxnames.NAME + ‘]”]’)=1 )  idxplan

 

This is the natural construction of the query, but the result is disappointing: We get an error message, “The argument 1 of the XML data type method “exist” must be a string literal” . It’s exactly what the message says, the EXIST method doesn’t accept string concatenation.

Fixing the XQUERY expression

 

The solution for this problem is the use of XQUERY extension functions, which allow the access to external items, such as collumns and variables, from the XQUERY expression. Using these extensions we can make the string concatenation in a custom field and make a reference to it. The fixed query will be like this one:

WITH idxnames
     AS (SELECT si.NAME,
         ‘declare namespace  qplan=”http://schemas.microsoft.com/sqlserver/2004/07/showplan”; //qplan:Object[@Index=”‘ 
                + si.NAME + ‘]”]’ AS filter
         FROM   sys.dm_db_index_usage_stats istat,
                sys.sysindexes si
         WHERE  istat.object_id = si.id
                AND istat.index_id = si.indid
                AND user_updates > ( user_seeks + user_lookups + system_seeks
                                     + system_lookups ))
SELECT *
FROM   idxnames
       CROSS apply (SELECT qp.query_plan,
                           qt.text
                    FROM   sys.dm_exec_query_stats
                           CROSS apply sys.Dm_exec_sql_text(sql_handle) qt
                           CROSS apply sys.Dm_exec_query_plan(plan_handle) qp
                    WHERE
       qp.query_plan.exist(‘sql:column(“idxNames.filter”)’) = 1)
               idxPlan 

 

Filtering Index Updates from the query

You may be suprised with the amount of results for each index, innitialy you may have a lot. However, analysing the query plan XML you will discover the index is most being used for updates, meaning the index is being updated when a record is updated. 

The plans where the index is being used for update are not needed for us on this query, they are exactly what create problems for performance. We need only to analyse the use of the index in SELECT’s, not UPDATE’s. In order to filter the query to return only the index usages on SELECT’s we will need to understand the query plan XML structructure. You can see below a piece of the XML of the query plan:

We can use the LogicalOp attribute on RelOp element to exclude the elements we are not interested. We need to make some adjustments to our XQUERY filter.

The LogicalOp’s we need to retrieve are: Clustered Index Scan, Clustered Index Seek, Index Scan and Index Seek . We need to change the filter to retrieve only these queries.

WITH idxnames
     AS (SELECT si.NAME,
           ‘declare namespace  qplan=”http://schemas.microsoft.com/sqlserver/2004/07/showplan”;                         //qplan:RelOp[@LogicalOp=”Index Scan”
                           or @LogicalOp=”Clustered Index Scan”
                           or @LogicalOp=”Index Seek”
                           or @LogicalOp=”Clustered Index Seek”]/Object[@Index=”‘ 

                + si.NAME + ‘]”]’ AS filter
         FROM   sys.dm_db_index_usage_stats istat,
                sys.sysindexes si
         WHERE  istat.object_id = si.id
                AND istat.index_id = si.indid
                AND user_updates > ( user_seeks + user_lookups + system_seeks
                                     + system_lookups ))
SELECT *
FROM   idxnames
       CROSS apply (SELECT qp.query_plan,
                           qt.text
                    FROM   sys.dm_exec_query_stats
                           CROSS apply sys.Dm_exec_sql_text(sql_handle) qt
                           CROSS apply sys.Dm_exec_query_plan(plan_handle) qp
                    WHERE
       qp.query_plan.exist(‘sql:column(“idxNames.filter”)’) = 1)
               idxPlan 

This query above will bring only the query plans where the indexes are being used for seek or scan, allowing us to check in which queries this happens and if the index has any importance at all. Using this information we can decide if we will drop the index or not