Identifying and Solving Index Scan Problems

When you're developing database applications, it pays to check for index scans in the SQL Server query plan cache. Once you've identified the queries, what next? Dennes Torres gives some preliminary guidelines on how to find out why these index scans are being chosen for these queries and how to make the queries run faster and more efficiently.

Updated 7 April 2015

Index Scans generally aren’t the sort of thing you’d want to see in a query plan. An index scan means that all the leaf-level of the index was searched to find the information for the query: When the index is a clustered index, this is the same as scanning the entire table. With only a few exceptions, this isn’t good; we need to try to turn scans into seeks, which means retrieving the data by just using the index tree.

In a previous article, ‘Exploring Query Plans in SQL’, I explained how to find query plans that are doing index scans in the SQL Server query plan cache. So, what now? Now that we found the plans, what should we do?

I will not attempt to explain the often complex task of optimizing query performance since there are many books about this subject. Instead, I’ll just give you a few preliminary guidelines on reducing the number of scans that you found in the plan cache after following the ‘Exploring Query Plans’ article.

Test Environment

We will use the AdventureWorks2012 database but with a few new tables created by Adam Machanic in his blog post ‘Thinking Big (Adventure)’.We will simulate a few queries and retrieve them from the query plan cache, exactly as it would happen in your development or integration database-environment. Please be aware that it is wrong  to do this sort of performance work on a production database because of the resources needed to check through the plan cache.

To retrieve the queries from the plan cache, we will use some of the techniques that I explained in ‘Exploring Query Plans’ article, but I’ll also use the table-valued function that returns all index scans in the plan cache that happened in a specific database, from ‘Checking on Plan Cache Warnings for a SQL Server Database’. With this combination, it will be easier to select only those scans from queries in AdventurWorks2012 database. Here is the function:

All the concepts used in this function were explained in the previous articles.

After creating the function we will run some queries and use the function to retrieve the plans from the cache. Below are the queries:

Notice that we have a ‘Go’ instruction after each query, otherwise we would have only one plan in the cache for the entire batch, instead of one plan for each query.

Let’s take a look in the cache using the function we just created:

2156-Screenshot%20(162)-97d901d5-1622-4c

By ordering by Total_Worker_Time field, our queries have been returned as the first four rows in the results. We have the text of the queries and if we click the link in the query_plan field we can see the query plan of each query.

Now we need to analyze each of the queries returned. Each query was selected because it had one or more index scans in it. SQL Server can use only one index for each table that is present in the query, so one query can use multiple indexes. This means it can have index seeks and scans in the same query.

You will need to analyze the guidelines below for each index scan in the query, trying to turn the scan into an index seek.

Let’s use the questions bellow to analyze the predicate of the index scans in each query.

Is the predicate a ‘sarg’?

‘Sarg’ is an abbreviation for ‘Search Argument’, it’s a word created by Kalen Delaney to mean an argument that can use an index seek for the search. Not all predicate expressions are ‘sarg’s, and there are a few rules that sargs need to follow. Grant Fritchey explained two of the rules of a ‘sarg’ in his great article ‘The seven sins against TSQL Performance‘.

First we need to find the predicate for the scan in the query: After that we need to identify if it’s a ‘sarg’ or not.

The predicate can appear in the WHERE clause, a filter over a field of the table that’s suffering the scan, or a join, in this case the JOIN expression is the predicate.

Making queries Sargable

Let’s see the rules of a sarg:

First Rule: You can’t use a function in the left side (the field side) of the predicate.

This is very well demonstrated in Grant’s article.

The following query disobeys this rule:

We can make this query better doing some changes to the predicate. Just writing the same predicate in a new way we will solve this problem:

As you can see in the new query plan below, the problem wasn’t solved with this query yet, but we are in the right path.

2156-Screenshot%20(129)-86bb8942-8cab-42

An interesting point is the fact that the scan is an index scan, not a clustered index scan. This happens because the non-clustered index is a covering index for this query.

A covering index is a non-clustered index that has all the fields the query needs in its leaf level pages. When using a regular non-clustered index that doesn’t cover the query, the database engine would need to do an additional operation called ‘key lookup’. This means that, after the data is found in the leaf level of the index, SQL Server needs to retrieve from the clustered index any additional fields that are not present in the non-clustered index.

We call an index ‘covering’ for a query when it has, in the leaf level, all the fields needed for the query. The fields became part of the index when they are part of the index key, they are either part of the clustered key or they are included with ‘include’ keyword when the index was created. A covering index doesn’t require key lookups and that’s better for performance.

What’s important to notice in the above query plan is the Index Scan element without a Key Lookup element, which denotes a covering index.

In fact the index IX_ProductID_TransactionDate is a composite index with ProductID and TransactionDate as the keys and Quantity and ActualCost included. That’s all the fields in the query, so the index become a covering index for this query, there is no need of key lookups in the query.

We will see some examples later in this article because the use of covering index is one of the solutions for the scan problem. Joe Webb also talked about covering index in his great article ‘Using Covering Indexes to improve query performance’.

Second Rule: The operators need to be of the same type

Grant also covered this subject in his article, but what could actually cause this kind of mistake?

The answer: ORMs. I’ve experienced cases where ORMs such as NHibernate were creating ‘WHERE’ clauses that compared VARCHAR fields with NVARCHAR values, resulting in the same problem. Therefore, the ORMs are potential point of performance problems. I’m not saying to not use ORMs, I’m saying to be careful, check the queries that are being generated and configure the ORM correctly to avoid this mistake.

Third Rule: You can’t search the information in the middle of the field

This expression isn’t a ‘sarg’:

whereas this other expression is a sarg:

So, LIKE expressions that require a search through the string to locate the substring will never be a ‘sarg’. We need to add more predicates to the query or change this like expression so the index can be used.Two of the queries have this problem. Let’s see the first one:

The query plan for this query is:

2156-Screenshot%20(131)-0f92e74c-9c27-48

In this plan we have a simple clustered index scan. You can identify in the predicate the ‘%’ sign in the beginning of the string, causing the problem.

The second query is the following:

This is the query plan for this query:

2156-Screenshot%20(130)-c0bb61f8-3649-4a

This time we have a non-clustered index scan, but we also have a Key Lookup element, so this is not a covering index for this query.

The scan is over the non-clustered index instead the clustered index because the predicate is part of the non-clustered key: FirstName and LastName, while the key is FirstName, LastName and MiddleName. The non-clustered index has shorter rows than the clustered, so it’s better to do the non-clustered scan.

We can demonstrate this difference by checking the number of pages each index has in its leaf-level. First let’s use the following query to find the index_id of each index:

2156-Screenshot%20(154)-6d88b5a1-ac2b-46

As we can see, the clustered index has index_id 1 (one) and the IX_Person_LastName_FirstName_MiddleName non-clustered index, which is the index our query is using, has index_id 2 (two).

Now that we know the IDs, we will use the DMF sys.dm_db_index_physical_status to check the number of pages both indexes, the clustered and the non-clustered, have. The queries will be these:

2156-Screenshot%20(155)-1a6ed08f-00c8-42

As you can see in the above image, while the non-clustered index has only 106 pages in the leaf level, the clustered has 3810. So, as our predicate has only the fields in the non-clustered index, is much better to do a non-clustered index scan than a clustered index scan.

Now that we analyzed the queries, how to solve the problem?

Both queries are a common kind of query used in applications to search for information, using ‘%’ in both sides of the search string. There is no way to make this query better without changing the functionality. We would need to remove the first ‘%’ sign in each predicate to make this query better but we need to decide if the performance boost worth the functionality change, but most times it is. Most times the additional ‘%’ sign in the start of the string isn’t worth the performance difference, or you should consider the use of a full text index instead.

By removing the first ‘%’ sign, the first query will become this:

2156-Screenshot%20(133)-4d58f9a4-e1d9-45

The problem of this query, in table bigproduct, hasn’t been solved yet. Let’s continue the analysis to find the problem.

Let’s see the second query:

2156-Screenshot%20(138)-dd46a4cd-95d2-46

The scan problem is solved for this query, now we have an index seek.

The key lookup continues in the plan to retrieve the additional fields. We can check Key Lookup element to identify which field is being retrieved:

2156-Screenshot%20(139)-442560d0-1e3c-4f

As you can see in the above image, the Key Lookup is present in the query plan to retrieve the PersonType field from the clustered index.

If the key lookup were too bad for the query, SQL Server would keep the clustered index scan. Therefore, it’s not too bad in this case, but if we could avoid it, it would be better. The solution to avoid the lookup is to include the additional fields needed by the query in the index, so the key lookup become unnecessary. The question you need to ask is: ‘Is this query so important and is it consuming so much resources that you need to change the index to make it a little (not much) better?’ Most times not, but there are exceptions you will need to identify.

Checking the ‘sarg’ in your queries can solve many problems, but sometimes this isn’t enough. Let’s continue to the next step of our analysis.

Is the index a composite index including the sarg field?

There’s an important rule about a composite index: You need to pay attention to the first column.

This is because the histogram within statistics is only created on the first column of the index. And the histogram of the statistics is one of the most important drivers for the choices that the optimizer makes. So, for example, if we have an index with key columns on ID and Name, a search on the Name column is unlikely to use the index at all, or may just scan the index. This is because there is no information about the Name column in the histogram. Also, because the data is stored with ID first, when something has to be looked up within the index, it needs to have that ID column, otherwise it has to go to the page and do a scan. To properly use the index you either have to search on ID or ID and Name. .

One of the queries in the plan cache has this problem:

We solved the first problem, the wrong sarg, changing the predicate, but the scan continues because the index IX_ProductID_TransactionDate is a composite index over productid and transactionDate.

These are your options in this case:

  • You can add the ProductID field in the query
  • You can create another index (non-clustered) over the field transactionDate. In this case you should question if the composite clustered index is the correct option for your table.

    One new index will increase database maintenance work and decrease performance for INSERTs and UPDATEs, so you need to identify if not only this query is important enough to warrant this solution, but also to check whether this query isn’t really a development/analysis mistake, since only one more predicate added to the query would solve the problem. If you decide so, a new non-clustered index on transactionDate will solve the problem and the query will do index seeks.

  • – You can change the order of the fields in the index

    That’s expected that the indexes that are already on the table were planned based on your business. Should you really change the order of the fields?

    The index IX_ProductID_TransactionDate, used in this query, has the keys ProductID and TransactionDate, but also has an include of Quantity and Cost. These four fields are all the fields in this table, so this is not only a composite index but also a covering index for most of the queries on this table. Someone planned this.

2156-Screenshot%20(156)-637ffb7e-85ff-4e

  • Was the first analysis that created this index wrong? Or the business needs changed?

    Generally, the first column in a composite index should probably be the most selective column, the column that is most unique. This again goes back to statistics and the histogram. Since the histogram makes the most determination for how many rows will be returned by a given value, it’s usually helpful to have that be the value that returns the most accurate number of rows that the compound index will represent. Most of the time, that’s the most selective column. For example, an address can be indexed on Country, State and City. But that order is probably not ideal. Most of the time you would want City then State and Country as that order reflects the degree of selectivity. Just remember, this is a guideline, not even a rule. There are good reasons to go with a less selective column first, especially if that’s the column that is most frequently used in filtering the data.

    In our demonstration, productid and transactionDate have similar selectivity (a product is sold in several dates and several products are sold in a single date). In this case, the question that needs to be answered before taking this action is if, after changing the index, other queries that search by productid, and uses this index, will no longer do an index seek. That’s like ‘covering all bases with a short blanket’ and in this case you should create a new index, or accept the ‘short blanket’ and decide which queries are more important.

To demonstrate a solution, let’s choose the first option and add a new predicate with the other field in the query. This new query would do an index seek:

2156-Screenshot%20(142)-df7eace3-9d5c-45

It’s difficult to predict which one will be the best choice for you. If you arrive at this point, the analysis of your query finishes. However, if that’s not your case, you don’t have a composite index over the ‘sarg’, let’s continue the analysis.

Do you have a non-clustered index over the sarg?

If you don’t have a non-clustered index over the sarg, that’s your problem. You are querying over a field that doesn’t have an index.

If this is a heavy and frequently-used query, you should probably create a non-clustered index for this query, but you will need to consider the number of indexes the table already have.

The point is that each index in the table makes INSERT and UPDATE operations slower. Let’s see a small demonstration of this.

In a new query window in SSMS, run the following script:

Notice that we didn’t commit the transaction. In a different window, run the following query:

This query shows every lock in AdventureWorks2012 database, including the name of the index that owns the locked object. You can notice in the index_name column in the image below that a single insert locked keys and pages in all the indexes of the table, because all of them need to receive the new row. Almost the same happens with updates, but only the indexes that contain the changed fields are affected.

2156-Screenshot%20(157)-75242361-2da9-48

After this demonstration, it’s clear that each index makes INSERT and UPDATE operations slower. To finish the demonstration, run a ‘ROLLBACK TRANSACTION’ in the first window.

To decide if you should or shouldn’t create a new index you will need to analyze the following points:

  • If you don’t have too many indexes over the table, there’s no doubt that you need to create the non-clustered index.
  • If you have too many indexes over the table, are all the indexes being used? You can check if an index is being used using the following query:

    Notice that we are using ‘bigproduct‘ table as an example. You can check in the result of how many seeks, scans or lookups this index has, and identify if the index is or isn’t being used.

    If you have indexes over the table that are being less used and are less important than the one you are missing, then you should consider dropping those indexes and creating the new one.

  • If you can’t drop any index, you need to consider this: Is the query you are analyzing so important that you should create a new index even knowing that you will reduce update and insert performance?

In our examples we have two queries with this problem:

The query plan does a clustered index scan because there isn’t any index for listprice field or name field. After analyzing all the details above, we will create two new indexes, one for listprice, another for name:

After creating the indexes, the query plan for the first query will be like this:

2156-Screenshot%20(143)-ba49a25e-5c33-4c

The problem is solved for the first query because it’s already doing an index seek. Notice the Key Lookup in the query. We can make this query even better removing the key lookup. Let’s return to this in a few minutes.

The query plan for the second query will be this:

2156-Screenshot%20(144)-86d54c5f-6a2d-49

The second query is still using the clustered index scan. Why is SQL Server refusing to use our index? Let’s continue the analysis to find out.

How many rows does your query return?

We saw the use of the Key Lookup when we use a non-clustered index. The problem is that the Key Lookup is only a good option for when there are just a few rows. If your query is returning too many rows, a high percentage of the total rows in the table, SQL Server will not use the index.

This happens because the cost of the key lookup for each row can be bigger than the cost of an entire scan.

The previous query that we faced is an example of this. The large number of bookmark lookups that the query would need to do in order to return the rows make the non-clustered index seek a more expensive operation than the clustered index scan.

We don’t need to do a lot of calculations to identify this. The index exists and is not being used, so SQL Server already did the calculations and identified that the use of the index for this query would not have a good result.

If we’re still in doubt, we can enable ‘statistics iosetting and force the use of the index to compare the queries.

First, let’s test the ‘statistics io for the query:

The result is the following:

Now let’s for the use of our index:

The result:

It becomes clear that SQL Server is using the correct index for this situation.

The first possible solution for this problem is to reduce the number of fields returned by the query and now we can clearly see the big problem that ‘*’ in the select can bring to us: retrieving more fields than we need will probably prevent us from taking advantage of covering indexes or even from creating one.

We should never use ‘*’ in our queries. The first step to correct the problem is to get rid of the ‘*’ and choose which fields we really need to return.

Most times it will not be difficult to identify the fields really needed for the query. Our new query will be like this:

Of course this doesn’t solve the problem yet, because we don’t have a covering index for this query. We will need to drop the index (indprice) that we just created and create it again:

Now we can execute our query again and the problem will be solved, the query plan will do an index seek.

Let’s return to the other query in bigproduct table:

The problem was solved, but we still have the key lookup, remember?

2156-Screenshot%20(143)-b1a5a933-f66d-45

Once again, our problem is the big villain ‘*’. The key lookup is retrieving the fields that aren’t part of the index. The query will become better if we select only the fields we really need:

The problem isn’t solved yet, but now we just need to check the key lookup in the query plan to identify the output list and solve the problem:

2156-Screenshot%20(159)-5166ddb3-642f-4b

Let’s change the index to eliminate the key lookup:

This is the resulting query plan:

2156-Screenshot%20(160)-2e2c6d0a-f18d-40

Until now we reduced the number of fields in the queries and changed the indexes, all with the objective of turning an index into a covering index for the query, or creating a covering index if needed. The other solution for this problem is to change the query to return fewer rows. The question you would need to ask is “Do I really need all these rows?”. A new analysis could lead to new predicates in the query.

The Result

After the entire analysis, these are our new four queries:

We changed all the four queries and identified two new indexes to solve the scan problems.

Conclusion

In this article I presented to you a few steps you can take to analyze and solve index scan problems that we found in SQL Server query plan cache. Of course this isn’t a complete guide, but is intended as an introduction to the many ways that you will solve a lot of problems by following this analysis in development.

Further reading