Fine Tuning your Database Design in SQL 2005

Sanchan Saxena gets to grips with the new index-tuning tools and features in SQL 2005.

Part 1: An overview of index-tuning tools

As part of my current job at Microsoft, I work with SQL customers worldwide, ranging from fortune 500 financial institutions to online retailers to small partner outfits. I am engaged primarily to conduct performance tuning of their SQL Server systems.

When a system has a performance problem, and time is short (as it inevitably is), it’s tempting just to “kill it with iron” – add more memory, more CPU horsepower, bigger and better disk subsystems, and so on. The problem is that if your design is fundamentally flawed then there is a strong chance that your application will not scale: doubling your horsepower will not double the performance. You might end up throwing a lot of hardware (and a lot of money) at the problem in order to reach your performance target.

Instead, your first port of call should be with the design of the application, and probably the “lowest-hanging fruit” here – the option that produces maximum gains with least impact on existing systems and processes – is to examine your indexing strategy.

When you run a query, SQL Server query optimizer’s job is to find the best possible execution plan (based on internal costing algorithms) to provide the best response time. If the indexes required for implementing this optimal execution plan are not present, then the query optimizer is forced to choose a sub-optimal execution plan, for which indexes do exist. This could cause SQL Server to do more I/O and result in slower query performance. The need to have the right indexes in place cannot be overemphasized.

However, the task of identifying the right indexes is not necessarily straightforward. It requires a sound knowledge of the of the sort of queries that will be run against the data, the distribution of that data, and the volume of data, as well as an understanding of what type of index will suit your needs best. While this understanding is still essential, SQL Server 2005 does offer a helping hand in the form of some new tools – principally the Database Engine Tuning Advisor – that can help you determine, tune and monitor your indexes.

In this article, I will demonstrate how to use these tools to get answers to the following questions:

  • Which indexes do I need for my queries?
  • How do I monitor index usage and their effectiveness?
  • How do I identify redundant indexes that could negatively impact performance of my DML queries (insert, updates and deletes)
  • As workload changes, how I do I identify any missing indexes that could enhance performance for my new queries?

Finding the right indexes for your workload

Determining exactly the right indexes for your system can be quite a taxing process. For example, you have to consider:

  • Which columns should be indexed (based on your knowledge of how the data is queried)
  • Whether to choose a single-column index or a multiple column index
  • Whether you need a clustered index or a non-clustered index
  • Whether or not (in SQL 2005) you could benefit from an index with included columns to avoid bookmark lookups
  • How to utilize indexed views (which the optimizer might access instead of the underlying tables to retrieve a subset of your data)

All of these options increase the complexity of identifying the right indexes for your system. You might be tempted to drop indexes in everywhere “just in case”, but one has to remember that with the introduction of every new index for improving the performance of the SELECT workload, the query optimizer has to do more work when running DML statements (insert, update or delete), as now it has to reorganize this index to accommodate for the newly updated data in the table. There is a balance to be struck here.

Even once you’ve determined the prefect set of indexes, your job is not finished. Your workload will change over time (new queries are will be added, older ones removed from the application) and this might warrant revisiting existing indexes, analyzing their usage and making adjustments (modifying/dropping existing indexes or creating new ones). Maintenance of indexes is critical to ensuring optimal performance in the long run.

SQL 2005 provides the following tools and features to help you find the right indexes for your queries, and then monitor and tune them:

  • Database Engine Tuning Advisor (DTA)
  • STATISTICS XML output
  • Dynamic Management Views (DMVs)

I will describe each one of them in the subsequent sections.

Using Database Engine Tuning Advisor (DTA)

The DTA is a new tool in SQL 2005 which replaces the Index Tuning Wizard in earlier versions of SQL Server. DTA can analyze both OLTP and OLAP workloads. You can either tune a single query or the entire workload to which your server is subjected. Based on the options that you select, you can use the DTA to make recommendations for several Physical Design Structures (PDS), which include:

  • Clustered indexes
  • Non-clustered indexes
  • Indexes with included columns (to avoid bookmark lookups)
  • Indexed views
  • Partitions

Let’s walk through the steps involved in using DTA effectively.

Generating the DTA workload

The first step is to collect a workload for DTA to analyze. You can do this one of two ways:

  • Using Management Studio – if you need to optimize the performance of a single query, you can use Management Studio to provide an input to DTA. Simply type the query in Management Studio, highlight it and then right click on it to choose Analyze in Database Engine Tuning Advisor. This will launch DTA with your query as the input.
  • Using Profiler – if you want to want to determine the optimum index set for the entire workload of your server, you should collect a profiler trace using the built-in TUNING template.

To unleash the true effectiveness of DTA, you should always use a representative profiler trace. Make sure that you subject your server to all the queries that will typically be run against the data, while you are collecting the trace. This could lead to a huge trace file, but that is normal. If you simply collect a profiler trace over a 5-10 minute period, you can be pretty sure it won’t be truly representative of all the queries run against your database.

The TUNING template captures only minimal events, so there should not be any significant performance impact on your server.

As part of the code download for this article, I’ve provided a profiler trace (Workload2Analyze.trc) that was captured with the TUNING template while running numerous queries ( Queries4Workload.sql) against the AdventureWorks database. I would strongly recommend that you use this trace, or the one from your server, to get a hands-on perspective of DTA.

Consuming the workload using DTA

Having collected a truly representative profiler trace, you can use it as an input to DTA, which will then generate recommendations. You can perform one of the following two types of analysis.

SCENARIO I: Keep my existing PDS and tell me what else I am missing

This type of analysis is common and is useful if you have previously established the set of indexes that you deem to be most useful for your given workload, and are seeking further recommendations. To conduct this analysis:

  1. Launch a new session in DTA.
  2. Choose the profiler trace as the input to this session.
  3. In the Select databases and tables to tune section select your target database (AdventureWorks, if you are using my script).
  4. In the Database for workload analysis dropdown, you can either use the same database or Tempdb (or any other database). I generally use Tempdb for his purpose.
  5. At the Tuning Options tab, select the following options:
    1. Physical Design Structures to use in database -> Indexes and Indexed views
    2. Physical Design Structures to keep in database -> Keep all existing PDS
    3. Uncheck the checkbox for limit tuning time.
  6. Hit START ANALYSIS and DTA will start consuming your workload.

Once DTA is finished consuming the workload, it will list all its recommendations, under the Recommendations tab. We will talk more about reading and implementing DTA’s recommendations in the section Reading output from DTA.

SCENARIO II: Ignore my existing PDS and tell me what query optimizer needs

In scenario I, DTA makes recommendations for any missing indexes. However, this doesn’t necessarily mean your existing indexes are optimal for the query optimizer. You may also consider conducting an analysis whereby DTA ignores all existing physical design structures and recommends what it deems the best possible set of PDS for the given workload. In this way, you can validate your assumptions about what indexes are required.

To conduct this analysis, follow steps 1 to 6 as above, except that at step 5b, choose “Do not keep any existing PDS“.

Contrary to how this might sound, DTA will not actually drop or delete any existing PDSs. This is the biggest advantage of using DTA, as it means you can use the tool to perform what-if analysis without actually introducing any changes to the underlying schema.

A note on the DTA Tuning Log

As it consumes the workload, DTA will store in the DTA Tuning log (found under the Progress tab, in the bottom section) details of any errors it encounters.

Typically, you might see a message of the form “XX% of the consumed workload has syntax errors“. The tuning log will provide further details of these errors, many of which can be safely ignored. For example:

  • Statement doesn’t reference any table – statements such as SET or DECLARE in your workload will cause these errors, and they can generally be ignored
  • Statement references only small tables – DTA will not tune a query if it references a small table (10 data pages or less)
  • Incorrect syntax or object related errors – if these errors indicate queries statments containing keywords such as BEGIN, TRY, INSERTED then you can ignore them.

To learn more about how to analyze the tuning log, refer to About the Tuning Log and Determining Whether Events Can Be Tuned on BOL

Reading output from DTA

After consuming the workload, DTA presents, under the Recommendations tab, a set of recommendations for tuning your PDS. I tend to focus on the following sections:

  • Recommendation – this is the action that you need to take. Possible values include Create or Drop.
  • Target of Recommendation – this is the proposed name of the PDS to be created. The naming convention is typical of DTA and generally starts with _dta*. However, I recommend that you change this name based on the naming convention in your database.
  • Definition – this is the list of columns that this new PDS will include. If you click on the hyperlink, it will open up a new window with the T-SQL script to implement this recommendation.
  • Estimated Improvements – this is the estimated percentage improvement that you can expect in your workload performance, if you implement all the recommendations made by DTA.
  • Space used by recommendation (MB) – under the Tuning Summary section of the Reports tab, you can find out the extra space in MB that you would need, if you decide to implement these recommendations.

Inbuilt analysis reports

There are 15 inbuilt reports under the Reports tab. However, the following three reports are the most important.

NOTE
For information on these and the other 12 reports, visit the article Choosing a Database Engine Tuning Advisor Report on MSDN.

Index Usage Report (current)

Start with this report to see how your existing indexes are being used by the queries running against your server. Each index that has been used by a query is listed here. Each referenced index has a Percent Usage value which indicates the percentage of statements in your workload that referenced this index.

If an index is not listed here, it means that it has not been used by any query in your workload.

TIP: identifying indexes that are not used
If you are certain that all the queries that run against your server have been captured by your profiler trace, then you can use this report to identify indexes that are not required and possibly delete them.

Index Usage Report (recommended)

Next, look at this report to identify how index usage will change if the recommended indexes are implemented. If you compare these two reports, you will see that the index usage of some of the current indexes has fallen while some new indexes have been included with a higher usage percentage, indicating a different execution plan for your workload and improved performance.

Statement Cost Report

This report lists individual statements in your workload and the estimated performance improvement for each one of them. Using this report, you can identify your poorly performing queries and see the sort of improvement you can expect if you implement the recommendations made by DTA.

You will find that some statements don’t have any improvements (Percent improvement = 0). This is because either the statement was not tuned for some reason or it already has all the indexes that it needs to perform optimally.

Implementing DTA’s recommendations

By now, we have collected a workload using Profiler, consumed it using DTA and got a set of recommendations to improve performance. You then have the choice to either:

  • Save recommendations – you can save the recommendations in an SQL script by navigating ACTIONS | SAVE RECOMMENDATIONS. You can then manually run the script in Management Studio to create all the recommended PDS.
  • Apply recommendations using DTA – if you are happy with the set of recommendations then simply navigate ACTIONS | APPLY RECOMMENDATIONS. You can also schedule a later time to apply these recommendations (during off-peak hours, for example)

More often than not I use the Save Recommendations option. It means that I can easily compare the resulting PDS set for several scenarios. The recommendation reports are also useful for future reference. As the workload on the server changes, you need to re-evaluate the index usage and make corrections (drop redundant indexes or create new ones). You can compare the index usage on a periodic basis (may be quarterly or so) and see how the usage is changing.

Performing what-if analysis using DTA

This is a very cool feature of DTA. Suppose you don’t want to apply all the recommendations that DTA provided. However, since the Estimated Improvement value can only be achieved if you apply all of these recommendations together, you are not really sure what kind of an impact it will have if you only choose to apply a sub-set of these recommendations.

Deselect the recommendations that you don’t want to apply. Now, go to ACTIONS | EVALUATE RECOMMENDATIONS.

This will launch another session with the same options as the earlier one. However, now when you click on START ANALYSIS, DTA will provide data on estimated performance improvements, based on just this sub-set of the recommendations.

Again, the key thing to remember is that DTA performs this “what-if” analysis without actually implementing anything in the database.

TIP: checking for redundant objects left behind by DTA
In theory, DTA should not leave any extra “hypotetical” PDS (indexes etc.) in your database, after the analysis is complete. However, you can easily check for this by running the following query:

Using STATISTICS XML

If you just wish to determine the optimum index set for a particular query, then the STATISTICS XML option provides a quick alternative to DTA.

When a query is executed with STATISTICS XML enabled, SQL Server not only returns the query result-set, but also executes T-SQL statements that retrieve the execution plan for the query, in the form of an XML file (Showplan.xml). For example, try executing the following SQL statement:

  • <StmtSimple> – the value of the StatementText will be the query that you just ran. You can confirm that this matches the query that you ran in Management Studio.
  • <MissingIndexes> – details of any missing indexes that the optimizer believes would improve the performance of this query are logged here

For the above query, the relevant section in your XML file might look something like this:

It indicates that we have one missing index on the Sales.SalesOrderHeader table. Look for the value of Impact in the <MisingIndexGroup> element, which estimates the percentage increase in response time you can expect to see if you implement the set of indexes highlighted in this XML.

Within the <MissingIndex> element:

  • <ColumnGroup Usage="EQUALITY"> lists index columns that are used for equality in the query ( TerritortyID, in our case). Of all the conditions specified in the query, the query optimizer first looks for an index on the equality column in order to evaluate the rows that meet the criteria
  • <ColumnGroup Usage="INEQUALITY"> lists index columns that are used to evaluate inequality conditions in the query ( ShipMethodID and TotalDue, in this case)
  • <ColumnGroup Usage="INCLUDE"> lists index columns that are included to cover the query ( PurchaseOrderNumber and CustomerID). You can specify included columns for non-clustered indexes to avoid a bookmark lookup in your execution plan

NOTE
To find out more about included columns on non-clustered indexes please refer to the Index with Included Columns article on MSDN.

To create this missing index, you should use the following template:

CREATE NONCLUSTERED INDEX <<IndexName>>
ON <<Schema.ObjectName>> (equality_columns, followed by inequality_columns)
INCLUDE (included_columns);

So to create the missing index in this case, the command is as follows:

Using index-related Dynamic Management Views (DMVs)

Dynamic Management Views are a new feature of SQL 2005. They maintain server-wide state and configuration information for your SQL Server installation. There are more than 50 DMVs in SQL 2005 and the information stored in them can be used to monitor the health of a server instance, diagnose problems, and tune performance.

There are several sets of DMVs, each of which store particular server information. One such set of DMVs store information about indexes and their usage. If the query optimizer finds that the indexes required for implementing a particular execution plan are not present, it will log the information about these missing indexes in the index-related DMVs. This information is persisted in the DMVs, until the next restart of SQL Server or when the metadata associated with the objects gets dropped.

You can query these DMVs to find the missing indexes or store the information for periodic analysis. Let’s take a look at how you can use the information in the DMVs and how you can interpret the output to create the requisite indexes.

Before we begin, go ahead and restart your SQL Server on your test machine so that the information is cleared in each DMV. Next, in a new query window, execute the same query as in the previous section, but with the STATISTICS XML option turned off:

 

To retrieve the missing index details, simply query the relevant DMV, as follows:

 

Every time you restart SQL Server, these DMVs are flushed out. So you might want to copy the values in these DMVs into some other tables in your database to conduct periodic analysis.

Using the right tool for the job

The STATISTICS XML option and the index-related DMVs are both great tools for assessing the impact of a particular query on your indexing requirements. The biggest limitation in each case is that it cannot be used to fine tune the entire workload on your server. If you are running mission critical applications which involve a high number of queries, this approach will be very time and labor intensive.

Furthermore, when using the DMVs, you can only access raw information about columns on which indexes might be missing. This approach also has several limitations and is less accurate for queries involving inequality predicates. For more details on the limitations, refer to the Limitations for Using the Missing Indexes Feature article on BOL.

Database Engine Tuning Advisor is a more sophisticated and advanced tool, and provides more options than any of the other methods describes in this article. While using DTA, you can use a representative workload and, as a result, the recommendations made by DTA are more accurate. DTA can also be used against SQL 2000, while the other features cannot. Plus, you can perform what-if analysis using DTA without having to introducing any changes in your database.

As a best practice recommendation, I would strongly advocate that you use DMVs or STATISTICS XML option only as a means to quickly assess the impact of a query on indexing requirements. You should always validate your findings, and the estimated improvements, using Database Engine Tuning Advisor, with a representative workload.

Conclusion

In this first instalment, I essentially provided an overview of how to use the different tools and features in SQL 2005 to determine the optimal set of indexes for the workload on your server.

My personal preference is to use Database Engine Tuning Advisor. To get the most effective recommendations, you should capture a representative workload of your server using the TUNING template and use it as an input to the DTA.

In the second part of this article, I will delve deeper into how to measure and monitor the usage and effectiveness of your existing indexes using DTA and DMVs. I’ll also cover some more advanced “what-if” analysis using DTA, and some best practice for its general usage.