Defragmenting Indexes in SQL Server 2005 and 2008

Rob Sheldon tackles the subject of Index Defragmentation in SQL Server 2005 and 2008, using the sys.dm_db_index_physical_stats system function. He shows how to analyze indexes and, if necessary, how to go about  reorganizing or rebuilding indexes. He makes the point that, by analyzing indexes effectively, you can save a lot of unnecessary  rebuilding of indexes.

When data is inserted into, deleted from, or updated in a SQL Server table, the indexes defined on that table are automatically updated to reflect those changes. As the indexes are modified, the information stored in them becomes fragmented, resulting in the information being scattered across the data files. When this occurs, the logical ordering of the data no longer matches the physical ordering, which can lead to a deterioration of query performance.

To fix this problem, indexes must be periodically reorganized or rebuilt (defragmented) so the physical order of the leaf-level pages matches the logical order of the leaf nodes. This means that you should analyze your indexes periodically to determine whether they’ve become fragmented and the extent of that fragmentation. From there, you can either reorganize or rebuild the affected indexes, depending on the results of your analysis. In this article, I explain how to analyze SQL Server indexes and how reorganize and rebuild them.

Note: Some sources distinguish between internal and external fragmentation. Internal fragmentation indicates that there is too much free space on the index page. External fragmentation indicates that the logical ordering and physical ordering do not match. Microsoft documentation doesn’t make this distinction and simply provides guidelines for when to defragment indexes and which type of defragmentation to perform. For this article, I follow the Microsoft guidelines.

Analyzing Fragmentation

To analyze SQL Server 2005 or 2008 indexes, you use the system function sys.dm_db_index_physical_stats to determine which indexes are fragmented and the extent of that fragmentation. You can use the function to analyze all the indexes in an instance of SQL Server 2005 or 2008, all indexes in a database, all indexes defined on a table, or a specific index. You can also analyze an index based on the partition number of the indexed object.

The sys.dm_db_index_physical_stats function takes the following parameters (in the order specified):

  • Database ID: A smallint value that represents the ID number of a database. If null is specified, the function retrieves index-related data from all databases on a SQL Server instance. If you specify null, you must also specify null for the object ID, index ID, and partition number.
  • Object ID: An int value that represents the ID number of a table or view. If null is specified, the function retrieves index-related data for all tables and views in a specific database or SQL Server instance. If you specify null, you must also specify null for the index ID and partition number.
  • Index ID: An int value that represents the ID number of an index. If null is specified, the function retrieves index-related data for all indexes defined on the specified table or view. If you specify null, you must also specify null for the partition number. Also, if the object ID refers to a heap, use 0 as the index ID.
  • Partition number: An int value that represents the partition number of an index or heap. If null is specified, the function retrieves index-related information for all partitions related to a specific object.
  • Mode: The scan level used to obtain index-related information. Valid inputs include NULL, DEFAULT, or one of the following three modes:
    • LIMITED: Scans the smallest number of pages, which means this is the fastest mode. The LIMITED mode is equivalent to NULL and DEFAULT.
    • SAMPLED: Scans 1% of all pages. If an index contains fewer than 10,000 pages, then DETAILED mode is used.
    • DETAILED: Scans all index pages, which means this is the slowest mode, but most accurate.

You must specify all five parameters, even if their values are null.

The sys.dm_db_index_physical_stats function returns a number of values that provide details about the indexes you specify. The topic “sys.dm_db_index_physical_stats” in SQL Server Books Online provides details about each of these values. However, several values are worth noting when analyzing an index:

  • avg_fragmentation_in_percent: Percentage of the logical index that is fragmented.
  • fragment_count: Number of fragments in the leaf level.
  • avg_fragment_size_in_pages: Average number of pages in a leaf-level fragment.
  • page_count: Number of index or data pages.

An index always has at least one fragment (fragment_count). The maximum number of fragments that an index can have is equal to the number of pages (page_count). For example, an index that is made up of five pages can at the most have five fragments. The larger the fragment, the less disk I/O that is required. So a five-page index with one fragment requires less disk I/O than the index with five fragments. Ideally, the avg_fragmentation_in_percent value should be as close to zero as possible, and the avg_fragment_size_in_pages should be as high as possible.

Based on your index analysis, you can determine what action to take. Microsoft recommends that you reorganize your index if the avg_fragmentation_in_percent value is less than or equal to 30% and rebuild the index if the value is greater than 30%. (Reorganizing and rebuilding indexes are described in the following sections.)

Note: The topic “sys.dm_db_index_physical_stats” in SQL Server Books Online includes a script that automatically analyzes and then, based on that analysis, reorganizes or rebuilds your indexes according to the Microsoft guidelines. You can modify the script as necessary to meet your administrative needs.

Keep in mind that these recommendations are guidelines only. A fragmented index (especially a low percentage) is not always enough of a reason to reorganize or rebuild your index. If your queries do not regularly involve table scans as a result of singleton lookups, defragmenting the index might have no effect on performance. In addition, for smaller indexes with relatively few pages and small amounts of data, you might see little to no improvement when you defragment the index. FILLFACTOR settings can also affect the types of improvements you see.

That said, you should still analyze your indexes regularly, and the sys.dm_db_index_physical_stats function is the best tool to use. So let’s take a look at an example of how to use the function to retrieve index-related statistics. In the following SELECT statement, I retrieve index data from the AdventureWorks database:

As you can see, I use the sys.dm_db_index_physical_stats function to retrieve the four values described above, plus the object_id and index_id values. Notice that I pull back only those indexes with an avg_fragmentation_in_percent value greater than zero and I order the results by object ID and index ID. That way, the indexes for each table and view are grouped together. Notice also that I use the DB_ID() function to retrieve the ID for the AdventureWorks database. Table 1 shows a partial list of the results that are returned from my local instance of SQL Server 2005. (I also ran the SELECT statement against an instance of SQL Server 2008 and received similar results.)

ObjectID

IndexID

PercentFragment

TotalFrags

PagesPerFrag

NumPages

18099105

2

50

2

1

2

30623152

1

20

2

2.5

5

30623152

2

66.6666666666667

3

1

3

30623152

3

50

2

1

2

30623152

4

50

2

1

2

53575229

4

7.40740740740741

3

9

27

66099276

1

50

2

1

2

78623323

1

0.380710659898477

29

27.1724137931034

788

78623323

2

1.91082802547771

8

19.625

157

78623323

3

1.41509433962264

13

16.3076923076923

212

158623608

1

0.483870967741936

31

20

620

158623608

2

7.31707317073171

14

8.78571428571429

123

158623608

3

1.78571428571429

10

16.8

168

181575685

1

15

4

5

20

181575685

2

33.3333333333333

5

1.8

9

181575685

3

30

4

2.5

10

270624007

1

50

2

1

2

279672044

1

98.5714285714286

70

1

70

306100131

2

17.6470588235294

4

4.25

17

309576141

1

0.536672629695885

14

39.9285714285714

559

Table 1: Partial List of Index Statistics for the AdventureWorks Database

Each row in Table 1 represents a specific index whose avg_fragmentation_in_percent value is greater than zero. As you can see, the statistics can vary greatly, but every index is fragmented to a certain degree.

As I mentioned, Table 1 shows only a partial list of indexes, but the number of indexes can grow quite large. As a result, you might want to narrow down the result set. In the following example, I specify the Sales.StoreContact table in the AdventureWorks database so that information related only to that table’s indexes is returned:

Notice that I use the OBJECT_ID() function to return the object ID of the StoreContact table. Also notice that I join the sys.dm_db_index_physical_stats function to the sys.indexes table to retrieve the name of the indexes. Now the results are far easier to read, as you can see in Table 2.

IndexName

PercentFragment

TotalFrags

PagesPerFrag

NumPages

AK_StoreContact_rowguid

66.6666666666667

3

1

3

IX_StoreContact_ContactID

50

2

1

2

IX_StoreContact_ContactTypeID

50

2

1

2

PK_StoreContact_CustomerID_ContactID

20

2

2.5

5

Table 2: List of Index Statistics for the Sales.StoreContact Table in the AdventureWorks Database

As Table 2 indicates, the StoreContact table contains four indexes whose avg_fragmentation_in_percent value is greater than zero. Based on Microsoft’s guidelines, I might choose to reorganize the PK_StoreContact_CustomerID_ContactID index and rebuild the others.

However, before we proceed to that process, let’s look at one more example. In the following SELECT statement, I return statistics on a specific index in the StoreContact table:

In this case, I declare variables to pass in the index name and to retrieve the index ID. I then use those variables in the SELECT statement to retrieve the necessary data. Table 3 shows the information returned by this statement:

IndexName

PercentFragment

TotalFrags

PagesPerFrag

NumPages

PK_StoreContact_CustomerID_ContactID

20

2

2.5

5

Table 3: Statistics on the PK_StoreContact_CustomerID_ContactID Index

As you would expect, the data matches the data shown in Table 2 for this index. From this information, you can see that 20% of the logical index is fragmented (avg_fragmentation_in_percent = 20), the index contains two fragments (fragment_count = 2) that are spread across five pages (page_count = 5), for an average of 2.5 pages per fragment (avg_fragment_size_in_pages = 2.5). Based on the avg_fragmentation_in_percent value of 20%, the index should be reorganized, rather the rebuilt.

Reorganizing Indexes

When you reorganize an index, SQL Server physically reorders the leaf-level pages to match the logical order of the leaf nodes. The process uses the existing pages only and does not allocate new ones, but it does compact the index pages. In addition, reorganization uses minimal resources and is automatically performed online, without blocking queries or updates. You should reorganize indexes only if they’re lightly fragmented, otherwise, you should rebuild them.

To reorganize an index, run an ALTER INDEX statement and include the keyword REORGANIZE, as shown in the following example:

Notice that I specify the index name and the table. After I run this statement, I then use the sys.dm_db_index_physical_stats function to retrieve index-related statistics. The results are shown in Table 4.

IndexName

PercentFragment

TotalFrags

PagesPerFrag

NumPages

PK_StoreContact_CustomerID_ContactID

20

2

2.5

5

Table 4: Statistics on the PK_StoreContact_CustomerID_ContactID Index after Reorganization

If you compare these results to those shown Table 3, you’ll see that nothing has changed. When you try to reorganize or rebuild an index with few pages, you’ll often see no benefit. However, when I reorganized the PK_Contact_ContactID index on the Person.Contact table, I reduced the number of fragments on 559 pages from 14 to 9 and lowered the percentage of fragmentation by about one-third.

Rebuilding Indexes

Rebuilding an index is generally more effective than reorganizing it because it drops the original index and builds a new one. As a result, the index starts clean with minimal fragmentation and the pages are compacted, and new pages are allocated as needed. In addition, you can choose to rebuild an index offline (the default) or online.

Rebuilding an index can be as simple as reorganizing one. Once again, use the ALTER INDEX statement, but specify the REBUILD keyword, rather than REORGANIZE, as shown in the following example:

In this case, I’m rebuilding the IX_StoreContact_ContactID index. When I pull statistics on the rebuilt index, I get the results shown Table 5.

IndexName

PercentFragment

TotalFrags

PagesPerFrag

NumPages

IX_StoreContact_ContactID

50

2

1

2

Table 5: Statistics on the IX_StoreContact_ContactID Index after Rebuilding

If you compare the results shown in Table 5 to Table 2, you’ll see that once again, nothing has changed. As with reorganizing an index, you’ll often find that rebuilding an index with few pages will result in little to no benefit. When I rebuilt the PK_Individual_CustomerID index on the Sales.Individual table, I reduced the number of pages from 3082 to 3081 and the number of fragments from 71 to 18. I also reduced the percentage of fragmentation by about 80%. Note, however, that the reason I didn’t use this index originally is because the percentage of fragmentation was already very low. As I mentioned above, the benefit you’ll see from rebuilding (or reorganizing) an index depends on factors such as page count, index size, and Fill Factor.

When you rebuild an index, you can include a WITH clause after the REBUILD keyword. In the WITH clause, you can specify one or more options that define how the index will be rebuilt. For example, you can define a Fill Factor, specify whether to recompute the distribution statistics, or specify whether row locks are allowed. In the following ALTER INDEX statement, I include a WITH clause that defines the FILLFACTOR and ONLINE options:

In this case, the FILLFACTOR option is set to 70, which determines the percentage of how much the leaf level of each index page will be filled during index creation or alteration. As a result, the leaf level will be filled to only 70%. The ONLINE option is set to ON, which means that the StoreContact table and IX_StoreContact_ContactTypeID index will be available for queries and data modification during the index operation. For a complete list of all the options available to the WITH clause, see the topic “ALTER INDEX (Transact-SQL)" in SQL Server Books Online.

After I ran the statement above and rebuilt the IX_StoreContact_ContactTypeID index, I retrieved the statistics on the index, as shown in Table 6.

IndexName

PercentFragment

TotalFrags

PagesPerFrag

NumPages

IX_StoreContact_ContactTypeID

33.3333333333333

2

1.5

3

Table 6: Statistics on the IX_StoreContact_ContactTypeID Index after Rebuilding

If you compare these results to those in Table 2, you’ll see that the percentage of fragmentation dropped from 50% to 33.3% and the number of pages increased from 2 to 3 as a result of changing the FILLFACTOR. (The default Fill Factor is 100%.) There are still two fragments, but now the average is 1.5 pages per fragment, rather than 1.

Another method you can use to rebuild an index is the CREATE INDEX statement. To use this statement, you must create an index by the same name as the original and include the DROP_EXISTING option in the WITH clause, as shown in the following example:

In this case, I’m rebuilding the AK_StoreContact_rowguid index, which is a unique, nonclustered index. Notice that the DROP_EXISTING option is set to ON, which means that the original index will be dropped and a new one created. Notice also that I’ve included the FILLFACTOR and ONLINE options in the WITH clause. For a complete list of all the options available to the WITH clause, see the topic "CREATE INDEX (Transact-SQL)" in SQL Server Books Online.

Now take a look at Table 7, which shows the index statistics that I retrieved after I rebuilt the index.

IndexName

PercentFragment

TotalFrags

PagesPerFrag

NumPages

AK_StoreContact_rowguid

25

2

2

4

Table 7: Statistics on the AK_StoreContact_rowguid Index after Re-Creating Index

Once again, if you compare these results to Table 2, you’ll see that I reduced the percentage of fragmentation from 66.7% to 25%, reduced the number of fragments from 3 to 2, and increased the pages per fragment from 1 to 2 (in part because I reduced the Fill Factor to 70%).

Using the CREATE INDEX statement rather than ALTER INDEX statement to rebuild an index has several advantages. The CREATE INDEX statement lets you add or remove key columns, change column order, change the sort order, repartition a partitioned index, and move the index to a different filegroup. However, the advantage to using the ALTER INDEX statement is that you can rebuild more than one index in a single transaction.

Note: You can also drop an index and then use the CREATE INDEX statement to re-create that index. However, Microsoft recommends against this approach because the index is offline during the drop and re-creation operations. Also, if the transaction fails, the index is not re-created.

Regardless of the approach you use to rebuild an index-or whether you rebuild or reorganize an index-the important point to remember is that you must first properly analyze your indexes, and that’s where the sys.dm_db_index_physical_stats function comes in. The function provides valuable details about all your indexes, and then, from this information, you can determine how to proceed. The actual reorganizing and rebuilding are very straightforward. You can even automate these processes based on the statistics returned by the function. Be sure to refer to SQL Server Books Online as necessary for details about all these operations.