SQL Server Heaps, and Their Fragmentation

In SQL Server, heaps are rightly treated with suspicion. Although there are rare cases where they perform well, they are likely to be the cause of poor performance. If a table is likely to have a large number of changes, then it can become fragmented due to way that space is allocated and forward pointers used. How does one detect this problem? Is it significant? How does one deal with it, if necessary?

What is a Heap?

In relational databases, a heap is a type of table that has no clustered index. Ideally, all tables should have a clustered index, though there are some rare cases where heaps perform rather better. Tables that are treated as logs, with inserts predominating, are an example. Where there are few rows of static data in a table, a heap can be an advantage, especially if covering non-clustered index is used.

The word ‘Heap’ means an untidy collection of objects piled on top of each other. In a relational database, a heap table is an unordered, and scattered, collection of pages. Why unordered, untidy and scattered? Whenever rows are inserted into a heap table, they are heaped one after one until the page become full. After the page is full, SQL server use PFS (Page Free Space) page to find out the free pages and move the new rows wherever enough free space found. Once all pages are full, a new page will be allocated to accommodate new rows.

Changing the Data in a Heap

When a row in a heap is updated so that it then no longer fits in its original location in the page, then the row has to be moved to another page that has the required space or to a new page, and a forwarding pointer is left on the page from where the row moved. This is called Forward Pointing. In consequence, rows are not in specific order and one row can be at an entirely different page to the previous one, and the next. There is no physical or logical ordering between the pages and rows in heap. SQL Server has to use IAM (Index Allocation Map) page to move through heap. Uwe Ricken, a SQL expert wrote an excellent article on internals of heap and the way Forwarded Record work which is good for further reference.

The Fragmentation of Heaps

Inserts, Updates and Deletes can all fragment the heap table and its indexes. This fragmentation can be either internal and external: If the new row can’t fit into existing page after an update or insert, it will be inserted into either new page or an existing page having free space. Deleting records also contributes towards fragmentation because it leads to empty space on the page.

As the amount of Fragmentation increases, all operations on the heap will require more Input and Output (I/O), causing query performance issues and more disk space. When SQL Server is confronted with forward pointers, it must first read the forward pointer page and then the data at the destination point which leads to more I/O. Furthermore, if rows from the heap table need to be returned in a sorted order, an ORDER BY clause is required, and this will impact the performance if the column is not part of index. Non-Clustered indexes on a heap have to contain both the non-clustered key value and a Row identifier (RID) that is mixture of file identifier (ID), page number and number of the row. Since all non-clustered indexes have the RID in leaf pages to refer the data other than non-clustered key, any modification to the heap will require the non-clustered page to be adjusted. RID lookup on big table with lots of forwarded pointers will cause performance issue in terms of more I/O.

All this suggests that, in certain circumstances, heaps will need to be defragmented to remove the build-up of forward pointers. How do you detect the level of fragmentation? Do you get a benefit from removing fragmentation? In this article, we’ll explore these questions.

Locating your Heaps

To find all your heaps in a database, try this:

To list all your tables in a database, specifying whether they are heaps or tables, you can use this:

To know whether a table (‘purchasing.Shipmethod’ in this example) is a heap or clustered index, run this command:

Maintaining your Heaps

To avoid performance and space issues due to fragmented heap table, they should be regularly checked to determine whether the level of fragmentation has exceeded a threshold. This fragmentation threshold is based on the quantity of forwarded pointers. You will need to determine a value that is appropriate for the workload and usage of the database but it is most usual to defragment a table when fragmentation reaches around 30 or 40 percent and the forwarded count is greater than a value based on workload analysis. Always do the maintenance in slack hours having low workload as it can hamper the performance of task if heap table is big. Rebuilding heap will involve removing unused space, make pages contiguous, rebuilding all the indexes and updating the distribution statistics on the heap. Check out the highlighted line, yes, it will rebuild all non-clustered indexes and update all stats ad that’s why it is better to convert it to a clustered table rather than rebuild the heap. Another SQL expert Paul Randal also stated the same in this article.

If a heap is getting the pattern of use that results in heavy fragmentation, then it probably shouldn’t be a heap in the first place. Still, if a database has heap tables and there are good reasons for keeping these as heaps, they should be maintained to avoid any performance issue and to reclaim extra space occupied. After rebuilding a heap, just ensure that your index maintenance job, or any script used for index and statistics maintenance, skips the rebuilding of non-clustered indexes and updating of statistics if the corresponding heap table is rebuilt, thus avoiding double work and overhead.

Defragmentation in Practice

There are two main ways to defragment a Heap Table:

  1. Create a Clustered Index and then drop it.
  2. Use the ALTER TABLE command to rebuild the Heap. This REBUILD option is available in SQL Server 2008 onwards. It can be done with online option in enterprise edition.

Which is the best way of doing this? We will test both the methods in a simple example and see their side-effects, by creating a HeapRebuildTest table with three columns, along with one non-clustered index and one statistic.

Now, we’ll populate the table with some data:

We now update some rows, just to cause forward pointers to be created.

We can now see the fragmentation using sys.dm_db_index_physical_stats Dynamic Management View with limited columns and filtered to see the fragmentation for leaf level:

This is the output from our query.

2298-1-d9e3ac7d-249e-45eb-a75e-fcd18e72a

Let’s see the output from our query:

The query returned this result:

2298-1-a854226e-fb69-4ed4-ae61-32bf152df

Both the above query results depict that heap and Non-Clustered index are heavily fragmented and statistics are also not updated. This will can cause performance issues as well as occupying more space. Let’s check how much time and how many reads are required by running a query output with the set statistics options set to ON to gauge performance.

Result shows that SQL Server has done 26157 logical read along with 5 scans and taken 63 milliseconds CPU time with 129 milliseconds elapsed time.

Let’s create and drop the clustered index so as to defragment the table and check the benefit:

Then, let’s again execute the queries to check the fragmentation and statistics.

2298-1-22b41bc8-8423-45e8-b553-25870f0d1

Great! Now, fragmentation is eliminated and the forwarded_record_count shows 0 rows! Let’s see the statistics:

2298-1-b9e3ece5-831d-49b1-a510-761d2d838

Statistics also updated with Rowmodctr = 0 for stats and index! Good! What is Rowmodctr? It basically shows the number of changes that have happened in terms of Inserts, updates and deletes since last time the stats updated. This, along with the Stats_date() function, are good indicators for stale stats. Let me re-execute the same query and check performance benefit.

2298-1-a59d9840-68cb-4418-b5cc-5d10e21c2

This time query has done 5835 logical read and taken lesser CPU time along with less elapsed time. Good!

Let’s test the second way:

After populating the table again and run the update command for ids between 100000 to 200000, below is the fragmentation after that:

2298-1-2c1d9bbc-b9cf-4bbf-9f6e-28ba928c0

This time, Heap shows less fragmentation because of rebuilding that we did by creating and dropping clustered index. Since, space released back to SQL Server and all pages defragmented, SQL Server efficiently managed space by moving the rows while inserting the record and caused less fragmentation comparatively.

2298-1-1aed58a1-ec3a-493f-98be-e22d4b228

Indexes and stats are again stale and fragmented. Let’s run below command:

Fragmentation:

2298-1-1c01492d-8cc3-48a1-b0a1-2f8b8fc07

Statistics:

2298-1-f0b7ad37-f25b-423b-b1e7-9432e5b81

Above results show that the index was de-fragmented and statistics updated with this method too. The only difference is that one automatic statistic was created and its statistics update time modified but not the Rowmodctr. It can’t be concluded that either method is a better way of rebuilding the index and updating the statistics because both methods do the same thing under the hood. But the additional work of creating and dropping a clustered index sorts the data by the index value and does some adjustments in the PFS (Page Free Space) page to reflect the free space on each page once dropped, which can prove to give a performance advantage.

Conclusions

A heap can perform better for certain exceptional uses, but if the heap gets to the point where it requires defragmentation then it probably shouldn’t be a heap, but a table. It seems that, by rebuilding heap tables, you also de-fragment all its indexes and update its statistics as a side effect. This will consume lot of resource, take significant time which hurt overall performance of database server during maintenance. Even rebuilding a Heap with the online option in Enterprise Edition also uses more resource and puts an extra load on Tempdb. Never rebuild a heap as a regular routine: Instead, check to see if indexes are fragmented, whether the fragmentation threshold is exceeded, or if the forwarded row count has exceeded certain threshold based on workload analysis. If you do this, then you shouldn’t schedule updating stats and rebuilding non-clustered indexes for heaps because this is done as a side-effect of defragmentation: Furthermore, analyze all your heap tables and their pattern of usage to see if they would be better converted into a clustered table.