SQL Server Heaps, and Their Fragmentation

What is a Heap?

In relational databases, a heap is a type of table that has no clustered index. Ideally, all tables should have a clustered index, though there are some rare cases where heaps perform rather better. Tables that are treated as logs, with inserts predominating, are an example. Where there are few rows of static data in a table, a heap can be an advantage, especially if covering non-clustered index is used.

The word ‘Heap’ means an untidy collection of objects piled on top of each other. In a relational database, a heap table is an unordered, and scattered, collection of pages. Why unordered, untidy and scattered? Whenever rows are inserted into a heap table, they are heaped one after one until the page become full. After the page is full, SQL server use PFS (Page Free Space) page to find out the free pages and move the new rows wherever enough free space found. Once all pages are full, a new page will be allocated to accommodate new rows.

Changing the Data in a Heap

When a row in a heap is updated so that it then no longer fits in its original location in the page, then the row has to be moved to another page that has the required space or to a new page, and a forwarding pointer is left on the page from where the row moved. This is called Forward Pointing. In consequence, rows are not in specific order and one row can be at an entirely different page to the previous one, and the next. There is no physical or logical ordering between the pages and rows in heap. SQL Server has to use IAM (Index Allocation Map) page to move through heap. Uwe Ricken, a SQL expert wrote an excellent article on internals of heap and the way Forwarded Record work which is good for further reference.

The Fragmentation of Heaps

Inserts, Updates and Deletes can all fragment the heap table and its indexes. This fragmentation can be either internal and external: If the new row can’t fit into existing page after an update or insert, it will be inserted into either new page or an existing page having free space. Deleting records also contributes towards fragmentation because it leads to empty space on the page.

As the amount of Fragmentation increases, all operations on the heap will require more Input and Output (I/O), causing query performance issues and more disk space. When SQL Server is confronted with forward pointers, it must first read the forward pointer page and then the data at the destination point which leads to more I/O. Furthermore, if rows from the heap table need to be returned in a sorted order, an ORDER BY clause is required, and this will impact the performance if the column is not part of index. Non-Clustered indexes on a heap have to contain both the non-clustered key value and a Row identifier (RID) that is mixture of file identifier (ID), page number and number of the row. Since all non-clustered indexes have the RID in leaf pages to refer the data other than non-clustered key, any modification to the heap will require the non-clustered page to be adjusted. RID lookup on big table with lots of forwarded pointers will cause performance issue in terms of more I/O.

All this suggests that, in certain circumstances, heaps will need to be defragmented to remove the build-up of forward pointers. How do you detect the level of fragmentation? Do you get a benefit from removing fragmentation? In this article, we’ll explore these questions.

Locating your Heaps

To find all your heaps in a database, try this:

SELECT object_name(sys.tables.object_id)

FROM sys.indexes

INNER JOIN sys.tables

ON sys.indexes.object_ID=sys.tables.OBJECT_ID

WHERE sys.indexes.type=0

To list all your tables in a database, specifying whether they are heaps or tables, you can use this:

SELECT object_schema_name(sys.tables.object_id)+'.'+object_name(sys.tables.object_id),

CASE WHEN sys.indexes.OBJECT_ID IS null THEN 'clustered' ELSE 'Heap' end

FROM sys.tables

LEFT OUTER JOIN

sys.indexes

ON sys.indexes.object_ID=sys.tables.OBJECT_ID

and sys.indexes.type=0

where object_schema_name(sys.tables.object_id) <>'sys'

To know whether a table (‘purchasing.Shipmethod’ in this example) is a heap or clustered index, run this command:

Select top 1 Case

when index_id=0 then 'Heap'

Else 'Clustered'

End as TableType

FROM sys.indexes

WHERE Object_id = object_id('purchasing.Shipmethod')

order BY index_id

Maintaining your Heaps

To avoid performance and space issues due to fragmented heap table, they should be regularly checked to determine whether the level of fragmentation has exceeded a threshold. This fragmentation threshold is based on the quantity of forwarded pointers. You will need to determine a value that is appropriate for the workload and usage of the database but it is most usual to defragment a table when fragmentation reaches around 30 or 40 percent and the forwarded count is greater than a value based on workload analysis. Always do the maintenance in slack hours having low workload as it can hamper the performance of task if heap table is big. Rebuilding heap will involve removing unused space, make pages contiguous, rebuilding all the indexes and updating the distribution statistics on the heap. Check out the highlighted line, yes, it will rebuild all non-clustered indexes and update all stats ad that’s why it is better to convert it to a clustered table rather than rebuild the heap. Another SQL expert Paul Randal also stated the same in this article.

If a heap is getting the pattern of use that results in heavy fragmentation, then it probably shouldn’t be a heap in the first place. Still, if a database has heap tables and there are good reasons for keeping these as heaps, they should be maintained to avoid any performance issue and to reclaim extra space occupied. After rebuilding a heap, just ensure that your index maintenance job, or any script used for index and statistics maintenance, skips the rebuilding of non-clustered indexes and updating of statistics if the corresponding heap table is rebuilt, thus avoiding double work and overhead.

Defragmentation in Practice

There are two main ways to defragment a Heap Table:

Create a Clustered Index and then drop it.
Use the ALTER TABLE command to rebuild the Heap. This REBUILD option is available in SQL Server 2008 onwards. It can be done with online option in enterprise edition.

1

Alter table TableName rebuild

Which is the best way of doing this? We will test both the methods in a simple example and see their side-effects, by creating a HeapRebuildTest table with three columns, along with one non-clustered index and one statistic.

USE Tempdb

Create table HeapRebuildTest

(

Id int,

Name varchar(100),

EmailId Varchar(200)

)

Create nonclustered index IDX_HeapRebuildTest_Name on HeapRebuildTest(name)

Create statistics st_HeapRebuildTest_name on HeapRebuildTest(name)

Now, we’ll populate the table with some data:

Declare @ID INT

Declare @Max INT

Declare @NAME VARCHAR(100)

Set @ID = 1

Set @Max = 500000

While(@ID <=@Max)

Begin

Set @NAME = Cast(left(newid(),60) as varchar)

Insert HeapRebuildTest

Select @Id, @NAME,@NAME+'@abc.com'

Set @ID = @ID+1

End

We now update some rows, just to cause forward pointers to be created.

Update HeapRebuildTest

set

EmailId = Cast(left(newid(),60) as varchar)+left(Emailid, charindex('@',Emailid,0)) where ID between 1 and 100000

We can now see the fragmentation using sys.dm_db_index_physical_stats Dynamic Management View with limited columns and filtered to see the fragmentation for leaf level:

Select Database_id,

Index_id,

Object_name([object_id]) as TableName,

Case when SI.name is null then 'HEAP' else SI.name End as IndexName,

Index_type_desc,

Avg_fragmentation_in_percent,

forwarded_record_count

From sys.dm_db_index_physical_stats(db_id(),object_id('HeapRebuildTest'),null,null,'detailed') AS SDDIPS

Inner join sys.sysindexes AS SI on SDDIPS.[object_id] = SI.id

AND SDDIPS.index_id = SI.indid

Where index_level = 0

This is the output from our query.

2298-1-d9e3ac7d-249e-45eb-a75e-fcd18e72a

Let’s see the output from our query:

Select object_name(id) as TableName,

Name as StatsName,

Rowmodctr,

Stats_date(id,indid) as UpdatetimeStats

from sys.sysindexes

where object_name(id) = 'HeapRebuildTest'

The query returned this result:

2298-1-a854226e-fb69-4ed4-ae61-32bf152df

Both the above query results depict that heap and Non-Clustered index are heavily fragmented and statistics are also not updated. This will can cause performance issues as well as occupying more space. Let’s check how much time and how many reads are required by running a query output with the set statistics options set to ON to gauge performance.

Set Statistics IO ON

Set Statistics Time ON

Select

name

from HeapRebuildTest

where id between 2000 and 5000

Result shows that SQL Server has done 26157 logical read along with 5 scans and taken 63 milliseconds CPU time with 129 milliseconds elapsed time.

Let’s create and drop the clustered index so as to defragment the table and check the benefit:

Create clustered index IDX_HeapRebuildTest_Id on HeapRebuildTest(Id)

Drop index IDX_HeapRebuildTest_Id on HeapRebuildTest

Then, let’s again execute the queries to check the fragmentation and statistics.

Select Database_id,

Index_id,

Object_name([object_id]) as TableName,

Case when SI.name is null then 'HEAP' else SI.name End as IndexName,

Index_type_desc,

Avg_fragmentation_in_percent,

forwarded_record_count

From sys.dm_db_index_physical_stats(db_id(),object_id('HeapRebuildTest'),null,null,'detailed') AS SDDIPS

Inner join sys.sysindexes AS SI on SDDIPS.[object_id] = SI.id

AND SDDIPS.index_id = SI.indid

Where index_level = 0

2298-1-22b41bc8-8423-45e8-b553-25870f0d1

Great! Now, fragmentation is eliminated and the forwarded_record_count shows 0 rows! Let’s see the statistics:

Select object_name(id) as TableName,

Name as StatsName,

Rowmodctr,

Stats_date(id,indid) as UpdatetimeStats

from sys.sysindexes

where object_name(id) = 'HeapRebuildTest'

2298-1-b9e3ece5-831d-49b1-a510-761d2d838

Statistics also updated with Rowmodctr = 0 for stats and index! Good! What is Rowmodctr? It basically shows the number of changes that have happened in terms of Inserts, updates and deletes since last time the stats updated. This, along with the Stats_date() function, are good indicators for stale stats. Let me re-execute the same query and check performance benefit.

2298-1-a59d9840-68cb-4418-b5cc-5d10e21c2

This time query has done 5835 logical read and taken lesser CPU time along with less elapsed time. Good!

Let’s test the second way:

After populating the table again and run the update command for ids between 100000 to 200000, below is the fragmentation after that:

2298-1-2c1d9bbc-b9cf-4bbf-9f6e-28ba928c0

This time, Heap shows less fragmentation because of rebuilding that we did by creating and dropping clustered index. Since, space released back to SQL Server and all pages defragmented, SQL Server efficiently managed space by moving the rows while inserting the record and caused less fragmentation comparatively.

2298-1-1aed58a1-ec3a-493f-98be-e22d4b228

Indexes and stats are again stale and fragmented. Let’s run below command:

1	Alter table HeapRebuildTest rebuild

Fragmentation:

2298-1-1c01492d-8cc3-48a1-b0a1-2f8b8fc07

Statistics:

2298-1-f0b7ad37-f25b-423b-b1e7-9432e5b81

Above results show that the index was de-fragmented and statistics updated with this method too. The only difference is that one automatic statistic was created and its statistics update time modified but not the Rowmodctr. It can’t be concluded that either method is a better way of rebuilding the index and updating the statistics because both methods do the same thing under the hood. But the additional work of creating and dropping a clustered index sorts the data by the index value and does some adjustments in the PFS (Page Free Space) page to reflect the free space on each page once dropped, which can prove to give a performance advantage.

Conclusions

A heap can perform better for certain exceptional uses, but if the heap gets to the point where it requires defragmentation then it probably shouldn’t be a heap, but a table. It seems that, by rebuilding heap tables, you also de-fragment all its indexes and update its statistics as a side effect. This will consume lot of resource, take significant time which hurt overall performance of database server during maintenance. Even rebuilding a Heap with the online option in Enterprise Edition also uses more resource and puts an extra load on Tempdb. Never rebuild a heap as a regular routine: Instead, check to see if indexes are fragmented, whether the fragmentation threshold is exceeded, or if the forwarded row count has exceeded certain threshold based on workload analysis. If you do this, then you shouldn’t schedule updating stats and rebuilding non-clustered indexes for heaps because this is done as a side-effect of defragmentation: Furthermore, analyze all your heap tables and their pattern of usage to see if they would be better converted into a clustered table.

Register for Simple Talk

SQL Server Heaps, and Their Fragmentation

What is a Heap?

Changing the Data in a Heap

The Fragmentation of Heaps

Locating your Heaps

Maintaining your Heaps

Defragmentation in Practice

Conclusions

Article tags

About the author

Neeraj Tripathi

Neeraj Tripathi's contributions

Articles

Books

Top topics

Neeraj Tripathi's latest contributions:

What is a Heap?

Changing the Data in a Heap

The Fragmentation of Heaps

Locating your Heaps

Maintaining your Heaps

Defragmentation in Practice

Conclusions

Article tags

Recommended

About the author

Neeraj Tripathi

Neeraj Tripathi's contributions

Articles

Books

Top topics

Neeraj Tripathi's latest contributions: