{"id":87231,"date":"2020-05-21T18:35:15","date_gmt":"2020-05-21T18:35:15","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=87231"},"modified":"2022-05-06T17:21:33","modified_gmt":"2022-05-06T17:21:33","slug":"heaps-in-sql-server-part-1-the-basics","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/heaps-in-sql-server-part-1-the-basics\/","title":{"rendered":"Heaps in SQL Server: Part 1 The Basics"},"content":{"rendered":"<p><strong>The series so far:<\/strong><\/p>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/sql-development\/heaps-in-sql-server-part-1-the-basics\/\">Heaps in SQL Server: Part 1 The Basics<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/sql-development\/heaps-in-sql-server-part-2-optimizing-reads\/\">Heaps in SQL Server: Part 2 Optimizing Reads<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/sql-development\/heaps-in-sql-server-part-3-nonclustered-indexes\/\">Heaps in SQL Server: Part 3 Nonclustered Indexes<\/a><\/li>\n<li>\u00a0<a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/sql-development\/heaps-in-sql-server-part-4-pfs-contention\/\">Heaps in SQL Server: Part 4 PFS contention<\/a><\/li>\n<\/ol>\n\n<p>This article is the beginning of a series of articles about Heaps in Microsoft SQL Server. Heaps are rejected by many database developers using Microsoft SQL Server. The concerns about Heaps are even fuelled by Microsoft itself by <strong>generally<\/strong> recommending the use of clustered indexes for every table. Globally renowned SQL Server experts also generally advise that tables in Microsoft SQL Server be provided with a clustered index.<\/p>\n<p>Again, and again, I try to convince developers that a heap can even have advantages. I have discussed many pros and cons with these people and would now like to break a \u201cPRO HEAP\u201d lance. This article deals with the basics. Important system objects that play a major role in Heaps are only superficially presented in this article and described in detail in a follow up article.<\/p>\n<h2>The Basics of Heaps<\/h2>\n<p>Heaps are avoided in Microsoft SQL Server as the devil avoids holy water. One reason for the rejection is that many blog articles by renowned SQL Server Experts indicate that a table should use a clustered index. This article is the beginning of a series of articles that focuses on the broadest possible scope of application pro or con Heaps. The focus is on the heap to decide for yourself whether and when a heap has advantages or disadvantages compared to a clustered index. A sensible decision requires that you understand the way of working and the internal structures.<\/p>\n<h2>What are Heaps<\/h2>\n<p>Heaps are tables without a clustered index. Without an index, no sorting of the data is guaranteed. Data is stored in the table where there is space without a predetermined order. If the table is empty, data records are entered in the table in the order of the <code>INSERT<\/code> commands.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">CREATE TABLE dbo.Customers\r\n(\r\n\tId\tINT NOT NULL,\r\n\tName\tVARCHAR (200) NOT NULL,\r\n\tStreet\tVARCHAR (200) NOT NULL,\r\n\tCode\tCHAR (3) NOT NULL,\r\n\tZIP\tVARCHAR (5) NOT NULL,\r\n\tCity\tVARCHAR (200) NOT NULL,\r\n\tState\tVARCHAR (200) NOT NULL\r\n)\r\nGO\r\nINSERT INTO dbo.Customers(Id, Name, Street, Code, Zip, City, State)\r\nVALUES(1,'John Smith','Times Square','123','10001',\r\n'New York','New York');<\/pre>\n<p>The script creates a new table for storing customer data and inserts one row. Since neither an index nor a primary key is used with the <code>CLUSTERED<\/code> option, data will be stored \u201cunsorted\u201d in this table.<\/p>\n<p>If a table does not have a clustered index, the heap can be seen in the system view <code>[sys]. [Indexes] <\/code>which will always have the <code>[Index_Id] = 0<\/code>.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">-- A Heap has always the index_id = 0\r\nSELECT\tobject_id,\r\n\tname,\r\n\tindex_id,\r\n\ttype_desc\r\nFROM\tsys.indexes\r\nWHERE\tobject_id = OBJECT_ID (N'dbo.Customers', N'U');\r\nGO<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"456\" height=\"66\" class=\"wp-image-87233\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/05\/word-image-20.png\" \/><\/p>\n<p><strong>Figure 1: [index_id] = 0 or [index_id] = 1 is reserved for Heaps OR Clustered Indexes<\/strong><\/p>\n<h2>The Structure of Heaps<\/h2>\n<p>Since Heaps are primitive data stores, no complex structures are required to manage Heaps. Heaps have one row in <a href=\"https:\/\/docs.microsoft.com\/de-de\/sql\/relational-databases\/system-catalog-views\/sys-partitions-transact-sql?view=sql-server-ver15\">sys.partitions<\/a>, with [index_id] = 0 for <strong>each<\/strong> partition used by the Heap. When a Heap has multiple partitions, each partition has a Heap structure that contains the data for that specific partition.<\/p>\n<p>Depending on the data types in the Heap, each Heap structure will have one or more allocation units to store and manage the data for a specific partition. At a minimum, each Heap will have one IN_ROW_DATA allocation unit per partition.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"857\" height=\"415\" class=\"wp-image-87234\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/05\/word-image-21.png\" \/><\/p>\n<p><strong>Figure 2: Storage concept of Heaps \/ Clustered Indexes \/ Nonclustered Indexes<\/strong><\/p>\n<h2>Index Allocation Map<\/h2>\n<p>Each table and index use IAM structures (Index Allocation Map) to manage the data pages. An IAM page contains information about blocks (extents) that are used by a table or index per allocation unit.<\/p>\n<p>Data pages of a Heap do not store references to next or previous data pages (links). This is not necessary for Heaps because a Heap does not require data to be sorted according to defined criteria.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT\tSIAU.type_desc,\r\n       SIAU.total_pages,\r\n       SIAU.used_pages,\r\n       SIAU.data_pages,\r\n       SIAU.first_iam_page,\r\n\tsys.fn_PhysLocFormatter(SIAU.first_iam_page) AS iam_page\r\nFROM\tsys.system_internals_allocation_units AS SIAU\r\n\tINNER JOIN sys.partitions AS P\r\n\tON\r\n\t\tSIAU.container_id = \r\n\t\tCASE WHEN SIAU.type IN (1, 3)\r\n\t\t  THEN P.hobt_id\r\n\t\t  ELSE P.partition_id\r\n\t\tEND\r\nWHERE\tP.object_id = OBJECT_ID (N'dbo.Customers', N'U');\r\nGO<\/pre>\n<p>The column <code>[first_iam_page]<\/code> in the <code>[sys].[system_internals_allocation_units]<\/code> system view points to the first IAM page in the chain of IAM pages that manage the allocated data pages of a Heap in a specific partition. Don\u2019t worry about the mystic hex code; it can easily be deciphered with the function <a href=\"https:\/\/www.sqlskills.com\/blogs\/paul\/sql-server-2008-new-undocumented-physical-row-locator-function\/\">sys.fn_PhysLocFormatter<\/a>!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"843\" height=\"66\" class=\"wp-image-87235\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/05\/word-image-22.png\" \/><\/p>\n<p><strong>Figure 3: Information about page allocations and first IAM page<\/strong><\/p>\n<p>The above query returns information about the storage type, number of pages and the location of the first IAM-page which manages the data pages of the Heap. Microsoft SQL Server only needs the first IAM page because it holds a reference to the next IAM and so on.<\/p>\n<p>To have a look into the secrets of an IAM Page, you can use <code>DBCC PAGE<\/code> but be careful about using undocumented functions in a production system.<\/p>\n<pre class=\"lang:tsql decode:true\">-- route the output of DBCC PAGE to the client\r\nDBCC TRACEON (3604);\r\n-- Show the content of a data page\r\nDBCC PAGE (0, 1, 188, 3);<\/pre>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"843\" height=\"432\" class=\"wp-image-87236\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/05\/word-image-23.png\" \/><\/p>\n<p><strong>Figure 4: A deeper look into the content of an IAM-page<\/strong><\/p>\n<p>The above depiction shows the stored information in the IAM page of our Heap. It says that pages 32,680 \u2013 32,967 are allocated by the table [dbo].[Customers]. So now Microsoft SQL Server knows what data pages hold data when running a SELECT statement against the Heap.<\/p>\n<h2>Page Free Space (PFS)<\/h2>\n<p>The fill level of a data page can only be specified for data pages of a Heap. Unlike a clustered index, the data rows are not sorted and do not have to be entered in a sorted form. It is up to Microsoft SQL Server to decide on which data page a data record is saved.<\/p>\n<p>However, to assess where a record can be saved, Microsoft SQL Server needs to know where there is enough space on allocated data pages to complete the transaction. This information is retrieved via the recorded fill level of the data page. The information is held in the PFS page.<\/p>\n<p>The problem is that this degree of filling is not saved \u201cexactly\u201d. Rather, Microsoft SQL Server only uses \u201crough\u201d percentages to indicate the degree of filling.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"989\" height=\"356\" class=\"wp-image-87237\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/05\/word-image-24.png\" \/><\/p>\n<p>The next higher fill level is updated as soon as the state is exceeded. For example, a data page is ALWAYS 50% full as soon as the first data record is entered.<\/p>\n<p>The current fill level of data pages of a heap can be determined with the &#8211; undocumented &#8211; system function <code>[sys]. [dm_db_database_page_allocations]<\/code>.<\/p>\n<p>The next example shows how the fill level changes when the state (bytes) is exceeded. To do this, a Heap is created that stores 2,004 bytes per data record.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">DROP TABLE IF EXISTS dbo.demo_table;\r\nGO\r\nCREATE TABLE dbo.demo_table\r\n(\r\n\tC1\tINT\t\tNOT NULL\tIDENTITY (1, 1),\r\n\tC2\tCHAR(2000)\tNOT NULL\tDEFAULT ('Test')\r\n);\r\nGO\r\nINSERT INTO dbo.demo_table DEFAULT VALUES;\r\nGO<\/pre>\n<p>After the table has been created, insert one record into the table. Although the data page is filled with only 25%, Microsoft SQL Server records the filling status of the page with 50%.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">-- What pages have been allocated?\r\nSELECT\tallocated_page_page_id,\r\n\tprevious_page_page_id,\r\n\tnext_page_page_id,\r\n\tpage_type_desc,\r\n\tpage_free_space_percent\r\nFROM\tsys.dm_db_database_page_allocations\r\n(\r\n\tDB_ID(),\r\n\tOBJECT_ID(N'dbo.demo_table', N'U'),\r\n\t0,\r\n\tNULL,\r\n\tN'DETAILED'\r\n);\r\nGO<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1100\" height=\"95\" class=\"wp-image-87238\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/05\/word-image-25.png\" \/><\/p>\n<p><strong>Figure 5: The page_free_space_percent<\/strong><\/p>\n<p>When you insert a second row the fill level won\u2019t change since 2 * 2011 bytes = 4,022 bytes do not reach 50%. The fill level only needs to be updated for the third data record!<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">-- Insert a second row into the demo table\r\nINSERT INTO dbo.demo_table DEFAULT VALUES;\r\nGO<\/pre>\n<div class=\"note\">\n<p><em>NOTE: Each data record has a record header that describes the structure of the data record. The structure is stored in 7 bytes. The physical length of a sample data record is therefore not 2,004 bytes but 2,011 bytes. To learn more about the anatomy of a record, read this <a href=\"https:\/\/www.sqlskills.com\/blogs\/paul\/inside-the-storage-engine-anatomy-of-a-record\/\">article<\/a> by Paul Randal.<\/em><\/p>\n<\/div>\n<h2>Conclusion<\/h2>\n<p>This article described the internals of Heaps, but you may still be wondering if there are any advantages to using them. Here is a summary of the advantages and disadvantages which I\u2019ll write about in future articles:<\/p>\n<h3>Advantages of Heaps<\/h3>\n<p>Using a heap can be more efficient than a table with a clustered index. In general, there are some use cases for Heaps like loading staging tables or storing protocol data into a Heap, since there is no need to pay attention to sorting when saving data. Data records are saved on the next possible data page on which there is sufficient space. Furthermore, the INSERT process does not require moving down the B-Tree of an index structure to the data page to save the record!<\/p>\n<h3>Disadvantages of Heaps<\/h3>\n<p>Heaps can have several disadvantages:<\/p>\n<p>A Heap cannot scale if the database design is unsuitable because of PFS contention (will be handled in the next articles in detail!)<\/p>\n<p>You cannot efficiently search for data in a Heap.<\/p>\n<p>The time to search for data in a Heap increases linearly with the volume of data.<\/p>\n<p>A Heap is unsuitable for frequent data updates because of the risk of forwarded records (will be handled in the next articles in detail)<\/p>\n<p>A Heap is horrible for every database administrator when it comes to maintenance because a Heap requires an update of nonclustered indexes when the Heap is rebuilt.<\/p>\n<p>Some of the \u201cdisadvantages\u201d mentioned above can be eliminated or bypassed if you know how a heap \u201cticks\u201d internally. I hope I can convince one or the other that a clustered index is <strong>not always<\/strong> the better choice. How to optimize the handling of Heaps will be described in future articles, so stay tuned!<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most advice you see online about heaps is to avoid them. In this article, Uwe Ricken describes the basics of heaps so that you can determine when heaps are the best choice. &hellip;<\/p>\n","protected":false},"author":205038,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143529,143531],"tags":[101935,5842,101934],"coauthors":[19794],"class_list":["post-87231","post","type-post","status-publish","format-standard","hentry","category-performance-sql-server","category-t-sql-programming-sql-server","tag-heaps","tag-sql-monitor","tag-why-use-heaps"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/87231","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/205038"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=87231"}],"version-history":[{"count":9,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/87231\/revisions"}],"predecessor-version":[{"id":87247,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/87231\/revisions\/87247"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=87231"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=87231"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=87231"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=87231"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}