{"id":1063,"date":"2011-01-06T00:00:00","date_gmt":"2011-01-06T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/effective-clustered-indexes\/"},"modified":"2021-08-16T15:02:10","modified_gmt":"2021-08-16T15:02:10","slug":"effective-clustered-indexes","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/learn\/effective-clustered-indexes\/","title":{"rendered":"Effective Clustered Indexes"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">Clustered indexes are the cornerstone of good database design. A poorly-chosen clustered index doesn&#8217;t just lead to high execution times; it has a &#8216;waterfall effect&#8217; on the entire system, causing wasted disk space, poor IO, heavy fragmentation, and more.<\/p>\n<p>This article will present all the attributes that I believe make up an efficient clustered index key, which are:<\/p>\n<ul>\n<li><b>Narrow<\/b> &#8211; as narrow as possible, in terms of the number of bytes it stores<\/li>\n<li><b>Unique<\/b> &#8211; to avoid the need for SQL Server to add a &#8220;uniqueifier&#8221; to duplicate key values<\/li>\n<li><b>Static<\/b> &#8211; ideally, never updated<\/li>\n<li><b>Ever-increasing<\/b> &#8211; to avoid fragmentation and improve write performance<\/li>\n<\/ul>\n<p>By explaining how SQL Server stores clustered indexes and how they work, I will demonstrate why these attributes are so essential in the design of a good, high-performance clustered index.<\/p>\n<h1>How clustered indexes work<\/h1>\n<p>In order to understand the design principles that underpin a good clustered index, we need to discuss how SQL Server stores clustered indexes. All table data is stored in 8 KB data pages. When a table contains a clustered index, the clustered index tells SQL Server how to order the table&#8217;s data pages. It does this by organizing those data pages into a B-tree structure, as illustrated in Figure 1.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered1.jpg\" alt=\"1213-Clustered1.jpg\" \/><\/p>\n<p class=\"caption\">Figure 1: The b-tree structure of a clustered index<\/p>\n<p>It can be helpful, when trying to remember which levels hold which information, to compare the B-tree to an actual tree. You can visualize the root node as the trunk of a tree, the intermediate levels as the branches of a tree, and the leaf level as the actual leaves on a tree.<\/p>\n<p>The leaf level of the B-tree is always level 0, and the root level is always the highest level. Figure 1 shows only one intermediate level but the number of intermediate levels actually depends on the size of the table. A large index will often have more than one intermediate level, and a small index might not have an intermediate level at all.<\/p>\n<p>Index pages in the root and intermediate levels contain the clustering key and a page pointer down into the next level of the B-tree. This pattern will repeat until the leaf node is reached. You&#8217;ll often hear the terms &#8220;leaf node&#8221; and &#8220;data page&#8221; used interchangeably, as the leaf node of a clustered index contains the data pages belonging to the table. In other words, the leaf level of a clustered index is where the actual data is stored, in an ordered fashion based on the clustering key.<\/p>\n<p>Let&#8217;s look at the B-tree again. Figure 2 represents the clustered index structure for a fictional table with 1 million records and a clustering key on <span class=\"STCodeinTextChar\">EmployeeID<\/span>.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered2.jpg\" alt=\"1213-Clustered2.jpg\" \/><\/p>\n<p class=\"caption\">Figure 2: A b-tree index for a 1-million row table<\/p>\n<p>The pages in Level 1 and Level 2, highlighted in green, are index pages. In Level 1, each page contains information for 500,000 records. As discussed, each of these pages stores not half a million rows, but rather half a million clustered index values, plus a pointer down into the associated page on the next level. For example, to retrieve the details for Employee 500, SQL Server would read three pages: the root page in Level 2, the intermediate page in Level 1, and the appropriate leaf level page in Level 0. The root page tells SQL Server which intermediate level page to read, and the intermediate page tells it which specific leaf level page to read.<\/p>\n<div class=\"note\">\n<p class=\"note\"><b>Index seeks and Index scans<\/b><br \/>\n When specific data is returned from data page, in this fashion, it is referred to as an <span class=\"STBold\">index seek<\/span>. The alternative is an <span class=\"STBold\">index scan<\/span>, whereby SQL Server scans all of the leaf level pages in order to locate the required data. As you can imagine, index seeks are almost always much more efficient than index scans. For more information on this topic, please refer to the <i>Further Reading<\/i> section at the end of this article.<\/p>\n<\/div>\n<p>In this manner, SQL Server uses a clustered index structure to retrieve the data requested by a query. For example, consider the following query against the <span class=\"STCodeinTextChar\">Sales.SalesOrderHeader<\/span> table in <span class=\"STCodeinTextChar\">AdventureWorks<\/span>, to return details of a specific order.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT\u00a0 CustomerID ,  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OrderDate ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SalesOrderNumber\r\nFROM\u00a0\u00a0\u00a0 Sales.SalesOrderHeader\r\nWHERE\u00a0\u00a0 SalesOrderID = 44242 ;\r\n<\/pre>\n<p>This table has a clustered index on the <span class=\"STCodeinTextChar\">SalesOrderID<\/span> column and SQL Server is able to use it to navigate down through the clustered index B-tree to get the information that is requested. If we were to visualize this operation, it would look something like this:<\/p>\n<table class=\"MsoTableGrid\">\n<tbody>\n<tr>\n<td>\n<p class=\"MsoNormal\"><span class=\"STBold\">Root Node<\/span><\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>SalesOrderID<\/b><\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>PageID<\/b><\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">NULL<\/p>\n<\/td>\n<td>\n<p class=\"style15\">750<\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">59392<\/p>\n<\/td>\n<td>\n<p class=\"style15\">751<\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"MsoNormal\"><span class=\"STBold\">Intermediate level<\/span><\/p>\n<p class=\"MsoNormal\">(Page 750)<\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>SalesOrderID<\/b><\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>PageID<\/b><\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">44150<\/p>\n<\/td>\n<td>\n<p class=\"style15\">814<\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">44197<\/p>\n<\/td>\n<td>\n<p class=\"style15\">815<\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">44244<\/p>\n<\/td>\n<td>\n<p class=\"style15\">816<\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">44290<\/p>\n<\/td>\n<td>\n<p class=\"style15\">817<\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">44333<\/p>\n<\/td>\n<td>\n<p class=\"style15\">818<\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"MsoNormal\"><span class=\"STBold\">Leaf level<\/span><\/p>\n<p class=\"MsoNormal\">(Page 815)<\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>SalesOrderID<\/b><\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>OrderDate<\/b><\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>SalesOrderNumber<\/b><\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>AccountNumber<\/b><\/p>\n<\/td>\n<td>\n<p class=\"style15\"><b>CustomerID<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">44240<\/p>\n<\/td>\n<td>\n<p class=\"style15\">9\/23\/2005<\/p>\n<\/td>\n<td>\n<p class=\"style15\">SO44240<\/p>\n<\/td>\n<td>\n<p class=\"style15\">10-4030-013580<\/p>\n<\/td>\n<td>\n<p class=\"style15\">13580<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">44241<\/p>\n<\/td>\n<td>\n<p class=\"style15\">9\/23\/2005<\/p>\n<\/td>\n<td>\n<p class=\"style15\">SO44241<\/p>\n<\/td>\n<td>\n<p class=\"style15\">10-4030-028155<\/p>\n<\/td>\n<td>\n<p class=\"style15\">28155<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p class=\"style15\">44242<\/p>\n<\/td>\n<td>\n<p class=\"style15\">9\/23\/2005<\/p>\n<\/td>\n<td>\n<p class=\"style15\">SO44242<\/p>\n<\/td>\n<td>\n<p class=\"style15\">10-4030-028163<\/p>\n<\/td>\n<td>\n<p class=\"style15\">28163<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In the root node, the first entry points to <span class=\"STCodeinTextChar\">PageID<\/span> 750, for any values with a <span class=\"STCodeinTextChar\">SalesOrderID<\/span> between <span class=\"STCodeinTextChar\">NULL<\/span> and 59391. The data we&#8217;re looking for, with a <span class=\"STCodeinTextChar\">SalesOrderID<\/span> of 44242, falls within that range, so we navigate down to page 750, in the intermediate level. Page 750 contains more granular data than the root node and indicates that the <span class=\"STCodeinTextChar\">PageID<\/span> 815 contains <span class=\"STCodeinTextChar\">SalesOrderID<\/span> values between 44197 and 44243. We navigate down to that page in the leaf level and, finally, upon loading <span class=\"STCodeinTextChar\">PageID<\/span> 815, we find all of our data for <span class=\"STCodeinTextChar\">SalesOrderID<\/span> 44242.<\/p>\n<h1>Characteristics of an effective clustered index<\/h1>\n<p>Based on this understanding of how a clustered index works, let&#8217;s now examine why and how this dictates the components of an <i>effective<\/i> clustered index key: narrow, unique, static, and ever-increasing.<\/p>\n<h2>Narrow<\/h2>\n<p>The width of an index refers to the number of bytes in the index key. The first important characteristic of the clustered index key is that it is as narrow as is practical. To illustrate why this is important, consider the following <span class=\"STCodeinTextChar\">narrow_example<\/span> table:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE TABLE dbo.narrow_example\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 web_id \u00a0\u00a0\u00a0\u00a0\u00a0INT IDENTITY(1,1), -- unique  \u00a0\u00a0\u00a0\u00a0\u00a0 web_key \u00a0\u00a0\u00a0\u00a0UNIQUEIDENTIFIER , -- unique\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 log_date \u00a0\u00a0\u00a0DATETIME , -- not unique\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 customer_id INT -- not unique\r\n\u00a0\u00a0\u00a0 ) ;<\/pre>\n<p>The table has been populated with 10 million rows and table contains two columns that are candidates for use as the clustering key:<\/p>\n<ul>\n<li><span class=\"STCodeinTextChar\">web_id<\/span> &#8211; a fixed-length <span class=\"STCodeinTextBold\">int<\/span> data type, consuming 4 bytes of space<\/li>\n<li><span class=\"STCodeinTextChar\">web_key<\/span> &#8211; a fixed-length <span class=\"STCodeinTextBold\">uniqueidentifier<\/span> data type, consuming 16 bytes.<\/li>\n<\/ul>\n<div class=\"note\">\n<p class=\"note\"><b>TIP:<\/b> <br \/>\n Use the <span class=\"STCodeinTextChar\">DATALENGTH<\/span> function to find how many bytes are being used to store the data in a column.<\/p>\n<\/div>\n<p>So, which column will make a better clustered index key? Let&#8217;s take a look at the B-tree structure of each, shown in Figure 3.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered3.jpg\" alt=\"1213-Clustered3.jpg\" \/><\/p>\n<p class=\"caption\">Figure 3: The b-tree levels for clustered indexes based on <span class=\"STCodeinTextChar\">int<\/span> and <span class=\"STCodeinTextChar\">uniqueidenitifier<\/span> key<\/p>\n<p>The most obvious difference is that the <span class=\"STCodeinTextChar\">uniqueidentifier<\/span> key has an additional non-leaf level, giving 4 levels to its tree, as opposed to only 3 levels for the <span class=\"STCodeinTextChar\">int<\/span> key. The simple reason for this is that the <span class=\"STCodeinTextChar\">uniqueidentifier<\/span> consumes 300% more space than the <span class=\"STCodeinTextChar\">int<\/span> data type, and so when we create a clustered key on <span class=\"STCodeinTextChar\">uniqueidentifier<\/span>, fewer rows can be packed into each index page, and the clustered key requires an additional non-leaf level to store the keys.<\/p>\n<p>Conversely, using a narrow <span class=\"STCodeinTextChar\">int<\/span> column for the key allows SQL Server to store more data per page, meaning that it has to traverse fewer levels to retrieve a data page, which minimizes the IO required to read the data. The potential benefit of this is large, especially for range scan queries, where more than one row is required to fulfill the query criteria. In general, the more data you can fit onto a page, the better your table can perform. This is why appropriate choice of data types is such an essential component of good database design.<\/p>\n<p>However, our choice of clustering key can affect the performance of not only the clustered index, but also any non-clustered indexes that rely on the clustered index. As shown in Figure 4, a non-clustered index contains the clustered index key in every level of its b-tree structure, as a pointer back into the clustered index. This happens regardless of whether or not the clustering key was explicitly included in the nonclustered index structure, either as part of the index key or as an included column. In other words, whereas in the clustered index the leaf level contains the actual data rows, in a nonclustered index, the leaf level contains the clustered key, which SQL Server uses to find the rest of the data.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered4.jpg\" alt=\"1213-Clustered4.jpg\" \/><\/p>\n<p class=\"caption\">Figure 4: Non-clustered indexes also store the clustering key in order to look up data in the clustered index<\/p>\n<p>So, let&#8217;s see how our choice of clustering key impacts the potential performance of our non-clustered indexes. We&#8217;ll keep the example pretty simple and create a non-clustered index on <span class=\"STCodeinTextChar\">customer_id<\/span>, which is an <span class=\"STCodeinTextChar\">int<\/span> data type.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE NONCLUSTERED INDEX IX_example_customerID\r\nON dbo.narrow_example (customer_id) ;<\/pre>\n<p>Figure 5 shows the resulting B-tree structures of our nonclustered index, depending on whether we used the <span class=\"STCodeinTextChar\">uniqueidentifier<\/span> or the <span class=\"STCodeinTextChar\">int<\/span> column for our clustered index key.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered5.jpg\" alt=\"1213-Clustered5.jpg\" \/><\/p>\n<p class=\"caption\">Figure 5<\/p>\n<p>While we have the same number of levels in each version of the index, notice that the non-clustered index based on the <span class=\"STCodeinTextChar\">int<\/span> clustering key stores 86% more data in each leaf-level data page than its <span class=\"STCodeinTextChar\">uniqueidentifier<\/span> counterpart. Once again, the more rows you can fit on a page, the better the overall system performance: range-scan queries on the narrow <span class=\"STCodeinTextChar\">int<\/span> version will consume less IO and execute faster than equivalent queries on the wider, <span class=\"STCodeinTextChar\">uniqueidentifier<\/span> version.<\/p>\n<p>In this example, I&#8217;ve kept the table and index structures simple in order to better illustrate the basic points. In a production environment, you&#8217;ll often encounter tables that are much, much wider. It&#8217;s possible that such tables will require a composite clustered index, where the clustering key is comprised of more than one column. That&#8217;s okay; the point isn&#8217;t to advise you to base all of your clustered keys on integer <span class=\"STCodeinTextChar\">IDENTITY<\/span> columns, but to demonstrate that a wide index key can have on a significant, detrimental impact on a database&#8217;s performance, compared to a narrow index key. Remember, narrowness refers more to the number of bytes consumed than the number of columns. For example, a composite clustered key on three <span class=\"STCodeinTextChar\">int<\/span> columns would still be narrower than a <span class=\"STCodeinTextChar\">uniqueidentifier<\/span> key (4 + 4 + 4 = 12 bytes for the former vs. 16 bytes for the latter).<\/p>\n<h2>Unique<\/h2>\n<p>Index uniqueness is another highly desirable attribute of a clustering key, and goes hand-in-hand with index narrowness. SQL Server does not require a clustered index to be unique, but yet it must have some means of uniquely identifying every row. That&#8217;s why, for non-unique clustered indexes, SQL Server adds to every duplicate instance of a clustering key value a 4-byte integer value called a <span class=\"STCodeinTextChar\">uniqueifier<\/span>. This <span class=\"STCodeinTextChar\">uniqueifier<\/span> is added everywhere the clustering key is stored. That means the <span class=\"STCodeinTextChar\">uniqueifier<\/span> is stored in every level of the B-tree, in both clustered and non-clustered indexes. As you can imagine, if there are many rows using the same clustering key value, this can become quite expensive.<\/p>\n<p>What&#8217;s more, the <span class=\"STCodeinTextChar\">uniqueifier<\/span> is stored as a variable-length column. This is important because if a table does not already contain any other variable-length columns, each duplicate value is actually consuming 8-bytes of overhead: 4 bytes for the <span class=\"STCodeinTextChar\">uniqueifier<\/span> value and 4 bytes to manage variable-length columns on the row. The following example demonstrates this. We create a table with a non-unique clustered index, insert into it a single row, and then retrieve minimum and maximum record sizes (which currently refer to the same, single record) from the <span class=\"STCodeinTextChar\">sys.dm_db_index_physical_stats<\/span> DMV:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE TABLE dbo.overhead ( myID INT NOT NULL ) ;\r\n\r\nCREATE CLUSTERED INDEX CIX_overhead -- not unique!\r\nON dbo.overhead(myID) ;\r\n\r\nINSERT\u00a0 INTO dbo.overhead\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( myID )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 1 ;\r\n SELECT\u00a0 min_record_size_in_bytes ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 max_record_size_in_bytes  FROM\u00a0\u00a0\u00a0 sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.overhead'),\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL, NULL, N'SAMPLED') ;\r\n\r\nmin_record_size_in_bytes\u00a0\u00a0 max_record_size_in_bytes\r\n------------------------ ------------------------\r\n11\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 11\r\n\r\n(1 row(s) affected)\r\n<\/pre>\n<p class=\"MsoNormal\">Although we only have a single column in the table, there is a minimum of 7 bytes of overhead per row, in SQL Server. While this overhead may increase with the addition of <span class=\"STCodeinTextChar\">NULL<\/span> or variable-length columns, it will never be less than 7 bytes per row. The other 4 bytes are used to store the int column, myID.<\/p>\n<p class=\"MsoNormal\">Now let&#8217;s insert a duplicate value into the table:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">INSERT\u00a0 INTO dbo.overhead\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( myID )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 1 ;\r\n\r\nSELECT\u00a0 min_record_size_in_bytes ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 max_record_size_in_bytes\r\nFROM\u00a0\u00a0\u00a0 sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.overhead'),\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL, NULL, N'SAMPLED') ;\r\n\r\nmin_record_size_in_bytes\u00a0\u00a0 max_record_size_in_bytes\r\n------------------------ ------------------------\r\n11\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 19\r\n\r\n(1 row(s) affected)\r\n<\/pre>\n<p>The duplicate value requires the addition of a <span class=\"STCodeinTextChar\">uniqueifier<\/span>, which consumes an extra 4 bytes. However, since a variable-length column, such as a <span class=\"STCodeinTextChar\">varchar()<\/span> column, does not already exist on the table, an additional 4 bytes are added by SQL Server to manage the variable-length properties of the <span class=\"STCodeinTextChar\">uniqueifier<\/span>. This brings the total <span class=\"STCodeinTextChar\">uniqueifier<\/span> overhead to 8 bytes per row.<\/p>\n<div class=\"note\">\n<p class=\"note\"><b>TIP:<\/b><br \/>\n The <span class=\"STCodeinTextChar\">sys.dm_db_index_physical_stats<\/span> DMV runs in three modes: <span class=\"STCodeinTextChar\">LIMITED<\/span>, <span class=\"STCodeinTextChar\">SAMPLED<\/span>, or <span class=\"STCodeinTextChar\">DETAILED<\/span>. The <span class=\"STCodeinTextChar\">min_record_size_in_bytes <\/span>and <span class=\"STCodeinTextChar\">max_record_size_in_bytes<\/span> columns are only available in <span class=\"STCodeinTextChar\">SAMPLED<\/span> or <span class=\"STCodeinTextChar\">DETAILED<\/span> mode. Be careful when running this DMV in production or on large tables, as the <span class=\"STCodeinTextChar\">SAMPLED<\/span> mode scans 1% of pages and <span class=\"STCodeinTextChar\">DETAILED<\/span> modes scans all pages. Refer to Books Online for more information.<\/p>\n<\/div>\n<p>So, returning to our original <span class=\"STCodeinTextChar\">narrow_example<\/span> table, let&#8217;s see what would happen if the clustering key was changed to <span class=\"STCodeinTextChar\">customer_id<\/span>, which is a non-unique <span class=\"STCodeinTextChar\">int<\/span>. Although the <span class=\"STCodeinTextChar\">uniqueifier<\/span> is not readily visible and cannot be queried, internally the leaf-level page might look something like this:<\/p>\n<table class=\"MsoTableGrid\">\n<tbody>\n<tr>\n<td>\n<p class=\"style17\">web_id<\/p>\n<\/td>\n<td>\n<p class=\"style17\">web_key<\/p>\n<\/td>\n<td>\n<p class=\"style17\">log_date<\/p>\n<\/td>\n<td>\n<p class=\"style17\">customer_id<\/p>\n<\/td>\n<td>\n<p class=\"style17\">uniqueifier<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">1<\/p>\n<\/td>\n<td>\n<p class=\"style15\">6870447C-A0EC-4B23-AE5F-9A92A00CE166<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">1<\/p>\n<\/td>\n<td>\n<p class=\"style15\">NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">2<\/p>\n<\/td>\n<td>\n<p class=\"style15\">5AB480CF-40CD-43FD-8C3D-5C625875E143<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">1<\/p>\n<\/td>\n<td>\n<p class=\"style15\">1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">3<\/p>\n<\/td>\n<td>\n<p class=\"style15\">95C312B9-83AF-4725-B53C-77615342D177<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">1<\/p>\n<\/td>\n<td>\n<p class=\"style15\">2<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">4<\/p>\n<\/td>\n<td>\n<p class=\"style15\">88AA4497-9A20-4AB7-9704-1FDFAE200564<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">2<\/p>\n<\/td>\n<td>\n<p class=\"style15\">NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">5<\/p>\n<\/td>\n<td>\n<p class=\"style15\">E3EA3014-FC23-48B6-9205-EE6D06D37C5B<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">2<\/p>\n<\/td>\n<td>\n<p class=\"style15\">1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">6<\/p>\n<\/td>\n<td>\n<p class=\"style15\">9F6A8933-F6EC-416F-AACA-1C3FF172151C<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">3<\/p>\n<\/td>\n<td>\n<p class=\"style15\">NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">7<\/p>\n<\/td>\n<td>\n<p class=\"style15\">B16406A8-649B-4E7A-A234-C7B7D8FCE2D3<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">4<\/p>\n<\/td>\n<td>\n<p class=\"style15\">NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">8<\/p>\n<\/td>\n<td>\n<p class=\"style15\">443B627B-21CE-4466-AD15-1879C8749225<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">4<\/p>\n<\/td>\n<td>\n<p class=\"style15\">1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">9<\/p>\n<\/td>\n<td>\n<p class=\"style15\">2F3757DE-3799-4246-BA88-944C5DA3683E<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">4<\/p>\n<\/td>\n<td>\n<p class=\"style15\">2<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"style15\">10<\/p>\n<\/td>\n<td>\n<p class=\"style15\">25D9F2AA-6610-48CD-9AC4-4F1E29FDED1C<\/p>\n<\/td>\n<td>\n<p class=\"style15\">12\/15\/2010<\/p>\n<\/td>\n<td>\n<p class=\"style15\">4<\/p>\n<\/td>\n<td>\n<p class=\"style15\">3<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p class=\"MsoNormal\">The <span class=\"STCodeinTextChar\">uniqueifier<\/span> is <span class=\"STCodeinTextChar\">NULL<\/span> for the first instance of each <span class=\"STCodeinTextChar\">customer_id<\/span>, and is then populated, in ascending order, for each subsequent row with the same <span class=\"STCodeinTextChar\">customer_id<\/span> value. The overhead for rows with a <span class=\"STCodeinTextChar\">NULL<\/span> <span class=\"STCodeinTextChar\">uniqueifier<\/span> value is, unsurprisingly, zero bytes. This is why min_record_size_in_bytes remained unchanged in the overhead table; the first insert had a <span class=\"STCodeinTextChar\">uniqueifier<\/span> value of <span class=\"STCodeinTextChar\">NULL<\/span>. This is also why it is impossible to estimate how much additional storage overhead will result from the addition of a <span class=\"STCodeinTextChar\">uniqueifier<\/span>, without first having a thorough understanding of the data being stored. For example, a non-unique clustered index on a <span class=\"STCodeinTextChar\">datetime<\/span> column may have very little overhead if data is inserted, say, once per minute. However, if that same table is receiving thousands of inserts per minute, then it is likely that many rows will share the same <span class=\"STCodeinTextChar\">datetime<\/span> value, and so the <span class=\"STCodeinTextChar\">uniqueifier<\/span> will have a much higher overhead.<\/p>\n<p class=\"MsoNormal\">If your requirements seem to dictate the use of a non-unique clustered key, my advice would be to look to see if there are a couple of relatively narrow columns that, together, can form a unique key. You&#8217;ll still see the increase in the row size for your clustering key in the index pages of both your clustered and nonclustered indexes, but you&#8217;ll at least save the cost of the <span class=\"STCodeinTextChar\">uniqueifier<\/span> in the data pages of the leaf level of your clustered index. Also, instead of storing an arbitrary <span class=\"STCodeinTextChar\">uniqueifier<\/span> value to the index key, which is meaningless in the context of your data, you would be adding meaningful and potentially useful information to all of your nonclustered indexes.<\/p>\n<p>A good clustered index is also built upon static, or unchanging, columns. That is, you want to choose a clustering key that will never be updated. SQL Server must ensure that data exists in a logical order based upon the clustering key. Therefore, when the clustering key value is updated, the data may need to be moved elsewhere in the clustered index so that the clustering order is maintained. Consider a table with a clustered index on <span class=\"STCodeinTextChar\">LastName<\/span>, and two non-clustered indexes, where the last name of an employee must be updated.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered6.jpg\" alt=\"1213-Clustered6.jpg\" \/><\/p>\n<p class=\"caption\">Figure 6: The effect of updating a clustered key column<\/p>\n<p>Not only is the clustered index updated and the actual data row moved &#8211; most likely to a new data page &#8211; but each non-clustered index is also updated. In this particular example, at least three pages will be updated. I say &#8220;at least&#8221; because there are many more variables involved, such as whether or not the data needs to be moved to a new page. Also, as discussed earlier, the upper levels of the B-tree contain the clustering key as pointers down into the leaf level. If one of those index pages happens to contain the clustering key value that is being updated, that page will also need to be updated. For now, though, let&#8217;s assume only three pages are affected by the <span class=\"STCodeinTextChar\">UPDATE<\/span> statement, and compare this to behavior we see for the same <span class=\"STCodeinTextChar\">UPDATE<\/span>, but with a clustering key on <span class=\"STCodeinTextChar\">ID<\/span> instead of <span class=\"STCodeinTextChar\">LastName<\/span>.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered7.jpg\" alt=\"1213-Clustered7.jpg\" \/><\/p>\n<p class=\"caption\">Figure 7: An <span class=\"STCodeinTextChar\">UPDATE<\/span> that does not affect the clustered index key<\/p>\n<p>In Figure 7, only the data page in the clustered index is changed because the clustering key is not affected by the <span class=\"STCodeinTextChar\">UPDATE<\/span> statement. Since only one page is updated instead of three, clustering on <span class=\"STCodeinTextChar\">ID<\/span> requires less IO than clustering on <span class=\"STCodeinTextChar\">LastName<\/span>. Also, updating fewer pages means the <span class=\"STCodeinTextChar\">UPDATE<\/span> can complete in less time.<\/p>\n<p>Of course, this is another simplification of the process. There are other considerations that can affect how many pages are updated, such as whether an update to a variable-length column causes the row to exceed the amount of available space. In such a case, the data would still need to be moved, although only the data page of the clustered index is affected; nonclustered indexes would remain untouched.<\/p>\n<p>Nevertheless, updating the clustering key is clearly more expensive than updating a non-key column. Furthermore, the cost of updating a clustering key increases as the number of non-clustered indexes increases. Therefore, it is a best practice is to avoid clustering on columns that are frequently updated, especially in systems where <span class=\"STCodeinTextChar\">UPDATE<\/span> performance is critical.<\/p>\n<h1>Ever-Increasing<\/h1>\n<p>The last important attribute of a clustered index key is that it is ever-increasing. In addition to narrow, unique, and static, an integer <span class=\"STCodeinTextChar\">identity<\/span> column is an excellent example of an ever-increasing column. The <span class=\"STCodeinTextChar\">identity<\/span> property continuously increments by the value defined at creation, which is typically one. This allows SQL Server, as new rows are inserted, to keep writing to the same page until the page is full, then repeating with a newly allocated page.<\/p>\n<p>There are two primary benefits to an ever-increasing column:<\/p>\n<ol>\n<li><b>Speed of insert<\/b> &#8211; SQL Server can much more efficiently write data if it knows the row will always be added to the most recently allocated, or last, page<\/li>\n<li><b>Reduction in clustered index fragmentation<\/b> &#8211; this fragmentation results from data modifications and can take the form of gaps in data pages, so wasting space, and a logical ordering of the data that no longer matches the physical ordering.<\/li>\n<\/ol>\n<p>However, before we can discuss the effect of the choice of clustering key on insert performance and index fragmentation, we need to briefly review the types of fragmentation that can occur.<\/p>\n<h2>Internal and external index fragmentation<\/h2>\n<p>There are two types of index fragmentation, which can occur in both clustered and non-clustered indexes: extent (<span class=\"STItalic\">a.k.a.<\/span> external) and page (<i>a.k.a.<\/i> internal) fragmentation. First, however, Figure 8 illustrates an un-fragmented index.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered8.jpg\" alt=\"1213-Clustered8.jpg\" \/><\/p>\n<p class=\"caption\">Figure 8: Data pages in an un-fragmented clustered index<\/p>\n<p>In this simplified example, a page is full if it contains 3 rows, and in Figure 8 you can see that every page is full and the physical ordering of the pages is sequential. In extent fragmentation, also known as <i>external fragmentation<\/i>, the pages get out of physical order, as a result of data modifications. The pages highlighted in orange in Figure 9 are the pages that are externally fragmented. This type of fragmentation can result in random IO, which does not perform as well as sequential IO.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered9.jpg\" alt=\"1213-Clustered9.jpg\" \/><\/p>\n<p class=\"caption\">Figure 9: External fragmentation in a clustered index<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered10.jpg\" alt=\"1213-Clustered10.jpg\" \/><\/p>\n<p class=\"caption\">Figure 10: Internal fragmentation in a clustered index<\/p>\n<p>Figure 10 illustrates page fragmentation, also known as <i>internal fragmentation<\/i>, and refers to the fact that the there are gaps in the data pages, which reduces the amount of data that can be stored on each page, and so increase the overall amount of space needed to store the data. Again, the pages in orange indicate an internally fragmented page.<\/p>\n<p>For example, comparing Figures 8 and 10, we can see that the un-fragmented index holds 15 data rows in 5 pages. By contrast, the index with internal fragmentation only holds 9 data rows in the same number of pages. This is not necessarily a big issue for singleton queries, where just a single record is needed to fulfill the request. However, when pages are not full and additional pages are required to store the data, range-scan queries will feel the effects, as more IO will be required to retrieve those additional pages.<\/p>\n<p>Most indexes suffering from fragmentation will often have both extent <i>and <\/i>page fragmentation.<\/p>\n<h2>How non-sequential keys can increase fragmentation<\/h2>\n<p>Clustering on ever-increasing columns such as <span class=\"STCodeinTextChar\">identity<\/span> integers will result in an un-fragmented index, as illustrated in Figure 8. This results in sequential IO and maximizes the amount of data stored per page, resulting in the most efficient use of system resources. It also results in very fast write performance.<\/p>\n<p>Use of a non-sequential key column can, however, result in a much higher overhead during insertion. First, SQL Server has to find the correct page to write to and pull it into memory. If the page is full, SQL Server will need to perform a <b>page split<\/b> to create more space. During a page split, a new page is allocated, and half the records are moved from the old page to the newly-allocated page. Each page has a pointer to the previous and next page in the index, so those pages will also need to be updated. Figure 11 illustrates the results of a page split.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1213-Clustered11.jpg\" alt=\"1213-Clustered11.jpg\" \/><\/p>\n<p class=\"caption\">Figure 11: Internal and external fragmentation as a result of a page split<\/p>\n<p>Initially, we have two un-fragmented pages, each holding 3 rows of data. However, a request to insert &#8220;coconut&#8221; into the table results in a page split, because Page 504, where the data naturally belongs, is full. SQL Server allocates a new page, Page 836, to store the new row. In the process, it also moves half the data from Page 504 to the new page in order to make room for new data in the future. Lastly, it updates the previous and next pointers in both pages 504 and 505. We&#8217;re left with Page 836 out of physical ordering, and both pages 504 and 836 contain free space. As you can see, not only would writes to this latter scenario be slower, but both internal and external fragmentation of the table would be much higher.<\/p>\n<p>I once saw a table with 4 billion rows clustered on a non-sequential uniqueidentifier, also known as a GUID. The table had a fragmentation level of 99.999%. Defragging the table and changing the clustering key to an <span class=\"STCodeinTextChar\">identity<\/span> integer resulted in a space savings of over 200 GB. Extreme, yes, but it illustrates just how much impact an ever-increasing clustering key can have on table.<\/p>\n<p>I am not suggesting that you only create clustered indexes on <span class=\"STCodeinTextChar\">identity<\/span> integer columns. Fragmentation, although generally undesirable, primarily impacts range-scan queries; singleton queries would not notice much impact. Even range-scan queries can benefit from routine defragmentation efforts. However, the ever-increasing attribute of a clustered key is something to consider, and is especially important in OLTP systems where <span class=\"STCodeinTextChar\">INSERT<\/span> speed is important.<\/p>\n<h1>Summary<\/h1>\n<p>In this article, I&#8217;ve discussed the most desirable attributes of a clustered index: narrow, unique, static, and ever-increasing. I&#8217;ve explained what each attribute is and why each is important. I&#8217;ve also presented the basics of B-tree structure for clustered and non-clustered indexes. The topic of &#8220;indexing strategy&#8221; is vast topic and we&#8217;ve only scratched the surface. Beyond what I presented in this article, there are also many application-specific considerations when choosing a clustering key, such as how data will be accessed and the ability to use the clustered index in range-scan queries. As such, I&#8217;d like to stress that the attributes discussed in this article are not concrete rules but rather time-proven guidelines. The best thing to do if you&#8217;re not sure if you&#8217;ve chosen the best clustering key is to test and compare the performance of different strategies.<\/p>\n<h2>Further Reading<\/h2>\n<ul class=\"reference-list\">\n<li><a href=\"http:\/\/www.simple-talk.com\/sql\/database-administration\/brads-sure-guide-to-indexes\/\">Brad&#8217;s Sure Guide to Indexes<\/a> Brad McGehee&#8217;s &#8220;ground level&#8221; overview of indexes and how they work<\/li>\n<li><a href=\"http:\/\/www.simple-talk.com\/sql\/database-administration\/defragmenting-indexes-in-sql-server-2005-and-2008\/\">Defragmenting Indexes in SQL Server 2005 and 2008<\/a> Rob Sheldon on investigating, and fixing, index fragmentation using <span class=\"STCodeinTextChar\">sys.dm_db_index_physical_stats.<\/span><\/li>\n<li><a href=\"http:\/\/www.sqlteam.com\/article\/sql-server-indexes-the-basics\">SQL Server Indexes: The Basics, by Kathi Kellenberger<\/a><\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>As a guideline, clustered Indexes should be Narrow, Unique, Static and Ever Increasing (NUSE).  Michelle Ufford Explains why.<\/p>\n<p>&hellip;<\/p>\n","protected":false},"author":221900,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143525],"tags":[5334,4149,5333,5966,4150],"coauthors":[11325],"class_list":["post-1063","post","type-post","status-publish","format-standard","hentry","category-learn","tag-clustered-indexes","tag-learn-sql-server","tag-michelle-ufford","tag-monitor","tag-sql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1063","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221900"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1063"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1063\/revisions"}],"predecessor-version":[{"id":76666,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1063\/revisions\/76666"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1063"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1063"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1063"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1063"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}