{"id":104708,"date":"2024-12-03T21:58:25","date_gmt":"2024-12-03T21:58:25","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=104708"},"modified":"2025-01-03T20:04:04","modified_gmt":"2025-01-03T20:04:04","slug":"mysql-index-deep-dive-clustered-b-tree-indexes","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/mysql-index-deep-dive-clustered-b-tree-indexes\/","title":{"rendered":"MySQL Index Overviews: Clustered B-Tree Indexes"},"content":{"rendered":"\n<p>If you\u2019re a frequent reader of the Simple-Talk blog, you\u2019re no stranger to indexes as <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/the-nuances-of-mysql-indexes\/\">we\u2019ve covered indexes in database management systems previously<\/a> \u2013 in that blog, we\u2019ve told you that indexes come in a variety of flavors including ordinary Balanced Tree (B-Tree) indexes and R-Tree, or spatial, indexes. B-Tree indexes have multiple types: they can be covering, composite, descending, <code>FULLTEXT<\/code>, <code>UNIQUE<\/code>, hash-based, or have something to do with the <code>PRIMARY KEY<\/code>. B-Tree indexes can also have a clustered form: and that form is what this blog is all about.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-are-clustered-indexes\">What are Clustered Indexes?<\/h2>\n\n\n\n<p>In the MySQL world, a clustered index is an index that stores the row data of a table in an ordered format. Some database programmers refer to clustered indexes as primary keys, and this is a very typical use of a clustered index in MySQL especially.<\/p>\n\n\n\n<p>A table can only have one clustered index and that\u2019s because data inside of a table can only be stored in one order.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-introduction-to-clustered-indexes\">Introduction to Clustered Indexes<\/h2>\n\n\n\n<p>Clustered indexes can be defined as data structures that order rows in data pages in the same way that they are ordered in the index. <em>Clustered indexes are special in that a table cannot have more than one clustered index defined on one column.<\/em> Many use cases of clustered indexes involve primary keys and that\u2019s not without a reason either \u2013 these indexes often have an <code>AUTO_INCREMENT<\/code> (or \u201cA_I\u201d for short) parameters that make <code>NULL<\/code> values inside of that column become automatically incrementing numeric IDs as values are inserted.<\/p>\n\n\n\n<p>With that being said, an <code>AUTO_INCREMENT<\/code> feature on top of a clustered index isn\u2019t a necessity, but many tables use a numeric ID field, the ability for the clustered index to increment values of rows automatically is an awesome feat. As noted, since clustered indexes store data in the order of the data, inserting sorted data is typically favorable, too.<\/p>\n\n\n\n<p>All primary keys are clustered indexes. If the primary key isn\u2019t defined and there are no unique indexes (if there\u2019s no PK, the first unique index is considered to be the primary key), MySQL would create (generate) a clustered index on an invisible column and name the index <code>GEN_CLUST_INDEX<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-understanding-clustered-indexes\">Understanding Clustered Indexes<\/h2>\n\n\n\n<p>To understand clustered indexes, think of the keys for these indexes as unique identifiers for each row in a table. Think of hardware IDs\u2013 <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows-hardware\/drivers\/install\/hardware-ids\">hardware IDs, or HWIDs<\/a>, act as fingerprints of your hardware so the OS can identify it as real. Hardware IDs are a set of letters and numbers unique to each set of hardware. Clustered indexes act in a similar way to hardware IDs, just inside the tables within your database.<\/p>\n\n\n\n<p>Clustered indexes have a couple of distinct characteristics unique to themselves: For example, they store data in a specific order that cannot be changed.<\/p>\n\n\n\n<p>Each table has a clustered index \u2013 in most cases, the clustered indexes serve as the index for the primary key when you create the table, but if not, MySQL will consider the first unique index on the table will be created as the clustered index.<\/p>\n\n\n\n<p>As every table has a clustered index, every index that is not part of a clustered index is referred to as a non-clustered index or a secondary index. Clustered indexes can be defined upon table creation or once a table is already created.<\/p>\n\n\n\n<p>You can\u2019t \u201cspecify\u201d a clustered index per se; your database will do that for you by either using the primary key, a unique constraint, or adds one for you. To create a table bearing a clustered index, run a query like so*:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE `demo_data` (\n`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,\n`username` VARCHAR(40) NOT NULL DEFAULT '',\n`email` VARCHAR(40) NOT NULL DEFAULT '',\n`details` VARCHAR(70) NOT NULL DEFAULT ''\n--,...\n);<\/pre>\n\n\n\n<p>*Note that in MySQL, MariaDB, and Percona Server, naming primary keys isn\u2019t supported. Titling a primary key will be the primary cause of the error #1280:<\/p>\n\n\n\n<p><code>Warning: #1280 Name 'x' ignored for PRIMARY key<\/code>.<\/p>\n\n\n\n<p>With that in mind, know that when altering a table, clustered indexes (primary keys) in MySQL can be defined like so (don\u2019t forget the <code>AUTO_INCREMENT<\/code> and <code>PRIMARY KEY<\/code> definitions) \u2013 some of you may also want to drop the <code>id <\/code>column if it\u2019s already created:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">ALTER TABLE `demo_data` DROP COLUMN `id`;\n\nALTER TABLE `demo_data`\n  ADD COLUMN `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY;<\/pre>\n\n\n\n<p>Once a primary key is in place, it will be displayed with a \u201c<code>PRI<\/code>\u201d value in the <code>Key<\/code> column when <code>DESCRIBE<\/code> queries are being run:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"761\" height=\"305\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/d-misc-redgate-blogs-july-2024-mysql-index-deep-d.png\" alt=\"D:\\Misc\\Redgate\\Blogs\\July 2024\\MySQL Index Deep Dive - Clustered Indexes\\Images\\DESCRIBE data_table.png\" class=\"wp-image-104710\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Image 1 \u2013 A Clustered Index \u2013 <code>PRIMARY KEY<\/code> \u2013 on the id column<\/strong><\/p>\n\n\n\n<p>If you were to drop the id column in this table, like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">ALTER TABLE `demo_data` \n        DROP COLUMN `id`;<\/pre>\n\n\n\n<p>You would not see the internal clustering identifier show up in the output of the <code>DESCRIBE<\/code> statement output. Note too that if you create a key without the <code>AUTO_INCREMENT<\/code> setting, like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">ALTER TABLE `demo_data`\n ADD COLUMN `id` INT NOT NULL UNIQUE INDEX;<\/pre>\n\n\n\n<p>Or<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">ALTER TABLE `demo_data`\n ADD COLUMN `id` INT NOT NULL;\n\nCREATE UNIQUE INDEX `id` ON `demo_data`(`id`);<\/pre>\n\n\n\n<p>You will see that the id column shows up as <code>PRI<\/code>. However, if you change the unique index to a not-unique index, it will show up as <code>MUL<\/code>, because the object will be clustered on the key values of the index, but since it is not defined as <code>UNIQUE<\/code>, it will have to add a uniqueifier to the structure.<\/p>\n\n\n\n<p>Note that you define any column that includes the <code>AUTO_INCREMENT<\/code> option \u2013 such an option would enable values in the primary key column to increment automatically. For example, if you drop the id column, then try to add it back in this manner:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">ALTER TABLE `demo_data`\n  ADD COLUMN `id` INT NOT NULL AUTO_INCREMENT;<\/pre>\n\n\n\n<p>MySQL will come back with an error regarding the table index definition, meaning that the <code>AUTO_INCREMENT<\/code> value must be appended with <code>PRIMARY KEY,<\/code> too:<\/p>\n\n\n\n<p><code>#1075 - Incorrect table definition; there can be only one auto column and it must be defined as a key<\/code><\/p>\n\n\n\n<p>Regardless, I\u2019ve illustrated how a clustered index is structured inside your database below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1197\" height=\"615\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/12\/image.png\" alt=\"\" class=\"wp-image-104719\"\/><\/figure>\n\n\n\n<p><strong>Figure 1 \u2013 Clustered Index vs. Non-Clustered Index<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2589\" height=\"563\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/12\/image-1.png\" alt=\"\" class=\"wp-image-104720\"\/><\/figure>\n\n\n\n<p><strong>Figure 2 \u2013 Clustered Index<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-clustered-index-examples\">Clustered Index Examples<\/h2>\n\n\n\n<p>Clustered indexes are still B-tree indexes with the distinction from non-clustered indexes being that the leaf nodes of the index are the entire row (or at least the entire row that fits on a page), and as such, are used according to the rules of B-tree indexes. I\u2019ll prove it by running a couple of queries.<\/p>\n\n\n\n<p>Our <code>data_table<\/code> has been loaded with 16,000 rows filled with mock data (you can add this data using the technique from the end of the article in the appendix):<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"543\" height=\"188\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104708-4.png\" alt=\"\" class=\"wp-image-104711\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Image 2 \u2013 <code>data_table<\/code> in MariaDB<\/strong><\/p>\n\n\n\n<p>Now, we will see which queries make use of our clustered index \u2013 our clustered index resides in the <code>id<\/code> column.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1159\" height=\"763\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/d-misc-redgate-blogs-july-2024-mysql-index-deep-d-1.png\" alt=\"D:\\Misc\\Redgate\\Blogs\\July 2024\\MySQL Index Deep Dive - Clustered Indexes\\Images\\EXPLAIN SELECT PRIMARY KEY.png\" class=\"wp-image-104712\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Image 3 \u2013 <code>SELECT<\/code> Queries and the <code>id<\/code> column<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1158\" height=\"560\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/d-misc-redgate-blogs-july-2024-mysql-index-deep-d-2.png\" alt=\"D:\\Misc\\Redgate\\Blogs\\July 2024\\MySQL Index Deep Dive - Clustered Indexes\\Images\\EXPLAIN LIKE vs. without LIKE.png\" class=\"wp-image-104713\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Image 4 \u2013 Wildcards and the id column<\/strong><\/p>\n\n\n\n<p>As you can see, primary keys can be used just like B-tree indexes can, but since they have a rather narrow use case, the instances where they are considered for use are also pretty narrow:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>They define the order that you may need the data in.<\/li>\n\n\n\n<li>They denote the key value that you will be fetching data with most of the time. That is the PK. (This is particularly useful when doing joins).<\/li>\n<\/ol>\n<\/div>\n\n\n<p>If our table doesn\u2019t have a defined primary key and is using a non-unique index instead (keep in mind that a non-unique index is also considered to be a B-tree index, things change \u2013 the unique index acts just like a b-tree index, just without enforcing uniqueness (and the table may be clustered on that index if it is the only index on the table).<\/p>\n\n\n\n<p>Before adding a unique index, we would need to remove duplicates from a table and we can do so by running a query like the one below (here column is the name of the column which we expect to have duplicate values):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">DELETE FROM `table_name` \nWHERE details IN ( SELECT `column` FROM `table_name` \n                   GROUP BY `column` \n                   HAVING COUNT(column) &gt; 1 );<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1343\" height=\"157\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/d-misc-redgate-blogs-july-2024-mysql-index-deep-d-3.png\" alt=\"D:\\Misc\\Redgate\\Blogs\\July 2024\\MySQL Index Deep Dive - Clustered Indexes\\Images\\Delete Duplicate Rows Query.png\" class=\"wp-image-104714\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Image 5 \u2013 Deducting Duplicate Values from a Table<\/strong><\/p>\n\n\n\n<p>This query would complete the following:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>Finds duplicate rows in the details column (changing the number after the > sign to X would remove rows that have X amount of occurrences)<\/li>\n\n\n\n<li>Deletes them.<\/li>\n<\/ol>\n<\/div>\n\n\n<p>This query is likely to take a while, but after it\u2019s done, we can define a unique index just like we\u2019d define an ordinary B-tree index, just with the <code>UNIQUE<\/code> clause in between:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">ALTER TABLE `table_name` ADD UNIQUE INDEX \n   uniq_idx(`col_name`);<\/pre>\n\n\n\n<p>After that, we can insert data from our older table automatically deducting duplicates because we already have a unique index in place \u2013 we use the IGNORE clause to avoid errors:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">INSERT IGNORE INTO data_table2 \nSELECT name,username,email,ip_address,details \nFROM data_table;<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1220\" height=\"86\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/d-misc-redgate-blogs-july-2024-mysql-index-deep-d-4.png\" alt=\"D:\\Misc\\Redgate\\Blogs\\July 2024\\MySQL Index Deep Dive - Clustered Indexes\\Images\\INSERT IGNORE INTO.png\" class=\"wp-image-104715\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Image 6 \u2013 INSERT IGNORE Query<\/strong><\/p>\n\n\n\n<p>Duplicates are skipped and our unique index is now our clustered index. That also means that if you have a unique index on a column and you re-insert the data into it, you will lose all but one of the rows.<\/p>\n\n\n\n<p>After we have a unique index and some data in place (we\u2019ve defined the clustered index in the details column), we can run some queries and see how our database interprets the unique index as well:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1265\" height=\"574\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/d-misc-redgate-blogs-july-2024-mysql-index-deep-d-5.png\" alt=\"D:\\Misc\\Redgate\\Blogs\\July 2024\\MySQL Index Deep Dive - Clustered Indexes\\Images\\EXPLAIN SELECT Unique Index.png\" class=\"wp-image-104716\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>Image 7 \u2013 Our Database Interpreting the Unique Index<\/strong><\/p>\n\n\n\n<p>As you can see, things don\u2019t change much from an ordinary B-tree index being in place \u2013 the same things apply here too:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>The first query is using the <code>details_idx<\/code> set on the details column.<\/li>\n\n\n\n<li>The second query is using the <code>details_idx<\/code> because it\u2019s the column after the <code>WHERE<\/code> clause and the <code>LIKE<\/code> clause doesn\u2019t make it useless (there\u2019s a wildcard at the end as opposed to the beginning.)<\/li>\n\n\n\n<li>The third query considers the details_idx index but uses none because the OR negates the necessity for it (if we\u2019d have an index on the username column, it would be used instead.)<\/li>\n<\/ol>\n<\/div>\n\n\n<p>Clustered indexes are indeed an interesting beast, huh?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-summary\">Summary<\/h2>\n\n\n\n<p>Clustered indexes are a special type of index inside your tables \u2013 such indexes store data related to rows and are often used to display unique automatically incrementing ID values whenever new rows are inserted. These kinds of indexes order data in a table, and thus, one cannot have multiple clustered indexes on the same table.<\/p>\n\n\n\n<p>If there are no automatically incrementing ID values, MySQL will use the first unique index as the primary key. If no unique index exists on a table, MySQL will generate a clustered index and name it <code>GEN_CLUST_INDEX<\/code> \u2013 thus, a table has a clustered index no matter it\u2019s defined or not.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-clustered-indexes-frequently-asked-questions\">Clustered Indexes \u2013 Frequently Asked Questions<\/h2>\n\n\n\n<p><em>Q: What is a clustered index?<br>A: A clustered index is an index in which the order of rows directly corresponds to the order of rows in the index.<\/em><\/p>\n\n\n\n<p><em>Q: Do all tables have clustered indexes?<br>A: Yes. In MySQL, a clustered index is a column with a PRIMARY KEY, and if such a column does not exist, a clustered index is the first column with a unique index.<\/em><\/p>\n\n\n\n<p><em>Q: Where can I learn more about clustered indexes and other database secrets?<br>A: Follow <a href=\"https:\/\/www.red-gate.com\/simple-talk\/articles\/\">the Red-gate Simple Talk blog<\/a>, read books, attend industry workshops and conferences, and consider subscribing to <a href=\"https:\/\/databasedive.com\">the YouTube channel Database Dive.<\/a><\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-appendix-table-structure-amp-data\">Appendix \u2013 Table Structure &amp; Data<\/h2>\n\n\n\n<p>The table structure and data can be found <a href=\"https:\/\/pst.innomi.net\/paste\/hjsybydm6bjgvwzkv9fvn7hd\">here<\/a> \u2013 the data is generated by a <a href=\"https:\/\/www.mockaroo.com\/\">mock random data generator Mockaroo<\/a>, so if necessary, feel free to use the tool to add more data to the table as well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you\u2019re a frequent reader of the Simple-Talk blog, you\u2019re no stranger to indexes as we\u2019ve covered indexes in database management systems previously \u2013 in that blog, we\u2019ve told you that indexes come in a variety of flavors including ordinary Balanced Tree (B-Tree) indexes and R-Tree, or spatial, indexes. B-Tree indexes have multiple types: they&#8230;&hellip;<\/p>\n","protected":false},"author":339547,"featured_media":104709,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,145792],"tags":[5854],"coauthors":[146040],"class_list":["post-104708","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-featured","category-mysql","tag-mysql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104708","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/339547"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=104708"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104708\/revisions"}],"predecessor-version":[{"id":105045,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104708\/revisions\/105045"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/104709"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=104708"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=104708"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=104708"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=104708"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}