MySQL Index Overviews: Clustered B-Tree Indexes

Comments 0

Share to social media

If you’re a frequent reader of the Simple-Talk blog, you’re no stranger to indexes as we’ve covered indexes in database management systems previously – in that blog, we’ve told you that indexes come in a variety of flavors including ordinary Balanced Tree (B-Tree) indexes and R-Tree, or spatial, indexes. B-Tree indexes have multiple types: they can be covering, composite, descending, FULLTEXT, UNIQUE, hash-based, or have something to do with the PRIMARY KEY. B-Tree indexes can also have a clustered form: and that form is what this blog is all about.

What are Clustered Indexes?

In the MySQL world, a clustered index is an index that stores the row data of a table in an ordered format. Some database programmers refer to clustered indexes as primary keys, and this is a very typical use of a clustered index in MySQL especially.

A table can only have one clustered index and that’s because data inside of a table can only be stored in one order.

Introduction to Clustered Indexes

Clustered indexes can be defined as data structures that order rows in data pages in the same way that they are ordered in the index. Clustered indexes are special in that a table cannot have more than one clustered index defined on one column. Many use cases of clustered indexes involve primary keys and that’s not without a reason either – these indexes often have an AUTO_INCREMENT (or “A_I” for short) parameters that make NULL values inside of that column become automatically incrementing numeric IDs as values are inserted.

With that being said, an AUTO_INCREMENT feature on top of a clustered index isn’t a necessity, but many tables use a numeric ID field, the ability for the clustered index to increment values of rows automatically is an awesome feat. As noted, since clustered indexes store data in the order of the data, inserting sorted data is typically favorable, too.

All primary keys are clustered indexes. If the primary key isn’t defined and there are no unique indexes (if there’s no PK, the first unique index is considered to be the primary key), MySQL would create (generate) a clustered index on an invisible column and name the index GEN_CLUST_INDEX.

Understanding Clustered Indexes

To understand clustered indexes, think of the keys for these indexes as unique identifiers for each row in a table. Think of hardware IDs– hardware IDs, or HWIDs, act as fingerprints of your hardware so the OS can identify it as real. Hardware IDs are a set of letters and numbers unique to each set of hardware. Clustered indexes act in a similar way to hardware IDs, just inside the tables within your database.

Clustered indexes have a couple of distinct characteristics unique to themselves: For example, they store data in a specific order that cannot be changed.

Each table has a clustered index – in most cases, the clustered indexes serve as the index for the primary key when you create the table, but if not, MySQL will consider the first unique index on the table will be created as the clustered index.

As every table has a clustered index, every index that is not part of a clustered index is referred to as a non-clustered index or a secondary index. Clustered indexes can be defined upon table creation or once a table is already created.

You can’t “specify” a clustered index per se; your database will do that for you by either using the primary key, a unique constraint, or adds one for you. To create a table bearing a clustered index, run a query like so*:

*Note that in MySQL, MariaDB, and Percona Server, naming primary keys isn’t supported. Titling a primary key will be the primary cause of the error #1280:

Warning: #1280 Name 'x' ignored for PRIMARY key.

With that in mind, know that when altering a table, clustered indexes (primary keys) in MySQL can be defined like so (don’t forget the AUTO_INCREMENT and PRIMARY KEY definitions) – some of you may also want to drop the id column if it’s already created:

Once a primary key is in place, it will be displayed with a “PRI” value in the Key column when DESCRIBE queries are being run:

D:\Misc\Redgate\Blogs\July 2024\MySQL Index Deep Dive - Clustered Indexes\Images\DESCRIBE data_table.png

Image 1 – A Clustered Index – PRIMARY KEY – on the id column

If you were to drop the id column in this table, like this:

You would not see the internal clustering identifier show up in the output of the DESCRIBE statement output. Note too that if you create a key without the AUTO_INCREMENT setting, like this:

Or

You will see that the id column shows up as PRI. However, if you change the unique index to a not-unique index, it will show up as MUL, because the object will be clustered on the key values of the index, but since it is not defined as UNIQUE, it will have to add a uniqueifier to the structure.

Note that you define any column that includes the AUTO_INCREMENT option – such an option would enable values in the primary key column to increment automatically. For example, if you drop the id column, then try to add it back in this manner:

MySQL will come back with an error regarding the table index definition, meaning that the AUTO_INCREMENT value must be appended with PRIMARY KEY, too:

#1075 - Incorrect table definition; there can be only one auto column and it must be defined as a key

Regardless, I’ve illustrated how a clustered index is structured inside your database below:

Figure 1 – Clustered Index vs. Non-Clustered Index

Figure 2 – Clustered Index

Clustered Index Examples

Clustered indexes are still B-tree indexes with the distinction from non-clustered indexes being that the leaf nodes of the index are the entire row (or at least the entire row that fits on a page), and as such, are used according to the rules of B-tree indexes. I’ll prove it by running a couple of queries.

Our data_table has been loaded with 16,000 rows filled with mock data (you can add this data using the technique from the end of the article in the appendix):

Image 2 – data_table in MariaDB

Now, we will see which queries make use of our clustered index – our clustered index resides in the id column.

D:\Misc\Redgate\Blogs\July 2024\MySQL Index Deep Dive - Clustered Indexes\Images\EXPLAIN SELECT PRIMARY KEY.png

Image 3 – SELECT Queries and the id column

D:\Misc\Redgate\Blogs\July 2024\MySQL Index Deep Dive - Clustered Indexes\Images\EXPLAIN LIKE vs. without LIKE.png

Image 4 – Wildcards and the id column

As you can see, primary keys can be used just like B-tree indexes can, but since they have a rather narrow use case, the instances where they are considered for use are also pretty narrow:

  1. They define the order that you may need the data in.
  2. They denote the key value that you will be fetching data with most of the time. That is the PK. (This is particularly useful when doing joins).

If our table doesn’t have a defined primary key and is using a non-unique index instead (keep in mind that a non-unique index is also considered to be a B-tree index, things change – the unique index acts just like a b-tree index, just without enforcing uniqueness (and the table may be clustered on that index if it is the only index on the table).

Before adding a unique index, we would need to remove duplicates from a table and we can do so by running a query like the one below (here column is the name of the column which we expect to have duplicate values):

D:\Misc\Redgate\Blogs\July 2024\MySQL Index Deep Dive - Clustered Indexes\Images\Delete Duplicate Rows Query.png

Image 5 – Deducting Duplicate Values from a Table

This query would complete the following:

  1. Finds duplicate rows in the details column (changing the number after the > sign to X would remove rows that have X amount of occurrences)
  2. Deletes them.

This query is likely to take a while, but after it’s done, we can define a unique index just like we’d define an ordinary B-tree index, just with the UNIQUE clause in between:

After that, we can insert data from our older table automatically deducting duplicates because we already have a unique index in place – we use the IGNORE clause to avoid errors:

D:\Misc\Redgate\Blogs\July 2024\MySQL Index Deep Dive - Clustered Indexes\Images\INSERT IGNORE INTO.png

Image 6 – INSERT IGNORE Query

Duplicates are skipped and our unique index is now our clustered index. That also means that if you have a unique index on a column and you re-insert the data into it, you will lose all but one of the rows.

After we have a unique index and some data in place (we’ve defined the clustered index in the details column), we can run some queries and see how our database interprets the unique index as well:

D:\Misc\Redgate\Blogs\July 2024\MySQL Index Deep Dive - Clustered Indexes\Images\EXPLAIN SELECT Unique Index.png

Image 7 – Our Database Interpreting the Unique Index

As you can see, things don’t change much from an ordinary B-tree index being in place – the same things apply here too:

  1. The first query is using the details_idx set on the details column.
  2. The second query is using the details_idx because it’s the column after the WHERE clause and the LIKE clause doesn’t make it useless (there’s a wildcard at the end as opposed to the beginning.)
  3. The third query considers the details_idx index but uses none because the OR negates the necessity for it (if we’d have an index on the username column, it would be used instead.)

Clustered indexes are indeed an interesting beast, huh?

Summary

Clustered indexes are a special type of index inside your tables – such indexes store data related to rows and are often used to display unique automatically incrementing ID values whenever new rows are inserted. These kinds of indexes order data in a table, and thus, one cannot have multiple clustered indexes on the same table.

If there are no automatically incrementing ID values, MySQL will use the first unique index as the primary key. If no unique index exists on a table, MySQL will generate a clustered index and name it GEN_CLUST_INDEX – thus, a table has a clustered index no matter it’s defined or not.

Clustered Indexes – Frequently Asked Questions

Q: What is a clustered index?
A: A clustered index is an index in which the order of rows directly corresponds to the order of rows in the index.

Q: Do all tables have clustered indexes?
A: Yes. In MySQL, a clustered index is a column with a PRIMARY KEY, and if such a column does not exist, a clustered index is the first column with a unique index.

Q: Where can I learn more about clustered indexes and other database secrets?
A: Follow the Red-gate Simple Talk blog, read books, attend industry workshops and conferences, and consider subscribing to the YouTube channel Database Dive.

Appendix – Table Structure & Data

The table structure and data can be found here – the data is generated by a mock random data generator Mockaroo, so if necessary, feel free to use the tool to add more data to the table as well.

Article tags

Load comments

About the author

Lukas Vileikis

See Profile

Lukas Vileikis is an ethical hacker and a frequent conference speaker. Since 2014, Lukas has found and responsibly disclosed security flaws in some of the most visited websites in Lithuania. He runs one of the biggest & fastest data breach search engines in the world - BreachDirectory.com, frequently speaks at conferences and blogs in multiple places including his blog over at lukasvileikis.com.