Indexes in MySQL are one of the primary ways to enhance query performance and they are especially useful when the primary use case of our project refers to reading data stored in a database. We‘ve already told you about the nuanced world of indexes in MySQL – and there we‘ve mentioned that MySQL has multiple types of SQL indexes available for you to choose from.
The primary index type available within MySQL is the B-Tree index which we‘ve already covered in one of our previous articles. If you know your way around MySQL though, you will certainly be aware of other nuances of indexes, too – and one of those nuances has to do with the fact that B-Tree indexes can also consist of multiple columns (commonly referred to as composite indexes). In this example, we’re using MariaDB as our DBMS of choice. Percona Server for MySQL and MySQL Server will act identically.
Find the queries recreating the table structure and composite indexes in the appendix, and let’s get started.
What are Composite Indexes?
In the database world, composite indexes refer to indexes on multiple columns. Composite indexes are sometimes referred to as multicolumn indexes because that‘s precisely what they are – such indexes are keys on multiple columns, while indexes on the same column are referred to as simple indexes.
Multicolumn indexes have various specifics unique to themselves – depending on the database management system you find yourself using, you will find that composite indexes can be created on up to 16 or 32 columns, that these kinds of indexes only use B-Tree (MySQL) or GIN, BRIN, and GiST index types (PostgreSQL), and that a multicolumn index can be similar to a sorted array.
Composite Indexes in MySQL
Let‘s not get too far away from our primary DBMS of interest – MySQL. As noted before, the specifics of the components within your DBMS depend directly on the database management system you find yourself using – that‘s true for every component including composite indexes.
Composite indexes in MySQL:
- May consist of up to 16 columns.
- Will be used if your query uses all columns in the composite index or the first X columns in the composite index allowing the query optimizer to use any leftmost prefix of the index to look up rows.
- Have a crucially important feature – the order of columns within the index.
Here‘s a basic composite index definition:
1 |
CREATE INDEX composite_idx ON demo_table(c1,c2,c3); |
When composite indexes are in use, the order of the columns within the index is a crucial point. One cannot skip columns or read them out of order for the index to be used.
Can you guess what we just did? We‘ve created a composite index called composite_idx
on a table named demo_table
and created the multicolumn index on three columns – c1
, c2
, and c3
. Not exactly rocket science, is it?
Such an index definition enables you to have indexed search capabilities on queries that act in three ways:
- Queries that make use of the
c1
column - Queries that use both
c1
andc2
columns - Queries that utilize all three –
c1
,c2
, andc3
– columns.
Interesting, right? Such an interesting behavior applicable to defined multicolumn indexes also means that certain limits are in place – aside from being able to define up to 16 columns and not more, users employing composite indexes should also be aware of the fact that MySQL can only access or use your composite indexes in order, and MySQL will also stop using the composite index after the first range condition. We’ve provided an example of how everything looks like below.
Let‘s try them out! We have a simple table depicting items purchased by customers that internally looks like so:
1 |
DESCRIBE demo_table; |
Image 1 – DESCRIBE Query on a MariaDB Table without a Composite Index
Now, we define a composite index on the columns item_purchased
and quantity_purchased
, then inspect (DESCRIBE
) the table again:
1 |
CREATE INDEX composite_idx ON demo_table(item_purchased,quantity_purchased); |
The output should be similar to:
Image 2 – DESCRIBE Query on a Table with a Composite Index
Great! We now see that we have a key – an index – referred to as “MUL” – a multiple-column, or composite, index. Do note that various database management systems display such indexes differently and MySQL (and its counterparts) will only show “MUL” to the side of the first column that makes up a composite index, but inspecting the table structure in phpMyAdmin (or other SQL clients, for that matter) will show you that your column has multiple indexes more clearly once you inspect the table structure. See the example below:
Image 3 – Indexes in phpMyAdmin
Now we will dig into the internals of composite indexes, then run a couple of SQL queries to better understand the internals of multicolumn indexes in our MySQL infrastructure.
Index Order in Composite Indexes and Internals
Here we come back to the main capability of composite indexes: they are B-Tree indexes, but with a special feature in that they contain multiple columns within one index. As such, the order of columns is crucially important: SQL queries will run the most efficiently when you have a composite index with a column order ranging from the most to least distinct counting from the left. In other words, try to order your columns in such an order that the column that returns the highest amount of distinct rows when queried with the SELECT DISTINCT
query comes first.
Suppose we have a demo_data table like so with a bunch of data inside of it (running COUNT(*)
queries in InnoDB can take much longer than in this example if you run bigger data sets – InnoDB doesn’t store an internal row count inside of itself unlike MyISAM):
1 2 3 |
DESCRIBE demo_data; SELECT COUNT(*) AS total_rows FROM demo_data; |
The output should be similar to:
Image 4 – The Structure of demo_data Table
Let’s now add a composite index spanning all three columns with this query (the USING BTREE
clause is optional and is the default):
1 2 3 |
ALTER TABLE `your_table` ADD INDEX `index_name`(columns,separated,by,comma) [USING BTREE]; |
So for our table:
1 2 3 |
ALTER TABLE demo_data ADD INDEX composite_idx (email, username, ip) USING BTREE; |
Executing this you will see the following:
Image 5 – Adding a B-Tree Composite Index
Let’s now confirm our index has been added on all three columns:
Image 6 – A B-Tree Composite Index on a Table
Great. Now, take note of the index order – it’s email, then username, then IP. Not the other way around! We can prove the index order is at play by running a query like SHOW INDEXES
and observing the Seq_in_index
column:
1 |
SHOW INDEXES FROM demo_data; |
The output will be:
Image 7 – SHOW INDEXES in MariaDB
This means that our index will be used if we use queries like these (note that we use all columns inside of the composite index):
1 2 3 4 5 |
SELECT * FROM demo_data WHERE email = ‘aero42@demo.com’ AND username = ‘Demo 3’ AND ip = ‘127.0.0.2’; |
Output:
Image 8 – Using Composite Indexes Searching for Multiple Columns
Our composite index will also be used if we search solely for the first column from the left-hand side (in this case, email):
1 2 3 |
EXPLAIN SELECT * FROM demo_data WHERE email = ‘aero42@demo.com’; |
You will see something like the following:
Image 9 – Using Composite Indexes Searching for One Column
You may also see a situation where possible_keys may be depicted as NULL, but the key (chosen index) wouldn’t be NULL:
1 2 3 4 |
EXPLAIN SELECT * FROM demo_data WHERE ip = ‘127.0.0.2’ AND username = ‘Demo 3’; |
This should return:
Image 10 – NULL or NOT NULL?
Such situations can happen if MySQL (in this case, MariaDB) thinks that none of the indexes are suitable for a lookup, but all the columns selected by our query are columns of an index, which is exactly what’s happening here.
If we had another, non-indexed column, our composite index would still be used if our query would use that column and a column in the composite index:
1 2 3 4 |
EXPLAIN SELECT * FROM demo_data WHERE new_column = ‘Value 1’ AND email = ‘good_is_a_man@demo.net’; |
Executing this should return something like:
Image 11 – Using a Composite Index with a Non-Indexed Column
However, if our query wouldn’t use the leftmost prefix in the composite index (this column is email), no index would be used:
1 2 3 4 |
EXPLAIN SELECT * FROM demo_data WHERE new_column = ‘Value 1’ AND username = ‘Demo 9’; |
The output:
Image 12 – No Usage of Composite Index
Analyzing Composite Indexes in MySQL
Now that we know the basics of how composite indexes work internally, we can start analyzing them.
The first SQL query we run is a simple SELECT with one column in question:
1 2 3 |
EXPLAIN SELECT * FROM demo_table WHERE item_puchased = ‘Xiaomi Redmi 12 128GB’; |
Executing this, you should see something similar to:
Image 14 – EXPLAINing a Simple SELECT Query
The output of the EXPLAIN query shows that MariaDB elects to use our composite index. Our index has a key length of 482 bytes and that allows us to determine how many parts of our composite index are used by MySQL. This time, it’s a safe bet that MariaDB uses the item_purchased part of the composite_idx and we can confirm it by swapping the column order with a query like so:
1 2 3 4 |
ALTER TABLE `demo_table2` DROP INDEX `composite_idx`, ADD INDEX `composite_idx` (`quantity_purchased`, `item_purchased`) USING BTREE; |
Then trying the query again:
1 2 3 |
EXPLAIN SELECT * FROM demo_table2 WHERE item_purchased = ‘%Xiaomi Redmi 12 128GB’; |
Now you will see:
Image 15 – The Same SQL Query with a Composite Index in Switched Order
See – no possible keys here! Why? We didn’t use the leftmost prefix of the column that was indexed, we’ve used the rightmost prefix instead. In
This also explains the key length of 482 bytes – MariaDB stops using the multicolumn index at the first range condition and key_len specifies the number of bytes that are used from the index.
Let’s try again – this time we run a different query:
1 2 3 |
EXPLAIN SELECT * FROM demo_table WHERE quantity_purchased >= 5; |
This should be the output:
Image 16 – A Query That Doesn’t Use Indexes
This time we see that MySQL didn’t even consider any indexes to be used. To explain this, we need to take a step back and remember the specifics of multicolumn indexes – such indexes will only be used for finding rows if we use a leftmost prefix of the index – in our case, that’s the item_purchased column. Let’s include the column, then try again:
1 2 3 4 |
EXPLAIN SELECT * FROM demo_table WHERE item_purchased LIKE ‘iPhone%’ AND quantity_purchased >= 5; |
Now you will see the index has been used:
Image 17 – A Query That Uses a Composite Index
Woohoo! MariaDB uses our index again. MariaDB can traverse an index backward too and let’s not forget that the usual rules for indexes apply here too – attentive readers will notice that we have a wildcard only at the end of our search statement because otherwise, no index will be used no matter the order of the columns:
1 2 3 4 |
EXPLAIN SELECT * FROM demo_table WHERE item_purchased LIKE ‘%iPhone%’ AND quantity_purchased >= 5; |
Now you see:
Image 18 – a Query with a Wildcard
A query searching for exact matches, however, would use the composite index:
1 2 3 4 |
EXPLAIN SELECT * FROM demo_table WHERE item_purchased = ‘iPhone X’ AND quantity_purchased = 5; |
You can see that in the output:
Image 19 – a Query Searching for Exact Matches
Do note that where we see that MySQL uses the composite index, the length of the key (the index) remains being 482 bytes no matter what we do – that means that MySQL stops using the multicolumn index at the first range condition and doesn’t use the other part of the index (we defined a composite index on two columns – item_purchased and quantity_purchased.)
Composite Indexes, Sorting, and Search Operators
Everything gets even more interesting when sorting operations are involved – suppose we have a table like so:
1 |
DESCRIBE people; |
You can see the table information:
Image 20 – Another Example of a Table Using Composite Indexes
On this table, we’ve defined a composite index like so:
1 |
CREATE INDEX composite_idx ON people(first_name, last_name); |
We will now check the plans of a couple of SELECT queries with the ORDER BY clause:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
EXPLAIN SELECT * FROM people WHERE first_name = ‘John’ ORDER BY last_name; EXPLAIN SELECT * FROM people WHERE first_name = ‘John’ ORDER BY birthday; EXPLAIN SELECT * FROM people WHERE birthday = ‘ 1999-01-01’ ORDER BY birthday; |
This will return the following outputs:
Image 21 – Exploring Sorting in Composite Indexes
Now we‘ve run three queries – attentive readers will notice that the Extra column in all of the result sets is different:
- The first result ordered by the
last_name
was using the index condition and using the WHERE clause. The first example is using the composite index (composite_idx) because we search for two columns (first_name and last_name) that are both part of a composite index. - The second result ordered by the birthday was using the index condition, using a WHERE clause, and using a filesort operation. Sorting occurs when no applicable index satisfies the ordering of rows.
- The third result searching for a birthday and ordering by birthday was using a WHERE clause and using a filesort operation.
Here‘s what that means:
- Using the index condition means that MySQL filters rows by using the index and then uses the table to retrieve the matched rows.
- Using a WHERE clause can have different meanings. Some argue that this piece of information is frequently inconsistent at best, but in this specific case, it most likely means that the expression within a WHERE clause performed and index seek operation.
- Using a filesort operation means that rows in the table were read and sorted during an extra sorting phase during query execution.
Aside from the explanation of the query internals, one would benefit from using all of the parts of the composite index. Doing so is remarkably simple and in our case, would look like so:
1 2 3 4 |
SELECT * FROM people WHERE first_name = 'Josh' ORDER BY last_name; |
Pay special attention to the second part of the query straight after the WHERE
clause – we search for a first_name
and ORDER BY last_name
because our index looks exactly like that:
1 2 |
CREATE INDEX composite_idx ON people(first_name, last_name); |
To use a composite index with an ORDER BY
clause, ensure that you search for the leftmost column, then order by the rest with an equality operator.
Do keep in mind that the direction your ORDER BY
is sorted in also has a say here:
- MySQL will avoid a filesort operation if all parts of your composite index are sorted in an ascending (ASC) or descending (DESC) order and it has all of the rows from the equality match. This may seem like a narrow use case, but it’s worth keeping in mind. Remember that queries are tasks composed of smaller tasks – eliminating any unnecessary smaller task will make the performance of your query faster.
- You may not be able to avoid a filesort operation if one part of your composite index is sorted in a different manner than the other (e.g. if one part of your query is sorted in an ASCending order and the other in a DESCending order.) If you want your database to make use of your covering index, make sure that your query criteria specifies that – e.g. if you have a covering index on two columns, search for one column and the other or search for one column and order by the other, etc.
- If you search for something in a way that‘s backwards to the way you defined your composite indexes, you may be privy to a backwards index scan where MySQL decides to start searching for something starting from the end of the root node of the B-Tree index.
Key Takeaways and Summary
Composite, or multicolumn, indexes are indexes that are defined on multiple columns. To effectively use such indexes, make sure to use the “leftmost prefix“ principle: put the column you‘re accessing first in the first place, the second – in the second place, and so on.
An understanding of indexes will take you very far in the database world and beyond. The understanding of multicolumn indexes in your specific database management system of choice – in our case, MySQL and its counterparts like MariaDB Server and Percona Server – will significantly improve the performance of SQL queries that use filtering or sorting operations on columns.
Before embarking on any kind of index within your database management system, make sure to choose the proper index type after educating yourself on the Simple Talk site, educate yourself on their upsides and downsides, and most importantly – use indexes wisely. Use the index, Luke! Use indexes wisely and until next time.
Appendix – Table Structure and Demo Data
In case you want to create the tables and indexes in this blog, the table structure and all demo data is available here.
To generate a bunch of random data, take any data set with >100K rows, or anonymize if you have live data (this SQL query will update all usernames to one of 10 random values):
1 2 3 4 5 |
UPDATE `demo_data` SET `username` = ELT(FLOOR(RAND()*10) +1, 'Demo', 'Demo 2', 'Demo 3', 'Demo 4', 'Demo 5', 'Another Demo', 'Demo 7', 'Demo 8', 'Demo 9', 'Demo 10'); |
If you’re lazy, you can also use RAND() like so to create quick sample data (this query will set all usernames to random numeric values):
1 2 |
UPDATE `demo_data` SET `username` = RAND(); |
Anonymize email addresses with these queries (replace email domains with different domains):
1 2 3 |
UPDATE `demo_data` SET `email` = REPLACE(`email`, 'gmail.com', 'demo.com'); UPDATE `demo_data` SET `email` = REPLACE(`email`, 'hotmail.com', 'demo.org'); --etc |
And anonymize IP addresses like so (these queries will anonymize the first 200,000 IP addresses to 127.0.0.1, and the rest to 127.0.0.2):
1 2 |
UPDATE `demo_data` SET ip = ‘127.0.0.1’ LIMIT [0,]200000; UPDATE `demo_data` SET ip = ‘127.0.0.2’ WHERE ip != ‘127.0.0.1’; |
Load comments