The nuances of MySQL indexes

Indexes are critical for database performance. In this article, Lukas Vieikis explains the many types of indexes available with MySQL.

Developers and database administrators know it – there are many nuances that need to be taken care of to ensure that database performance doesn’t dive into the ground and that database performance doesn’t cause problems either now or in the future.

One of the core aspects surrounding database performance for ages are indexes – for ages, they’ve been the core cause of increased query performance, but they’re also shrouded in mystery. No matter if looking at B-tree indexes, composite indexes, spatial indexes, or any other type of indexes available in our database management system of choice, they all operate differently and that they all have unique upsides and downsides. Then the advantages and disadvantages of index types are unique to the database management system that is in use. This post, however, is focused on MySQL and its flavors (Percona Server and MariaDB); all tips applicable to MySQL apply to Percona Server and MariaDB as well.

Types of indexes available in MySQL

To figure out all of the nuances relevant to indexes (which are also referred to as keys) in MySQL, you must understand some basics about how they work and for what purposes are they meant to accomplish in the first place. MySQL offers the following types of indexes:

  1. B-Tree (short for Balanced Tree) indexes are usually referred to as the “ordinary” indexes. This definition is partly true because there aren’t that many impressive things surrounding them: such indexes are frequently created by database administrators looking to improve search query performance, but they don’t do many things outside of that (I’ll will get into what each type of index does in a second.)
  2. Hash indexes are used only by the MEMORY storage engine (for those who are not very familiar, MySQL lets users choose from a couple of storage engines, InnoDB being the main one) and they are known for allowing users to perform exact lookups (any search query that uses the operators = or <=> would use such kind of an index). Such indexes are generally very fast due to their design but have limited use cases due to the limits imposed on them by MySQL – user defined hash indexes are only supported by the MEMORY storage engine.
  3. Spatial indexes are used for geographical data indexing.
  4. Prefix indexes usually cover a prefix (a part of) a column.
  5. Composite indexes are usually also called multicolumn indexes and, as the name suggests, such indexes usually function on multiple columns at once.
  6. Covering indexes are sometimes confused with composite indexes – while composite indexes cover multiple columns, covering indexes cover only the columns required for the query to execute. A covering index is a special kind of index – such an index is in use when all fields required for a query to execute are included in the index. When a covering index is in use, MySQL can read the index instead of reading the disk.
  7. Clustered indexes usually store tables in a B-tree index structure. All indexes that aren’t clustered indexes are known as secondary indexes.

When looking at everything from a high level, the wide variety of indexes can get a little confusing, so to get a better understanding of what they are and what they do, we must dive a little deeper.

Indexes provided by MySQL in detail

Indexes are usually used to improve SELECT query performance at the expense of slowing down UPDATEs, DELETEs, and INSERT procedures. Slow data insertion is the price to pay for increased search query performance. When data is inserted, deleted, or updated, the indexes have to be updated at the same time. If a huge portion of the data available within the database infrastructure indexed, it might become a pretty big issue in the long run. However, benefits may quickly outpace disadvantages when looking at the entire workload. Here’s a simple table explain which type of index to use when:

Type of Index

When to Use?

Why?

B-tree (Balanced tree) Indexes

When needed to increase search (SELECT) query performance with queries that involve the operators =, <, >, <=, >=, LIKE queries, or to retrieve records falling within a given range.

Balanced tree indexes are suitable for use in search operations – they will help increase the performance of search queries in almost all cases from simple queries like SELECT * FROM demo WHERE column = ‘value’ to specific wildcard-based queries like SELECT * FROM demo WHERE column LIKE ‘value%’;, though, in this case, make sure that the wildcard is at the end of the statement because if not, the index might be useless.

Hash indexes

Whenever needing rapid search query performance when using the MEMORY storage engine available in MySQL.

Hash indexes come with an Achilles’s heel – the MEMORY storage engine – but at the same time offer blazing fast performance. Such an index type can be used only for equality comparisons on the entire key which don’t often have many use cases. Still, you never know when you might come across a situation that might require them, so it’s best to keep the features offered by them in mind.

Spatial indexes

These kinds of indexes have a specific use case – they’re a fit for geographic (in other words, geospatial, hence the name), data. A spatial index is a R-Tree index unless the storage engine supports non-spatial indexing of spatial columns. In that case, the storage engine creates a B-tree index. These indexes will only be useful when accessing GIS-related data inside of MySQL via GIS-related functions that include MBRContains, MBRCovers, and MBREquals (see the documentation for further information.)

When needing to search through geographical data (look at the name), these indexes will be the way to go. however, do keep in mind that these kinds of indexes have a weakness – they can be created only on InnoDB and MyISAM storage engines – columns using these kinds of indexes must also be defined as NOT NULL as well.

Prefix indexes

Such indexes are frequently used when the column indexed is too big and disk space is too precious – prefix indexes help to index only a part of the column while at the same time being careful about the index consuming disk space. That’s a golden medium in scenarios where disk space is scarce and performance is an issue, too.

This kind of indexes are useful when wanting to improve query performance, but are low on disk space – in that case, B-tree or other kinds of indexes would probably consume a lot of space. When using prefix indexes, chances are that the performance of SELECT queries will improve at least a little bit and make sure that our indexes consume as little disk space as possible.

Composite (multicolumn) indexes

Such indexes are used when the need of indexing multiple columns appears.When one index is aimed at multiple columns at the same time, it’s a multicolumn, or composite index.

Such indexes are useful with queries involving AND operators.

These kinds of indexes can be used to satisfy queries like SELECT * FROM demo WHERE c1 = ‘something’ AND c2 = ‘something’, etc. The fact that the query involves a second, third, or fourth column should be a good enough reason to consider using a composite index. Bear in mind that MySQL reads columns left to right and that your composite index can consist from a maximum of sixteen (16) columns. You can index less columns, but not more.

Covering indexes

Such indexes cover all of the columns required for a query to successfully execute – once such indexes are in use, queries will retrieve results from the index itself and not from the disk. This will save disk I/O because MySQL will provide results derived from the index which is a smaller object.

Use covering indexes to save disk I/O. In other words, covering indexes will be very useful for queries like SELECT c1, c2, c3, c4, c5 FROM demo WHERE c3 = ‘demo’; – In this case, the index covers all the fields and there is no need to use the actual table

Clustered indexes

Such indexes are frequently PRIMARY KEYs inside of a specific table or UNIQUE INDEXes with all of their columns defined as NOT NULL.

Such indexes are very useful if we want one column to increment automatically once data is added to another column, but keep in mind that there cannot be more than one clustered index – a clustered index is a table stored in an index B-tree structure. If our table contains a PRIMARY KEY, the clustered index is the primary key (our primary index will then be called PRIMARY) – if we don’t have a PRIMARY KEY and have a UNIQUE INDEX, the clustered index is the unique index.

The table provided above should help you decide when and what kind of index type you should use. Keep in mind, however, that knowing the features of indexes provided above will only act as a small part of your decision. When placing the entire puzzle together, your decision will inevitably be also influenced by other factors such as whether your database infrastructure is optimized for performance or not, what storage engine you decide to use, how much data you have, how many rows are unique (if any), etc.

What Factors to Consider when Indexing?

As stated, once you understand the types of indexes and all of the features provided by them, you must also consider other factors. The list of things to consider should include answers to the following questions:

Question

Why is The Answer Important?

What kind of storage engine will be used?

MySQL provides multiple storage engines including, but not limited to storage engines that can guarantee ACID compliance, engines that show the exact row count in a specific database, engines that act as “black holes” in the sense that they accept data, but never store it, some engines store all of the data in memory, etc.

Two of the most popular storage engines include InnoDB and MyISAM, and the choice of storage engine will be imperative when working with any kind of index type available in MySQL. For an example, InnoDB has parameters called innodb_flush_log_at_trx_commit, innodb_buffer_pool_size, and others which can be used to either control ACID compliance or the size of the InnoDB buffer pool. Both exchange ACID for speed and increasing the buffer pool size will make the modification of indexes and data associated with them easier at the expense of other things.

MyISAM has its own key buffer size which is equivalent to the buffer pool size, but the main downside of this storage engine is that it’s now considered obsolete. For that reason, most database administrators elect to use either XtraDB (an advanced version of InnoDB developed by Percona) or InnoDB itself. If your column contains a primary key, MySQL will automatically create a clustered index named PRIMARY, but you can also create a UNIQUE INDEX yourself.

Is the my.cnf file optimized for performance?

InnoDB has a couple of parameters that are crucial for its performance and that need to be optimized to get the best out of MySQL – this information is out of scope for this article, but it will be covered in a later article in this series.

What data types and character sets will be used?

This question may sound silly at first glance, but both data types and character sets in use may become an Achilles’s heel when dealing with indexes in the future if they are set up incorrectly.

  • Character sets will become especially important when dealing with data in multiple languages. Russian speakers would benefit from the character set of latin1; general use cases should use the character set utf8mb4 instead of using utf8 as well (utf8mb4 can store 4-byte characters while utf8 only allows for 3.)
  • In regards to data types, keep in mind that one of the most frequently used data type in the world of MySQL is VARCHAR, allowing variable length characters (both number and text-based) values. In this case, mind the fact that the bigger the data type is, the more space on the disk indexes will consume too, so make sure to choose the size appropriately and choose wisely.

Also, keep in mind that MySQL comes with a couple of limitations on this front as well – one cannot put a UNIQUE INDEX on a text column, for example. Users of InnoDB should also mind the fact that the index key prefix length limit is 3072 bytes if the DYNAMIC or COMPRESSED row format is in use, while MySQL will only use 767 bytes for the REDUNDANT or COMPACT row formats.

What columns are indexed and are all of them necessary?

This question might seem very silly, but you might be surprised to see how many database administrators and software engineers index for the sake of indexing. MySQL comes with a couple of additional queries that can help you figure out whether an index is actually being used (think EXPLAIN and the like), so make sure to dive into it and learn the ins and outs of the EXPLAIN query. Once you index your columns, make sure the indexes are used by MySQL. Otherwise you will waste disk space.

Is there enough data for it to be indexed?

Remember indexes in books you’ve read? Secondary indexes in MySQL work in a similar fashion. They are essentially used to find specific column values quickly, but if you have ten rows and need to find one, you won’t go far to find it. To make sure that the indexes you’re going to use will be useful to MySQL, please make sure you have at least a couple thousand rows in a table. Applications being read-heavy help too. The more rows we have, the more effective our indexes will be.

What type of index should be used?

Lastly, consider one of the most important questions – what’s your use case? Refer to the list of indexes provided by MySQL to figure out your answer to this question, if necessary, read up on the documentation, and choose the option that is the most useful for your specific use case.

The list of questions above is not exhaustive, but it should act as a good starting point to direct your choices. After answering these questions, you should have a pretty good understanding of where your MySQL infrastructure is heading and how best to approach your data with indexes.

Once you figure out how best to index your data, please make sure to consider what issues you might be facing in the future to avoid any mishaps. These issues might not necessarily be directly related to indexes. However, if you neglect to make sure your servers are scalable when choosing a hosting provider, if you neglect to do basic research to figure out how much operating memory is necessary, if you don’t think about database normalization, or if you don’t ever do database-related research in Stack Overflow, it’s safe to say you and your database will be in line for trouble. To make sure your indexes will be as effective as possible, consider everything from the servers you are going to employ to reach your goal (make sure they have scalable resources if necessary) to the normalization of your database.

Summary

Indexes in MySQL have their nuances and most of them are related to specific kinds of indexes. No type of index is useless – all of the indexes have their use cases. However, to adequately apply them to these use cases, you must be familiar with at least some of the nuances of indexes. I hope that this article has helped you do that, and that you stay around the blog for more. See you in the next one!