This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.
Throughout this series, I’ve covered a number of topics about working with MongoDB collections and their data. One topic I have not covered is how to work with indexes. Similar to a relational database system, MongoDB lets you create indexes on your collections to help improve query read performance. You can create an index on one or more fields, even if those fields contain embedded documents, geospatial data, or date and time values.
In this article, I demonstrate how to create several types of indexes. You’ll also learn how to view existing indexes, drop indexes, and hide them from the query planner. The examples are based on MongoDB Shell commands, rather than the MongoDB Compass interface. By learning the Shell commands, you’ll be able to better understand the principles behind index creation.
You can access MongoDB Shell through MongoDB Compass or through your system’s command-line interface. Either will serve for this article, where you’ll learn how to create four types of indexes. Although MongoDB also supports other index types, I think you’ll likely use these four basic types initially. From this foundation, you should have no problem creating the other types.
MongoDB indexes can be extremely useful in supporting your data-driven applications, as long as those indexes are implemented with your workloads in mind. Although they can help speed up read operations, they can also slow down write operations because each data modification requires both the document and index data to be updated.
Understanding your workloads’ read-to-write ratios will be a key factor in planning an effective indexing strategy. You can also use tools such as the Performance Advisor in MongoDB Atlas to help you determine how best to implement your indexes. Once you’ve decided to add indexes a collection, this article will help you get started.
Note: For the examples in this article, I used the same MongoDB Atlas environment I used for the previous articles in this series. Refer to the first article for details about setting up these environments. The examples are based on the hr
database and employees
collection. You can use a different test database and collection if you like. Just modify the code accordingly.
Creating indexes in a MongoDB collection
You can create an index in MongoDB Shell by using the createIndex
method, which lets you define different types of indexes based on one or more fields in your collection’s documents. The following syntax shows the basic elements that go into a createIndex
command:
1 |
db.collection.createIndex( { key }, { options } ); |
The command’s syntax is made up of the following elements:
- db. System variable for referencing the current database and accessing the properties and methods available to the database object. For this article, you need to ensure that
hr
is the current database. - collection. Placeholder for the target collection. For this article, we will be using the
employees
collection. When you specify a collection, you can access the properties and methods available to the collection object. - createIndex. A method available to the collection object for creating an index in the specified collection.
- key. Placeholder for one or more field/value pairs. For each pair, you must specify the field on which the index will be based, along with a predefined value that determines the type of index to create on that field. For example, a value of
1
indicates a basic ascending index, a value of-1
indicates a descending index, and a value of2dsphere
indicates a type of geospatial index. - options. Placeholder for one or more optional settings that refine the index definition.
I’ll demonstrate how all this works shortly, but first, you’ll need to prepare your test environment. Start by ensuring that the hr
database and employees
collection are in place and that the collection contains no documents. Then, in MongoDB Shell, run the following insertMany
command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
db.employees.insertMany([ { "_id": 101, "name": "Drew", "emp_id": "drews4873", "position": { "title": "Senior Developer", "dept": "R&D", "location": { "type": "Point", "coordinates": [ -122.3493036, 47.6205131 ] }, "skills": [ "Java", "SQL", "Python", "PHP" ], "yrs_exp": 18 } }, { "_id": 102, "name": "Parker", "emp_id": "parkerc5927", "position": { "title": "Data Scientist", "dept": "R&D", "location": { "type": "Point", "coordinates": [ -104.984848, 39.738449 ] }, "skills": [ "Python", "R", "Go", "SAS" ], "yrs_exp": 14 } }, { "_id": 103, "name": "Harper", "emp_id": "harperd0564", "position": { "title": "Marketing Manager", "dept": "Marketing", "location": { "type": "Point", "coordinates": [ -73.9653627, 40.7827725 ] }, "yrs_exp": 22 } }, { "_id": 104, "name": "Darcy", "emp_id": "darcyg7432", "position": { "title": "Senior Developer", "dept": "R&D", "location": { "type": "Point", "coordinates": [ -87.7517243, 41.7855141 ] }, "skills": [ "Java", "Csharp", "Python", "R" ], "yrs_exp": 6 } }, { "_id": 105, "name": "Carey", "emp_id": "careyr4038", "position": { "title": "SEO Specialist", "dept": "Marketing", "location": { "type": "Point", "coordinates": [ -84.3732295, 33.7910822 ] }, "yrs_exp": 7 } }, { "_id": 106, "name": "Avery", "emp_id": "averyl2074", "position": { "title": "Network Admin", "dept": "IT", "location": { "type": "Point", "coordinates": [ -106.6699345, 35.0961125 ] }, "yrs_exp": 11 } }, { "_id": 107, "name": "Gabe", "emp_id": "gabet5387", "position": { "title": "Developer", "dept": "R&D", "location": { "type": "Point", "coordinates": [ -122.4058344, 37.802379 ] }, "skills": [ "Haskell", "Fortran", "Smalltalk", "COBOL" ], "yrs_exp": 8 } } ]); |
The command adds seven documents to the employees
collection. When you run the command in MongoDB Shell, you should receive a message that shows the _id
values of these documents.
As you’ll recall from previous articles in this series, every document in a collection must include the _id
field. MongoDB automatically creates a unique index on the _id
field as soon as you create the collection, even before you add any documents to the collection. This ensures that the values inserted into the _id
field are always unique.
You can view the indexes that are defined on a collection by using the getIndexes
method available to the collection object. For example, the following command uses the method to retrieve the indexes defined on the employees
collection:
1 |
db.employees.getIndexes(); |
You do not need to pass in any arguments when calling the getIndexes
method. By default, it will list all indexes created in the specified collection. In this instance, the method should return only the index defined on the _id
field, as shown in the following results:
1 |
[ { v: 2, key: { _id: 1 }, name: '_id_' } ] |
The results include three field/value pairs that describe the index. The v
field, which has a value of 2
, refers the index format version. The next pair is the key
field and its value of {
_id
: 1
}
. This is the field/value pair (key) on which the index is based. This is followed by the name
field, which has a value of _id_
. This is the name that MongoDB automatically assigns to the index.
Create a single-field index in a MongoDB collection
MongoDB supports multiple types of indexes. One of the most common is the single-field index, such as the one that MongoDB automatically creates on the _id
field. You can use the createIndex
method to define an index on any field in a document. For example, the following createIndex
command defines an index on the emp_id
field:
1 |
db.employees.createIndex( { "emp_id": 1 } ); |
In this case, the method takes only one argument, the field/value key pair. The key specifies the target field (emp_id
), followed by the sort order (1
). A value of 1
indicates that the index should be sorted in ascending order. A value of -1
indicates that the index should be sorted in descending order. MongoDB documentation warns against creating descending indexes because they can negatively impact index performance.
When you run the createIndex
command, MongoDB creates the index and returns the name that is automatically assigned to this index, which is emp_id_1
. The generated index name is made up of the key’s field/value pair, with the field and value separated by an underscore.
If the index is a compound index (made up of multiple fields), the index name includes all the key pairs, separated by underscores. Not surprisingly, a complex compound index can result in a very lengthy index name. However, MongoDB lets you name an index when you create it, as you’ll see shortly.
The index itself is made up of the values in the target field, which in this case is emp_id
. The values are sorted in ascending order. Each entry is associated with a pointer that indicates where the data resides.
After you create an index, you can verify that it has been properly defined by again running a getIndexes
command:
1 |
db.employees.getIndexes(); |
The command returns the following results, which now show two index listings:
1 2 3 4 |
[ { v: 2, key: { _id: 1 }, name: '_id_' }, { v: 2, key: { emp_id: 1 }, name: 'emp_id_1' } ] |
The first listing is the same one you saw before, the one MongoDB automatically generated on the _id
field. The second listing is for the index you just created on the emp_id
field.
Dropping an index in a MongoDB collection
In some cases, you might want to update an index definition to better refine it. However, you can’t modify an index except in a few instances, such as hiding or unhiding the index. Instead, you must first drop the index and then re-create it.
You can drop any index except the one that MongoDB automatically creates on the _id
field. To drop an index in MongoDB Shell, use the dropIndex
method, as in the following example:
1 |
db.employees.dropIndex("emp_id_1"); |
The method takes either one of the following arguments: the index name or its field/value key. In this example, I’ve used the index name. This is the index you just created. When you run the command, MongoDB removes the index and returns the following results:
1 |
{ nIndexesWas: 2, ok: 1 } |
The nIndexesWas
field shows the number of indexes that existed prior to running the dropIndex
command. The ok
field and its 1
value indicate that the command was successfully. If it had been unsuccessful, the value would be 0
. Note that, when you run a command such as dropIndex
against an Atlas collection, your results will also include cluster-related information, although it will still contain the core information shown here.
After you remove the index, you can then re-create it with an updated definition. For example, the following createIndex
command again creates an index on the emp_id
field, but this time the index definition includes two options:
1 2 3 |
db.employees.createIndex( { "emp_id": 1 }, { name: "unique_id", unique: true } ); |
Each option is a field/value pair that sets the value of an index property. The first option is name
, which assigns the value unique_id
to the name
property. As a result, the index will be named unique_id
, rather than use a name generated automatically by MongoDB. The second option is unique
, and its value is set to true
, so the index will be defined as a unique index.
When you run the command, MongoDB should return the index name that you specified in the definition. You can then run a getIndexes
command to verify your changes. The command should return the following results:
1 2 3 4 |
[ { v: 2, key: { _id: 1 }, name: '_id_' }, { v: 2, key: { emp_id: 1 }, name: 'unique_id', unique: true } ] |
As you can see, the listing for the new index now includes the name that you assigned to the index and indicates that this is a unique index.
Create a compound index in a MongoDB collection
As already noted, a compound index is one that is made up of multiple fields. You can include up to 32 fields, although you’re more likely to use only two or three. For each field that you include in the index definition, you must specify a field/value key. The data within the index is grouped by the first specified field, then by the second, and so on.
Compound indexes can be useful when your applications frequently query a specific set of fields. For example, an HR system might often query employee usernames and last names. To improve query performance, you can create a compound index on the two fields. Compound indexes also make it possible to support covered queries, in which the query pulls all the required data from the index, without needing to access the underlying documents.
As with single-field indexes, you can use the createIndex
method to define a compound index. The main difference is that, with a compound index, you specify two or more field/value keys, as in the following example:
1 2 3 |
db.employees.createIndex( { "position.dept": 1, "position.title": 1 }, { name: "title_dept" } ); |
The command creates a compound index on the position.dept
field and the position.title
field, with both fields sorted in ascending order. Notice that a comma separates the field/value keys and that the index is named title_dept
.
In this case, the index has been created on fields in an embedded document. It’s also possible to create an index on the embedded document itself, such as the position
field. However, for a query to be able to take advantage of such an index, it must specify the entire embedded document, and it must specify the embedded fields in the exact order they’re defined. For example, if you create an index on the position
field, a query that searches only the position.dept
field cannot take advantage of the index.
After you create the title_dept
index, you should again run a getIndexes
command to verify the new index. The command should now return the following information:
1 2 3 4 5 6 7 8 9 |
[ { v: 2, key: { _id: 1 }, name: '_id_' }, { v: 2, key: { emp_id: 1 }, name: 'unique_id', unique: true }, { v: 2, key: { 'position.dept': 1, 'position.title': 1 }, name: 'title_dept' } ] |
The third listing provides details about the compound index. As expected, the key
value reflects both of the field/value pairs used to define the index.
Hiding an index in a MongoDB collection
Earlier in the article, I demonstrated how you can drop an index. You might do this so you can redefine the index or simply because you want to remove it from the collection. In some cases, however, it might be useful to see what impact the index’s removal will have on performance, before you actually drop the index.
You can achieve this by hiding the index. When you hide an index, you’re essentially deactivating it and hiding it from the query planner. If hiding the index negatively impacts performance, you can simply unhide the index without having to drop and re-create it. The index is still fully maintained when it’s hidden, so it can be enabled with little effort. On the other hand, if hiding the index doesn’t impact performance, you might choose go ahead and drop it.
In MongoDB Shell, you can use the hideIndex
method to hide an index. For example, the following command uses the method to hide the title_dept
index that you just created:
1 |
db.employees.hideIndex( "title_dept" ); |
The method’s only argument is the name of the target index. When you run the command, MongoDB sets the index as hidden and returns the following message:
1 |
{ hidden_old: false, hidden_new: true, ok: 1 } |
The message essentially states that the index was not hidden previously, but now it is. You can verify this change by running a getIndexes
command, which returns the following results:
1 2 3 4 5 6 7 8 9 10 |
[ { v: 2, key: { _id: 1 }, name: '_id_' }, { v: 2, key: { emp_id: 1 }, name: 'unique_id', unique: true }, { v: 2, key: { 'position.dept': 1, 'position.title': 1 }, name: 'title_dept', hidden: true } ] |
The listing for the compound index now shows the hidden
field as true
. Note that the getIndexes
method returns the hidden
property only if its value is true
.
If you decide you want to reactivate the index, you can unhide it by running an unhideIndex
command, as in the following example:
1 |
db.employees.unhideIndex( "title_dept" ); |
As with the hideIndex
method, the unhideIndex
method takes only the target index as its argument. When you run the command, you should receive the following message:
1 |
{ hidden_old: true, hidden_new: false, ok: 1 } |
Now the index will be fully operational, which means that the query planner can see the index and use it to optimize performance for queries that rely on the fields specified in the index definition.
You can also configure an index as hidden when you first create it. For this, you need to include the hidden
option in the index definition and set its value to true
.
Create a multikey index in a MongoDB collection
MongoDB also supports a type of index called multikey, which is simply an index defined on an Array
field. The index contains all the values within the array, whether scalar values or embedded documents. If an array contains duplicate values, the index will include only one entry for that value.
You do not need to do anything special to define a multikey index. MongoDB automatically creates the index as multikey when you define it on an Array
field. For example, the following createIndex
command defines a multikey index on the position.skills
field:
1 2 3 |
db.employees.createIndex( { "position.skills": 1 }, { name: "multi_skills", sparse: true } ); |
Most of the command’s elements you’ve seen before. The index is based on the position.skills
field, sorted in ascending order, and is named multi_skills
. Because the field is an array, MongoDB automatically creates the index as multikey.
The index definition also includes the sparse
option, with its value set to true
. This option tells MongoDB to exclude entries for documents that do not contain the position.skills
field. I’ll demonstrate how you can confirm this shortly. But first, run the above createIndex
command, and then, after you’ve successfully created the index, run a getIndexes
command. The command should return the following results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
[ { v: 2, key: { _id: 1 }, name: '_id_' }, { v: 2, key: { emp_id: 1 }, name: 'unique_id', unique: true }, { v: 2, key: { 'position.dept': 1, 'position.title': 1 }, name: 'title_dept' }, { v: 2, key: { 'position.skills': 1 }, name: 'multi_skills', sparse: true } ] |
The fourth listing shows the new multikey index, with the name we specified and the sparse
property set to true
. Unfortunately, MongoDB provides no indication that this is a multikey index. All you have to go on is that fact that the position.skills
field is an array.
However, you can verify that the sparse
property is giving you the results you want. First, run the following find
command against the employees
collection:
1 |
db.employees.find().count(); |
The command uses the find
method and its count
method to determine the number of documents in this collection. In this case, the count should be 7
, assuming you didn’t add or remove any other documents from the collection.
Next, run the following find
command, which also includes the hint
method:
1 |
db.employees.find().hint( "multi_skills" ).count(); |
The command instructs MongoDB to use the multi_skills
index when determining the document count. As a result, the command will return a count of 4
rather than 7
because only four documents include the position.skills
field.
Create a geospatial index in a MongoDB collection
MongoDB lets you store geospatial data as GeoJSON objects or legacy coordinate pairs in your documents. The sample documents you inserted into the employees
collection include the position.location
field, which contains geospatial data stored as GeoJSON objects. For example, the first document includes the following geolocation data:
1 2 3 |
"location": { "type": "Point", "coordinates": [ -122.3493036, 47.6205131 ] }, |
The position.location
field is an embedded document that includes the type
and coordinates
fields. The type
field specifies the GeoJSON object type, which in this case, is Point
. The Point
type indicates that the coordinates apply to a specific point location. The coordinates
field is an array that contains the point’s two coordinates, with the longitude listed first.
Data that is stored as GeoJSON objects is used to represent geographic points or geometrical shapes on an Earth-like sphere. For this reason, they’re commonly used to store location data. However, you can also store geospatial data as legacy coordinate pairs. This makes it possible to store location data on a Euclidean plane, which is flat and two-dimensional. You can learn more about geospatial data in the MongoDB topic Geospatial Queries.
MongoDB lets you create indexes on either type of geospatial data. When defining a geospatial index, you must use 2dsphere
as the key value for a GeoJSON index and use 2d
as the key value for a legacy index. For example, the following createIndex
command creates a GeoJSON index on the position.location
field:
1 2 3 |
db.employees.createIndex( { "position.location": "2dsphere" }, { name: "geo_location" } ); |
The command creates an index on the entire position.location
field, even though the coordinates themselves are stored in the position.location.coordinates
field. The command also names the index geo_location
.
After you run this command, you can again use a getIndexes
command to verify the new index. The command should now return the following results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
[ { v: 2, key: { _id: 1 }, name: '_id_' }, { v: 2, key: { emp_id: 1 }, name: 'unique_id', unique: true }, { v: 2, key: { 'position.dept': 1, 'position.title': 1 }, name: 'title_dept' }, { v: 2, key: { 'position.skills': 1 }, name: 'multi_skills', sparse: true }, { v: 2, key: { 'position.location': '2dsphere' }, name: 'geo_location', '2dsphereIndexVersion': 3 } ] |
The listing for the geo_location
index looks similar to the others, except that the key value is now 2dsphere
. In addition, the listing includes the 2dsphereIndexVersion
property, which refers the format version used for this type of index.
As you can also see in these results, the getIndexes
command now returns five index listings, including the default one that MongoDB automatically created on the _id
field. These are all the indexes that we’re going to cover in this article, so you’re now going to delete the four indexes that you created. For this, you can run the following dropIndexes
command:
1 |
db.employees.dropIndexes(); |
The dropIndexes
method drops all indexes except the one on the _id
field. This is different from the dropIndex
method you saw earlier, which removes only the specified index. When you run the dropIndexes
command, it should return the following results:
1 2 3 4 5 |
{ nIndexesWas: 5, msg: 'non-_id indexes dropped for collection', ok: 1 } |
The results indicate that five indexes existed before the command ran and that all but the _id
index was dropped. You can confirm this by once again running a getIndexes
command, which should return the following results:
1 |
[ { v: 2, key: { _id: 1 }, name: '_id_' } ] |
As you can see, only the auto-generated index now exists in the employees
collection.
Getting started with MongoDB indexes
In this article, I demonstrated how you can create several different types of indexes. However, a single article is not enough to explain everything there is to know about indexes and index types. For example, MongoDB also supports wildcard and hashed indexes and provides additional options for refining your indexes, such as partial and time to live (TTL). I highly recommend that you refer to MongoDB documentation for further information, starting with the topic Indexes.
MongoDB indexes can provide you with a powerful tool for improving query performance. Without an index, a query must scan every document in a collection to get the information it needs. The strategic use of indexes can help improve query execution when accessing the same fields repeatedly, which is especially important for high-volume data operations.
That said, indexes can have a detrimental impact on write performance, so the type of workloads you support will be integral in determining your indexing strategy. Again, refer to the MongoDB documentation for guidelines about creating indexes and measuring their use. Indexes should be an important consideration in your data-delivery strategy, but you should implement them with great care.
Load comments