This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.
Throughout this series, I’ve introduced you to different features in MongoDB and provided examples to help demonstrate how the database system works. The examples have all been based on conventional collections, the type that MongoDB creates by default. However, MongoDB also supports other types of collections, including the time series collection, which can benefit many of today’s event-driven workloads.
The documents in a time series collection represent a sequence of data points, with each document recording an event at a specific point of time. For example, a machine sensor might generate temperature readings that are transformed into individual documents and stored in a time series collection.
A time series collection is optimized to handle these types of documents and the workloads they support, offering improved query performance and reduced storage consumption. MongoDB automatically stores the collection’s data in groups of related documents and indexes them based on their date values and unique group identifiers.
In this article, I introduce you to the time series collection and demonstrate different ways you can create them in MongoDB Shell. If you want to try out the examples, you can use the version of Shell embedded in MongoDB Compass or the one you access through your system’s command-line interface. You can also create time series collections in the Compass GUI, although this article focuses on the Shell commands.
Note: For the examples in this article, I used the same MongoDB Atlas environment I used for the previous articles in this series. Refer to the first article for details about setting up these environments. For this article, the examples are based on the iot
database, which you can create in advance or when you try out the exercises.
Adding a time series collection to a MongoDB database
To create a time series collection in MongoDB Shell, you can use the createCollection
method, just like you can for a conventional collection. The primary difference is that, for a time series collection, you must include the timeseries
option in your collection definition, as shown in the following syntax:
1 2 3 4 5 6 7 8 9 10 11 12 |
db.createCollection( "collection_name", { timeseries: { timeField: "field_name", metaField: "field_name", granularity_options }, expireAfterSeconds: num_seconds } ); |
The command’s syntax consists of the following elements:
- db. System variable for referencing the current database and accessing the properties and methods available to the database object. For this article, you should ensure that
iot
is the current database. - createCollection. Database method for creating a collection in the current database.
- collection_name. Placeholder for the name of the new collection. For this article, we’ll be creating the
pressure
collection. - timeseries. An option available to the
createCollection
method for creating a time series collection. The option defines an embedded document that includes parameters specific to a time series collection. - timeField. A
timeseries
parameter that specifies a date field in the collection’s documents. The field must be defined as a valid BSON data type. BSON is a binary encoding of JSON. - metaField. An optional
timeseries
parameter that specifies a metadata field in the collection’s documents. The field should contain data that can uniquely identify a related group of documents. For example, the field might identify a weather sensor and its location. Only documents that are generated by the same sensor at the same location can be included in the same bucket. The field value should rarely, if ever, change. Although this setting is optional, its inclusion can improve query performance because it can be used as part of a compound index along with the field assigned to thetimeField
parameter. - granularity_options. Placeholder for one or more parameters that specify the collection’s granularity, which determines how the collection’s documents are bucketed into related groups of data. I discuss the granularity options in more detail later in the article.
- expireAfterSeconds. An optional parameter that lets you specify whether the documents in a time series collection should be automatically deleted after a certain amount of time. If the setting is included, it should be defined with an integer value that indicates when the documents will be deleted. The integer, as indicated by the
num_seconds
placeholder, determines the number of seconds that should pass before a document expires.
The documents in a time series collection typically contain a date field that is assigned to the timeField
parameter, a metadata field that is assigned to the metaField
parameter, and some type of measure specific to the date field and metadata field. For instance, the documents in a time series collection that tracks global temperatures might include the following three fields:
- A date field that records when the temperature was measured.
- A metadata field that identifies the weather sensor and its location.
- A measure field that records the temperature.
Each document in a time series collection represents an event at a specific point in time, such as the weather station’s temperature readings. Other examples include website views, stock trades, inventory changes, sensor data from internet of things (IoT) devices, and a variety of other use cases. The key is to define your time series collections to meet the specific needs of your workloads. You’ll get a better sense of how each of these elements work as we progress through the article.
Creating a time series collection
Now that we’ve reviewed the syntax, let’s look at an example of how to create a time series collection. We’ll start with a basic collection that uses the default granularity and does not expire the documents.
You’ll be creating the collection in the iot
database, so you’ll need to change the context to that database. To do so, launch MongoDB Shell and, at the command prompt, enter the following command:
1 |
use iot; |
The command switches the shell’s database context to iot
. You can use this command even if you have not yet created the database. If the database does not exist, MongoDB will automatically create it when you add the collection.
Once you’ve established the database context, you can use the createCollection
method to add the pressure
collection, as shown in the following command:
1 2 3 4 5 6 7 8 9 |
db.createCollection( "pressure", { timeseries: { timeField: "timestamp", metaField: "source" } } ); |
The timeseries
element in the collection definition includes the following two parameters:
- The
timeField
parameter specifies thetimestamp
field, which contains the document’s timestamp. - The
metaField
parameter specifies thesource
field, which is an embedded document that contains a system identifier and sensor identifier.
That’s all there is to creating a basic time series collection. The trick is to know in advance which fields you plan to assign to the timeField
and metaField
parameters. The fields will be specific to the documents you’ll be inserting into the collection.
When you run the createCollection
command, MongoDB automatically creates a compound index on the fields specified in the timeField
and metaField
parameters. In this case, MongoDB creates the index on the source
and timestamp
fields, as indicated by the index name, source_1_timestamp_1
.
After you create the collection, you can then run the following insertMany
command to add sample data to the collection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
db.pressure.insertMany([ { "timestamp": ISODate("2024-12-01T12:05:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 122 }, { "timestamp": ISODate("2024-12-01T12:35:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 137 }, { "timestamp": ISODate("2024-12-01T16:05:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 129 }, { "timestamp": ISODate("2024-12-01T16:35:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 133 }, { "timestamp": ISODate("2024-12-01T20:05:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 87 }, { "timestamp": ISODate("2024-12-01T20:35:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 113 }, { "timestamp": ISODate("2024-12-02T12:05:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 121 }, { "timestamp": ISODate("2024-12-02T12:35:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 129 }, { "timestamp": ISODate("2024-12-02T16:05:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 127 }, { "timestamp": ISODate("2024-12-02T16:35:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 131 }, { "timestamp": ISODate("2024-12-02T20:05:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 139 }, { "timestamp": ISODate("2024-12-02T20:35:00.000Z"), "source": { "systemId": 4937, "sensorId": 37217 }, "reading": 128 } ]); |
When you run this command, you should receive an acknowledgment message indicating that 12 documents were added to the collection. The message should also list each document’s _id
value. As with a conventional collection, MongoDB automatically adds an _id
field to each document to ensure its uniqueness, unless you’ve specifically included the field when inserting the documents.
Each document in the pressure
collection provides a record of a pressure reading from a sensor on a piece of machinery, equipment or other type of system, such as you might find in a manufacturing plant. The document includes a timestamp
field, source
field, and reading
field:
- The value for the
timestamp
field uses theISODate
function to convert the date/time value from a string to aDate
object. - The value for the
source
field is an embedded document. The document specifies a system ID that uniquely identifies the piece of equipment or other system that is being monitored. The document also includes a unique ID for the sensor itself. In this way, the sensors on a single system can be tracked individually. - Each document also includes the
reading
field, which provides the pressure reading at the time the pressure was measured.
In this case, the sample documents are all from the same sensor on the same system. In a real-world setting, there would be many more records, with documents for different systems and sensors. The documents would be stored in multiple buckets, based on the values in their timestamp
and source
fields.
Querying a time series collection
You can query a time series collection just like you can a conventional collection. For example, the following command uses the find
method to return only a subset of documents from the collection:
1 |
db.pressure.find( { "reading": { $lt: 120 } } ); |
The command uses the $lt
operator to specify that a document must have a reading
value less than 120
. As a result, the command returns only two documents, which are shown in the following results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
{ timestamp: 2024-12-01T20:05:00.000Z, source: { sensorId: 37217, systemId: 4937 }, reading: 87, _id: ObjectId('674dfe47bd3a4413e2aebb01') } { timestamp: 2024-12-01T20:35:00.000Z, source: { sensorId: 37217, systemId: 4937 }, reading: 113, _id: ObjectId('674dfe47bd3a4413e2aebb02') } |
You can also create more complex queries, including aggregations. For instance, the following command uses the aggregate
method to find the average reading
value for each date in the collection’s documents.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
db.pressure.aggregate([ { $match: { "reading": { $gt: 120 } } }, { $project: { date: { $dateToParts: { "date": "$timestamp" } }, "reading": 1 } }, { $group: { _id: { "date": { "year": "$date.year", "month": "$date.month", "day": "$date.day" } }, "avgReading": { $avg: "$reading" } } }, { $project: { _id: 0, "date": { $concat: [ { $toString: "$_id.date.year" }, "/", { $toString: "$_id.date.month" }, "/", { $toString: "$_id.date.day" } ] }, "avgReading": { $round: [ "$avgReading", 2 ], } } }, { $sort: { "date": 1 }} ]); |
The command’s aggregation includes multiple stages. The first stage ($match
) limits the documents to those with a reading
value greater than 120
. The stages that follow then extract and parse the date from the timestamp
value and group the documents by each date. Next, the aggregation finds the average pressure reading for each date. In this case, the documents include only two dates, so the results include only the following two averages:
1 2 3 4 5 6 7 8 |
{ date: '2024/12/1', avgReading: 130.25 } { date: '2024/12/2', avgReading: 129.17 } |
If this were a larger data set, there would likely be more dates, but those included here should be enough to demonstrate how the aggregation works with a time series collection. It also shows how you can interact with the collection just like a conventional collection.
Configuring the granularity on a time series collection
As I pointed out earlier, MongoDB groups the documents in a time series collection into multiple buckets. Documents within a bucket share the same metaField
value. In addition, their timeField
values fall within a defined range, such as occurring within the same hour or same day. The exact interval of time is determined by the collection’s granularity.
By default, the granularity is set to seconds
, which means the documents within a particular bucket have a timestamp within the same hour. However, you can set the granularity at larger time spans.
MongoDB provides two methods for configuring the granularity. The first is to use the granularity
parameter, and the second is to use the set of custom parameters. Let’s start with the granularity
parameter, which takes one of the following three values:
seconds
(default). A bucket can contain up to one hour’s worth of events.minutes
. A bucket can contain up to 24 hours’ worth of events.hours
. A bucket can contain up to 30 days’ worth of events.
For example, if the granularity
parameter is set to minutes
, the same bucket can include documents with the timestamps 2024-12-10T12:09:43Z
, 2024-12-10T14:09:43Z
, and 2024-12-10T16:09:43Z
, but not 2024-12-11T12:09:43Z
, which is one day later.
You can set the granularity
parameter either by including it when creating the collection or when modifying the collection’s definition. For example, if you were creating the pressure
collection for the first time, you could use the following createCollection
command, which sets the granularity
parameter to minutes
:
1 2 3 4 5 6 7 8 9 10 |
db.createCollection( "pressure", { timeseries: { timeField: "timestamp", metaField: "source", granularity: <strong>"</strong>minutes<strong>"</strong> } } ); |
Because the pressure
collection already exists, you’ll likely want to modify the collection definition rather than re-create it. To update the definition, use the runCommand
database method to call the collMod
database command, as shown in the following example:
1 2 3 4 |
db.runCommand({ collMod: "pressure", timeseries: { granularity: "minutes" } }); |
The command modifies the timeseries
element by setting the granularity
parameter to minutes
. Nothing else will change in the collection’s definition.
In this case, modifying the collection definition is no problem. However, MongoDB places limits on how you can modify a time series collection. For example, when you modify the granularity, you cannot go from a longer timespan to a shorter one. In other words, you can move from seconds
to minutes
, but not from minutes
to seconds
.
MongoDB also places limitations on the metaField
parameter. You cannot change the parameter’s value to a different field. However, if the field is an object (embedded document), you can add subfields to the documents.
In some cases, you might want to have more control over your collection’s granularity than what is provided through the granularity
parameter. You can achieve this by using the following two custom parameters, rather than the granularity
parameter:
bucketMaxSpanSeconds
. Specifies the maximum time between timestamps for documents in the same bucket.bucketRoundingSeconds
. Sets the minimum time for a new bucket by rounding down the document’s timestamp.
The two parameters were introduced in MongoDB 6.3. Each one takes a single argument, the number of seconds. In addition, the two arguments must be the same value. For example, if you were creating the pressure
collection from scratch, you might use the following createCollection
command:
1 2 3 4 5 6 7 8 9 10 11 |
db.createCollection( "pressure", { timeseries: { timeField: "timestamp", metaField: "source", bucketRoundingSeconds: 86400, bucketMaxSpanSeconds: 86400 } } ); |
The command sets the value of each customer parameter to 86400
seconds (24 hours). However, given that the collection already exists, you’ll likely want to use the runCommand
method instead, as in the following example:
1 2 3 4 5 6 7 |
db.runCommand({ collMod: "pressure", timeseries: { bucketRoundingSeconds: 86400, bucketMaxSpanSeconds: 86400 } }); |
If you use the custom parameters when updating your collection’s granularity, you must adhere to the same restrictions as with the granularity
parameter. That is, you cannot move from a longer timespan to a shorter one, although you can move from a shorter one to a longer one.
Enable automatic removal on a collection
By default, MongoDB retains the documents in a time series collection until they’ve been manually deleted, similar to a conventional collection. However, you can override this behavior by including the expireAfterSeconds
parameter in your collection definition. The parameter lets you specify the number of seconds that the documents should be retained before MongoDB automatically deletes them.
You can specify the expireAfterSeconds
parameter either when you create the collection or when updating the collection definition. Just to be complete, I’ll first show you how to incorporate the option in your initial collection definition:
1 2 3 4 5 6 7 8 9 10 11 12 |
db.createCollection( "pressure", { timeseries: { timeField: "timestamp", metaField: "source", bucketRoundingSeconds: 86400, bucketMaxSpanSeconds: 86400 }, expireAfterSeconds: 43200 } ); |
The command specifies that the collection’s documents should be removed after 43,200 seconds (12 hours). When a document’s timestamp exceeds the specified threshold, MongoDB automatically deletes it from the database. The deletion might not occur immediately, but the document will eventually be removed. If all the documents in a bucket are deleted, MongoDB removes the bucket as well.
Once again, because the pressure
collection already exists, you’ll likely want to use the runCommand
method and collMod
command to update the collection definition. For example, the following command enables automatic removal in the existing collection:
1 2 3 4 |
db.runCommand({ collMod: "pressure", expireAfterSeconds: 43200 }); |
There might be times when you want to disable automatic deletion on a time series collection. You can again use the runCommand
method to update the collection definition. Only this time, you should set the expireAfterSeconds
parameter to off
, as in the following example:
1 2 3 4 |
db.runCommand({ collMod: "pressure", expireAfterSeconds: "off" }); |
After you run the command, the documents will no longer be automatically deleted from the pressure
collection, although you can reenable automatic removal at any time.
Getting started with time series collection in MongoDB
A time series collection can be a handy tool for supporting workloads that record events at specific points in time, whether they’re generated by stock trades, weather stations, industrial IoT sensors, smart devices in people’s homes, or other places and systems. As handy as they are, however, time series collections are not suited to documents that are subject to frequent updates, such as those used in transactional processing.
If you’re supporting workloads that can benefit from a time series collection, you should invest the time in learning how they work and how best to optimize them, particularly when it comes to setting the collection’s granularity. You can find more details about time series collections in the MongoDB topic Time Series, which provides an in-depth look at how the collections work and how to get the most out of them.
Load comments