This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.
MongoDB is a document database. As such, the data is stored as individual documents. A document is a data structure made up of one or more field/value pairs. Nearly everything you do in MongoDB is either directly or indirectly related to the documents that you store in a database or move in and out of a database. The better you understand how documents work, the more effectively you can write queries and manage the data.
In my previous article, which was the first in this series, I introduced you to MongoDB and described how to get started with MongoDB Atlas and MongoDB Compass. Atlas provides a cloud-based database service comparable to on-premises MongoDB, and Compass serves as a client interface for connecting to MongoDB and working with document data.
As part of this discussion, I also covered some of the basics of MongoDB documents, demonstrating how to create them and add them to your database. In this article, I expand on this discussion to give you a better sense of how documents are constructed and the different types of data they can contain. To help with this process, I provide several examples that demonstrate some of the ways you can define documents to meet your specific business needs.
Note: For the examples in this article, I used the same MongoDB Atlas and MongoDB Compass environments I set up for the first article. If you want to try out these examples and are uncertain how to connect to Atlas, refer to the first article for more information.
MongoDB document basics
In MongoDB, a document is made up of one or more of more field/value pairs that are enclosed in a set of curly brackets and separated by commas. A field might contain a single value or contain multiple values, as in the case of arrays or embedded documents. Each value within a document is defined with a specific data type that determines how MongoDB handles the data.
A MongoDB document is always part of a collection, which in turn is always part of a database. A database can contain multiple collections, and each collection can contain multiple documents.
This structure is similar to a relational database, in which data is organized into tables and rows A table is comparable to a MongoDB collection, and a row is comparable to a MongoDB document. However, documents in a MongoDB collection do not have to conform to the strict schema restrictions that are imposed on rows in a table, offering far greater flexibility when storing data.
MongoDB stores documents as Binary JSON (BSON), which is a binary representation of JSON documents that extends the number of supported data types. Overall, the document structure itself is fairly straightforward, consisting of one or more field/value pairs, as shown in the following syntax:
1 2 3 4 5 6 7 |
{ field1: value1, field2: value2, field3: value3, ... fieldN: valueN } |
Although this structure itself offers a great deal of flexibility, MongoDB still imposes multiple limitations on each document. For example, the maximize size of a document is 16 MB, and there can be no more than 100 nested levels (arrays and embedded documents).
In addition, a document cannot include duplicate field names. If you add a document with duplicate field names, MongoDB will sometimes drop one of the duplicate fields without returning an error or giving you any indication that something is wrong.
Sometimes you might not realize you ran up against a limitation until you try to insert a document into a collection and receive an error. Unfortunately, the error message you receive might not provide you with any useful information, leaving you to do a lot of trial-and-error to pinpoint the problem. In some cases, however, you can find an answer in the MongoDB documentation, starting with the topic https://www.mongodb.com/docs/manual/reference/limits/.
With that in mind, consider the following example, which defines a basic document that contains only three fields:
1 2 3 4 5 |
{ "name": "Drew", "grade": 6, "reviews": 9.4 } |
This document includes the following field/value pairs, with each field containing only a single value:
- The
name
field has a value ofDrew
, which is defined with theString
data type. This is usually the most common data type used in MongoDB. It is based on the UTF-8 Unicode encoding standard. - The
grade
field has a value of6
, which is defined with theInt32
data type. The data type is used for 32-bit integer values. - The
reviews
field has a value of9.4
, which is defined with theDouble
data type. This data type is used for floating point numbers, which means they can contain decimal points for greater precision.
That’s all there is to defining a basic document. When you insert a document like this into a collection, MongoDB automatically determines the data type for each field based on its value and whether the value is enclosed in quotation marks. As a general rule, MongoDB interprets quoted values as string values and non-quoted values as numeric values, although it’s possible to specify a specific data type when inserting the document, as you’ll see in the next section.
Adding the _id field to a document
Each document in a collection must include an _id
field that serves as the document’s primary key. The field’s value is immutable and must be unique within the collection. If you try to add a document to a collection that contains the same _id
value as an existing document, MongoDB will return an error, without inserting the document into the collection.
When you add a document to a collection, you can choose to specifically include the _id
field or you can let MongoDB add the field for you. If MongoDB adds the field, it will contain a 12-byte hexadecimal value (24 characters), and the value will be defined with the ObjectId
data type. For example, the value might look something like 653fcc59b4121d2fe701df04
.
If you choose to add the _id
field yourself, you can include it in your document definition just like any other field. This approach provides you with more control over your documents. For example, you might be importing documents and want to preserve their original primary keys. The following example shows a document definition that includes the _id
field:
1 2 3 4 5 6 |
{ "_id": 1001, "name": "Drew", "grade": 6, "reviews": 9.4 } |
In this case, the _id
value is added as an integer, but you can specify any type of value except an array, as long as the value is unique within the collection. You can even specify a 12-byte hexadecimal value like the type that MongoDB automatically generates, as in the following document definition.
1 2 3 4 5 6 |
{ "_id": "653d35cdcea93f2aea8abbce", "name": "Drew", "grade": 6, "reviews": 9.4 } |
In this example, the _id
value as defined with the String
data type, rather than the ObjectId
data type. However, the ObjectId
data type offers a couple advantages over String
, in part because an auto-generated ObjectId
value incorporates a timestamp that reflects when the value was generated. For example, you can use the ObjectId.getTimestamp()
method to extract the creation time of the ObjectId
value, and you can sort your documents by their ObjectId
values, which is roughly equivalent to sorting by creation times.
To save a 12-byte hexadecimal value as an ObjectId
value, you must instruct MongoDB to assign that data type. Some MongoDB examples indicate that you should use the ObjectId
constructor to specify the ObjectId
data type, as in the following example:
1 2 3 4 5 6 |
{ "_id": ObjectId("653d35cdcea93f2aea8abbce"), "name": "Drew", "grade": 6, "reviews": 9.4 } |
However, if you try to use the ObjectId
constructor in Compass when adding a document, the Insert Document dialog box will display the error shown in the following figure.
The issue here is that MongoDB clients vary in their requirements when communicating with a MongoDB database. The Compass interface relies on MongoDB Extended JSON, which refers to the extensions MongoDB adds to the JSON format. Because Compass runs in strict mode, as it relates to Extended JSON, certain statement elements behave differently than they might with other clients. For example, Compass and the MongoDB Shell take different approaches to specifying a field’s data type when inserting a document.
A good example of this is the ObjectId
constructor. Because Compass run in strict mode, you cannot use the ObjectId
constructor. Instead, you must use the $oid
operator, as shown in the following example:
1 2 3 4 5 6 |
{ "_id": { "$oid": "653d35cdcea93f2aea8abbce" }, "name": "Avery", "grade": 6, "reviews": 9.4 } |
When using the $oid
operator, you must define the _id
value as an embedded document that includes one field/value pair, with the $oid
operator used for the field name and the hexadecimal string for its value. MongoDB will then define the _id
value with the ObjectId
data type.
Adding a data field to a document
When using Compass or another client that requires strict adherence to Extended JSON, you’ll likely run into other situations similar to the ObjectId
constructor, in which case, you’ll need to determine the best way to specify a data type.
For example, your documents might include one or more fields that contain date values. You can, of course, add a date as a string value, as in the following document definition:
1 2 3 4 5 6 7 |
{ "_id": { "$oid": "653d35cdcea93f2aea8abbce" }, "name": "Avery", "grade": 6, "reviews": 9.4, "hire_date": "May 13, 2016" } |
However, a string value cannot take advantage of the methods and properties available to the Date
data type, such as being able to easily retrieve parts of a date, like year or month. In addition, when you save a date such as May 13, 2016 as a Date
object, MongoDB automatically converts it to the UTC datetime format. For example, when I added May 13, 2016 as a Date
object on my system, MongoDB saved the value as 2016-05-13T07:00:00.000+00:00
.
As with the ObjectId
constructor, MongoDB provides the new
Date
constructor for specifying that a value should be defined with the Date
data type, as shown in the following example:
1 2 3 4 5 6 7 |
{ "_id": { "$oid": "653d35cdcea93f2aea8abbce" }, "name": "Avery", "grade": 6, "reviews": 9.4, "hire_date": new Date("May 13, 2016") } |
Although you’ll see plenty of examples that use this constructor, it won’t work in Compass because you’ll run up against the same issue we ran into with the ObjectId
constructor. As a result, you’ll need to recast the field definition, only this time, specifying the $date
operator:
1 2 3 4 5 6 7 |
{ "_id": { "$oid": "653d35cdcea93f2aea8abbce" }, "name": "Avery", "grade": 6, "reviews": 9.4, "hire_date": { "$date": "May 13, 2016" } } |
Now MongoDB will accept this document with no problem, although you might run into this issue with other data types, depending on your document definitions. Fortunately, extended JSON also supports other operators for defining data types when you want to control your data type assignments. For more information, check out the MongoDB topic MongoDB Extended JSON (v2).
Embedding fields in other fields
MongoDB also supports a variety of other data types, in addition to what we’ve already looked at here. Two of the most valuable data types are Array
and Object
. Both of these let you embed multiple values in a field, making it possible to create additional data layers within your document structure.
The best way to understand the Array
and Object
data types is to see them in action. The following document builds on the previous example by adding a field whose value is defined with an Object
data type and a field whose value is defined with an Array
type:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "_id": { "$oid": "653d35cdcea93f2aea8abbce" }, "name": "Drew", "position": { "department": "R&D", "title": "Senior Developer", "grade": 6, "reviews": 9.4 }, "hire_date": { "$date": "May 13, 2016" }, "current_emp": true, "skills": [ "Java", "SQL", "Python", "PHP" ] } |
The position
field is the first of the new fields. Its value is an embedded document, which means it will be defined with the Object
data type when you add the document to a collection. As such, the value’s elements (field/value pairs) are enclosed in curly braces and separated by commas, just like a top-level document. In this case, the embedded document includes four field/value pairs:
- The
department
value will be defined with theString
data type. - The
title
value will be defined with theString
data type. - The
grade
value will be defined with theInt32
data type. - The
reviews
value will be defined with theDouble
data type.
Together, these four fields are combined into a single document to form the position
value. Notice that you can mix data types as necessary, just like a top-level document that you would insert into a collection.
The document in the example above also includes the skills
field, which is defined with an Array
data type. An Array
value is simply a set of values enclosed in square brackets and separated with commas. In this case, the values are Java
, SQL
, Python
, and PHP
, all of which are String
values. However, an array can also include values that are defined with different data types.
You might have noticed that I added the current_emp
field to the document as well. This field’s value is defined with the Boolean
data type, which can take a value of either true
or false
(not enclosed in quotes). I included the field here only to point out another available data type.
In fact, MongoDB supports many data types. As we progress through this series, I hope to introduce some of those types so you have a more complete picture of the various types of data that MongoDB supports. In the meantime, you can find a complete list of data types in the MongoDB topic BSON Types.
Embedding documents in an array
As you saw in the previous section, the Object
and Array
data types provide you with a great deal of flexibility when creating your documents. But they don’t stop there. MongoDB also lets you embed one type of value within the other type. For example, you can create an Array
field whose values are made up of multiple embedded documents, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
{ "_id": { "$oid": "653d35cdcea93f2aea8abbce" }, "name": "Drew", "position": { "department": "R&D", "title": "Senior Developer", "grade": 6, "reviews": 9.4 }, "hire_date": { "$date": "May 13, 2016" }, "current_emp": true, "education": [ { "school": "MIT", "degree": "bachelor's", "major": "software engineering", "gpa": 3.78 }, { "school": "UC Berkeley", "degree": "master's", "major": "computer science", "gpa": 3.89 } ], "skills": [ "Java", "SQL", "Python", "PHP" ] } |
The document builds on the previous example by adding the education
field, which is defined with the Array
data type. The array contains two Object
values, each one an embedded document that describes some aspect of Drew’s education.
Each document in the education
field contains the same three String
fields and one Double
field. However, you could have defined the documents with different fields or with the same fields and different types, including additional arrays or embedded documents, making it possible to create documents with multiple nested layers. As noted earlier, however, you are limited to 100 nested levels, although this still gives you an incredible amount of flexibility over how you define your documents.
With the addition of the education
field, the document now contains many of the elements you’ll likely run up against when working with MongoDB data. Although this document by no means represents everything you can do in MongoDB, it gives you a good sense of a document’s flexibility—as well as its potential complexity. Even so, the document is still only a set of field/value pairs.
To help better understand the document above, the following table provides a breakdown of all the fields and values that it now contains.
Field | Value | Data type | Type alias | Type
number |
_id | 653d35cdcea93f2aea8abbce | ObjectId | objectid | 7 |
name | Drew | String | string | 2 |
position | {embedded document with 3 elements} | Object | object | 3 |
position.department | R&D | String | string | 2 |
position.title | Senior Developer | String | string | 2 |
position.grade | 6 | Int32 | int | 16 |
position.reviews | 9.4 | Double | double | 1 |
hire_date | 2016-05-13T07:00:00.000+00:00 | Date | date | 9 |
current_emp | true | Boolean | bool | 8 |
education | [array with 2 embedded documents] | Array | array | 4 |
education[0] | {embedded document with 4 elements} | Object | object | 3 |
education[0].school | MIT | String | string | 2 |
education[0].degree | bachelor’s | String | string | 2 |
education[0].major | software engineering | String | string | 2 |
education[0].gpa | 3.78 | Double | double | 1 |
education[1] | {embedded document with 4 elements} | Object | object | 3 |
education[1].school | UC Berkeley | String | string | 2 |
education[1].degree | master’s | String | string | 2 |
education[1].major | computer science | String | string | 2 |
education[1].gpa | 3.89 | Double | double | 1 |
skills | [array with 4 elements] | Array | array | 4 |
skills[0] | Java | String | string | 2 |
skills[1] | SQL | String | string | 2 |
skills[2] | Python | String | string | 2 |
skills[3] | PHP | String | string | 2 |
The table also shows the official name, alias and numeric identifier that MongoDB assigns to each of the document’s data types. You might see the aliases or numeric identifiers used in code examples or in other places, so I thought it would be useful to include them here.
Notice that the array fields shown in the table are also identified by their index numbers. MongoDB automatically assigns these number to the array elements, using 0-based indexing to identify each element. For example, the skills
array includes four elements, which are numbered 0 through 3. In this way, each element can be easily referenced when querying the document (a topic I’ll be discussing later in the series).
Adding the document to a collection
In the previous article in these series, I explained how to add a document to collection in Compass. First, you select the collection in the left panel and then, in the main window, click the Add Data drop-down arrow and click Insert document. In the Insert Document dialog box, delete the existing text, type or paste the document code, and then click Insert.
For this article, I created a database named hr
and a collection named employees
. I then added the document in the previous example (in the previous section entitled “Embedding Documents into an Array”) to that collection. The following figure shows the document in List View, with the document fully expanded. To expand a document, hover over the document until the Expand all down-arrow appears near the top left corner (to the left of the _id
field) and then click the arrow.
Notice that Compass shows the data type for the Array
fields and Object
fields. For the Array
data type, Compass also displays the number of elements within the array. For example, Compass shows that the education
array contains two elements, which are the two embedded documents.
List View does not show the data types for the other fields, although it shows the ObjectId
constructor preceding with the _id
value, indicating that the value is defined with the ObjectId
data type.
Note: List View will show all the data types if you double-click one of the document’s elements as though you were going to edit that value, something I’ll be discussing later in the series.
You can also see the data types for all the fields by viewing the document in Table View. When you first switch to Table View, Compass displays the top level fields and their data types, as shown in the following figure.
You can view the embedded fields and their data types by drilling down into the specific Array
or Object
value. For example, to view details about the embedded document in the position
field, hover over the position
value until the edit button appears, and then click that button. Compass will display the document’s four fields, their values, and their data types, as shown in the following figure.
After you finish viewing information about the embedded fields, you can move back to the document’s top level by clicking the employees portion of breadcrumb near the top left corner of the grid.
If you want to drill into the education
field, you can take the same approach as with the position
field. Hover over the field’s value and then click the edit button. This will move you down one level, which is shown in the following figure.
Of course, this still doesn’t show you the actual fields because you need to drill down into the individual document. For example, if you hover over the education[0]
value in the grid—{}
4
fields
—and click the edit button, you’ll be able to view the embedded fields, as shown in the following figure.
Once again, you’re able to view each field name, value, and data type. You can take the same approach with the skills
array, which is shown in the next figure. As before, this view provides you with more details about the array’s values.
When working with MongoDB documents in Compass, you should have a good sense of how to find your way around the documents so you understand how they’re structured and what data types have been assigned to the field values. This can make it easier for you when you’re building your queries and you need to know how to find exactly what you’re looking for.
Getting started with MongoDB documents
Just about everything you do in MongoDB revolves around the stored documents. At its highest level, the document structure is quite basic. Each document is made up of one or more field/value pairs, separated by commas, with everything enclosed in curly brackets. However, a document can quickly become a complex structure when you start adding arrays and embedded documents, especially if you embed even more arrays and documents in the embedded fields. As this series progresses, we’ll be spending a lot of time on how to query these documents, but know that everything you do starts with having a strong foundation in the document structure.
Load comments