Introducing the MongoDB Document

This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.

MongoDB is a document database. As such, the data is stored as individual documents. A document is a data structure made up of one or more field/value pairs. Nearly everything you do in MongoDB is either directly or indirectly related to the documents that you store in a database or move in and out of a database. The better you understand how documents work, the more effectively you can write queries and manage the data.

In my previous article, which was the first in this series, I introduced you to MongoDB and described how to get started with MongoDB Atlas and MongoDB Compass. Atlas provides a cloud-based database service comparable to on-premises MongoDB, and Compass serves as a client interface for connecting to MongoDB and working with document data.

As part of this discussion, I also covered some of the basics of MongoDB documents, demonstrating how to create them and add them to your database. In this article, I expand on this discussion to give you a better sense of how documents are constructed and the different types of data they can contain. To help with this process, I provide several examples that demonstrate some of the ways you can define documents to meet your specific business needs.

Note: For the examples in this article, I used the same MongoDB Atlas and MongoDB Compass environments I set up for the first article. If you want to try out these examples and are uncertain how to connect to Atlas, refer to the first article for more information.

MongoDB document basics

In MongoDB, a document is made up of one or more of more field/value pairs that are enclosed in a set of curly brackets and separated by commas. A field might contain a single value or contain multiple values, as in the case of arrays or embedded documents. Each value within a document is defined with a specific data type that determines how MongoDB handles the data.

A MongoDB document is always part of a collection, which in turn is always part of a database. A database can contain multiple collections, and each collection can contain multiple documents.

This structure is similar to a relational database, in which data is organized into tables and rows A table is comparable to a MongoDB collection, and a row is comparable to a MongoDB document. However, documents in a MongoDB collection do not have to conform to the strict schema restrictions that are imposed on rows in a table, offering far greater flexibility when storing data.

MongoDB stores documents as Binary JSON (BSON), which is a binary representation of JSON documents that extends the number of supported data types. Overall, the document structure itself is fairly straightforward, consisting of one or more field/value pairs, as shown in the following syntax:

Although this structure itself offers a great deal of flexibility, MongoDB still imposes multiple limitations on each document. For example, the maximize size of a document is 16 MB, and there can be no more than 100 nested levels (arrays and embedded documents).

In addition, a document cannot include duplicate field names. If you add a document with duplicate field names, MongoDB will sometimes drop one of the duplicate fields without returning an error or giving you any indication that something is wrong.

Sometimes you might not realize you ran up against a limitation until you try to insert a document into a collection and receive an error. Unfortunately, the error message you receive might not provide you with any useful information, leaving you to do a lot of trial-and-error to pinpoint the problem. In some cases, however, you can find an answer in the MongoDB documentation, starting with the topic https://www.mongodb.com/docs/manual/reference/limits/.

With that in mind, consider the following example, which defines a basic document that contains only three fields:

This document includes the following field/value pairs, with each field containing only a single value:

  • The name field has a value of Drew, which is defined with the String data type. This is usually the most common data type used in MongoDB. It is based on the UTF-8 Unicode encoding standard.
  • The grade field has a value of 6, which is defined with the Int32 data type. The data type is used for 32-bit integer values.
  • The reviews field has a value of 9.4, which is defined with the Double data type. This data type is used for floating point numbers, which means they can contain decimal points for greater precision.

That’s all there is to defining a basic document. When you insert a document like this into a collection, MongoDB automatically determines the data type for each field based on its value and whether the value is enclosed in quotation marks. As a general rule, MongoDB interprets quoted values as string values and non-quoted values as numeric values, although it’s possible to specify a specific data type when inserting the document, as you’ll see in the next section.

Adding the _id field to a document

Each document in a collection must include an _id field that serves as the document’s primary key. The field’s value is immutable and must be unique within the collection. If you try to add a document to a collection that contains the same _id value as an existing document, MongoDB will return an error, without inserting the document into the collection.

When you add a document to a collection, you can choose to specifically include the _id field or you can let MongoDB add the field for you. If MongoDB adds the field, it will contain a 12-byte hexadecimal value (24 characters), and the value will be defined with the ObjectId data type. For example, the value might look something like 653fcc59b4121d2fe701df04.

If you choose to add the _id field yourself, you can include it in your document definition just like any other field. This approach provides you with more control over your documents. For example, you might be importing documents and want to preserve their original primary keys. The following example shows a document definition that includes the _id field:

In this case, the _id value is added as an integer, but you can specify any type of value except an array, as long as the value is unique within the collection. You can even specify a 12-byte hexadecimal value like the type that MongoDB automatically generates, as in the following document definition.

In this example, the _id value as defined with the String data type, rather than the ObjectId data type. However, the ObjectId data type offers a couple advantages over String, in part because an auto-generated ObjectId value incorporates a timestamp that reflects when the value was generated. For example, you can use the ObjectId.getTimestamp() method to extract the creation time of the ObjectId value, and you can sort your documents by their ObjectId values, which is roughly equivalent to sorting by creation times.

To save a 12-byte hexadecimal value as an ObjectId value, you must instruct MongoDB to assign that data type. Some MongoDB examples indicate that you should use the ObjectId constructor to specify the ObjectId data type, as in the following example:

However, if you try to use the ObjectId constructor in Compass when adding a document, the Insert Document dialog box will display the error shown in the following figure.

The issue here is that MongoDB clients vary in their requirements when communicating with a MongoDB database. The Compass interface relies on MongoDB Extended JSON, which refers to the extensions MongoDB adds to the JSON format. Because Compass runs in strict mode, as it relates to Extended JSON, certain statement elements behave differently than they might with other clients. For example, Compass and the MongoDB Shell take different approaches to specifying a field’s data type when inserting a document.

A good example of this is the ObjectId constructor. Because Compass run in strict mode, you cannot use the ObjectId constructor. Instead, you must use the $oid operator, as shown in the following example:

When using the $oid operator, you must define the _id value as an embedded document that includes one field/value pair, with the $oid operator used for the field name and the hexadecimal string for its value. MongoDB will then define the _id value with the ObjectId data type.

Adding a data field to a document

When using Compass or another client that requires strict adherence to Extended JSON, you’ll likely run into other situations similar to the ObjectId constructor, in which case, you’ll need to determine the best way to specify a data type.

For example, your documents might include one or more fields that contain date values. You can, of course, add a date as a string value, as in the following document definition:

However, a string value cannot take advantage of the methods and properties available to the Date data type, such as being able to easily retrieve parts of a date, like year or month. In addition, when you save a date such as May 13, 2016 as a Date object, MongoDB automatically converts it to the UTC datetime format. For example, when I added May 13, 2016 as a Date object on my system, MongoDB saved the value as 2016-05-13T07:00:00.000+00:00.

As with the ObjectId constructor, MongoDB provides the new Date constructor for specifying that a value should be defined with the Date data type, as shown in the following example:

Although you’ll see plenty of examples that use this constructor, it won’t work in Compass because you’ll run up against the same issue we ran into with the ObjectId constructor. As a result, you’ll need to recast the field definition, only this time, specifying the $date operator:

Now MongoDB will accept this document with no problem, although you might run into this issue with other data types, depending on your document definitions. Fortunately, extended JSON also supports other operators for defining data types when you want to control your data type assignments. For more information, check out the MongoDB topic MongoDB Extended JSON (v2).

Embedding fields in other fields

MongoDB also supports a variety of other data types, in addition to what we’ve already looked at here. Two of the most valuable data types are Array and Object. Both of these let you embed multiple values in a field, making it possible to create additional data layers within your document structure.

The best way to understand the Array and Object data types is to see them in action. The following document builds on the previous example by adding a field whose value is defined with an Object data type and a field whose value is defined with an Array type:

The position field is the first of the new fields. Its value is an embedded document, which means it will be defined with the Object data type when you add the document to a collection. As such, the value’s elements (field/value pairs) are enclosed in curly braces and separated by commas, just like a top-level document. In this case, the embedded document includes four field/value pairs:

  • The department value will be defined with the String data type.
  • The title value will be defined with the String data type.
  • The grade value will be defined with the Int32 data type.
  • The reviews value will be defined with the Double data type.

Together, these four fields are combined into a single document to form the position value. Notice that you can mix data types as necessary, just like a top-level document that you would insert into a collection.

The document in the example above also includes the skills field, which is defined with an Array data type. An Array value is simply a set of values enclosed in square brackets and separated with commas. In this case, the values are Java, SQL, Python, and PHP, all of which are String values. However, an array can also include values that are defined with different data types.

You might have noticed that I added the current_emp field to the document as well. This field’s value is defined with the Boolean data type, which can take a value of either true or false (not enclosed in quotes). I included the field here only to point out another available data type.

In fact, MongoDB supports many data types. As we progress through this series, I hope to introduce some of those types so you have a more complete picture of the various types of data that MongoDB supports. In the meantime, you can find a complete list of data types in the MongoDB topic BSON Types.

Embedding documents in an array

As you saw in the previous section, the Object and Array data types provide you with a great deal of flexibility when creating your documents. But they don’t stop there. MongoDB also lets you embed one type of value within the other type. For example, you can create an Array field whose values are made up of multiple embedded documents, as shown in the following example:

The document builds on the previous example by adding the education field, which is defined with the Array data type. The array contains two Object values, each one an embedded document that describes some aspect of Drew’s education.

Each document in the education field contains the same three String fields and one Double field. However, you could have defined the documents with different fields or with the same fields and different types, including additional arrays or embedded documents, making it possible to create documents with multiple nested layers. As noted earlier, however, you are limited to 100 nested levels, although this still gives you an incredible amount of flexibility over how you define your documents.

With the addition of the education field, the document now contains many of the elements you’ll likely run up against when working with MongoDB data. Although this document by no means represents everything you can do in MongoDB, it gives you a good sense of a document’s flexibility—as well as its potential complexity. Even so, the document is still only a set of field/value pairs.

To help better understand the document above, the following table provides a breakdown of all the fields and values that it now contains.

Field

Value

Data type

Type alias

Type

number

_id

653d35cdcea93f2aea8abbce

ObjectId

objectid

7

name

Drew

String

string

2

position

{embedded document with 3 elements}

Object

object

3

position.department

R&D

String

string

2

position.title

Senior Developer

String

string

2

position.grade

6

Int32

int

16

position.reviews

9.4

Double

double

1

hire_date

2016-05-13T07:00:00.000+00:00

Date

date

9

current_emp

true

Boolean

bool

8

education

[array with 2 embedded documents]

Array

array

4

education[0]

{embedded document with 4 elements}

Object

object

3

education[0].school

MIT

String

string

2

education[0].degree

bachelor’s

String

string

2

education[0].major

software engineering

String

string

2

education[0].gpa

3.78

Double

double

1

education[1]

{embedded document with 4 elements}

Object

object

3

education[1].school

UC Berkeley

String

string

2

education[1].degree

master’s

String

string

2

education[1].major

computer science

String

string

2

education[1].gpa

3.89

Double

double

1

skills

[array with 4 elements]

Array

array

4

skills[0]

Java

String

string

2

skills[1]

SQL

String

string

2

skills[2]

Python

String

string

2

skills[3]

PHP

String

string

2

The table also shows the official name, alias and numeric identifier that MongoDB assigns to each of the document’s data types. You might see the aliases or numeric identifiers used in code examples or in other places, so I thought it would be useful to include them here.

Notice that the array fields shown in the table are also identified by their index numbers. MongoDB automatically assigns these number to the array elements, using 0-based indexing to identify each element. For example, the skills array includes four elements, which are numbered 0 through 3. In this way, each element can be easily referenced when querying the document (a topic I’ll be discussing later in the series).

Adding the document to a collection

In the previous article in these series, I explained how to add a document to collection in Compass. First, you select the collection in the left panel and then, in the main window, click the Add Data drop-down arrow and click Insert document. In the Insert Document dialog box, delete the existing text, type or paste the document code, and then click Insert.

For this article, I created a database named hr and a collection named employees. I then added the document in the previous example (in the previous section entitled “Embedding Documents into an Array”) to that collection. The following figure shows the document in List View, with the document fully expanded. To expand a document, hover over the document until the Expand all down-arrow appears near the top left corner (to the left of the _id field) and then click the arrow.

Notice that Compass shows the data type for the Array fields and Object fields. For the Array data type, Compass also displays the number of elements within the array. For example, Compass shows that the education array contains two elements, which are the two embedded documents.

List View does not show the data types for the other fields, although it shows the ObjectId constructor preceding with the _id value, indicating that the value is defined with the ObjectId data type.

Note: List View will show all the data types if you double-click one of the document’s elements as though you were going to edit that value, something I’ll be discussing later in the series.

You can also see the data types for all the fields by viewing the document in Table View. When you first switch to Table View, Compass displays the top level fields and their data types, as shown in the following figure.

You can view the embedded fields and their data types by drilling down into the specific Array or Object value. For example, to view details about the embedded document in the position field, hover over the position value until the edit button appears, and then click that button. Compass will display the document’s four fields, their values, and their data types, as shown in the following figure.

After you finish viewing information about the embedded fields, you can move back to the document’s top level by clicking the employees portion of breadcrumb near the top left corner of the grid.

If you want to drill into the education field, you can take the same approach as with the position field. Hover over the field’s value and then click the edit button. This will move you down one level, which is shown in the following figure.

Of course, this still doesn’t show you the actual fields because you need to drill down into the individual document. For example, if you hover over the education[0] value in the grid—{} 4 fields—and click the edit button, you’ll be able to view the embedded fields, as shown in the following figure.

Once again, you’re able to view each field name, value, and data type. You can take the same approach with the skills array, which is shown in the next figure. As before, this view provides you with more details about the array’s values.

When working with MongoDB documents in Compass, you should have a good sense of how to find your way around the documents so you understand how they’re structured and what data types have been assigned to the field values. This can make it easier for you when you’re building your queries and you need to know how to find exactly what you’re looking for.

Getting started with MongoDB documents

Just about everything you do in MongoDB revolves around the stored documents. At its highest level, the document structure is quite basic. Each document is made up of one or more field/value pairs, separated by commas, with everything enclosed in curly brackets. However, a document can quickly become a complex structure when you start adding arrays and embedded documents, especially if you embed even more arrays and documents in the embedded fields. As this series progresses, we’ll be spending a lot of time on how to query these documents, but know that everything you do starts with having a strong foundation in the document structure.