Microsoft Azure DocumentDB

DocumentDB is a late-entrant in the Document-oriented database field. However, it benefits from being designed from the start as a cloud service with a SQL-like language. It is intended for mobile and web applications. Its JSON document-notation is compatible with the integrated JavaScript language that drives its multi-document transaction processing via stored procedures, triggers and UDFs.

In April 2015, Microsoft made Azure DocumentDB generally available to the public, nearly eight months after launching the preview, and over six years after the release of MongoDB, the world’s most popular NoSQL database management system. Like MongoDB, DocumentDB stores data as uniquely identifiable documents, but unlike MongoDB, DocumentDB was built from the ground up to run as a cloud service, providing a platform for creating NoSQL databases, populating them with JSON documents, and querying those documents with a SQL-like language optimized to work with JSON data.

DocumentDB organizes documents into collections, with each database capable of hosting one or more collection. Because DocumentDB is a cloud service, it offers quick and easy implementations, while delivering the flexibility and scalability necessary to meet the demands of todays web and mobile applications.

DocumentDB integrates JSON and JavaScript right into the database engine. JSON, short for JavaScript Object Notation, is a widely implemented lightweight format for exchanging data between different source types, similar to how XML can be used to exchange data. JSON is based on a subset of the JavaScript programming language and is easy for computers to parse and generate, as well as being human readable.

To support data access in DocumentDB, Microsoft has come up with a SQL-type language for querying the JSON documents. The language is made up only of the SELECT statement, which looks very much like a T-SQL SELECT statement, except that the DocumentDB version supports only four clauses. In this article, I provide a general overview of how to get started with DocumentDB and then introduce you to DocumentDB’s SQL language. The article assumes you have some familiarity with T-SQL and JSON, both useful skills to have if you plan to work with DocumentDB.

Getting started with DocumentDB

I won’t dive too deeply into how to get started with DocumentDB because Microsoft provides a good set of instructions on its DocumentDB documentation page, but I’ll still give you some basic information about the initial setup, should you want to try out the example queries in this article. To do so, you must take the following steps (if you haven’t already taken them):

  1. Create a DocumentDB account.
  2. Create a database.
  3. Create a collection.
  4. Add the JSON documents.

Not surprisingly, you must have an Azure account before you can create the DocumentDB account. If you are new to Azure, you can take advantage of the 30-day free trial. Once that’s in place, go to the Azure portal, click New in the left pane, which opens the New pane to the right. In the New pane, click Data + Storage. When the Data + Storage pane appears, click Azure DocumentDB, as shown in the following figure.


When you click Azure DocumentDB, the DocumentDB account pane appears, where you must provide a name for the account and configure several other settings. The following figure shows the pane as I configured it for the examples in this article.


The way in which you configure the various settings depends on your subscription type, location, and whether you want to create a resource group or use an existing one. (A resource group is a container for managing a collection of Azure resources.)

Note: Microsoft documentation often refers to panes such as DocumentDB account as blades, an unfortunate naming convention if you consider how blade servers can be used in cloud computing. That said, I think we can safely assume that when you open a blade in the Azure portal, you’re not automatically adding server resources.

For this article, I named the account awacct and the new resource group awgrp. There’s a reason for this. When creating the query examples for the article, I used the CTP 3 release of SQL Server 2016 and the AdventureWorks2016CTP3 database to generate the necessary JSON documents. For that reason, I prefixed many of the names I used in these example with aw to tie them back to their Adventure Works roots. You can follow whatever convention you like.

After Azure creates the DocumentDB account, which can take several minutes, Azure will pin it to your dashboard, as shown in the following figure.


From here, you can create a DocumentDB database that is associated with the account. To do so, click the account on the dashboard to launch the DocumentDB account pane. At the top of the pane, click Add Database. This launches yet another pane, where you provide a database name. For the examples in the article, I created a database named awdb.

After you’ve created the database, the database will be listed in the DocumentDB account pane. Click the database to open the Database pane, as shown in the following figure.


JSON documents in a DocumentDB database are stored in one or more collections. Consequently, you must create at least one collection before you can add any documents.

To create a collection, click Add Collection at the top of the Database pane. In the Add Collection pane, provide a collection name. You can also choose a pricing tier and indexing policy other than the default options. Be sure to refer to the Microsoft documentation to understand the implications of each of these options.

For this article, I created a collection named hrcoll to reflect the fact that the data originates in the HumanResources schema in the AdventureWorks2016CTP3 database. Again, you can name the collection whatever you like. Once you create the collection, it will be listed on the Database pane. If you then click the collection, the Collection pane will open, providing the options necessary to add or query JSON documents, as shown in the following figure.


Once we have our collection, we can add the documents needed for the article’s example queries. Before we continue, however, note that Azure provides several methods for adding, viewing, and querying document data. For this article, we’ll be launching most of our operations from the Collection pane.

Adding documents to a DocumentDB database

As noted earlier, I generated the JSON documents in SQL Server. Although I’ll step you through the process of how I got the data, I’ve also included the output here so you can just cut and paste directly, if you don’t have SQL Server 2016 installed or you simply don’t want to mess with it. That said, we’ll start with the following T-SQL query:

I’ve set up the SELECT statement to retrieve the results in the JSON format. If you’re not familiar with how this works, refer to my last article, ” JSON support in SQL Server 2016.” Basically, I’m trying to end up with results that look like the following:

What we have here is a JSON array, which is enclosed by the outer square brackets. An array is simply an ordered collection of one or more values. In this case, the array contains only one value, an object, which is enclosed in curly braces. An object is a collection of one or more properties, with each property made up of a name/value pair.

The object inside the array contains four properties:

  • The first property is named id and has a value of ken0.
  • The second property is named Name and its value is an embedded object. This object has three properties: First, Middle, and Last, each with its own value.
  • The third property is named PII and also has an embedded object as a value. In this case, the object includes the DOB and NatID properties, each with its own value.
  • The fourth property is named JobTitle and has a value of Chief Executive Officer.

Although this is a simple example, it is enough to help demonstrate how querying works in DocumentDB. Be aware, however, that SQL Server does not output the JSON in the tidy format shown above, but instead returns the data as a single row:

Fortunately, we can use the query’s output as is in DocumentDB, except that we must remove the outer brackets before creating the document. We don’t need the outer array because our outer object is complete enough to form a JSON document. Once we get rid of the brackets, we’re good to go.

The next step, then, is to return to the Collection pane in the DocumentDB interface and click Create Document. This launches the Document pane, where you’ll find the beginnings of a document already defined, similar to the following:

Notice that this pared-down document includes the id property. All DocumentDB documents require this property. If you don’t provide it, DocumentDB automatically adds the property, assigning an auto-generated GUID as its value. Often, you’ll want to define your own ID, as we’ve done in our sample document.

To add the JSON document, delete the the default document stub, paste the output from our query into the Document pane (again, without the outer brackets), and then click Save. If all goes as planned, the document will be saved to the collection and the Document pane will now look like the one shown in following figure.


That’s all there is to adding the first document. You’ll now need to add four more documents, based on the same T-SQL query and procedure we followed here. Each time you run the query, update the BusinessEntityID value in the WHERE clause to specify the next employee, using the IDs 2, 3, 4, and 5.

If you don’t want to run your own T-SQL queries, you can instead use the following four JSON documents, created for your convenience, with the outer brackets removed:

After you have added all five documents to the collection, you can use Document Explorer to review them. In the Collection pane, scroll down if necessary and click Document Explorer. This launches the Document Explorer pane, which should show all five documents, listed by their id values. If you then click one of the documents, it will open in a separate pane, as shown in the following figure.


In this case, the document with an id value of Ken0 is displayed, but you can view any document. In addition, for each document, you can view system-generated properties by clicking the Properties button at the top of the Document pane, as shown in the following figure.


In the Properties pane, you will find five auto-generated properties:

  • _RID Unique, hierarchical identifier.
  • _ TS – Timestamp of when the document was last updated
  • _ SELF – Unique addressable URI
  • _ ETAG – Resource etag required for optimistic concurrency control
  • _ ATTACHMENTS Appears to refer to a preview feature currently limited to “experimentation and development”

These properties are added to each document you create in DocumentDB, along with the id property, if you don’t create that one yourself. Now let’s look at how to query the document data.

Querying data in a DocumentDB database

On the surface, the syntax for the DocumentDB SELECT statement looks similar to that of the SQL Server SELECT statement, at least for those elements supported in DocumentDB. The following syntax shows the clauses that make up the DocumentDB SELECT statement:

Even at this level, you can see that the DocumentDB SELECT is missing several of the clauses you find in the T-SQL SELECT, such as WITH, GROUP BY, and HAVING. You’ll also find other differences as you work with the DocumentDB SELECT, mostly because DocumentDB queries specifically target JSON data. Even so, you still get that familiar SQL structure with DocumentDB, which makes transitioning to DocumentDB a lot easier if you’re already writing T-SQL SELECT statements.

The FROM clause

You can query a DocumentDB database from within an application, or you can use Query Explorer in the Azure portal to access the document data directly. For this article, we’ll take the latter approach. That way, you can become familiar with DocumentDB regardless of the environment in which you work or the type of applications you’re creating.

To access Query Explorer, click the Query Documents button in the Collection pane. Query Explorer opens in its own pane, with the database and collection selected and the following SELECT statement created:

This statement might be confusing at first because there is nothing to indicate what the c refers to in the FROM clause. If you were to surmise that it stands for collection , you wouldn’t be far off. The c is actually an alias for the current collection, whatever that might be. We can, in fact, specify any sort of gobbledygook as the alias and we’ll get the same results:

Note that the asterisk (*) in the SELECT clause serves as a type of wildcard that indicates that all content should be returned from the source specified in the FROM clause. We’ll be covering the SELECT clause in a bit.

The job of the FROM clause is to specify our source of data. That source is either the currently selected collection or a subset of that collection (more on that shortly). We are always working within the context of a collection. Even so, if we want to query documents in that collection we must still include the FROM clause, no matter what our FROM expression states.

To folks who have been working with SQL Server or other database products, this flexible nature of the FROM clause might seem odd. What complicates matters is that Microsoft documentation is a bit confusing on this issue, giving us information such as the following:

  • “If the expression is a collection_name, then collection_name will be used as an alias.”
  • “If the expression is <collection_expression>, then property_name, then property_name will be used as an alias. If the expression is a collection_name, then collection_name will be used as an alias.”

This comes from the MSDN article ” DocumentDB SQL Query Language,” which is a single document that describes all aspects of the SELECT statement. Fortunately, we can at least glean from this information that the FROM clause can include a collection expression , which itself can be in one of five forms, as described in the following table:

Expression Description
ROOT Retrieve document from the default, currently connected collection.
collection_name Retrieve document from the specified collection, which must match the currently connected collection.
input_alias Here’s what the MSDN document says about the input alias: “Specifies that document should be retrieved from the other source defined by the provided alias.” I’m assuming this means gobbledygook.
collection_expression.property_name Retrieve document from the specified property within the currently connected collection.
collection_expression[property_name | array_index] Retrieve document from the specified property or array index within the currently connected collection.

Despite the obscure description of the input alias, it appears that we’ve already seen it in action, so let’s look at a SELECT statement that specifies the name of the hrcoll collection, the current collection used for these examples:

When we click Run Query, the SELECT statement retrieves all documents in the current collection and displays them in a new pane to the right of Query Explorer, as shown in the following figure.


If you were to scroll down in the Results pane, you would see all five documents that you added to the collection. Notice that the auto-generated properties show up with each document. In addition, you can view details about the query results at the bottom of the Query Explorer pane.

When specifying a collection in the FROM clause, you can also include an alias either with or without the AS keyword. For example, the following FROM clause defines the hr alias and includes the AS keyword:

However, you can just as easily skip AS:

Although including an alias in such a simple SELECT statement is overkill, it demonstrates the ability to assign aliases, which can be useful when adding other elements to a SELECT statement, which we’ll cover later in the article.

You can also specify a property name in the FROM clause, along with the collection name:

Now our results include only the contents of the Name property in each JSON document, as shown in the following figure:


When specifying property names in the FROM clause, you must keep the JSON hierarchy in mind. For example, the following SELECT statement returns no results because it skips the Name level of the hierarchy:

If we want to return only the first names from each document, we need to fully qualify the property reference:

When specifying properties in this way, we can also assign aliases to them, as shown in the following example:

By adding an alias, we can reference the Name property in other parts of the SELECT statement without needing to provide the fully qualified name in those places, just like you can do in T-SQL.

The SELECT clause

Now that you’re familiar with the FROM clause, let’s take a closer look at the SELECT clause, where we can refine our queries even further. The next example defines a SELECT list that returns only the id and Name properties in each document, rather than returning all content:

First off, notice that we have to qualify the name of each element in the SELECT list, even though there is only one data source. If we do not qualify the name, we will receive an error. In this case, because the FROM clause includes an alias for the collection, hr, we can use that to qualify the property names. The statement returns the following results:


As we did in the FROM clause, we can work our way down the JSON hierarchy in the SELECT list. For example, we can specify the First and Last properties:

This time, our results include only the first and last names from each document:


We can even specify property names in both the SELECT clause and FROM clause at the same time:

Notice that the properties specified in the SELECT clause use the alias specified in the FROM clause. As a result, we do not need to include the entire hierarchy for each property, only what is necessary based on the alias. The SELECT statement returns the same results as the preceding statement.

Because DocumentDB is concerned with JSON documents, the SELECT statement also provides features that let us better control our query results. For example, the following SELECT clause creates the FullName property, based on the First and Last properties:

The second expression in the SELECT list encloses the First and Last properties in brackets and assigns the FullName alias to those columns. This gives us the new property with an array as its value:


Each FullName array contains the values from the First and Last properties, but the property names have been dropped. In this way, we’re able to change the format of the JSON output to meet our specific needs. Although this is just a simple example, it points to the flexibility we have when working with the JSON documents.

Another SQL feature at our disposal is the VALUE operator, which lets us retrieve property values without the property name and without having to restructure the JSON. To use the operator, we need only add it before a property reference in the SELECT list:

As the following figure shows, the statement returns a single array, with each NatID value as one of the array values:


Another type of operator that the SELECT statement supports is the ternary operator, which is represented by a question mark. The operator lets us create a property whose value is based on another property’s value, providing a type of if/else logic. For example, the following SELECT statement uses a ternary operator to create the JobRole property, whose value is determined by the JobTitle property:

The logic reads as follows: If the JobTitle value equals Chief Executive Officer, then the JobRole value should be set to CEO. For all other JobTitle values, the JobRole value should be set to Underling, as shown in the following results.


The DocumentDB SELECT statement also supports the coalesce operator, which is represented by double question makes. You can use the operator to retrieve an alternate property value if the originally specified property does not exist, sort of like the COALESCE function in T-SQL.

To illustrate how this works, we need to add another document to our collection, this one based on the following T-SQL SELECT statement:

The SELECT statement retrieves data for the employee with the BusinessEntityID value of 6, but it specifies a different property value from the previous documents. Instead of the PII.NatID property, this statement creates the PII.S S N property, giving us the following results:

Here’s what the document should look lie after it has been formatted to appear more readable (and with the outer brackets removed):

If you don’t want to mess with SQL Server 2016, you can use this formatted JSON to create the document in the current collection.

After the document has been added, we can include it in our query results. Suppose we want to retrieve the national ID from all the documents, including the new one. We might start with a SELECT statement that looks like the following:

The statement will query all the documents, but will return only those that include the NatID property in the specified position, giving us the following results.


For the newest document, we receive only an empty object (the last set of curly braces). However, we can fix this by using the coalesce operator:

The SELECT statement specifies that if the NatID property is not found, the SSN property should be used. In addition, whichever value is used, that value should be part of the NationalID property. Now our results include data from all documents.


As you can see, we now have the national ID for all the employees in our collection.

The WHERE and ORDER clauses

The WHERE clause lets us filter our results even further. For example, the following SELECT statement returns the id and Name properties for any employees with a JobTitle value of Design Engineer:

The WHERE clause includes a scalar expression that verifies the value of the JobTitle property. If it matches the specified string, the employee’s id and Name properties are returned:


In this case, only two of the JSON documents are a match. However, rather than using the equal operator in the WHERE clause expression, we can use the not equal ( !=) operator:

Now our SELECT statement returns the four other documents, those with a JobTitle value other than Design Engineer.

The DocumentDB SELECT statement supports plenty of other operators as well, much like you’ll find in T-SQL. For example, the following WHERE clause includes the AND logical operator as well as the lesser than ( <) comparison operator:

In order to be returned, a document must contain a JobTitle value that equals Design Engineer and a DOB value that is less than 1955-01-01, giving us the following results:


DocumentDB also supports the IN operator, just like T-SQL. The IN operator lets you specify a list of possible values that can be compared to a property value. For example, the WHERE clause in the following statement specifies that the JobTitle value must equal Senior Tool Designer or Design Engineer.

Now our SELECT statement returns the following results:


We can also include an ORDER BY clause in our SELECT statement to specify the order in which the data should be returned. For example, we can modify the previous example to sort the results by the Last property:

In the ORDER BY clause, we can specify one or more scalar expressions, which means, more often than not, one or more properties in the JSON documents. In this case, the SELECT statement returns the following results:


By default, the data is sorted in ascending order, by the Last property value. However, we can specify the sort order by tagging on ASC or DESC to our expression: ASC for ascending order and DESC for descending order.

The DocumentDB wish list

In addition to what we’ve covered here, the DocumentDB SELECT statement supports a number of other language elements. For example, we can use the IN keyword in our FROM clause to iterate through JSON arrays or use the JOIN keyword to create a self-join. The SELECT statement also supports a variety of built-in SQL functions, including mathematical, string, array, spatial, and type checking functions.

Even so, the SELECT statement has a long way to go to catch up with its T-SQL counterpart. For proof, you need only visit Azure’s DocumentDB forum to view a wish list of features. There you’ll find the top requests for language elements such as aggregate functions, wildcard searching with LIKE, cross-document joins, and the DISTINCT keyword. Many forum visitors would also like to see support for the INSERT, UPDATE, and DELETE statements.

Even so, the SQL we do get in DocumentDB makes it a lot easier for those already familiar with SQL Server querying to make the leap to DocumentDB, without having to learn a new language or unusual constructs. And the ubiquity of JSON across the known universe makes it more likely we’ll be able to make use of DocumentDB to support our web and mobile applications.

For those already committed to the Azure services stack, DocumentDB seems a natural progression. It’s easy to use and quick to implement. And a JSON-based NoSQL service is a natural fit for the cloud. At the same time, DocumentDB is competing against the likes of Cloudant, Amazon, Google, and Oracle, not to mention MongoDB, so DocumentDB won’t be given a free ride. Still, Microsoft has made a serious commitment to the cloud, even if it was slow to jump onboard, and DocumentDB could prove a force to be reckoned with.