SQL For Cosmos DB – Tips and Tricks

Building on the introduction to the SQL used by Cosmos DB in the first article in the series, here you will learn how to handle some of the more classic challenges that you may face when querying JSON documents. Adam Aspin shows you some of the workarounds that are useful in practice when finding and shaping output data ready for further analysis.

The series so far:

  1. Introduction to SQL for Cosmos DB
  2. SQL For Cosmos DB – Tips and Tricks

This article will help you extend your basic Cosmos DB SQL querying abilities. Here, you will learn a few of the tricks that you might need to overcome the limitations of the flavour of Cosmos DB SQL.

String Functions for Wildcard Searches

The Cosmos DB engine is not bad at searching through vast swathes of JSON data. If you need to use a wildcard search for a text inside a specific attribute, the CONTAINS() function is a good starting point. You use it rather like a T-SQL LIKE in a WHERE clause.

NOTE: Load the SimpleCars2 data into the Cosmos DB emulator. For help with this, see the first article in this series.

Run the examples in this article against SimpleCars2.

This query finds the following documents in the source collection:

You need to be aware that the search text is case sensitive. What do you do to make a wildcard search case-insensitive? The answer is simply to force the attribute and the text that you are searching to be in the same case, much as you would do for case-sensitive collations in SQL Server:

Alternatively, if you need to set up a Wildcard search that applies to the start or the end of the contents of an attribute, similar to LIKE ‘%sometext’ in T-SQL, you can use this approach:

This more focused wildcard search gives the following result:

This query will still be case sensitive, but you can use the UPPER() or LOWER() functions to ensure that a case-insensitive search is applied.

Null Handling in Cosmos DB SQL

One reassuring shared feature between T-SQL and Cosmos DB SQL is that nulls, expressed as the unquoted lowercase word null in JSON, will ‘infect’ an entire calculation and return a null value if any single attribute is null. This is something that you will have to learn to handle in JSON documents.

However, one of the joys of a semi-structured format like JSON is that no attribute is compulsory. The potential consequence is that attributes may be missing in certain documents, and a missing attribute will also return a null in a calculation. In essence, you have to guard against two different types of nulls, missing elements, and actual null values.

To make this clearer, suppose that you have written some preventative code like this:

Assuming the query works (and there is no reason that it should not) the output should be:

Here you are combining the Cosmos DB SQL IS_NULL() function, which detects a missing attribute, with ternary logic to imitate the T-SQL ISNULL() function. What is more, you have enclosed the NULL detection ternary logic inside parentheses like this:

This ensures that the output works as you expect. Indeed, forgetting to enclose the IS_NULL() logic inside parentheses will cause problems.

If you run this query against the SimpleCars2 collection, you will see that this technique works flawlessly for Invoice GBPGB002 where there is a LineItemDiscount attribute that is set to null in the source JSON document. Unfortunately, the total costs are missing for three invoices, numbers GBPGB003, GBPGB004, and GBPGB005. If you look at these documents, you can see that they are all missing the LineItemDiscount attribute.

Undeterred, you try and remedy the situation using the IS_DEFINED() function, like this (and please note that the ternary logic is reversed compared to the IS_NULL() function):

This time, the output looks like this:

In this case, you will see costs for invoices GBPGB003, GBPGB004, and GBPGB005 where there is no attribute at all for line item discount but not for GBPGB002, the document with the actual null value.

This example shows that there are, in fact, two types of nulls in JSON, missing attributes and attributes that contain a null. One way to ensure that both of these potential traps are handled is code like this:

Finally, you obtain the result that you were looking for:

Here you are using simple logic to detect, firstly, if the attribute is defined and secondly to ensure that, if it is present, it does not contain a null. This kind of approach comes close to basic null handling in T-SQL. Once you reach a certain level of proficiency in Cosmos DB, you can write your own user-defined functions in JavaScript to handle these kinds of issues.

Given the clear limitations of Cosmos DB SQL, you will have to become inventive when writing SQL queries. You will have to write complex SQL using combinations of the available functions to achieve your goals.

Take a look at a handful of elementary workarounds to give you some ideas.

SUBSTRING() for LEFT()

Cosmos DB SQL currently does not have an equivalent to the SQL LEFT() function. Instead, you need to use a little ingenuity and apply the SUBSTRING() function like this:

You should obtain the following result:

SUBSTRING() for RIGHT()

Just as there is no LEFT() function, there is no RIGHT() function. However, combining the SUBSTRING() and REVERSE() functions like this can solve certain challenges:

This time the result is:

Remember, that if you are looking for more than one character at the right of the string using this technique, then you will need to reverse the string that you are searching for. That is, instead of looking for ‘red’ you would have to enter ‘der.’

If entering a search text in reverse order causes you any discomfort, then you can always add a second REVERSE() like this:

Once again, this is case-sensitive unless you extend the SQL to enforce case-insensitivity, like this:

Impersonate T-SQL YEAR(), MONTH(), and DAY()

There is no YEAR() function in Cosmos DB SQL. However, one simple alternative is to apply the SUBSTRING() function to the date string and isolate the relevant date element. To specify JSON documents for a specific year, you could write:

The output from this query displays the sale date for six of the seven documents in the collection.

Extending this principle, you can isolate the month like this:

Only two objects are returned in this case. The day can be isolated in this way:

This filter only returns a single object.

Although they are probably self-evident, for the sake of completeness it is worth noting that you can use:

The operators >=, <=, <> or !=

IN for a set of years, etc.

Once you start creating more complex WHERE clauses that mix and match these elements, you will find that you can specify date and time ranges extremely easily.

Rounding up to nearest 1000

As a simple example of how to extend the Cosmos DB SQL functions, take a look at the following short code snippet that extends the TRUNC() function to round up to the nearest thousand:

Here, the output looks like this:

Conclusion

There are many other workarounds that you can apply to overcome the limitations of the Cosmos DB implementation of SQL, and this short list could be extended to cover a range of options. However, this would be like a trip back to SQL Server circa 1994, and even this would not cover all the challenges that you are likely to face. Suffice it to say that you are likely to require some ingenuity and write extremely convoluted SQL to do something that would be simple in T-SQL or to consider other solutions.

Cosmos DB has a short answer to this, and that it to write your own user-defined functions in JavaScript. This is a rich and immensely powerful language, and it can be fairly easy to extend Cosmos DB SQL with a plethora of functions that answer your specific requirements. However, the purpose of this article is to look at the Cosmos DB flavor of SQL, and so learning how to add extensions in a language that you may not be familiar with will have to wait for another day.

The next article in the series will cover how to query more complex JSON documents.