{"id":84227,"date":"2019-05-13T23:34:53","date_gmt":"2019-05-13T23:34:53","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=84227"},"modified":"2026-04-15T18:49:29","modified_gmt":"2026-04-15T18:49:29","slug":"sql-for-cosmos-db-handling-complex-json-structures","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/nosql\/sql-for-cosmos-db-handling-complex-json-structures\/","title":{"rendered":"Cosmos DB SQL: Querying Complex JSON &#8211; Arrays, Schema on Read, and Subqueries"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-executive-summary\">Executive Summary<\/h2>\n\n\n\n<p><strong>Cosmos DB SQL&#8217;s document model stores data as JSON &#8211; which means queries must handle nested objects, arrays, variable schemas, and missing attributes that don&#8217;t exist in relational SQL. This article, Part 3 of the Cosmos DB SQL series, covers the advanced querying patterns for complex JSON structures: accessing nested subnodes and arrays of objects, handling schema-on-read inconsistencies (where the same attribute has different names across documents), checking for missing elements, verifying data types using IS_STRING and IS_NUMBER, reshaping output JSON, and using subqueries. The patterns use the Azure Cosmos DB SQL (formerly DocumentDB SQL) query dialect.<\/strong><\/p>\n\n\n<p><strong>The series so far:<\/strong><\/p>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/nosql-databases\/introduction-to-sql-for-cosmos-db\/\">Introduction to SQL for Cosmos DB<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/nosql-databases\/sql-for-cosmos-db-tips-and-tricks\">SQL For Cosmos DB \u2013 Tips and Tricks<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/nosql-databases\/sql-for-cosmos-db-handling-complex-json-structures\/\">SQL For Cosmos DB \u2013 Handling Complex JSON Structures<\/a><\/li>\n<\/ol>\n\n\n\n\n<p>JSON allows for nested nodes, arrays and arrays of objects, and Cosmos DB SQL can handle all of these when reshaping the output data. Consequently, you will use a second collection when learning how to query more complex JSON documents<\/p>\n\n\n\n<p>This means creating a new collection called <a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/05\/complexcars.zip\">complexcars<\/a> in the Cosmos DB emulator\u2014or even in Cosmos DB if you prefer. Once this collection has been created you need to load into it the documents named complex1.json, complex2.json and complex3.json. Be sure to review how to add the files from this <a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/nosql-databases\/introduction-to-sql-for-cosmos-db\/\">article<\/a> if you need help.<\/p>\n\n\n\n<p>Each document in this collection looks something like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">{\n    \"InvoiceNumber\": \"GBPGB011\",\n    \"TotalSalePrice\": 89000,\n    \"SaleDate\": \"2015-04-30T00:00:00\",\n    \"Customer\": {\n        \"Name\": \"Wonderland Wheels\",\n        \"CreditRisk\": false,\n        \"Reseller\": true\n    },\n    \"Address\": {\n        \"Town\": \"London\",\n        \"PostCode\": \"E7 4BR\",\n        \"CountryName\": \"United Kingdom\",\n        \"CountryISO\": \"GBR       \"\n    } ,\n    \"CustomerComments\": [\n        \"Excellent\",\n        \"Wonderful\",\n        \"Superb\"\n    ],\n    \"Salesdetails\": [\n        {\n            \"LineItem\": 1,\n            \"Make\": \"Porsche\",\n            \"Model\": \"944\",\n            \"SellingPrice\": 8500,\n            \"LineItemDiscount\": 50,\n            \"PurchaseCost\": 6800,\n            \"RepairsCost\": 250,\n            \"PartsCost\": 225,\n            \"TransportInCost\": 150\n        },\n        {\n            \"LineItem\": 2,\n            \"Make\": \"Bentley\",\n            \"Model\": \"Flying Spur\",\n            \"SellingPrice\": 80500,\n            \"LineItemDiscount\": 500,\n            \"PurchaseCost\": 64400,\n            \"RepairsCost\": 500,\n            \"PartsCost\": 750,\n            \"TransportInCost\": 750\n        }\n    ],\n    \"id\": \"f58a70dc-f107-d3ba-acda-02f39893eb44\",\n    \"_rid\": \"molfALBK0z8BAAAAAAAAAA==\",\n    \"_self\": \"dbs\/molfAA==\/colls\/molfALBK0z8=\/docs\/molfALBK0z8BAAAAAAAAAA==\/\",\n    \"_etag\": \"\\\"00000000-0000-0000-c2fa-09169cb001d4\\\"\",\n    \"_attachments\": \"attachments\/\",\n    \"_ts\": 1549993225\n}<\/pre>\n\n\n\n<p>In this document, <code>Customer<\/code> and <code>Address<\/code> are subnodes of distinct objects inside the document. <code>Salesdetails<\/code> is an array of objects and <code>CustomerComments<\/code> is an array of multiple items.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-choose-attributes-from-subnodes\">Choose Attributes from Subnodes<\/h2>\n\n\n\n<p>Returning the contents of a node in a JSON collection is mercifully simple. All you have to do is to specify the node, like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT c.Address FROM c<\/pre>\n\n\n\n<p>The output is the selected node-exactly as it appears in the source document:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"Address\": {\n            \"Town\": \"London\",\n            \"PostCode\": \"E7 4BR\",\n            \"CountryName\": \"United Kingdom\",\n            \"CountryISO\": \"GBR       \"\n        }\n    },\n    {\n        \"Address\": {\n            \"Town\": \"Liverpool\",\n            \"PostCode\": \"LL1 001\",\n            \"CountryName\": \"United Kingdom\",\n            \"CountryISO\": \"GBR       \"\n        }\n    },\n    {\n        \"Address\": {\n            \"Town\": \"London\",\n            \"PostCode\": \"NW1 1AA\",\n            \"CountryName\": \"United Kingdom\",\n            \"CountryISO\": \"GBR       \"\n        }\n    }\n]<\/pre>\n\n\n\n<p>In practice, this approach can be a useful way of returning multiple elements.<\/p>\n\n\n\n<p>Of course, you can make the output even more fine-grained and return selected attributes from a specific node:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT   \t c.Customer.Name\n        \t,c.Customer.CreditRisk\nFROM \tc<\/pre>\n\n\n\n<p>The result in this case is as simple as it is predictable:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"Name\": \"Wonderland Wheels\",\n        \"CreditRisk\": false\n    },\n    {\n        \"Name\": \"Honest John\",\n        \"CreditRisk\": false\n    },\n    {\n        \"Name\": \"Cut and Shut\",\n        \"CreditRisk\": true\n    }\n]<\/pre>\n\n\n\n<p>And it goes without saying that you can mix and match JSON attributes from varying levels in the hierarchy of nodes by running queries like this one:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT   c.TotalSalePrice AS InvoiceAmount\n        ,c.Customer.Name\n        ,c.Customer.CreditRisk\n        ,c.Address.Town\nFROM    c<\/pre>\n\n\n\n<p>In this case the result is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceAmount\": 89000,\n        \"Name\": \"Wonderland Wheels\",\n        \"CreditRisk\": false,\n        \"Town\": \"London\"\n    },\n    {\n        \"InvoiceAmount\": 95000,\n        \"Name\": \"Honest John\",\n        \"CreditRisk\": false,\n        \"Town\": \"Liverpool\"\n    },\n    {\n        \"InvoiceAmount\": 170000,\n        \"Name\": \"Cut and Shut\",\n        \"CreditRisk\": true,\n        \"Town\": \"London\"\n    }\n]\n<\/pre>\n\n\n\n<p>While on this subject it is worth noting that:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>You can mix from different levels.<\/li>\n\n\n\n<li>You can drill down to any node merely by specifying the exact path down through the hierarchy of nodes.<\/li>\n\n\n\n<li>You can return the contents of an entire node as well as selected attributes from other nodes using queries like this:<\/li>\n<\/ul>\n<\/div>\n\n\n<pre class=\"wp-block-preformatted\">SELECT   c.TotalSalePrice AS InvoiceAmount\n        ,c.Customer.Name\n        ,c.Customer.CreditRisk\n        ,c.Address\nFROM    c<\/pre>\n\n\n\n<p>This query gives the following output:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceAmount\": 89000,\n        \"Name\": \"Wonderland Wheels\",\n        \"CreditRisk\": false,\n        \"Address\": {\n            \"Town\": \"London\",\n            \"PostCode\": \"E7 4BR\",\n            \"CountryName\": \"United Kingdom\",\n            \"CountryISO\": \"GBR       \"\n        }\n    },\n    {\n        \"InvoiceAmount\": 95000,\n        \"Name\": \"Honest John\",\n        \"CreditRisk\": false,\n        \"Address\": {\n            \"Town\": \"Liverpool\",\n            \"PostCode\": \"LL1 001\",\n            \"CountryName\": \"United Kingdom\",\n            \"CountryISO\": \"GBR       \"\n        }\n    },\n    {\n        \"InvoiceAmount\": 170000,\n        \"Name\": \"Cut and Shut\",\n        \"CreditRisk\": true,\n        \"Address\": {\n            \"Town\": \"London\",\n            \"PostCode\": \"NW1 1AA\",\n            \"CountryName\": \"United Kingdom\",\n            \"CountryISO\": \"GBR       \"\n        }\n    }\n]<\/pre>\n\n\n\n<p>You may remember from the previous article that you can use the <code>ROOT<\/code> keyword to indicate the collection. Well this is also possible when querying subnodes, like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT * \nFROM ROOT.Customer<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-choose-elements-from-an-array-of-objects\">Choose Elements from an Array of Objects<\/h2>\n\n\n\n<p>If a JSON document contains arrays of objects (as is the case for <code>salesdetails<\/code> in the sample documents) then you might need to extend the SQL slightly depending on how you want to display the data. Essentially you have a couple of possibilities:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Return the entire arrays of objects, that is, everything inside the array<\/li>\n\n\n\n<li>Specify the item in the object that you wish to output<\/li>\n\n\n\n<li>Flatten the output and return selected items from each object<\/li>\n<\/ul>\n<\/div>\n\n\n<p>Let\u2019s look at each of these approaches in turn using the Salesdetails object in the sample document.<\/p>\n\n\n\n<p>Returning the complete contents of the object is an extension of the technique that you saw previously when returning the complete contents of a node:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT *\nFROM  c.Salesdetails<\/pre>\n\n\n\n<p>Executing this query produces the following result:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    [\n        {\n            \"LineItem\": 1,\n            \"Make\": \"Porsche\",\n            \"Model\": \"944\",\n            \"SellingPrice\": 8500,\n            \"LineItemDiscount\": \"50\",\n            \"PurchaseCost\": 6800,\n            \"RepairsCost\": 250,\n            \"PartsCost\": 225,\n            \"TransportInCost\": 150\n        },\n        {\n            \"LineItem\": 2,\n            \"Make\": \"Bentley\",\n            \"Model\": \"Flying Spur\",\n            \"SellingPrice\": 80500,\n            \"LineItemDiscount\": 500,\n            \"PurchaseCost\": 64400,\n            \"RepairsCost\": 500,\n            \"PartsCost\": 750,\n            \"TransportInCost\": 750\n        }\n    ]\n]<\/pre>\n\n\n\n<p>The output is truncated in this example, as all six line items from the source documents are returned by the query. However, once you see a couple of them the principle. Hopefully, is clear.<\/p>\n\n\n\n<p>However, the structure can be simplified and returned as a less complex and deep array if you use the IN keyword, like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT *\nFROM l IN c.Salesdetails<\/pre>\n\n\n\n<p>Here the result is subtly different:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"LineItem\": 1,\n        \"Make\": \"Porsche\",\n        \"Model\": \"944\",\n        \"SellingPrice\": 8500,\n        \"LineItemDiscount\": \"50\",\n        \"PurchaseCost\": 6800,\n        \"RepairsCost\": 250,\n        \"PartsCost\": 225,\n        \"TransportInCost\": 150\n    },\n    {\n        \"LineItem\": 2,\n        \"Make\": \"Bentley\",\n        \"Model\": \"Flying Spur\",\n        \"SellingPrice\": 80500,\n        \"LineItemDiscount\": 500,\n        \"PurchaseCost\": 64400,\n        \"RepairsCost\": 500,\n        \"PartsCost\": 750,\n        \"TransportInCost\": 750\n    }\n]<\/pre>\n\n\n\n<p>Once again, only a subset of the output data is displayed here.<\/p>\n\n\n\n<p>If you don\u2019t want the entire contents of the object you can tweak the <code>SELECT<\/code> statement to isolate only the required attributes from the array.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT l.Make\nFROM l IN c.Salesdetails<\/pre>\n\n\n\n<p>Here, as you can see, you are returning only a subset of the items in each array:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"Make\": \"Porsche\"\n    },\n    {\n        \"Make\": \"Bentley\"\n    },\n    {\n        \"Make\": \"Aston Martin\"\n    },\n    {\n        \"Make\": \"Rolls Royce\"\n    },\n    {\n        \"Make\": \"Porsche\"\n    },\n    {\n        \"Make\": \"Jaguar\"\n    }\n]<\/pre>\n\n\n\n<p>What is interesting to note here is that you are using one alias (c) to refer to the collection and another alias (l) to refer to the object itself.<\/p>\n\n\n\n<p>Moreover, you can count the number of objects in an object with an extension of the code you saw above. Here I am using the <code>VALUE<\/code> keyword to return the value without a JSON attribute name.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT VALUE COUNT(l)\nFROM l IN c.Salesdetails<\/pre>\n\n\n\n<p>The result is simply:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    6\n]<\/pre>\n\n\n\n<p>Naturally, you can mix attributes from inside the object with attributes from elsewhere in the document:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT  c.InvoiceNumber\n       ,c.Customer.Name\n       ,c.Salesdetails\nFROM   c<\/pre>\n\n\n\n<p>In this case the result is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB011\",\n        \"Name\": \"Wonderland Wheels\",\n        \"Salesdetails\": [\n            {\n                \"LineItem\": 1,\n                \"Make\": \"Porsche\",\n                \"Model\": \"944\",\n                \"SellingPrice\": 8500,\n                \"LineItemDiscount\": \"50\",\n                \"PurchaseCost\": 6800,\n                \"RepairsCost\": 250,\n                \"PartsCost\": 225,\n                \"TransportInCost\": 150\n            },\n            {\n                \"LineItem\": 2,\n                \"Make\": \"Bentley\",\n                \"Model\": \"Flying Spur\",\n                \"SellingPrice\": 80500,\n                \"LineItemDiscount\": 500,\n                \"PurchaseCost\": 64400,\n                \"RepairsCost\": 500,\n                \"PartsCost\": 750,\n                \"TransportInCost\": 750\n            }\n        ]\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB001\",\n        \"Name\": \"Honest John\",\n        \"Salesdetails\": [\n            {\n                \"LineItem\": 1,\n                \"Make\": \"Aston Martin\",\n                \"Model\": \"DB10\",\n                \"SellingPrice\": 185000,\n                \"LineItemDiscount\": \"5000\",\n                \"PurchaseCost\": 125000,\n                \"RepairsCost\": 2500,\n                \"PartsCost\": 2025,\n                \"TransportInCost\": 150\n            },\n            {\n                \"LineItem\": 2,\n                \"Make\": \"Rolls Royce\",\n                \"Model\": \"Silver Ghost\",\n                \"SellingPrice\": 82500,\n                \"LineItemDiscount\": 500,\n                \"PurchaseCost\": 54500,\n                \"RepairsCost\": 500,\n                \"PartsCost\": 750,\n                \"TransportInCost\": 750\n            }\n        ]\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB002\",\n        \"Name\": \"Cut and Shut\",\n        \"Salesdetails\": [\n            {\n                \"LineItem\": 1,\n                \"Make\": \"Porsche\",\n                \"Model\": \"924\",\n                \"SellingPrice\": 95000,\n                \"LineItemDiscount\": \"5000\",\n                \"PurchaseCost\": 48000,\n                \"RepairsCost\": 250,\n                \"PartsCost\": 225,\n                \"TransportInCost\": 150\n            },\n            {\n                \"LineItem\": 2,\n                \"Make\": \"Jaguar\",\n                \"Model\": \"XK\",\n                \"SellingPrice\": 65000,\n                \"LineItemDiscount\": 500,\n                \"PurchaseCost\": 60000,\n                \"RepairsCost\": 500,\n                \"PartsCost\": 750,\n                \"TransportInCost\": 750\n            }\n        ]\n    }\n]<\/pre>\n\n\n\n<p>Specifying the path to the arrays of objects is enough to return the entire contents of the array of objects.<\/p>\n\n\n\n<p>Specifying the individual item inside an array of objects means tweaking the SQL and indicating the (zero-based) item that you want to see in the output, like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT  c.InvoiceNumber\n       ,c.Customer.Name\n       ,c.Salesdetails[0].Make\n       ,c.Salesdetails[0].Model\nFROM   c<\/pre>\n\n\n\n<p>As you can see below, on this occasion you are only returning one object from the array:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB011\",\n        \"Name\": \"Wonderland Wheels\",\n        \"Make\": \"Porsche\",\n        \"Model\": \"944\"\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB001\",\n        \"Name\": \"Honest John\",\n        \"Make\": \"Aston Martin\",\n        \"Model\": \"DB10\"\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB002\",\n        \"Name\": \"Cut and Shut\",\n        \"Make\": \"Porsche\",\n        \"Model\": \"924\"\n    }\n]<\/pre>\n\n\n\n<p>Conversely, flattening the output to return all the items in an array involves using the JOIN keyword, and joining the document to itself-or more precisely to the array itself. If it helps, you can consider this as nearly equivalent to a table join in T-SQL only the second table is an arrays of objects inside the JSON document itself.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT  c.InvoiceNumber\n       ,c.Customer.Name\n       ,cx.LineItem\n       ,cx.Make\n       ,cx.Model\n       ,cx.SellingPrice\nFROM   c\nJOIN   cx IN c.Salesdetails<\/pre>\n\n\n\n<p>Here the output structure is decidedly different:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB011\",\n        \"Name\": \"Wonderland Wheels\",\n        \"LineItem\": 1,\n        \"Make\": \"Porsche\",\n        \"Model\": \"944\",\n        \"SellingPrice\": 8500\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB011\",\n        \"Name\": \"Wonderland Wheels\",\n        \"LineItem\": 2,\n        \"Make\": \"Bentley\",\n        \"Model\": \"Flying Spur\",\n        \"SellingPrice\": 80500\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB001\",\n        \"Name\": \"Honest John\",\n        \"LineItem\": 1,\n        \"Make\": \"Aston Martin\",\n        \"Model\": \"DB10\",\n        \"SellingPrice\": 185000\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB001\",\n        \"Name\": \"Honest John\",\n        \"LineItem\": 2,\n        \"Make\": \"Rolls Royce\",\n        \"Model\": \"Silver Ghost\",\n        \"SellingPrice\": 82500\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB002\",\n        \"Name\": \"Cut and Shut\",\n        \"LineItem\": 1,\n        \"Make\": \"Porsche\",\n        \"Model\": \"924\",\n        \"SellingPrice\": 95000\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB002\",\n        \"Name\": \"Cut and Shut\",\n        \"LineItem\": 2,\n        \"Make\": \"Jaguar\",\n        \"Model\": \"XK\",\n        \"SellingPrice\": 65000\n    }\n]<\/pre>\n\n\n\n<p>What is important here is to alias the arrays of objects as the focus of the <code>JOIN<\/code> keyword and use the <code>IN<\/code> keyword to identify the path to the arrays of objects in the document. If your document contains multiple arrays of objects that you wish to flatten, then you simply add further <code>JOIN<\/code> clauses.<\/p>\n\n\n\n<p>Put another way, <code>FROM<\/code> defines the collection, and <code>JOIN<\/code> refers to the \u201cinner document\u201d (the array of objects) contained in the outer JSON document.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-handling-arrays\">Handling Arrays<\/h2>\n\n\n\n<p>JSON also uses arrays to store items inside a document. The ComplexCars document contains an array named <code>CustomerComments<\/code>. You can return the contents of an array much like you output the contents of an object using the <code>IN<\/code> keyword.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT *\nFROM l IN c.CustomerComments<\/pre>\n\n\n\n<p>This query returns the entire contents of the array-from all the documents:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    \"Excellent\",\n    \"Wonderful\",\n    \"Superb\",\n    \"Brilliant\",\n    \"Magnificent\",\n    \"Amazing\"\n]<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-searching-inside-an-array\">Searching Inside an Array<\/h2>\n\n\n\n<p>Complex JSON documents can contain arrays of elements, and it is always possible that you may need to search inside an array for a specific item. Cosmos DB SQL lets you use the <code>ARRAY<\/code>_<code>CONTAINS()<\/code> function to handle this particular challenge:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT c.InvoiceNumber\n       ,c.Customer.Name \nFROM   c\nWHERE  ARRAY_CONTAINS(c.CustomerComments, \"Superb\")<\/pre>\n\n\n\n<p>This time only elements from a document where the array contains the specified text is returned:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB011\",\n        \"Name\": \"Wonderland Wheels\"\n    }\n]<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-handling-schema-on-read\">Handling Schema on Read<\/h2>\n\n\n\n<p>As you may have already discovered (or doubtless soon will) one of the difficulties in a schema-free approach to storing data is that attributes not only do not always appear in JSON documents, but that the same attribute can have different names across the documents in a collection.<\/p>\n\n\n\n<p>It follows that you will need to handle alternative attribute names in JSON documents. The classic way to prevent the lack of a schema causing erroneous output is to use the coalesce (double question mark) operator-like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT s.Address.Town ?? s. Address.City AS TownOrCity\nFROM   s<\/pre>\n\n\n\n<p>The output is as simple as the query:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"TownOrCity\": \"London\"\n    },\n    {\n        \"TownOrCity\": \"Liverpool\"\n    },\n    {\n        \"TownOrCity\": \"London\"\n    }\n]<\/pre>\n\n\n\n<p>The coalesce operator is simple: if the first attribute is nonexistent, then the second one is used. As you can imagine, this operator can save you considerable grief through minimizing erroneous output. However, a few comments may help even further:<\/p>\n\n\n\n<p>If you do not add an alias &#8211; $1, $2 etc. is used<\/p>\n\n\n\n<p>You can extend coalesce operator to handle multiple attribute names by writing code like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT s.Address.Town ?? s.Address.City ?? s.Address.Village \n          AS TownOrCity  \nFROM s<\/pre>\n\n\n\n<p>You should look for empty braces in the output that indicate an unhandled attribute.<\/p>\n\n\n\n<p>The challenge in these cases is, of course, discovering the duplicate attributes. This is explained a little further down in this article.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-dealing-with-schema-on-read-in-where-clause\">Dealing with Schema on Read in WHERE clause<\/h2>\n\n\n\n<p>Multiple attribute names for the same attribute does not only cause issues in the <code>SELECT<\/code> clause. You may need to filter on an attribute that has multiple synonyms in the document structures. Fortunately, this is not difficult to deal with-as the following SQL shows. Run this code against the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2019\/05\/simplecars2.zip\">simplecars2<\/a> collection.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT     s.InvoiceNumber, \n           s.Town ?? s.City ?? s.Village AS TownOrCity\nFROM       s\nWHERE      s.Town = \"Birmingham\" \n           OR s.City = \"Birmingham\" \n           OR s.Village = \"Birmingham\"<\/pre>\n\n\n\n<p>Running this query produces the following JSON:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB001\",\n        \"TownOrCity\": \"Birmingham\"\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB003\",\n        \"TownOrCity\": \"Birmingham\"\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB002\",\n        \"TownOrCity\": \"Birmingham\"\n    }\n]<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-avoid-missing-elements-and-ensure-a-complete-structure\">Avoid Missing Elements and Ensure a Complete Structure<\/h2>\n\n\n\n<p>Another variation on a theme of schema fluidity is the occasional need to ensure that the output JSON has a rigid and predictable format. Here again the coalesce operator can be used to guarantee that an attribute will appear in the result set even if it is missing from the source document. Note that in this query, I am using the simplecars2 collection that you saw in the previous article.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT    simplecars.InvoiceNumber\n         ,simplecars.TotalSalePrice\n         ,simplecars.Town ?? \"N\/A\" AS Metropolis\nFROM \t   simplecars<\/pre>\n\n\n\n<p>This query will return seven records, but here are two examples:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB001\",\n        \"TotalSalePrice\": 65000,\n        \"Metropolis\": \"Birmingham\"\n    },\n{\n        \"InvoiceNumber\": \"GBPGB003\",\n        \"TotalSalePrice\": 19500,\n        \"Metropolis\": \"N\/A\"\n    },\n\u2026\n]<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-checking-document-structure-and-data-types\">Checking Document Structure and Data Types<\/h2>\n\n\n\n<p>Cosmos DB SQL comes with a handful of functions that can assist in minimizing the risk of error that is implicit in the free-form structure of JSON.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-replace-strings-with-0\">Replace strings with 0<\/h3>\n\n\n\n<p>Unlike T-SQL, Cosmos DB SQL will not attempt to convert numeric text values to numbers. The direct consequence of this is that any attribute where the value is defined as a string cannot be used in a calculation as this will prevent the calculation from working (and the calculated attribute will not appear in the output). If you look at the single JSON document in the complexcars collection you will see that the first value for the <code>LineItemDiscount<\/code> attribute is \u201c50\u201d (in double quotes). This will prevent the attribute from being used in a calculation and will prevent any value from being returned.<\/p>\n\n\n\n<p>One solution is to use the <code>IS_NUMBER()<\/code> function (with a little ternary logic) to replace numeric strings with zeroes, like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT c.InvoiceNumber\n       ,IS_NUMBER(c.Salesdetails[0].LineItemDiscount) \n       ? c. Salesdetails[0].LineItemDiscount : 0 \n          AS LineItemDiscount\nFROM   c<\/pre>\n\n\n\n<p>This query works beautifully, and gives the following result:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB011\",\n        \"LineItemDiscount\": 0\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB001\",\n        \"LineItemDiscount\": 0\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB002\",\n        \"LineItemDiscount\": 0\n    }\n]<\/pre>\n\n\n\n<p>Clearly a conversion function would be ideal. Unfortunately, this is not directly available in Cosmos DB SQL. However, there is an alternative which is to write your own type casting function in JavaScript and use this instead. You can, of course, extend this technique to list all values that are stored as strings and therefore could mean that erroneous results are returned. The code for this is extremely simple:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT c.InvoiceNumber\nFROM   c\nWHERE  IS_NUMBER(c.LineItemDiscount) = false<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-detect-inappropriate-data-types\">Detect Inappropriate Data Types<\/h3>\n\n\n\n<p>Equally, there could be occasions when you need to detect a string data type. This is a simple call to the <code>IS_STRING()<\/code> function.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT c.InvoiceNumber\nFROM   c\nWHERE  IS_STRING(c.Salesdetails[0].LineItemDiscount)<\/pre>\n\n\n\n<p>The query output tells you that at least one invoice has a string where you would expect a numeric value:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB011\"\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB001\"\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB002\"\n    }\n]<\/pre>\n\n\n\n<p>The remaining type detection functions that you may find useful are:<\/p>\n\n\n\n<p>IS_BOOL<\/p>\n\n\n\n<p>IS_ARRAY<\/p>\n\n\n\n<p>IS_OBJECT<\/p>\n\n\n\n<p>IS_PRIMITIVE (these can be string, boolean, numeric or null)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-shape-output-json\">Shape Output JSON<\/h2>\n\n\n\n<p>Earlier in this article you learned how to \u201cflatten\u201d JSON. Inversely, there could be times when you need to create a different JSON structure for the output. Suppose, for example, you wish to take the flattened documents from the simplecars2 collection and display them in a nested format.<\/p>\n\n\n\n<p>The following short piece of SQL shows how this can be done. Please note that I am deliberately not attempting to output all the attributes from the documents.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT c.InvoiceNumber , {\"DateOfSale\": c.SaleDate\n       ,\"SellingPrice\": c.TotalSalePrice} AS InvoiceDetails\nFROM  c<\/pre>\n\n\n\n<p>Here is the result:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"InvoiceNumber\": \"GBPGB011\",\n        \"InvoiceDetails\": {\n            \"DateOfSale\": \"2015-04-30T00:00:00\",\n            \"SellingPrice\": 89000\n        }\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB001\",\n        \"InvoiceDetails\": {\n            \"DateOfSale\": \"2017-07-10T00:00:00\",\n            \"SellingPrice\": 95000\n        }\n    },\n    {\n        \"InvoiceNumber\": \"GBPGB002\",\n        \"InvoiceDetails\": {\n            \"DateOfSale\": \"2018-09-30T00:00:00\",\n            \"SellingPrice\": 170000\n        }\n    }\n]<\/pre>\n\n\n\n<p>Another way to obtain exactly the same result is (and I know that it looks a little weird) is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT c.InvoiceNumber \n       ,(SELECT c. SaleDate, c.TotalSalePrice FROM c) \n          AS InvoiceDetails\nFROM  c<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-subqueries\">Subqueries<\/h2>\n\n\n\n<p>As you saw in the previous article, Cosmos DB SQL allows you to use subqueries. One useful application of subqueries is to restructure the JSON in the output of a query. In the following case the subquery \u201cflattens\u201d the JSON structure in the complexcars document format-and the outer query then constructs a totally different document format.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT a.Name\n       ,{\"Make\": a.Make, \"Model\": a.Model\n       , \"SellingPrice\": a.SellingPrice} AS InvoiceDetails\nFROM\n(SELECT  c.InvoiceNumber\n       ,c.Customer.Name\n       ,cx.LineItem\n       ,cx.Make\n       ,cx.Model\n       ,cx.SellingPrice\nFROM   c\nJOIN   cx IN c.Salesdetails) a<\/pre>\n\n\n\n<p>The format this time is completely different, as you have two objects in the output. All in all, this is rather like a standard SQL JOIN.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[\n    {\n        \"Name\": \"Wonderland Wheels\",\n        \"InvoiceDetails\": {\n            \"Make\": \"Porsche\",\n            \"Model\": \"944\",\n            \"SellingPrice\": 8500\n        }\n    },\n    {\n        \"Name\": \"Wonderland Wheels\",\n        \"InvoiceDetails\": {\n            \"Make\": \"Bentley\",\n            \"Model\": \"Flying Spur\",\n            \"SellingPrice\": 80500\n        }\n    },\n    {\n        \"Name\": \"Honest John\",\n        \"InvoiceDetails\": {\n            \"Make\": \"Aston Martin\",\n            \"Model\": \"DB10\",\n            \"SellingPrice\": 185000\n        }\n    },\n    {\n        \"Name\": \"Honest John\",\n        \"InvoiceDetails\": {\n            \"Make\": \"Rolls Royce\",\n            \"Model\": \"Silver Ghost\",\n            \"SellingPrice\": 82500\n        }\n    },\n    {\n        \"Name\": \"Cut and Shut\",\n        \"InvoiceDetails\": {\n            \"Make\": \"Porsche\",\n            \"Model\": \"924\",\n            \"SellingPrice\": 95000\n        }\n    },\n    {\n        \"Name\": \"Cut and Shut\",\n        \"InvoiceDetails\": {\n            \"Make\": \"Jaguar\",\n            \"Model\": \"XK\",\n            \"SellingPrice\": 65000\n        }\n    }\n]<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion\">Conclusion<\/h2>\n\n\n\n<p>This concludes your whirlwind tour of Cosmos DB SQL. The good news is that you have already seen a large number of the currently available functions, as well as gained a reasonable overview of the core approaches that you may need when querying JSON documents in Cosmos DB. In fact, there are now very few of the Cosmos DB SQL API functions that you have not seen in these two articles.<\/p>\n\n\n\n<p>This very simplicity is, however, a two-edged sword. It is hard to deny that an experienced SQL programmer will feel frustrated at the absence of a range of functions that have been a fundamental part of T-SQL for years-if not decades. Moreover, it has to be said that Cosmos DB SQL \u2013 at least in its current incarnation \u2013 is not an analytical tool.<\/p>\n\n\n\n<p>However, I prefer to concentrate on the positives, and to point out that once you have mastered Cosmos DB SQL you can use it to export flattened JSON to an SQL Server using a variety of data ingestion techniques and carry out analytics on a reduced data set in the relational or even dimensional engines that you are currently using. Consequently, I encourage you to think of Cosmos DB and its SQL API as an essential extension to the SQL Server universe, and the SQL it offers -however limited-as essentially an easy way in to the worlds of JSON, document databases, NoSQL and big data.<\/p>\n\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: SQL For Cosmos DB \u2013 Handling Complex JSON Structures<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. How do I query nested JSON objects in Azure Cosmos DB SQL?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Use dot notation to access properties of nested objects: SELECT c.address.city FROM c returns the city from a nested address object. For arrays of objects, use JOIN to flatten the array into queryable rows: SELECT c.id, item.productName FROM c JOIN item IN c.orderItems returns one row per item in the orderItems array. For arrays of scalar values, IN and ARRAY_CONTAINS provide membership testing.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. What is schema on read in Cosmos DB and how do I handle inconsistent attribute names?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Schema on read means Cosmos DB imposes no schema on documents &#8211; different documents in the same container can have different attribute names for the same logical value (e.g., &#8216;Color&#8217; in one document and &#8216;colour&#8217; in another). To handle this in queries: use COALESCE(c.Color, c.colour) to return the first non-NULL value; use IS_DEFINED() to check attribute existence; or use WHERE IS_DEFINED(c.Color) to filter to documents that have the attribute. For queries that must handle all variants, COALESCE over all known attribute names is the standard pattern.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. How do I check data types in Cosmos DB SQL queries?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Use the IS_* type-checking functions: IS_STRING(c.attribute) returns true for string values; IS_NUMBER returns true for numeric values; IS_BOOL for booleans; IS_NULL for null; IS_ARRAY for arrays; IS_OBJECT for objects. These are useful when the same attribute might hold different types across documents (a common schema-on-read challenge). Example: WHERE IS_NUMBER(c.price) AND c.price &gt; 100 ensures the comparison only applies to documents where price is actually numeric.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. Can I reshape the output JSON in Cosmos DB SQL queries?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes. Cosmos DB SQL supports constructing new JSON objects in the SELECT clause: SELECT {&#8220;productId&#8221;: c.id, &#8220;name&#8221;: c.productName} AS product FROM c returns each result as a new JSON object with renamed properties. Use VALUE to return a scalar or array without the outer wrapper. Subqueries allow further reshaping: SELECT c.id, (SELECT p.name, p.price FROM p IN c.products) AS products FROM c returns a document with a nested array of products extracted from the product sub-array.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Query complex JSON in Azure Cosmos DB SQL: access subnodes and arrays, handle schema-on-read inconsistencies, detect missing attributes, check data types with IS_STRING and IS_NUMBER, reshape output JSON, and use subqueries.&hellip;<\/p>\n","protected":false},"author":2181,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":true,"footnotes":""},"categories":[53,143535],"tags":[95509],"coauthors":[12351],"class_list":["post-84227","post","type-post","status-publish","format-standard","hentry","category-featured","category-nosql","tag-standardize"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/84227","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/2181"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=84227"}],"version-history":[{"count":8,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/84227\/revisions"}],"predecessor-version":[{"id":109825,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/84227\/revisions\/109825"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=84227"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=84227"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=84227"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=84227"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}