Importing JSON Data from Web Services and Applications into SQL Server

To support many applications, it makes sense for the database to work with JSON data, because it is the built-in way for a JavaScript or TypeScript application to represent object data. It can mean less network traffic, looser coupling, and less need for the application developer to require full access to the base tables of the database. However, it means that the database must do plenty of checks first before importing. Phil Factor explains how it can be easily done.

Articles by Phil Factor about JSON and SQL Server:

  1. Consuming JSON Strings in SQL Server (Nov 2012)
  2. SQL Server JSON to Table and Table to JSON (March 2013)
  3. Producing JSON Documents From SQL Server Queries via TSQL (May 2014)
  4. Consuming hierarchical JSON documents in SQL Server using OpenJSON (Sept 2017)
  5. Importing JSON data from Web Services and Applications into SQL Server(October 2017)

There are a number of reasons why database developers or DBAs will regard JSON with suspicion. They are used to the protection of a higher level of constraints and checks for imported data: They want to be sure that the metadata of the data is correct before they import it: They also want to check the data itself. At the simplest level, they want to be confident that a regular feed from a data source is as robust as possible. Why so fussy? Simply because once bad data gets into a database of any size, it is tiresome and time-consuming to extract it.

As well as taking JSON data from web-based applications, especially single-page ones, many databases rely on volatile data from web services, such as a list of currencies and their current value against the dollar. We need to be sure that each time it happens, the JSON has the same metadata or data structure as usual, and that the data is valid. Although nowadays, application data generally comes as a JSON document, we have a similar problem with other types of document-based data: XML, for example, exists in two variations: Schema-based or typed XML which is good and virtuous but has a naughty sister, schema-less or untyped XML. Typed XML is the only sort of XML you should allow in a SQL Server database. Not only is it safer to use, but it is stored much more efficiently. Untyped XML, JSON and YAML all require checks.
We’d like to do the same with JSON as we do with typed XML, and allow JSON Schema to do the donkey work; but, sadly, SQL Server don’t currently support any JSON schema binding and cannot, therefore, store JSON efficiently in a ‘compiled’ form. (note: you can get 25x compression of JSON data, and far better performance by using clustered columnstore indexes, but that is a different story) 

The dominant JSON Schema is actually called ‘JSON Schema’ and is in IETF draft 6. It can describe a data format in human-readable JSON and can be used to provide a complete structural validation. PostgreSQL has postgres-json-schema for validating JSON, JavaScript has many add-ins for doing this. Net programmers have Json.NET Schema and several other alternatives.

PowerShell is the obvious place to validate your JSON via JSON.net schema, but there is, in fact, quite a lot you can do in SQL to check your JSON data. You can easily compare the metadata of two JSON documents to see if anything has changed, and you can apply constraints on JSON Data. This is because of the use of path expressions. To get started we’d better describe them, because they are essential to any serious use of JSON in SQL Server.

Path expressions

When you need to read from, or write to, values from a JSON document in SQL Server, you use JSON path expressions. These can reference objects, arrays or values.

These path expressions are needed if you call OPENJSON with the WITH clause to return a relational table-source, if you call JSON_VALUE to extract a single value from JSON text, or when you call JSON_MODIFY to update this value or append to it.

The path itself starts with a dollar sign ($) that represents the context item. The path consists of elements separated by ‘dots’ or full-stops.

The property path is a set of path steps that consist of Key names that are optionally ‘quoted’. (e.g. $.info.address.county or $.info.address.”post code”.outward). Each dot denotes that the Lvalue (left-side value) is the parent of the RValue (right-side value). If the key name starts with a dollar sign or contains special characters such as spaces then you need to use double-quoted delimiters for the key (e.g. $.info.Cost.”La Posche Hotel”). If the key name refers to an array, they are indexed with a zero-based index. (e.g. $.info.tags[0] or $.info.tags[1])

If the path references an object that exists more than once, neither JSON_Value nor JSON_modify can access the second or subsequent values. In this case, you have to use OpenJSON instead to get to all the values.

JSON paths in SQL Server can start with a keyword ‘lax’ or ‘strict’. It is an unusual requirement to want to suppress errors if a path isn’t found in the JSON document but this is the default, and you can specify this by using ‘lax’. You’d be more likely to want ‘strict’ mode, but ‘lax’ is better if you want to test whether a path value is there because you will know something is wrong by the NULL return value

You can easily use path expressions. The first function you’d probably need is a routine to tell you what these expressions actually are for any particular JSON document.

Finding out what paths there are in a JSON string or document

Here is a routine that takes a JSON string and returns a table-source containing the expressions, data types and values for the JSON that you specify.

With this, you can see what paths lead to the keys, and you can see what is in the values of those keys.

Which would give you this…

Differences between the metadata of two JSON strings

It is very easy to check two JSON strings to make sure that they have the same metadata, and report any differences that it finds.

As part of the build, we’d run some code like this to ensure that the function still does what you expect

Actually, to be honest, I wouldn’t do my tests this way in order to make sure the function works. I’d run all the results into a table and make sure that the entire table was what I’d expect. After all, we all do automated unit tests as part of our daily build don’t we?

OK. We can now compare the metadata of two JSON strings, but we can compare the values as well if we ever need to check that two JSON documents represent the same data.

Checking that the JSON is what you expect

Many Web Services send information messages or warning messages in JSON format instead of the data you expect. It could contain a variety of messages such as service interruption, subscription terminations, or daily credit-limits reached. These need to be logged and you need to bypass the import routine. Probably the quickest way to check your JSON is to do a simple JSON_VALUE call on a key/value pair that needs to be there, to see if you get NULL back. To get that path in the first place, you can use the JSONPathsAndValues function. However, it is possible that there are a number of key/value pairs that need to be there. There are plenty of ways of doing this if you have a list or table of the paths you need. I’ll use a VALUES table-source to illustrate the point.

Comparing values as well as metadata

A very small tweak in the code for checking the keys will allow you to compare the values as well if you ever need to do that. Note that it checks the order of lists too.

As you can see, there are two differences between the JSON strings. There are different values for the same array element (line 1 of result) and there are a different number of array elements (line 2).

Checking for valid SQL Datatypes.

That is OK as far as it goes, but what about checking that the data will actually go into SQL Server. Although there are good practices for the storage of dates in JSON, for example, there is no JSON standard. If we know our constraints, it is dead easy to check. Imagine we have some JSON which is of a particular datatype. We just define the paths of the values that we want to check and perform whatever check we need on the JSON. We can demonstrate this technique.

 

Which, in this test case would give…

You will have appreciated that, as well as the JSON, you will need a separate table source to say what type of data each value should be, and I’ve done a custom date format to show that you can refine your constraint. I’ve used a VALUES table source, but you can easily swap in a JSON one. This table source needs to be kept in sync with the case statement for it to work. In a working system, you’d want to encapsulate all this in a function.

Checking whether the values will pass constraint checks

So how would you tackle the task of checking this simple example to make sure that all these IP addresses were valid? For this example we’ll just check that there are just three dots. We don’t want to check the name, obviously.

You wouldn’t want that last IP address in your database. It isn’t valid. You only want to check those IP values. Here, you can very easily run the check.

And you will see the bad IP address, but it only runs the check on the list of IP addresses, which is what you want.

If you had a number of checks to do, you’d save the table-source as a table variable or temporary table and run several queries on it.

Scaling things up

Normally you are dealing with large amounts of JSON data, so you only want to parse it once into a temporary table or table variable before running all your metadata checks. The advantage of doing it this way is that, once you’ve put a good primary key on the path (beware of duplicate JSON keys: they are valid JSON – RFC 4627), the process of firstly checking that the data can be successfully coerced into the appropriate column of the destination table, and then checked that it is within bounds before finally unpicking the hierarchical JSON data into all the relational tables in the right order is much easier and well-controlled.

Using a JSON Webservice from SQL Server

Normally, you’d use SSIS or Powershell for a regular production data feed, but it is certainly possible to do it in SQL. The downside is that your SQL Server needs to have internet access, which is a security risk, and also you have to open up your security surface-area by allowing OLE automation. That said, this is how you do it.

To use this is simple. You can use this on any of the JSONTest samples

Which will give you …

Now that we have the means to get a real service we can try it out. To try out the next example, you need to register with GeoNames.org, but they are good people running a free and useful service. (If you decide to use it in production, make sure you buy support).

Imagine that you need to find the exact geographical coordinated for any postal code in the world. You can now do this, or a whole range of geographical services. For this example, though, we’ll just list the capitals of all the countries in a defined area, together with their populations and longitude/latitude coordinates.

This web service is liable to send you messages instead of your data so be sure to check and log these.

In the following batch, we are merely showing this sort of message as an error.

In this case, I should have used my own account rather than demo mode. Although the ‘demo’ name in the URL will work a few times per hour, you will need to change this for your own registered name if you’re getting stuck in.

If all is well, this will give you a result something like this…

Of course, there is a lot more you can do. You can, for example, check that the data fits the datatypes in the table, and do whatever other constraint checks you need.

You can of xcourse dispense with the gymnastics of that last bit of openJSON by using what is in the output of dbo.JSONPathsAndValues

This gives the same result but saving some parsing.

Summary

I’ve set out to show how you can run all manner of checks before importing some JSON into SQL Server, such as ensuring that the JSON metadata is what you expect, finding out the full paths of the information you want, or checking that the values are valid for your datatypes and within range. It is easy to work out what is in a JSON document without having to inspect it. By showing you a simple feed from a web service, I hope I’ve shown how you can make it as robust as you require. There is a lot more you can do, of course, but I hope I’ve shown you enough to enable you to feel confident about accepting data in JSON format.