Transferring Data with JSON in SQL Server

Comments 3

Share to social media

JSON has two distinct uses, to transmit data and to store it. They are best considered separately. When you use JSON to store data, you are generally forced down the route of using a special-purpose database, though SQL Server is happy to accommodate JSON as an NVARCHAR(MAX). For transmitting and transferring data, JSON should be part of your toolkit because it is so widely used and supported. SQL server has at last provided enough functionality to allow us to use JSON to transmit data within applications and between applications and receive data from web services and network sources. There is a rich infrastructure of both standards, resources, and components that make this easy.

Why do we stick to ODBC and TDS to send and receive tables? Well, it is for three reasons; the conventions within ODBC quietly send the metadata, or ‘data about the data,’ along with the data itself. Secondly, TDS (Tabular Data Stream) is a very efficient application-layer protocol; finally, it is bullet-proof in its reliability. Now that JSON is acquiring a standard for JSON Schema which is being adopted widely, is it time to reconsider? With network speeds a thousand times faster than when SQL Server was first created, are we still bound to the traditional ways of transferring data? Should we consider other ways?

I have long had a dream of transferring data in such a way that the recipient has enough information to create a table automatically to receive the data. In other words, it transfers the metadata as well as the data. It makes it possible to transfer tables between different systems. Sure, ODBC actually does this, but it is not a file-based transfer.

I will be showing how to save SQL Server tables as JSON files that include the metadata and import them into SQL Server. We’ll test it out by attempting this on AdventureWorks.

If you are interested in checking out JSON to see if it works for you, I’ll be showing you how you can transfer both the data and the metadata or schema, using JSON with JSON Schema, with standard conventions, and avoid the quirks in the SQL Server implementation.

Why Bother with JSON Schema?

JSON Schema allows you to validate the contents of a JSON file and to specify constraints. It also provides a standard way of providing metadata for other purposes beyond JSON-based storage. MongoDB, for example, can use JSON Schema to validate a collection. NewtonSoft provides an excellent version of JSON that has full JSON Schema support and an online JSON Validator. You can even generate JSON Schemas from NET types. For me, the attraction is that I can avoid having to invent a way of transferring the metadata of file-based JSON Data. Data is always better with its metadata.

Why Use JSON at All When We Have Good Old CSV?

JSON wasn’t originally intended for relational data which is constrained by a schema or where the order of data within a row is invariant, as it is in CSV. However, raw JSON is good-natured enough not to object if you want to use it that way. No matter how you choose to use it, you’ll notice something useful if you are accustomed to CSV: JSON supports null values as well as the bit values true and false. At last, you can distinguish between a blank string and a null one.

If one can send the metadata in a way that is useful enough to allow creating a table successfully with it, then it is worth the effort. However, there are other benefits, because nowadays the front-end developers understand, and are happy with, JSON. They are content just to send and receive data in this format. This suits me, because they lose interest in accessing the base tables.

The CSV standard allows the first ‘header’ line to display the name of the column rather than the first line of data. The header line is in the same format as normal record lines. This goes some small way towards portability, but with JSON, we have the opportunity to associate far more of the metadata or a schema with the data. CSV can only be used effectively if both ends of the conversation know what every column means based on the name of the file and the header row, but with JSON, we don’t have to be so restrictive, because we can send a schema with the data.

I’ll explain more about this later, but first, we need to be confident that we are able to use JSON to reliably copy a table from one server to another. We will, for this article, stick to the most common way of rendering tabular data, as an array of objects. It is probably the simplest because it represents the way used by FOR JSON queries.

Transferring Data as an Array of Objects.

We need to consider both the Data and the Schema. For this article, we’ll just stick to tables, rather than the broader topic of results of queries of maybe several tables. We need to provide a way of reading and writing these files.

There are some interesting things we have to take into account in reading data into a table.

  • OPENJson function can’t have certain deprecated datatypes in its ‘explicit schema’ syntax. We have to specify them as NVARCHAR(x)
  • FOR JSON cannot use some of Microsoft’s own CLR datatypes.
  • We have to cope properly with identity fields.
  • If importing data into a group of related tables, we need to temporarily disable all constraints before we start and turn them back on when we finish, and we need to wrap the operation in a transaction.
  • JSON files are, by convention, written in UTF-8 file format.

This makes the process a bit more complicated than you’d imagine and makes the requirement for a schema more important. We’ll look at what is required in small bites rather than the grand overview.

Writing the File Out

Here is what we’d like to achieve first: it is a JSON rendering of the AdventureWorks PhoneNumberType table, stored in a file. It contains both the data and the schema. You can keep data and schema separate if you prefer or have the two together in the one file. The schema has extra fields beyond the reserved JSON Schema fields just to make life convenient for us. It tells us where the table came from and what the columns consisted of. It also tells us if the columns are nullable. As we can add fields, we can even transfer column-level or table-level check constraints if we want to. The schema should have a unique identifier, the $id. Eventually, this should resolve to a reference to a file, but this is not currently a requirement. It is intended to enable the reuse of JSON schemas. It should also have a $schema keyword to identify the JSON object as a JSON schema. However, the schema can be blank, which means that all the JSON data with correct syntax that is validated against it passes validation. Here is an example:

The schema object contains a valid JSON Schema for our result, tested to the json-schema.org standard (4 upwards). It enforces the object-within-array structure of the JSON and the order and general data type of each column. You’ll see that I’ve added some fields that we need for SQL Server to SQL Server transfer.

It is worth trying this out with the NewtonSoft validator seen above, manipulating the data and seeing when it notices! I haven’t added a regex to check the date format because our JSON is generated automatically, but there are a lot of useful checks you can add. This can be done in PowerShell and is a useful routine way of checking JSON data before you import it.

This schema is for the JSON version of the data as an array of objects. Fortunately, it is easy to generate the data because this is the natural way of FOR JSON AUTO.

…which gives the result …

The only complication is the fact that we need to include null values. Otherwise, the path specifications we need to use in OpenJSON to create a SQL result don’t always work, and the error it gives doesn’t tell you why. It is a good idea to provide a list of columns rather than a Select *, because we can then coerce tricky data types or do data conversions explicitly. If everything we have to do is as easy as that, it will be a short article.

We soon find out that it isn’t that easy.

Gives the error

Alas, the JSON implementation has not yet come to grips with the geography datatype. Even to reliably export a table, we must spell out the columns if the table has CLR user types other than hierarchyid. This works.

Although it provides the JSON, you need to carry the information about the SQL Server data type that the spatialLocation represents. As you can appreciate, to make it easy to consume a JSON document in SQL Server, you also need a schema.

Now we need to create the JSON.Schema of that JSON we just produced. There is a draft standard for this that is understood by the best .NET JSON package (NewtonSoft) as well as MongoDB 3.6 onwards. At the moment, it can do basic constraints and datatype checks, but it can also allow us to create a SQL table if we add sufficient information. Our extra information doesn’t interfere with JSON schema validation as long as we don’t use reserved words.

SQL Server doesn’t support any type of JSON schema information other than the one used in OpenJSON WITH. It is optimistically called ‘explicit schema.’ Although this is very useful, it can’t be transferred with the data or referenced from a URL. Sadly, it cannot be passed to the OpenJSON function as a string or as JSON. Ideally, one would want to use JSON Schema, which is now also used increasingly in MongoDB.

The application that is using the database will probably know about JSON Schema because the NewtonSoft JSON library is so popular and can pass it to you. Unfortunately, the datatypes in SQL Server are a rich superset of the basic JSON number, string, and Boolean. You won’t even get a datetime. JSON Schema, however, allows you to extend the basics with other fields, so we’ll do just that.

If you aren’t interested in JSON Schemas, you can still use them, just using { } or true as your schema, which means ‘anything goes’ or ‘live free and die, earlier than you think.’

We need to enforce nullability from the JSON perspective. This is done by defining the number type as an array. A nullable number allows both numbers and nulls, whereas a nullable string allows both nulls and strings. Because the sensible database developer adds descriptions to columns via the ms_description extended property, we can include that. We have added fields that aren’t part of the standard just to help us: SQLtype, column_ordinal, is_nullable, for example. This is allowed by the standard.

We use the handy sys.dm_exec_describe_first_result_set table function that gives us even more information than we want. I’m writing this for SQL Server 2017. SQL Server 2016 is usable, but the STRING_AGG() function is much neater for demonstration purposes than the XML concatenation trick.

I will demonstrate, in this article, how to create a batch to save all the JSON Schemas for all the tables in the current database. But first, you’ll need two procedures. They are temporary so there is no mopping up to be done afterward, but you can easily make them permanent in a ‘utility’ database if they are useful to you later. The schemas are saved into a directory within C:\data\RawData\ on your server. (Obviously, you alter this to suit) The files will be stored in a subdirectory based on the name of the database, followed by Schema. Be sure to create the directory before running the code.

To create a valid JSON.Schema file in that #CreateJSONSchemaFromTable stored procedure, I’ve had to create it as a string because SQL Server’s FOR JSON couldn’t do all the work. That is the joy of JSON: it is so simple that you have the option of creating it as a string if the automatic version doesn’t quite do what you want. It is doing a few quite tricky things such as the array of JSON types that are necessary to check for nullability in JSON. You can verify it, too, with ISJSON() to be doubly certain it’s proper JSON, as I’ve done.

Once these procedures are in place, you can work considerable magic. To start fairly simply, here is how you can create the JSON for a specific table, both schema and data, and save it to disk on the server

You can, of course, save the JSON Schema file of all the tables in your database:

You can run the script to create the schema for all the tables of any database just by running it within the context of that database or add the USE directive.

Just to show that we have the power, here is how you can save all the JSON files, containing both the schema and data for all your tables all in one go. We still have some way to go because the CLR datatypes aren’t yet handled. We’ll deal with that soon.

If you attempted this on AdventureWorks, you’ll appreciate that it will be a difficult article after all. This is because we have still to tackle the task of getting the JSON data when we have difficult datatypes such as CLR. Here is the procedure that will do that:

You’d use it like this to get the data (make sure you create the directory first!):

To create files with the data and schema together, you’d do this

We can read out AdventureWorks into files, one per table, in around forty seconds. Sure, it takes a bit longer than BCP, but the data is easier to edit!

Reading the File Back in

Now on another computer, we can pick the file up and easily shred it back into a table. We might want to complete one of two common tasks. If we are creating a table from the imported data, then we have to get the data from the schema and SELECT INTO the table. If we want to stock an existing table, then we want to INSERT INTO. In the second case, we can get the metadata from the table instead of the JSON file in order to do this.

Reading the JSON Data into an NVARCHAR(MAX) Variable

This should work to read in the data from a file that is local to the server, but it doesn’t seem to import utf-8 data properly. However, for this test harness, it is OK.

Shredding JSON into a Table Source Manually

Now, assuming we have a JSON file that has both the schema and the data, we can do this to get a result when either doing a SELECT INTO or an INSERT INTO.

If you have made it to this point without reading the first part of the article, you might have thought that I’d typed all that SQL code out. No. I could have done it, but it’s a bit boring to do so, and I make mistakes.

You will notice that I had to comment out the organisationNode field. This is because CLR datatypes aren’t yet supported with the Explicit Schema (WITH (…)) syntax.

You need to do this instead

You’ll also find that you can’t use legacy types such a text, image and image either. In this case, you need to specify the equivalent MAX type instead.

Automating the Shredding of the JSON with the Help of JSON Schema

This means. that to import JSON reliably into a lot of tables, you will need to resort to assembling and executing code as a string using sp_executesql. This allows you to generate both the rather tedious column list and that rather daunting list of column specifications called the OpenJSON explicit schema automatically from the JSON Schema, or even an existing table if you wish. This now means that we can take any data from a file and read it straight into a suitable table. 

Of course, you will probably want to make sure that the JSON is valid before importing into a database. I have written a post that describes how to automate this for a large number of files.

We’ll start with a little test harness with a sample JSON schema and show how you can generate these two strings, the column specification, and the explicit schema. We need to do it in a way that will get around the restrictions of the OpenJSON implementation.

In this case, what we get is this ‘explicit schema’ to place in OpenJSON’s WITH clause,

…and a list of parameters for the SQL Expression …

Now we just need to put this all together in a way that is convenient! Note that you’ll need to add scripting logic to ensure that the schema is created first in the destination database.

Inserting Data into a New Table

You can read in a file with the schema and data, importing into a new database, in this case called Demos:

Inserting into an Existing Table

In order to insert into an existing table, we need to use the principles we’ve shown. We can adopt several approaches to this. I prefer to use PowerShell with SMO, iterating through the tables, but the whole process can be done in SQL if need be. Whichever approach you take, the essential procedure is this which takes a file with just the data and inserts it into an existing table:

We didn’t have to use the schema because we can get everything we need from the table. Note that if you opt to have a single JSON document that holds both the data and the schema, you need to alter the OpenJson(@jsonData) to OpenJson(@jsonData, 'strict $.data')in the procedure.

Conclusion

If you want to copy data between SQL Server databases, nothing matches the performance of native mode BCP. If, on the other hand, you are sharing data with a range of heterogeneous data stores, services or applications, then you need to use JSON, I reckon. Nowadays, it is best to pass the schema with the data and the rise of the JSON Schema standard means that now we can do it reliably with checks for data integrity. Sure, we can stick to CSV, and it is great for legacy systems if you can find a reliable way of doing it for SQL Server, but CSV has no standard schema yet, and the application developers understand JSON better.

I get a certain pleasure in being able to write data straight into a database without the preliminaries of puzzling out the schema that would be required from a CSV file. It is so fast that I weep for all those lost hours doing it the hard way.

I also look forward to being able to check and validate data before I allow it anywhere near the database. Somehow JSON schema has opened up new possibilities.

SourceCode

The source to this article and various blogs on the topic of importing, validating and exporting both JSON Schema and data in SQL Server  is on github here

More articles by Phil about JSON Schema and JSON