How to validate JSON Data before you import it into a database.

Comments 0

Share to social media

If you are, as you should be, checking JSON data in a whole lot of files before you import them into your database, you would do well to use JSON Schema, because you can run a number of checks such as regex checks that can’t be done any other way, and it is usually possible to detect bad data

Obviously, the most immediate value that the relational database person gets from doing this chore is to check that all the required (not null) columns are there for every row, and that they have the right sort of data. This can be done in very general terms if you use the ‘type’ field, but far more precisely if you are handy with PCRE regex.

In a nutshell, JSON Schema defines the way that you’ve structured your JSON document, the data types and constraints upon that data. It is extensible in that you can add your own special-purpose fields. It is ideal for tabular data because it can define how you represent columns, and what their names are. It can also enforce the structure so that tables and grids are safe within a JSON document.

If you have a very small amount of data, you can do the validation in an online JSON Validator such as the NewtonSoft JSON Schema Validator at https://www.jsonschemavalidator.net/.

This is currently the best way of building up a JSON Schema, and trying out the features of the validator, because you can build and test the rules you create against a subset of the table data.

Validation runs are a different problem Obviously, doing this in an online app will soon become tiresome, even with the smallest database. Imagine having to do this with a regular data feed or REST webservice! It is, however, possible to automate this.

There are many different ways of automating the validation process, using a range of platforms and frameworks. As I’m mostly using PowerShell for database scripting, I do it that way. I use NewtonSoft’s JSON.NET validator.

Imagine that you need to validate the aging classic database ‘Pubs’.

You have a data directory

You have a directory with the JSON data, one file per table, named after the table.

You also have a sibling directory with the schemas. (I’ve shown elsewhere how to generate the basic schemas from a SQL Server database, ready for your refinements.

I’m not suggesting that this is ideal, merely the way that I’ve set up the demo.

Now we can validate the two.

This is done here as a separate script, but it is likely to be part of an ingestion process. Your schemas will probably stay there, saved in source control for each data feed.

This isn’t the only way of doing this. There are plenty of validators, but the JSON.NET validator is a good place to start if you are already heavily into PowerShell. For more details of JSON Schema, see http://json-schema.org/

Load comments

About the author

Phil Factor

See Profile

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 40 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

Phil Factor's contributions