{"id":81752,"date":"2018-11-16T10:34:01","date_gmt":"2018-11-16T10:34:01","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=81752"},"modified":"2019-08-15T13:45:04","modified_gmt":"2019-08-15T13:45:04","slug":"how-to-validate-json-data-before-you-import-it-into-a-database","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/how-to-validate-json-data-before-you-import-it-into-a-database\/","title":{"rendered":"How to validate JSON Data before you import it into a database."},"content":{"rendered":"<p>If you are, as you should be, checking JSON data in a whole lot of files before you import them into your database, you would do well to use JSON Schema, because you can run <a href=\"https:\/\/json-schema.org\/latest\/json-schema-validation.html\">a number of checks<\/a> such as regex checks that can\u2019t be done any other way, and it is usually possible to detect bad data<\/p>\n<p>Obviously, the most immediate value that the relational database person gets from doing this chore is to check that all the required (not null) columns are there for every row, and that they have the right sort of data. This can be done in very general terms if you use the \u2018type\u2019 field, but far more precisely if you are handy with PCRE regex.<\/p>\n<p>In a nutshell, JSON Schema defines the way that you\u2019ve structured your JSON document, the data types and constraints upon that data. It is extensible in that you can add your own special-purpose fields. It is ideal for tabular data because it can define how you represent columns, and what their names are. It can also enforce the structure so that tables and grids are safe within a JSON document.<\/p>\n<p>If you have a very small amount of data, you can do the validation in an online JSON Validator such as the <strong>NewtonSoft JSON Schema Validator<\/strong> at <a href=\"https:\/\/www.jsonschemavalidator.net\/\">https:\/\/www.jsonschemavalidator.net\/<\/a>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1016\" height=\"747\" class=\"wp-image-81753\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-166.png\" \/><\/p>\n<p>This is currently the best way of building up a JSON Schema, and trying out the features of the validator, because you can build and test the rules you create against a subset of the table data.<\/p>\n<p>Validation runs are a different problem Obviously, doing this in an online app will soon become tiresome, even with the smallest database. Imagine having to do this with a regular data feed or REST webservice! It is, however, possible to automate this.<\/p>\n<p>There are many <a href=\"http:\/\/json-schema.org\/implementations.html#validators\">different ways<\/a> of automating the validation process, using a range of platforms and frameworks. As I\u2019m mostly using PowerShell for database scripting, I do it that way. I use NewtonSoft\u2019s JSON.NET validator.<\/p>\n<p>Imagine that you need to validate the aging classic database \u2018Pubs\u2019.<\/p>\n<p>You have a data directory<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"398\" height=\"92\" class=\"wp-image-81754\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-167.png\" \/><\/p>\n<p>You have a directory with the JSON data, one file per table, named after the table.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"794\" height=\"323\" class=\"wp-image-81755\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-168.png\" \/><\/p>\n<p>You also have a sibling directory with the schemas. (I\u2019ve shown elsewhere how to generate the basic schemas from a SQL Server database, ready for your refinements.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"823\" height=\"399\" class=\"wp-image-81756\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-169.png\" \/><\/p>\n<p>I\u2019m not suggesting that this is ideal, merely the way that I\u2019ve set up the demo.<\/p>\n<p>Now we can validate the two.<\/p>\n<p>This is done here as a separate script, but it is likely to be part of an ingestion process. Your schemas will probably stay there, saved in source control for each data feed.<\/p>\n<pre class=\"theme:powershell-ise font:consolas font-size:14 line-height:16 lang:ps decode:true  \">$ErrorActionPreference = \"Stop\"\r\n# enter the base directory \r\n$Path = 'MyPathToTheData'\r\n# ...and the names of the subdirectories\r\n$SchemaDirectory = 'JSONSchema\\'\r\n$DataDirectory = 'JSONData\\'\r\n# all this following section thanks to James Newton-King\r\n$NewtonsoftJsonPath = Resolve-Path -Path \"lib\\Newtonsoft.Json.dll\"\r\n$NewtonsoftJsonSchemaPath = Resolve-Path -Path \"lib\\Newtonsoft.Json.Schema.dll\"\r\n\r\nAdd-Type -Path $NewtonsoftJsonPath\r\nAdd-Type -Path $NewtonsoftJsonSchemaPath\r\n\r\n\r\n# define the validator type\r\n$source = @'\r\n    public class Validator\r\n    {\r\n        public static System.Collections.Generic.IList&lt;string&gt; Validate(Newtonsoft.Json.Linq.JToken token, Newtonsoft.Json.Schema.JSchema schema)\r\n        {\r\n            System.Collections.Generic.IList&lt;string&gt; messages;\r\n            Newtonsoft.Json.Schema.SchemaExtensions.IsValid(token, schema, out messages);\r\n            return messages;\r\n        }\r\n    }\r\n'@\r\nAdd-Type -TypeDefinition $source -ReferencedAssemblies $NewtonsoftJsonPath, $NewtonsoftJsonSchemaPath\r\n# end of James Newton-King's code. Thanks, James.\r\n\r\nGet-ChildItem \"$($Path)\\$($DataDirectory)\" -Filter *.json | select Name | Foreach{\r\n# do every file in the directory\t\r\n\t$JSON = [IO.File]::ReadAllText(\"$($Path)\\$($DataDirectory)$($_.Name)\")\r\n\t$Schema = [IO.File]::ReadAllText(\"$($Path)\\$($SchemaDirectory)$($_.Name)\")\r\n# parse the JSON files documents into a tokenised form\r\n\t$Token = [Newtonsoft.Json.Linq.JToken]::Parse($JSON)\r\n\t$Schema = [Newtonsoft.Json.Schema.JSchema]::Parse($Schema)\r\n# do the validation, using the parsed documents\t\r\n\t$ErrorMessages = [Validator]::Validate($Token, $Schema)\r\n\tif ($ErrorMessages.Count -eq 0)\r\n\t{ write-host \"Schema is valid\" } #just for the test. I don't approve of write-host!\r\n\telse #I've selected just the first five because usually a whole column is wrong!\r\n\t{ $ErrorMessages | Select-Object -First 5 | foreach{ write-warning $_ } }\r\n\t\r\n<\/pre>\n<p>This isn\u2019t the only way of doing this. There are <a href=\"http:\/\/json-schema.org\/implementations.html#validators\">plenty of validators<\/a>, but the JSON.NET validator is a good place to start if you are already heavily into PowerShell. For more details of JSON Schema, see http:\/\/json-schema.org\/<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you are, as you should be, checking JSON data in a whole lot of files before you import them into your database, you would do well to use JSON Schema, because you can run a number of checks such as regex checks that can\u2019t be done any other way, and it is usually possible&#8230;&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[5134],"coauthors":[6813],"class_list":["post-81752","post","type-post","status-publish","format-standard","hentry","category-blogs","tag-sql-prompt"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/81752","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=81752"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/81752\/revisions"}],"predecessor-version":[{"id":81815,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/81752\/revisions\/81815"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=81752"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=81752"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=81752"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=81752"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}