{"id":81726,"date":"2018-11-14T17:14:10","date_gmt":"2018-11-14T17:14:10","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=81726"},"modified":"2018-11-14T17:45:40","modified_gmt":"2018-11-14T17:45:40","slug":"getting-excel-data-as-json-rainfall-and-sheep-counting","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/getting-excel-data-as-json-rainfall-and-sheep-counting\/","title":{"rendered":"Excel to SQL Server via JSON: PowerShell Conversion and OPENJSON Import Walkthrough"},"content":{"rendered":"<p>You might think that it is easy to get JSON data from a spreadsheet, and there are plenty of utilities around that are based on the idea that it is trivial. If such data was strictly tabular, then it might be.\u00a0 The problem is that any conversion tool makes assumptions about the way that data is structured, whether it is row- or column-oriented, whether it has headers, or row labels, and so on. The whole joy of excel is that it allows you a great deal of leeway in the way that you lay out your data. You might use headers, labels or ranges. There seems to be no end to the human ingenuity in finding creative ways of structuring data. The safest way I\u2019ve ever found to automate the harvesting of data from a spreadsheet is to automate the process in PowerShell.<\/p>\n<p>Sometimes, the requirement is just too trivial for an automated task. Let\u2019s take a couple of simple tasks. Both these are based on what amounts to a form. As a developer, I occasionally have to deal with JSON-based files that amount to configuration information. For my first example, I will take a simple calendar form that allows someone to record a daily measurement such as the hours worked, the profit, the output, or whatever. We\u2019ll do it for rainfall in inches.<\/p>\n<p>A data person will groan slightly. This is a pivot that is friendlier to humans than machines. It has added totals. It has places where there is no data. What we need is a json file, object-within-array, that deals with all this. You won\u2019t get very far just turning the format to JSON. Here is a sample spreadsheet. I&#8217;ll add a version to this article.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"529\" height=\"764\" class=\"wp-image-81727\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-158.png\" \/><\/p>\n<p>I\u2019ve taken an existing format and added very few tweaks. The year in the top title is actually an integer value that is in a separate call to the merged-right-aligned cell containing the rest of the title. This makes it easy for us to detect what year the data applies to.<\/p>\n<p>We then add a cell underneath the form. This is going to grow to the same number of cells in the same shape as the cells with the data in it.<\/p>\n<p>In the cell formula, we add an expression<\/p>\n<pre class=\"lang:vb decode:true\">=IF(LEN(TRIM(E3))&gt;0,CONCAT(\"{\"\"date\"\":\"\"\",TEXT(DATE($J$1,COLUMN()-1,$A3),\"yyyy-mm-dd\"),\"\"\",\"\"rainfall\"\":\",E3,\"},\"),\"\")<\/pre>\n<p>What this means is \u2018if the length of the trimmed string in the rainfall cell is greater than zero, then create a string that consists of the date label followed by a string value in ISO format that consists of the year column we added at the top, the column number -1 to tell us the month number, and the day taken from the column at the beginning of the row. Then we add the \u2018rainfall\u2019 label and the value in the cell. If no data in the call then we don\u2019t record anything. \u2019<\/p>\n<p>For this first cell we get the string \u2018{&#8220;date&#8221;:&#8221;2016-01-01&#8243;,&#8221;rainfall&#8221;:0.1},\u2019 displayed. It looks suspiciously like a fragment of JSON. Those liberally-sprinkled $ signs are absolute references to either columns and rows. The cell reference with both column and rows specified as absolute references is to get the year.<\/p>\n<p>Now we use the \u2018fill down\u2019 and \u2018fill right\u2019 features to copy this expression to a block of cells of the same size and shape as the data(twelve across by thirty-one down), below this new cell. We now get a sort of doppelganger matrix with the fragment of JSON rather than the data.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"719\" height=\"641\" class=\"wp-image-81728\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-159.png\" \/><\/p>\n<p>You can\u2019t see the full JSON equivalent of each cell as the display is truncated. You can see that it is a sparse matrix because if there is no rainfall then there is nothing to record (this is a philosophical argument I\u2019ll grant you)<\/p>\n<p>We have one last duty to perform. We need to concatenate all these together and lop off the final comma before wrapping it up into the array square-brackets.<\/p>\n<p>In the cell below the block we\u2019ve just created, we add this expression as a formula<\/p>\n<pre class=\"lang:vb decode:true\">=CONCAT(\"[\",LEFT(CONCAT(B36:M66),LEN(CONCAT(B36:M66))-1),\"]\")<\/pre>\n<p>This cell will contain the legal JSON array of objects.<\/p>\n<p>We then hide the rows containing the calculation calls, leaving the form and the final call containing the final JSON.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"658\" height=\"509\" class=\"wp-image-81729\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-160.png\" \/><\/p>\n<p>So we do that, and save it to file or whatever we need to do. We paste it into SSMS and check that it has faithfully recorded the data. With a relief, we find that our totals have tallied and we can go on to add the data to our database.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"534\" height=\"949\" class=\"wp-image-81730\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-161.png\" \/><\/p>\n<p>Now that we have a template that allows easy entry and checking of the data, we also have no fear of getting the data into a database.<\/p>\n<p>What if the data is a wee bit more complex?<\/p>\n<p>A while back, I was writing a blog here and needed a sample of JSON that wasn\u2019t distracting and had no compromising information. I knocked together a database in Excel, using tables from the Wikipedia. I used Excel. I quite often gather information like this because of Excel\u2019s easy way with HTML tables.<\/p>\n<p>I have to confess that I got rather interested in the information I gathered, which was the dialect words for counting in twenties, which was once used throughout Britain for simple counting tasks. It is interesting because it is a survival of the old Brythonic language used before Roman imperialism. There was a craze amongst folklorists in the 1850s for collecting instances of it being used. It was lucky for modern linguists that they did as it was, even then, dying out.<\/p>\n<p>Please imagine that it is your valuable business data and that what we have is the output of all our manufacturing plants month by month: something like that.<\/p>\n<p>Here is a section of the spreadsheet<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"844\" height=\"528\" class=\"wp-image-81731\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-162.png\" \/><\/p>\n<p>I added the cells in a similar way to that I\u2019ve just described for the yearly calendar data. As I kept finding more and more strings, going back into books published in the 1860s, I had to keep adding columns for each new region was found. It had to be extensible.<\/p>\n<p>This is an array of \u2018region\u2019 objects, each of which has an array of word objects.To illustrate the fact that this data was rather more complex, I\u2019ve just reduced it to two numbers in the\u00a0sequence\u00a0array rather than the original twenty. Each JSON document in our collection has an embedded array.<\/p>\n<pre class=\"lang:js decode:true \">\u00a0\u00a0[{\r\n\u00a0\u00a0\u00a0\u00a0 \"region\": \"Wilts\",\r\n\u00a0\u00a0\u00a0\u00a0 \"sequence\": [{\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"number\": 1,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"word\": \"Ain\"\r\n\u00a0\u00a0\u00a0\u00a0 }, {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"number\": 2,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"word\": \"Tain\"\r\n\u00a0\u00a0\u00a0\u00a0 }]\r\n\u00a0\u00a0}, {\r\n\u00a0\u00a0\u00a0\u00a0 \"region\": \"Scots\",\r\n\u00a0\u00a0\u00a0\u00a0 \"sequence\": [{\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"number\": 1,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"word\": \"Yan\"\r\n\u00a0\u00a0\u00a0\u00a0 }, {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"number\": 2,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"word\": \"Tyan\"\r\n\u00a0\u00a0\u00a0\u00a0 }]\r\n\u00a0\u00a0}]<\/pre>\n<p>\u00a0Excel can cope perfectly easily with this.<\/p>\n<p>Below the form in which we entered our sheep-counting words for each region of the British isles, we create a block of calls of the same size and shape. Into the first cell we add a formula:<\/p>\n<pre class=\"lang:vb decode:true \">=IF(LEN(TRIM(B2))&gt;0,CONCAT(\"{\"\"number\"\":\",$A2,\",\"\"word\"\":\"\"\",B2,\"\"\"},\"),\"\")<\/pre>\n<p>This means \u2018 if the value exists, join together the string \u2018number\u2019 in parentheses, followed by the data in the corresponding first row that tells us what number we are counting, and then provide the value within the column for this particular region. Add a comma. Otherwise leave a blank string\u2019.<\/p>\n<p>What we get is<\/p>\n<pre class=\"lang:js decode:true\">{\"number\":1,\"word\":\"Ain\"},<\/pre>\n<p>We then copy this down and across by filling down and then right. It will be twenty rows and the same number of columns as there are regions.<\/p>\n<p>Now we can assemble the region objects in the row below. The first call will have this expression<\/p>\n<pre class=\"lang:vb decode:true \">=CONCAT(\"{\"\"region\"\":\"\"\",B1,\"\"\",\"\"sequence\"\":[\",LEFT(CONCAT(B22:B41),LEN(CONCAT(B22:B41))-1),\"]},\")<\/pre>\n<p>This means \u2018join together or concatenate a string consisting of the key \u201cregion\u201d: followed by name of the region (B1). Then we have the key \u201csequence\u201d followed by an array open-square-brackets. Then we join together all twenty of the values of the cells above into an array.<\/p>\n<p>This will, in the first column, give us the value \u2026..<\/p>\n<pre class=\"lang:js decode:true\">{\r\n  \"region\": \"Wilts\",\r\n  \"sequence\": [{\r\n    \"number\": 1,\r\n    \"word\": \"Ain\"\r\n  }, {\r\n    \"number\": 2,\r\n    \"word\": \"Tain\"\r\n  }, {\r\n    \"number\": 3,\r\n    \"word\": \"Tethera\"\r\n  }, {\r\n    \"number\": 4,\r\n    \"word\": \"Methera\"\r\n  }, {\r\n    \"number\": 5,\r\n    \"word\": \"Mimp\"\r\n  }, {\r\n    \"number\": 6,\r\n    \"word\": \"Ayta\"\r\n  }, {\r\n    \"number\": 7,\r\n    \"word\": \"Slayta\"\r\n  }, {\r\n    \"number\": 8,\r\n    \"word\": \"Laura\"\r\n  }, {\r\n    \"number\": 9,\r\n    \"word\": \"Dora\"\r\n  }, {\r\n    \"number\": 10,\r\n    \"word\": \"Dik\"\r\n  }, {\r\n    \"number\": 11,\r\n    \"word\": \"Ain-a-dik\"\r\n  }, {\r\n    \"number\": 12,\r\n    \"word\": \"Tain-a-dik\"\r\n  }, {\r\n    \"number\": 13,\r\n    \"word\": \"Tethera-a-dik\"\r\n  }, {\r\n    \"number\": 14,\r\n    \"word\": \"Methera-a-dik\"\r\n  }, {\r\n    \"number\": 15,\r\n    \"word\": \"Mit\"\r\n  }, {\r\n    \"number\": 16,\r\n    \"word\": \"Ain-a-mit\"\r\n  }, {\r\n    \"number\": 17,\r\n    \"word\": \"Tain-a-mit\"\r\n  }, {\r\n    \"number\": 18,\r\n    \"word\": \"Tethera-mit\"\r\n  }, {\r\n    \"number\": 19,\r\n    \"word\": \"Gethera-mit\"\r\n  }, {\r\n    \"number\": 20,\r\n    \"word\": \"Ghet\"\r\n  }]\r\n}<\/pre>\n<p>This needs to be provided to every column with a regional variation in it by filling right to the row across all the columns representing a region.<\/p>\n<p>Now we need to join all the regions together to give us the document, and to give it the array enclosing-brackets and nick out the final comma<\/p>\n<p>This is done by a final expression in the first data column<\/p>\n<pre class=\"lang:vb decode:true \">=CONCAT(\"[\",LEFT(CONCAT(B42:AU42),LEN(CONCAT(B42:AU42))-1),\"]\")<\/pre>\n<p>Here is the spreadsheet with the rows un-hidden<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"437\" height=\"576\" class=\"wp-image-81732\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-163.png\" \/><\/p>\n<p>And the final version<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"785\" height=\"681\" class=\"wp-image-81733\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-164.png\" \/><\/p>\n<p>And we check out the JSON in SSMS<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"556\" height=\"728\" class=\"wp-image-81734\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/word-image-165.png\" \/><\/p>\n<p>We can, (and have done), added many regional variations, merely by selecting a column, copying it, and inserting the copied column. This copies all the relevant formulas. I thin merely changed the data in the copied data-cells.<\/p>\n<h1>Conclusion<\/h1>\n<p>This technique works and saves a lot of time in transferring data. It saves time in gathering data due to excel\u2019s wonderful promiscuity on collecting data. It saves time in importing the data into SQL Server because one can be so flexible in the way we arrange the JSON schema. We can even cope with data that goes across multiple worksheets. You may need to \u2018escape\u2019 the data if it contains banned characters such as the double-quote marks used as string delimiters or the control characters.<\/p>\n<p>I\u2019ve found that this technique works best where excel data is structured into forms in a reasonably disciplined ways, so it is worth taking time to create handy templates. I\u2019ve noticed that a lot of business people relate very quickly to weekly or monthly charts of data and will be grateful for an accommodation towards the way that they naturally think about data.<\/p>\n<p>The collection of the data can be done by ODBC or by automation of Excel rather than cut\u2019 n paste. <a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/database-administration\/getting-data-between-excel-and-sql-server-using-odbc\/\">See Getting Data between Excel and SQL Server using ODBC<\/a> for the details.<\/p>\n<h2>Sample Spreadsheets<\/h2>\n<ul>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/YanTanTethera-1.xlsx\">YanTanTethera<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/11\/SampleRainfall-1.xlsx\">SampleRainfall<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Import Excel spreadsheet data into SQL Server without ETL tooling &#8211; convert Excel to JSON using PowerShell (ImportExcel module), then load the JSON into SQL Server using OPENJSON. Covers column-to-property mapping, handling irregular sheets, multi-sheet workbooks, and dealing with merged cells and headers. Worked examples with weather and agricultural datasets.&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[4643,4880],"coauthors":[6813],"class_list":["post-81726","post","type-post","status-publish","format-standard","hentry","category-blogs","tag-excel","tag-json"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/81726","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=81726"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/81726\/revisions"}],"predecessor-version":[{"id":81744,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/81726\/revisions\/81744"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=81726"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=81726"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=81726"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=81726"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}