{"id":101721,"date":"2024-03-27T17:00:32","date_gmt":"2024-03-27T17:00:32","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=101721"},"modified":"2024-09-03T20:04:46","modified_gmt":"2024-09-03T20:04:46","slug":"creating-a-single-json-file-for-configuration-using-notebooks","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/creating-a-single-json-file-for-configuration-using-notebooks\/","title":{"rendered":"Creating a Single JSON File for Configuration using Notebooks"},"content":{"rendered":"<p>On the blog <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/fabric-notebooks-and-deployment-pipelines\/\" target=\"_self\" rel=\"noopener\">Fabric Notebook and Deployment Pipelines<\/a> I explained a technique to keep notebooks configuration values in <strong>JSON<\/strong> files on lakehouses, a good solution from many different points of views.<\/p>\n<p>What if we need to provide maintenance to the <strong>JSON<\/strong> configuration file using notebooks?<\/p>\n<p>The first problem is the fact the typical statement to save a dataframe as <strong>JSON<\/strong> will always create a folder containing the <strong>JSON<\/strong> files. The folder behaves like a table and all the files contain the records.<\/p>\n<p>This statement will create a folder containing <strong>JSON<\/strong> files:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\">data_frame<span style=\"color: #333333\">.<\/span>write<span style=\"color: #333333\">.<\/span>mode(<span style=\"background-color: #fff0f0\">'overwrite'<\/span>)<span style=\"color: #333333\">.<\/span>json(<span style=\"background-color: #fff0f0\">'location'<\/span>)\r\n<\/pre>\n<\/div>\n<p>&#8216;location&#8217; will always be a folder, a single file name will never be accepted.<\/p>\n<p>The solution is a workaround: We can use <strong>mssparkutils<\/strong> to execute file system statements, copying the <strong>JSON<\/strong> file as a single file and dropping the folder and its content.<\/p>\n<p>These are the steps we need to execute:<\/p>\n<ol>\n<li>Save the dataframe as a <strong>JSON<\/strong> in a temporary location<\/li>\n<li>The last file in the folder created is the one we want. We need to get it and copy to it&#8217;s final location and name<\/li>\n<li>We drop the entire folder and its content.<\/li>\n<\/ol>\n<p>The final result of these steps is the single <strong>JSON<\/strong> file on the location we want it. We can develop these steps as a function, as the example below:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\"><span style=\"color: #008800;font-weight: bold\">def<\/span> <span style=\"color: #0066bb;font-weight: bold\">saveResult<\/span> (data_frame, temp_location, file_path):\r\n    data_frame<span style=\"color: #333333\">.<\/span>write<span style=\"color: #333333\">.<\/span>mode(<span style=\"background-color: #fff0f0\">'overwrite'<\/span>)<span style=\"color: #333333\">.<\/span>json(temp_location)\r\n    <span style=\"color: #007020\">file<\/span> <span style=\"color: #333333\">=<\/span> mssparkutils<span style=\"color: #333333\">.<\/span>fs<span style=\"color: #333333\">.<\/span>ls(temp_location)[<span style=\"color: #333333\">-<\/span><span style=\"color: #0000dd;font-weight: bold\">1<\/span>]<span style=\"color: #333333\">.<\/span>path\r\n    mssparkutils<span style=\"color: #333333\">.<\/span>fs<span style=\"color: #333333\">.<\/span>cp(<span style=\"color: #007020\">file<\/span>, file_path)\r\n    mssparkutils<span style=\"color: #333333\">.<\/span>fs<span style=\"color: #333333\">.<\/span>rm(temp_location, recurse<span style=\"color: #333333\">=<\/span><span style=\"color: #007020\">True<\/span>)\r\n<\/pre>\n<\/div>\n<p>The parameters of the function are:<\/p>\n<ul>\n<li><strong>data_frame:<\/strong> As the name explains<\/li>\n<li><strong>temp_location<\/strong>: Temporary location where we will save the <strong>JSON<\/strong> folder<\/li>\n<li><strong>file_path<\/strong>: Final location of the single <strong>JSON<\/strong> file we want.<\/li>\n<\/ul>\n<p>The statements used on the <strong>mssparkutils<\/strong> are the following:<\/p>\n<ul>\n<li><strong>ls<\/strong>: List the files in the location<\/li>\n<li><strong>cp<\/strong>: Copy a file to a new location<\/li>\n<li><strong>rm<\/strong>: Remove a complete folder, with the option to be recursive, removing the folder content<\/li>\n<\/ul>\n<p>Once the function is created, we just need to use it:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\"><span style=\"color: #888888\">#read the json file<\/span>\r\ndf <span style=\"color: #333333\">=<\/span> spark<span style=\"color: #333333\">.<\/span>read<span style=\"color: #333333\">.<\/span>json(<span style=\"background-color: #fff0f0\">'Files\/configuration\/config.json'<\/span>, multiLine<span style=\"color: #333333\">=<\/span><span style=\"color: #007020\">True<\/span>)\r\n\r\n<span style=\"color: #888888\">#change the content as needed<\/span>\r\n<span style=\"color: #888888\">#save the json file using the function<\/span>\r\nsaveResult(dt,<span style=\"background-color: #fff0f0\">'Files\/configuration\/accounts'<\/span>,<span style=\"background-color: #fff0f0\">'Files\/configuration\/config.json'<\/span>)\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>On the blog Fabric Notebook and Deployment Pipelines I explained a technique to keep notebooks configuration values in JSON files on lakehouses, a good solution from many different points of views. What if we need to provide maintenance to the JSON configuration file using notebooks? The first problem is the fact the typical statement to&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,159164],"tags":[158998,158997,159035],"coauthors":[6810],"class_list":["post-101721","post","type-post","status-publish","format-standard","hentry","category-blogs","category-microsoft-fabric","tag-lakehouse","tag-microsoft-fabric","tag-notebook"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101721","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=101721"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101721\/revisions"}],"predecessor-version":[{"id":101951,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101721\/revisions\/101951"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=101721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=101721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=101721"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=101721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}