{"id":100843,"date":"2023-12-13T17:00:46","date_gmt":"2023-12-13T17:00:46","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=100843"},"modified":"2024-09-03T20:15:17","modified_gmt":"2024-09-03T20:15:17","slug":"fabric-notebooks-and-deployment-pipelines","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/fabric-notebooks-and-deployment-pipelines\/","title":{"rendered":"Fabric Notebooks and Deployment Pipelines"},"content":{"rendered":"<p>On my article about <strong>Fabric<\/strong> source control extended features, I explained how Microsoft included the notebooks on the source control.<\/p>\n<p>In this way we can include notebooks on a <strong>Software Development Lifecycle<\/strong> (<strong>SDLC<\/strong>) for <strong>Power BI<\/strong> objects.<\/p>\n<p>In this way, the notebooks need to flow from the development environment to test and production environments. However, what happens with the references the notebook may contain?<\/p>\n<p>The notebook may contain references to lakehouses and other configurations which may be different on each environment. We need to ensure these references are automatically changed when we promote the notebook to another environment.<\/p>\n<h2>Two types of references<\/h2>\n<p>The notebooks in Fabric can contain two types of references and we need to handle each type in different ways:<\/p>\n<p><strong>Configuration values:<\/strong> These are any value used in the notebook code.<\/p>\n<p><strong>Default Lakehouse:<\/strong> The default lakehouse is essential for most of the notebook tasks. The notebook code doesn&#8217;t contain the default lakehouse code directly. For this reason, the default lakehouse requires a different method to be handled than other references, becoming a 2nd type of reference.<\/p>\n<p>Let\u2019s analyse how to manage each one of these reference types on our notebooks.<\/p>\n<h2>Configuration Values<\/h2>\n<p>The rule we should use is simple: We should avoid using hard coded configuration values. We need to store the configuration externally to the notebook and in a way that we can change the values between environments.<\/p>\n<p>It doesn\u2019t matter where you choose to store the configuration values, you would need to repeat on every notebook the code to retrieve them.<\/p>\n<p>This creates another requirement: We need to centralize this code in one notebook, and we need to be able to call this notebook from any other.<\/p>\n<p>We can use the following statement:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\"><span style=\"color: #333333\">%<\/span>run <span style=\"color: #ff0000;background-color: #ffaaaa\">\u201c<\/span>notebook name<span style=\"color: #ff0000;background-color: #ffaaaa\">\u201d<\/span>\r\n<\/pre>\n<\/div>\n<p>There are other methods to execute a different notebook, but this method makes the execution happen in the same session. In this way, the caller has access to the variables created on the centralized notebook.<\/p>\n<p>In this way, the code used on the centralized notebook can load the configuration in variables and these variables will be accessible by the caller.<\/p>\n<p>If we store the configurations in a JSON file in the lakehouse, the code of the centralized notebook will be like this:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\">df <span style=\"color: #333333\">=<\/span> spark<span style=\"color: #333333\">.<\/span>read<span style=\"color: #333333\">.<\/span>json(<span style=\"background-color: #fff0f0\">\"Files\/configuration\/config.json\"<\/span>, multiLine<span style=\"color: #333333\">=<\/span><span style=\"color: #007020\">True<\/span>)\r\nurlprefix<span style=\"color: #333333\">=<\/span>df<span style=\"color: #333333\">.<\/span>collect()[<span style=\"color: #0000dd;font-weight: bold\">0<\/span>]<span style=\"color: #333333\">.<\/span>__getitem__(<span style=\"background-color: #fff0f0\">\"functionURL\"<\/span>)\r\ndatabase<span style=\"color: #333333\">=<\/span>df<span style=\"color: #333333\">.<\/span>collect()[<span style=\"color: #0000dd;font-weight: bold\">0<\/span>]<span style=\"color: #333333\">.<\/span>__getitem__(<span style=\"background-color: #fff0f0\">\"database\"<\/span>)\r\nlakehouseName<span style=\"color: #333333\">=<\/span>df<span style=\"color: #333333\">.<\/span>collect()[<span style=\"color: #0000dd;font-weight: bold\">0<\/span>]<span style=\"color: #333333\">.<\/span>__getitem__(<span style=\"background-color: #fff0f0\">\"lakehouseName\"<\/span>)\r\n<\/pre>\n<\/div>\n<p>On this way, we can move the notebook to different environments without the need to change any code.<\/p>\n<h2>Default Lakehouse<\/h2>\n<p>The default lakehouse would be way more difficult to change without <strong>Power BI\u2019s<\/strong> help.<\/p>\n<p>As explained on the previous blog, the source control saves the notebook as a set of several files.\u00a0The most important one is a <em>.PY<\/em> file containing the notebook code.<\/p>\n<p>The <em>.PY<\/em> file also contains metadata which includes the default lakehouse for the notebook.<\/p>\n<p>Luckly, <strong>Power BI<\/strong> deployment pipelines support the creation of rules for notebooks, to change the default lakehouse of the notebooks.<\/p>\n<h2>Deployment Pipelines<\/h2>\n<p>Deployment pipelines are used to move objects from one workspace to another and keep the workspaces synchronized.<\/p>\n<p>In this way, we can use different workspaces as development, test and production environment.<\/p>\n<p>We can create rules about the transfer of objects from one workspace to another. In this way, when we transfer the objects, the rules will ensure some configurations are changed, such as the default lakehouse of the notebooks.<\/p>\n<p>I have a published session about <a href=\"https:\/\/www.youtube.com\/watch?v=Ja1Om9RN_-U\">Deployment Pipelines in English<\/a> or <a href=\"https:\/\/www.youtube.com\/watch?v=GRm8cQuQcZg\">in Portuguese<\/a><\/p>\n<h2>Configuring Deployment Pipelines<\/h2>\n<p>Let\u2019s build a small demonstration.<\/p>\n<p>On the starting point, we will have 4 workspaces:<\/p>\n<ul>\n<li>2 for the development environment, one with the lakehouse and the other with notebooks<\/li>\n<li>2 for the test environment, one with the lakehouse, the other empty, it will receive the notebooks<\/li>\n<\/ul>\n<p>On my example, the workspaces are \u201cSales Lakehouse\u201d, \u201cSales Notebooks\u201d, \u201cSales Lakehouse Test\u201d, \u201cSales Notebooks Tests\u201d.<\/p>\n<ol>\n<li>On the lakehouse <em>\u201cSales Notebooks\u201d<\/em>, click the button <em>Create Deployment Pipeline<\/em>.<\/li>\n<li>On the <em>Pipeline Name<\/em> textbox, type <em>\u201cSales Pipeline\u201d<\/em> as the name of the new deployment pipeline<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1218\" height=\"613\" class=\"wp-image-100844\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-description-automatica.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click the button <em>Next<\/em><\/li>\n<li>On <em>Customize your stages<\/em> window, click the <em>Create<\/em> button<\/li>\n<\/ol>\n<p>This window allows you to customize the name of each one of the stages on the <strong>SDLC<\/strong>. I will leave the default ones.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"407\" height=\"389\" class=\"wp-image-100845\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-screen-description-aut.png\" alt=\"A screenshot of a computer screen\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>On the <em>Assign<\/em> your workspace to a stage window, click the <em>Assign<\/em> button<\/li>\n<\/ol>\n<p>We started the creation of the pipeline from one workspace. This window is questioning to which stage we would like to assign our workspace. The workspace is <em>\u201cSales Notebooks\u201d<\/em>, we will assign it to the <em>Development<\/em> stage, which is selected by default.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"406\" height=\"321\" class=\"wp-image-100846\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-description-automatica-1.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Open the drop down under the test environment and select <em>\u201cSales Notebooks Test\u201d<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"488\" height=\"372\" class=\"wp-image-100847\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-test-description-autom.png\" alt=\"A screenshot of a computer test\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Under the test environment, click the button <em>Assign a Workspace<\/em><\/li>\n<\/ol>\n<p>Once the development and test environment are assigned, the pipeline identifies the difference between the environments.<\/p>\n<h2>Making the Deployment<\/h2>\n<p>After configuring the environments of the pipeline, it&#8217;s time to make the deployment between the environments and make some tests.<\/p>\n<ol>\n<li>Under the <em>Development<\/em> environment, click the button <em>Deploy<\/em><\/li>\n<\/ol>\n<p>We can only create pipeline rules after the first deployment has been made. The first deployment will be wrong, the notebooks will be pointing to the long lakehouse.<\/p>\n<p>We do it anyway to create the rules and ensure the next deployment will be correct.<\/p>\n<ol>\n<li>On the <em>Deploy to the next stage<\/em> window, click the button <em>Deploy<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"548\" height=\"423\" class=\"wp-image-100848\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-program-description-au.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>On the <em>Test<\/em> environment, click the icon <em>Deployment Rules<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"544\" height=\"153\" class=\"wp-image-100849\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-white-rectangular-object-with-black-lines-descr.png\" alt=\"A white rectangular object with black lines\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>On the Deployment Rules window, click one of the notebooks<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"582\" height=\"270\" class=\"wp-image-100850\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-description-automatica-2.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>On Set deployment rules, click the <em>Default Lakehouse<\/em> rule<\/li>\n<li>Click the button <em>Add<\/em> rule<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"580\" height=\"421\" class=\"wp-image-100851\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-description-automatica-3.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>On the <em>From<\/em> drop down, select the Sales lakehouse, which will be the only option available.<\/li>\n<li>On the <em>To<\/em> drop down, select <em>Other<\/em><\/li>\n<\/ol>\n<p>At this point, you need to discover the Id of the lakehouse <em>\u201cSales Test\u201d<\/em> to include on the rules. The only method I identified for this is to access the lakehouse and get the Id from the URL.<\/p>\n<p>The image below shows the URL format when you access a lakehouse and how to identify the Id in it:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1124\" height=\"53\" class=\"wp-image-100852\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/word-image-100843-9.png\" \/><\/p>\n<ol>\n<li>Fill the <em>Lakehouse Id<\/em> box with the Id recovered from the Lakehouse<\/li>\n<li>Fill the <em>lakehouse name<\/em> textbox with <em>\u201cSales Test\u201d<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"597\" height=\"507\" class=\"wp-image-100853\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/word-image-100843-10.png\" \/><\/p>\n<ol>\n<li>Click the <em>Save<\/em> button<\/li>\n<\/ol>\n<p>Before the rule definition, the content of both workspaces was identified as the same. We identified a difference after the rule definiton application.<\/p>\n<p>The deployment pipeline takes the rules in consideration when calculating if two workspaces are in sync. The rule was just created and not applied, so the deployment pipeline considers the workspaces different.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1062\" height=\"356\" class=\"wp-image-100854\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-description-automatica-4.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>Test the Pipeline Rules<\/h2>\n<ol>\n<li>Under the development environment, click the <em>Deploy<\/em> button.<\/li>\n<\/ol>\n<p>After the deployment, the workspaces are in sync again. However, we know they are not the same, becuase we applied the rule. The deployment pipeline considers this when calculating if the workspaces are in sync or not.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1070\" height=\"380\" class=\"wp-image-100855\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-description-automatica-5.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Open a notebook in the workspace <em>\u201cSales Notebooks Test\u201d<\/em><\/li>\n<\/ol>\n<p>You may notice the default lakehouse is <em>SalesTest<\/em>, illustrating the default lakehouse change from the development environment to the <em>Test<\/em> environment<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"272\" height=\"240\" class=\"wp-image-100856\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/12\/a-screenshot-of-a-computer-description-automatica-6.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>Limitation: Notebook Ownership<\/h2>\n<p>There is one limitation for this entire process: The ownership of the objects.<\/p>\n<ul>\n<li>Only the owner of the notebook can create rules for the notebook<\/li>\n<li>The owner of the notebook needs to be the owner of the deployment pipeline<\/li>\n<li>Only the owner of the notebook can make a deploy overriding notebooks.<\/li>\n<li>There is no way, at the present date, to change notebook ownership<\/li>\n<\/ul>\n<p>This leads to a big problem: How to handle team work on notebooks?<\/p>\n<h2>Proposed Solution: Notebook Ownership<\/h2>\n<p>&nbsp;<\/p>\n<p>One single team member should create all needed notebooks. He can create the notebooks empty and leave for the other team members to develop them, but only this team member can create the notebooks.<\/p>\n<p>The same team member needs to be responsible for the deployment pipelines. He will:<\/p>\n<ul>\n<li>Create the pipelines<\/li>\n<li>Create the rules<\/li>\n<li>Deploy the pipelines<\/li>\n<\/ul>\n<p>Of course, this is a strange work-around, but at this date, it\u2019s needed.<\/p>\n<h2>Summary<\/h2>\n<p>We have the needed tools to include notebooks on the <strong>SDLC<\/strong> for <strong>Microsoft Fabric<\/strong>. Many other objects in Fabric still don\u2019t have this feature. This is one reason to isolate the objects in different workspaces: Create one workspace for the objects not supported in source control, and a different workspace, with full implementation of SDLC and deployment pipelines, for the objects which are supported in source control.<\/p>\n<p>On the article <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/source-control-with-git-power-bi-and-microsoft-fabric\/\">Source Control with GIT, Power BI and Microsoft<\/a> Fabric I proposed different SDLC methods which could be used with Power BI.<\/p>\n<p>Considering the features available for the notebook, we should be using a specific method: We can keep only the Dev workspace in source control and use deployment pipelines to synchronize Dev with Test and Production. It\u2019s a similar method as used in Azure Data Factory and Azure Synapse Analytics.<\/p>\n<p>There may be scenarios where we would still make a branch for the Test and another for the Production workspaces. However, on the source control repositories, the branches should never be compared or merged, because the deployment pipeline is handling differences on the files, the branches will always be different.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>On my article about Fabric source control extended features, I explained how Microsoft included the notebooks on the source control. In this way we can include notebooks on a Software Development Lifecycle (SDLC) for Power BI objects. In this way, the notebooks need to flow from the development environment to test and production environments. However,&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,159164,159166],"tags":[123648,159053,158997,159035,101611],"coauthors":[6810],"class_list":["post-100843","post","type-post","status-publish","format-standard","hentry","category-blogs","category-microsoft-fabric","category-powerbi","tag-data-platform","tag-deployment-pipelines","tag-microsoft-fabric","tag-notebook","tag-power-bi"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/100843","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=100843"}],"version-history":[{"count":11,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/100843\/revisions"}],"predecessor-version":[{"id":101276,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/100843\/revisions\/101276"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=100843"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=100843"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=100843"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=100843"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}