{"id":97753,"date":"2023-08-21T17:00:05","date_gmt":"2023-08-21T17:00:05","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=97753"},"modified":"2024-09-03T20:04:50","modified_gmt":"2024-09-03T20:04:50","slug":"fabric-lakehouse-convert-to-table-feature-and-workspace-level-spark-configuration","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/fabric-lakehouse-convert-to-table-feature-and-workspace-level-spark-configuration\/","title":{"rendered":"Fabric Lakehouse: Convert to Table feature and Workspace Level Spark Configuration"},"content":{"rendered":"<p>I have been working as a no-code data engineer: Focused on <strong>Data Factory<\/strong> ETL and visual tools. In fact, I prefer to use visual resources when possible.<\/p>\n<p>On my first contact with <strong>Fabric Lakehouse<\/strong> I discovered to convert Files into Tables I need to use a notebook. I was waiting a lot of time for a UI feature to achieve the same, considering this is a very simple task.<\/p>\n<h2>Convert to Table feature is Available in Lakehouses<\/h2>\n<p>This feature is finally available in the lakehouse: You can right-click a folder and choose the option <strong>&#8220;Convert to Table&#8221;<\/strong>.<\/p>\n<p>When converting, you can create a new table or add the information to an existing table. This allows you to make an incremental load manually, if needed.<\/p>\n<p>It&#8217;s simple as a right-click over the folder and asking for the conversion.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"550\" height=\"384\" class=\"wp-image-97759\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/08\/a-screenshot-of-a-computer-description-automatica-22.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>Table Optimizations in Lakehouses<\/h2>\n<p>There are optimizations we should do when writing delta tables. We usually do these optimizations on the spark notebooks we create.<\/p>\n<p>For example:<\/p>\n<ul>\n<li>spark.sql.parquet.vorder.enabled<\/li>\n<li>spark.microsoft.delta.optimizeWrite.enabled<\/li>\n<li>spark.microsoft.delta.optimizeWrite.binSize<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"793\" height=\"128\" class=\"wp-image-97760\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/08\/a-screenshot-of-a-computer-description-automatica-23.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>\nYou can discover more about these optimizations <a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/data-engineering\/delta-optimization-and-v-order?tabs=sparksql\">on this article from Microsoft<\/a><\/p>\n<p>How would we make these configurations if we use the UI feature?<\/p>\n<h2>Workspace Level Spark Configuration<\/h2>\n<p>We can make these configurations on workspace level. In this way, these configurations will become default and be applied to every write operation.<\/p>\n<ol>\n<li>On the workspace, click the button <strong>Workspace Settings<\/strong><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"842\" height=\"113\" class=\"wp-image-97761\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/08\/a-close-up-of-a-screen-description-automatically-1.png\" alt=\"A close-up of a screen\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>On the <strong>Workspace settings<\/strong> window, click <strong>Data Engineering\/Science<\/strong> on the left side.<\/li>\n<li>Click <strong>Spark compute<\/strong> option.<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"226\" height=\"494\" class=\"wp-image-97762\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/08\/a-screenshot-of-a-phone-description-automatically-1.png\" alt=\"A screenshot of a phone\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li value=\"3\">Under <strong>Configurations<\/strong> area, add the 3 properties we need for optimization.<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"675\" height=\"478\" class=\"wp-image-97763\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/08\/a-screenshot-of-a-computer-description-automatica-24.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>Differences between converting using UI or Notebooks<\/h2>\n<p>Let\u2019s analyze some differences in relation to the usage of the UI and the usage of a spark notebook:<\/p>\n<table style=\"width: 72.2243%\" border=\"2\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td style=\"background-color: #2b0909;color: white;width: 59.4651%\">\n<p><strong>UI Conversion<\/strong><\/p>\n<\/td>\n<td style=\"background-color: #2b0909;color: white;width: 52.2573%\">\n<p><strong>Spark Notebook<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 59.4651%\">\n<p>No writing options configuration, it depends on workspace level configuration<\/p>\n<\/td>\n<td style=\"width: 52.2573%\">\n<p>Custom writing options configuration<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 59.4651%\">\n<p>No partitioning configuration. The table can\u2019t be partitioned<\/p>\n<\/td>\n<td style=\"width: 52.2573%\">\n<p>Custom partitioning is possible for the tables.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 59.4651%\">\n<p>Manual Process, no scheduling possible<\/p>\n<\/td>\n<td style=\"width: 52.2573%\">\n<p>Schedulable process<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>\nSummary<\/h2>\n<p>This is an interesting new interactive feature for lakehouse in Fabric, but when we need to build a pipeline to be scheduled, we still need to use notebooks or data factory.<\/p>\n<p>The workspace level configuration for spark settings is also very interesting.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have been working as a no-code data engineer: Focused on Data Factory ETL and visual tools. In fact, I prefer to use visual resources when possible. On my first contact with Fabric Lakehouse I discovered to convert Files into Tables I need to use a notebook. I was waiting a lot of time for&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,159164],"tags":[158998,158997,159020],"coauthors":[6810],"class_list":["post-97753","post","type-post","status-publish","format-standard","hentry","category-blogs","category-microsoft-fabric","tag-lakehouse","tag-microsoft-fabric","tag-spark"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97753","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=97753"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97753\/revisions"}],"predecessor-version":[{"id":97764,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97753\/revisions\/97764"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=97753"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=97753"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=97753"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=97753"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}