{"id":96862,"date":"2023-05-24T00:00:44","date_gmt":"2023-05-24T00:00:44","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=96862"},"modified":"2024-07-11T16:22:03","modified_gmt":"2024-07-11T16:22:03","slug":"data-intelligence-on-light-speed-microsoft-fabric","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/cloud\/azure\/data-intelligence-on-light-speed-microsoft-fabric\/","title":{"rendered":"Data Intelligence on light speed: Microsoft Fabric"},"content":{"rendered":"<p><em>This article is based on exciting information just released at Microsoft&#8217;s <a href=\"https:\/\/build.microsoft.com\/\">Build<\/a> conference on May 23, 2023.<\/em><\/p>\n<h2>What we have today<\/h2>\n<p>When <strong>Synapse Analytics<\/strong> was created, technical sessions inspired me with some comparisons and explanations, and I reproduced them in my own technical sessions and writing.<\/p>\n<p>Synapse was created from a request from many Microsoft customers. They requested to be able to use one single tool for the entire data intelligence platform: Ingest Data, store, process, query, apply data science, and generate reports.<\/p>\n<p>Synapse is a real Swiss Army knife: We can make ingestion using <strong>Synapse Data Factory;<\/strong>\u00a0query and process the data using different methods, <strong>Serverless SQL Pool<\/strong> or <strong>Dedicated SQL Pool;<\/strong> and apply data science using spark pool and additional ML frameworks. Finally, Synapse is also linked to <strong>Power BI<\/strong>, allowing us to use some shortcuts to create visualizations.<\/p>\n<p>This unique set of features were always great, much better than the isolated tools we had before. But on the light of Microsoft Fabric, we can notice the missing points on Synapse:<\/p>\n<ul>\n<li>The integration of the different tools was limited. It was the best for the time, but compared with Microsoft Fabric, the integration was limited.<\/li>\n<li>We still need to decide between different infrastructure resources, such as Serverless SQL Pool and Dedicated SQL Pool, instead of using all together over the data.<\/li>\n<li>We still need to make decisions about infrastructure, especially the size of the dedicated SQL Pool. Many times, decisions were based mostly by guess.<\/li>\n<li>It doesn&#8217;t totally isolate storage and processing. When using a Dedicated SQL Pool, processing and storage are tied together.<\/li>\n<\/ul>\n<p>Synapse is considered so advanced that only a few noticed these problems, and not all the problems. Microsoft Fabric, the new product announced during BUILD, exposes to us this and more.<\/p>\n<h2>What\u2019s Microsoft Fabric<\/h2>\n<p>Like Synapse, Microsoft Fabric brings all the services needed for a data intelligence environment aggregated together, heavily integrated and built in a way that requires far less technical effort for the implementation.<\/p>\n<p>The image below illustrates the services included in Microsoft Fabric<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"767\" height=\"460\" class=\"wp-image-96876\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/interface-grafica-do-usuario-aplicativo-descrica-4.png\" alt=\"Interface gr\u00e1fica do usu\u00e1rio, Aplicativo\n\nDescri\u00e7\u00e3o gerada automaticamente\" \/><\/p>\n<p>In the following sections I will introduce these new concepts.<\/p>\n<h2>Data Intelligence as Software as a Service (SaaS)<\/h2>\n<p>Microsoft Fabric arrives, breaking standards and solidifying new ones. In the cloud environment, we are used to classify the services as Infrastructure as a service (IaaS), Platform as a service (PaaS) and SaaS. Synapse is classified as a PaaS, while Microsoft Fabric is officially classified as SaaS. The following chart shows the general areas that each level of management that each level of hosted management provides.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-96877\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/a-screenshot-of-a-cell-phone-description-automati-1.jpeg\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" width=\"629\" height=\"466\" \/><\/p>\n<p>Undoubtedly, the level of managed services provided by Microsoft Fabric is way above Synapse. Many tasks in Synapse would need to be carefully configured, yet in Microsoft Fabric brings a kind-of auto-settings, just working out of the box.<\/p>\n<p>Usually, when we think about a SaaS service, we think about an end user application, such as Office 365 or many other applications where the user just uses it. It\u2019s a concept that usually doesn&#8217;t match software used to ingest, transform, model and generate intelligent results from data.<\/p>\n<p>That\u2019s what Microsoft Fabric is, a software breaking the barriers of what we know about cloud software and services.<\/p>\n<h1>A different environment<\/h1>\n<p>Microsoft Fabric is not located inside the Azure environment but rather inside Power BI Portal. This leads to a very different environment than the one we have in Synapse.<\/p>\n<p>But the new environment is not like anything we know about <strong>Power BI Portal<\/strong> as well. The environment is organized for different experiences: You select a experience according to the kind of task you would like to execute, and the environment will adapt to the usual tasks related to this experience.<\/p>\n<p>The following experiences are available:<\/p>\n<ul>\n<li><strong>Power BI:<\/strong> The typical Power BI environment and tools<\/li>\n<li><strong>Data Factory:<\/strong> Using this persona, you can create and manage dataflows and data pipelines as in data factory.<\/li>\n<li><strong>Data Activator:<\/strong> This is an entirely new feature which enables you to create triggers over your visuals in Power BI<\/li>\n<li><strong>Data Engineering:<\/strong> This experience involves multiple tasks. It\u2019s responsible for creating and managing the lakehouses, but it\u2019s also the one that will allow you to create notebooks and orchestrate them with pipelines.<\/li>\n<li><strong>Data Science:<\/strong> Using this experience, you can apply Azure ML techniques over your data.<\/li>\n<li><strong>Data Warehouse:<\/strong> This experience allows you to model your data as in a SQL database and use SQL over your data. It\u2019s difficult to compare this to anything else. We can create many star models over our data lake and these models will be reused by our Power BI datasets, making it easier to have a central model for all our reports.<\/li>\n<li><strong>Real-time Analytics:<\/strong> This persona is in some ways comparable with Power BI Streaming Dataflows, allowing ingestion of real time data.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1380\" height=\"458\" class=\"wp-image-96878\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/word-image-96862-3-1.png\" \/><\/p>\n<p>Changing from ingestion (Data Factory), processing (Data Engineering), Modelling and SQL (Data Warehouse), and more is just a matter of choosing the correct experience to make the work over the same sets of data.<\/p>\n<p>The change of the personas is like a method to focus the environment on the kind of activities you would like to execute. The object&#8217;s creation itself is still happening inside a Power BI workspace.<\/p>\n<p>Besides that, the main new objects: a <strong>lakehouse<\/strong> and a <strong>data warehouse<\/strong>, have their own way to switch the work between one and another.<\/p>\n<h2>Microsoft Fabric and OneLake<\/h2>\n<p><strong>OneLake<\/strong> is the core service in <strong>Microsoft Fabric<\/strong>. It provides a data lake as a service, allowing us to build our data lake without all the trouble of provisioning it first. It\u2019s the central data storage for all data in <strong>Microsoft Fabric<\/strong> and it\u2019s provisioned for the tenant when the first <strong>Microsoft Fabric<\/strong> artifact is created.<\/p>\n<p>The name <strong>OneLake<\/strong> also matches very well with the shortcut feature in the <strong>OneLake<\/strong>: We can create shortcuts to files located externally and access them directly as if they were in our own lake.<\/p>\n<p>The image below illustrates how the <strong>OneLake<\/strong> relates with the other Microsoft Fabric features.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"733\" height=\"438\" class=\"wp-image-96879\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/interface-grafica-do-usuario-descricao-gerada-aut-1.png\" alt=\"Interface gr\u00e1fica do usu\u00e1rio\n\nDescri\u00e7\u00e3o gerada automaticamente\" \/><\/p>\n<h2>Onelake, Lakehouse and Workspaces<\/h2>\n<p>The lakehouse is one of the core objects we can create inside a <strong>Onelake<\/strong>. We create the lakehouse using the Data Engineer persona and the lakehouse will be contained inside a Workspace, which we know as a Power BI Workspace.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"900\" height=\"275\" class=\"wp-image-96880\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/interface-grafica-do-usuario-aplicativo-descrica-5.png\" alt=\"Interface gr\u00e1fica do usu\u00e1rio, Aplicativo\n\nDescri\u00e7\u00e3o gerada automaticamente\" \/><\/p>\n<p>Once we create a lakehouse, we can use Data Factory to ingest data into the files area or the tables area.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1469\" height=\"830\" class=\"wp-image-96881\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/interface-grafica-do-usuario-aplicativo-descrica-6.png\" alt=\"Interface gr\u00e1fica do usu\u00e1rio, Aplicativo\n\nDescri\u00e7\u00e3o gerada automaticamente\" \/><\/p>\n<p>The <strong>Files<\/strong> area is the unmanaged area of the lake, which accepts any kind of file. That\u2019s where we put the RAW files for further processing. The <strong>Tables<\/strong>, on the other hand, contains data only in Delta format.<\/p>\n<p>The lakehouse optimizes the <strong>Tables<\/strong> area with a special structure capable to make a regular delta table up to 10x faster while still maintaining full Delta format compliance.<\/p>\n<p>However, the lakehouse is not the biggest data structure we have. This position is reserved for the OneLake. It is an invisible, auto-provisioned data storage containing all the data for data warehouse, lakehouses, datasets and more.<\/p>\n<p>In this way, we can build an enterprise architecture using Workspaces to hold departmental lakehouses. The data can be shared across multiple departments using lakehouse shortcuts. This ensures the domain ownership of the data and a relationship between domains at the same time.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"571\" height=\"345\" class=\"wp-image-96882\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/tela-azul-com-letras-brancas-descricao-gerada-aut-1.png\" alt=\"Tela azul com letras brancas\n\nDescri\u00e7\u00e3o gerada automaticamente\" \/><\/p>\n<p>This is only the starting point of an enterprise architecture: The OneLake ensures a unified management and governance of the data. Data Lineage, Data Protection, Certification, Catalog Integration and more are unified features provided by the OneLake to all lakehouses created in an enterprise.<\/p>\n<p>All these features are inherited from the Power BI environment, ensuring an Enterprise Governance environment to the company.<\/p>\n<h2>OneLake and Processing Isolation<\/h2>\n<p>When using Synapse, the Synapse Dedicated Pool stores and process the data. This is a scenario where storage and processing are tied together.<\/p>\n<p>In OneLake, storage and processing are independent. The same data in the OneLake can be processed by many different methods, ensuring the storage and processing independence.<\/p>\n<p>Let\u2019s analyse the different methods we have available to process the data in OneLake.<\/p>\n<h2>Spark Notebooks<\/h2>\n<p>All the workspaces enabled for <strong>Microsoft Fabric<\/strong> have a feature called <strong>Live Pool<\/strong>. The <strong>Live Pool<\/strong> allows the execution of notebooks without the need to previously make the Spark Cluster configuration.<\/p>\n<p>Once the first code block is executed in a notebook, the <strong>Live Spark Pool<\/strong> kicks in in a few seconds and makes the execution.<\/p>\n<p>We can process the <strong>OneLake<\/strong> data using Spark Notebooks with the advantage of the <strong>Live Pools<\/strong><\/p>\n<h2>Data Factory<\/h2>\n<p>Data Factory objects, such as pipelines and dataflows, inside the Power BI environment are a start of a unification of the ETL tools: We have the Pipelines and Dataflows from data factory and the dataflows from Power BI.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1341\" height=\"591\" class=\"wp-image-96883\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/diagrama-descricao-gerada-automaticamente-2.png\" alt=\"Diagrama\n\nDescri\u00e7\u00e3o gerada automaticamente\" \/><\/p>\n<p>These two are now united and working together under Microsoft Fabric. We have an additional advantage: the Dataflows Gen2.<\/p>\n<p>The <strong>Dataflows Gen2 <\/strong>are an advance in relation to the Power BI Dataflows or Wrangling Dataflows we are used to. One of the most interesting features, in my opinion, is the possibility to define the target of a transformation, what we never could do in <strong>Power BI Dataflow <\/strong>(or<strong> Wrangling Dataflows<\/strong>)<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1388\" height=\"712\" class=\"wp-image-96884\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/interface-grafica-do-usuario-aplicativo-descrica-7.png\" alt=\"Interface gr\u00e1fica do usu\u00e1rio, Aplicativo\n\nDescri\u00e7\u00e3o gerada automaticamente\" \/><\/p>\n<h2>SQL Queries<\/h2>\n<p>The <strong>Microsoft Fabric<\/strong> provides two different methods to access the data using SQL, as if the data is in a regular database.<\/p>\n<p>One of the methods is using the Lakehouse object. This object provides us a SQL Endpoint, which allows us to model the tables and query the data using SQL.<\/p>\n<p>The second method uses a Data Warehouse object, which provides a complete SQL processing environment over the data in the OneLake.<\/p>\n<p>The table below highlights all the differences between the Lakehouse SQL Endpoint and the Data Warehouse. Some of these differences are available in the documentation; some are my personal conclusions.<\/p>\n<table>\n<tbody>\n<tr>\n<td style=\"border-style: double; background-color: #a6c1ed;\">\n<p><strong>Microsoft Fabric Offering<\/strong><\/p>\n<\/td>\n<td style=\"border-style: double; background-color: #a6c1ed;\">\n<p><strong>Warehouse<\/strong><\/p>\n<\/td>\n<td style=\"border-style: double; background-color: #a6c1ed;\">\n<p><strong>SQL Endpoint of the Lakehouse<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>Processor Engine<\/strong><\/p>\n<\/td>\n<td colspan=\"2\">\n<p style=\"text-align: center;\">SQL MPP &#8211; Polaris<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>Optimization Engine<\/strong><\/p>\n<\/td>\n<td>\n<p>Vertipaq<\/p>\n<\/td>\n<td>\n<p>Vertipaq for Tables<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>Storage Layer<\/strong><\/p>\n<\/td>\n<td colspan=\"2\">\n<p style=\"text-align: center;\">Open Data Format &#8211; Delta<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>Primary Capabilities<\/strong><\/p>\n<\/td>\n<td>\n<p>ACID compliant<\/p>\n<p>Full data warehouse with transactions support in T-SQL<\/p>\n<\/td>\n<td>\n<p>Read Only, system generated SQL Endpoint for lakehouse for T-SQL Querying and serving.<\/p>\n<p>Supports queries and views on top of lakehouse delta tables only<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>Recommended Use Case<\/strong><\/p>\n<\/td>\n<td>\n<ul>\n<li>Data Warehousing for Enterprise Use<\/li>\n<li>Data Warehousing supporting departmental, business unit or self-service use<\/li>\n<li>Structured Data Analysis in T-SQL with tables, views, procedures and functions and Advanced SQL Support for BI<\/li>\n<\/ul>\n<\/td>\n<td>\n<ul>\n<li>Exploring and querying delta tables from the lakehouse<\/li>\n<li>Staging Data and Archival zone for analysis<\/li>\n<li>Medallion architecture with zones for bronze, silver and gold analysis<\/li>\n<li>Pairing with warehouses for enterprise analytics use cases<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>Development Experience<\/strong><\/p>\n<\/td>\n<td>\n<ul>\n<li>Warehouse editor with full support for T-SQL data-ingestion, modeling, development and querying UI experience for data ingestion, modeling and querying<\/li>\n<li>Read\/Write support for 1<sup>st<\/sup> and 3<sup>rd<\/sup> party tooling<\/li>\n<\/ul>\n<\/td>\n<td>\n<ul>\n<li>Lakehouse SQL Enpoint with limited T-SQL support for views, table valued functions and SQL Queries<\/li>\n<li>UI experiences for modeling and querying<\/li>\n<li>Limited T-SQL support for 1<sup>st<\/sup> and 3<sup>rd<\/sup> party tooling<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>T-SQL Capabilities<\/strong><\/p>\n<\/td>\n<td>\n<p>Full DQL, DML and DDL T-SQL support. Full transaction support<\/p>\n<\/td>\n<td>\n<p>Full DQL, no DML, limited DDL T-SQL Support such as SQL Views and TVFs<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>Data Loading<\/strong><\/p>\n<\/td>\n<td>\n<p>SQL, pipelines, dataflows<\/p>\n<\/td>\n<td>\n<p>Spark, pipelines, dataflows, shortcuts<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>Delta Table Support<\/strong><\/p>\n<\/td>\n<td>\n<p>Reads and Write Delta Table<\/p>\n<\/td>\n<td>\n<p>Reads Delta table<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Power BI<\/h2>\n<p>Microsoft Fabric is deeply linked to the <strong>Power BI<\/strong> environment. We can in many points of our work, either from a <strong>lakehouse<\/strong> or a warehouse, start the creation of a Power BI report.<\/p>\n<p>The best is the access method: <strong>Power BI<\/strong> has a new access method to the <strong>OneLake<\/strong>, called <strong>Direct Lake<\/strong>.<\/p>\n<p><strong>Direct Lake<\/strong> is a new connection method between <strong>Power BI Datasets<\/strong> and the <strong>OneLake<\/strong>.<\/p>\n<p>When we use <strong>DirectQuery<\/strong>, every refresh requires a new load of the source, which makes the connection slower. On the other hand, when we use <strong>Import<\/strong>, the data is stored in memory and the performance is better, but when the data is updated, a refresh on the dataset is needed. Updates are not immediately visible on datasets and reports.<\/p>\n<p>The <strong>Direct Lake<\/strong> connection mix the best of both scenarios: It has the performance of the Import mode for keeping the data in memory, and the real time update of the data achieved by the <strong>DirectQuery<\/strong>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1396\" height=\"682\" class=\"wp-image-96885\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/uma-imagem-contendo-diagrama-descricao-gerada-aut-1.png\" alt=\"Uma imagem contendo Diagrama\n\nDescri\u00e7\u00e3o gerada automaticamente\" \/><\/p>\n<h2>What about Azure Synapse and Data Factory<\/h2>\n<p>Customers using Data Factory and Synapse Dedicated Pool can also expect easy ways to migrate to <strong>Microsoft Fabric. <\/strong>Micorosoft is focused on making the transition as smoothy as possible.<\/p>\n<p>Data Factory users have even the benefit of Gen2 dataflows, which are not supported by Azure data factory. So, you have advantages in developing dataflows and pipelines using Microsoft Fabric and you will an easy migration path from one to another.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>Microsoft Fabric seems to be the start of a new era. In a time of Open AI\/ChatGPT and co-pilots, we are getting an extremely powerful tool making complex data solutions accessible to all companies and the in the future we can think about a co-pilot for <strong>Microsoft Fabric<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is based on exciting information just released at Microsoft&#8217;s Build conference on May 23, 2023. What we have today When Synapse Analytics was created, technical sessions inspired me with some comparisons and explanations, and I reproduced them in my own technical sessions and writing. Synapse was created from a request from many Microsoft&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":103115,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[137091,53],"tags":[],"coauthors":[6810],"class_list":["post-96862","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure","category-featured"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/96862","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=96862"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/96862\/revisions"}],"predecessor-version":[{"id":105886,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/96862\/revisions\/105886"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/103115"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=96862"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=96862"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=96862"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=96862"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}