{"id":86736,"date":"2020-03-29T22:22:58","date_gmt":"2020-03-29T22:22:58","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=86736"},"modified":"2020-03-29T22:22:58","modified_gmt":"2020-03-29T22:22:58","slug":"microsoft-data-platform-2020","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/microsoft-data-platform-2020\/","title":{"rendered":"Microsoft Data Platform 2020"},"content":{"rendered":"<p>From last November 4 to 8 I have been at\u00a0<strong>PASS Summit<\/strong> in Seattle, attending excellent technical sessions and having the honour to deliver a\u00a0 lightning session, which is a small 10 minutes presentation.<\/p>\n<h3><strong>Why should you read this?<br \/>\n<\/strong><\/h3>\n<p style=\"padding-left: 30px\">Being one of the biggest data events in the world, <strong>PASS Summit<\/strong> usually exposes the present and future technologies that will drive the following year or more, pointing the direction for every data-related professional.<\/p>\n<p style=\"padding-left: 30px\">This summary, full of links and details about these technologies, can guide you during the exploration and learning of these new technologies.<\/p>\n<h3><strong>Azure SQL Database Serverless<\/strong><\/h3>\n<p style=\"padding-left: 30px\">In the beginning, we had only <strong>Azure SQL Databases<\/strong> or custom VM&#8217;s, Platform as a Service (<strong>PaaS<\/strong>) or Infrastructure as a Service (<strong>IaaS<\/strong>) two extreme opposites in relation to price and features.<\/p>\n<p style=\"padding-left: 30px\"><em>Want to know more? Take a look on this link: <\/em><a href=\"https:\/\/azure.microsoft.com\/en-us\/overview\/what-is-paas\/\"><em>https:\/\/azure.microsoft.com\/en-us\/overview\/what-is-paas\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">The first change happened in pricing: The creation of <strong>Elastic Pools<\/strong>. By using an <strong>Elastic Pool,<\/strong> we could aggregate many databases in a single pool of resources, expecting that when one database is consuming too many resources, the others would not be consuming that much.<\/p>\n<p style=\"padding-left: 30px\"><em>More about <\/em><strong><em>Elastic Pool<\/em><\/strong><em>: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/sql-database\/sql-database-elastic-pool\"><em>https:\/\/docs.microsoft.com\/en-us\/azure\/sql-database\/sql-database-elastic-pool<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">However, <strong>Elastic Pool<\/strong> is better for pools with 5 or more databases, with fewer databases it becomes too expensive, doesn&#8217;t worth it.<\/p>\n<p style=\"padding-left: 30px\">Microsoft decided we were missing something and create the <strong>Azure SQL Server Managed Instances<\/strong>, a halfway between the existing solutions. <strong>Azure SQL Server Managed Instances<\/strong> is intended to be cheaper than a custom VM and have more features than an <strong>Azure SQL Database<\/strong>, allowing us to access features of an entire instance.<\/p>\n<p style=\"padding-left: 30px\">On my first experiences with <strong>Managed Instances<\/strong>, they were difficult to provision, taking hours to provision. During the <strong>PASS Summit<\/strong>, there was one demonstration provisioning Managed Instances in seconds using <strong>Azure ARC<\/strong>. I will go on details later on this blog.<\/p>\n<p style=\"padding-left: 30px\"><em>Learn more about <\/em><strong><em>Azure SQL Server Managed Instances<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/blogs.msdn.microsoft.com\/sqlserverstorageengine\/2018\/03\/07\/what-is-azure-sql-database-managed-instance-2\/\"><em>https:\/\/blogs.msdn.microsoft.com\/sqlserverstorageengine\/2018\/03\/07\/what-is-azure-sql-database-managed-instance-2\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">Now we have one more option: <strong>Azure SQL Database Serverless<\/strong>. It&#8217;s not a product, but a pricing tier of <strong>Azure SQL Databases<\/strong>, although it offers slightly different features.\u00a0<\/p>\n<p style=\"padding-left: 30px\"><img loading=\"lazy\" decoding=\"async\" width=\"1063\" height=\"174\" class=\"wp-image-86740\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-21.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p style=\"padding-left: 30px\">The best way to understand this new pricing tier is comparing it with the database option &#8220;auto close\u201d: When no connection is active on the database, the database shuts down and stop charging our azure account. When the following connection request arrives, the database will start again, making the connection slower than the rest, but only the connection, all further requests will be regular. It&#8217;s cheaper, of course, since we are only charged by the time the database is in use, this makes this tier cheaper than the usual <strong>Azure SQL Database<\/strong>, but the &#8220;auto-close&#8221; behaviour maybe not good for some environments.<\/p>\n<p style=\"padding-left: 30px\"><strong>Serverless<\/strong> databases have an additional configuration called <strong>Auto-Close Delay<\/strong>. You can define after how long time of inactivity the database should shut down. The minimum amount of time is 1 hour. This configuration can prevent disruptions during work hours. If we find the correct delay configuration the database may shut down only out of working hours.<\/p>\n<p style=\"padding-left: 30px\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-86783\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/autoclosedelay.png\" alt=\"auto-close\" width=\"1433\" height=\"350\" \/><\/p>\n<p style=\"padding-left: 30px\">In the future, a new feature to allow us to automate the start and shutdown may appear, providing us with better control and more savings.<\/p>\n<p style=\"padding-left: 30px\"><em>More about <\/em><strong><em>Azure SQL Database Serverless<\/em><\/strong><em>: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/sql-database\/sql-database-serverless\"><em>https:\/\/docs.microsoft.com\/en-us\/azure\/sql-database\/sql-database-serverless<\/em><\/a><\/p>\n<p><strong>SQL Database Edge<\/strong><\/p>\n<p style=\"padding-left: 30px\"><strong>IOT<\/strong> devices, especially in industries, send constant information to servers in order to the server monitor the device and discover if there is something wrong. More than just monitor after the device failure, the servers can run machine learning models over the data to identify potential future failures according to the device&#8217;s history.<\/p>\n<p style=\"padding-left: 30px\">The weakness of this architecture is the need of the devices to send the data to the server in order to be analysed and only then the server will be able to take an action, maybe changing some device configuration, or sending some alert.<\/p>\n<p style=\"padding-left: 30px\"><strong>SQL Database Edge<\/strong> is intended exactly to solve this weakness. It&#8217;s a <strong>SQL Server Edition<\/strong> which can be installed inside the <strong>IOT<\/strong> device. Instead of sending the data to the server, the IoT device can save the data in <strong>SQL Database Edge<\/strong> and execute the machine learning models using <strong>SQL Server<\/strong>, which has the <strong>SQL Server Machine Learning Service<\/strong>. In this way, the own device can identify possible future failures and take the possible actions to work around it or send an alert.<\/p>\n<p style=\"padding-left: 30px\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"720\" class=\"wp-image-86741\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-1.jpg\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p style=\"padding-left: 30px\">Empowering the device, we distribute the load from the server to hundreds or thousands of devices, allowing for faster and more reliable problem identification and workaround using machine learning models.<\/p>\n<p style=\"padding-left: 30px\">One more interesting detail is the ability to distribute <strong>Azure SQL Database Edge<\/strong> to the devices using <strong>IOT Hub<\/strong>.<\/p>\n<p style=\"padding-left: 30px\"><em>You can read more about <\/em><strong><em>Azure SQL Database Edge<\/em><\/strong><em> here: <\/em><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/sql-database-edge\/\"><em>https:\/\/azure.microsoft.com\/en-us\/services\/sql-database-edge\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>Learn more about SQL Server Machine Learning Service on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/advanced-analytics\/what-is-sql-server-machine-learning?view=sql-server-ver15\"><em>https:\/\/docs.microsoft.com\/en-us\/sql\/advanced-analytics\/what-is-sql-server-machine-learning?view=sql-server-ver15<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>More about <\/em><strong><em>IOT Hub<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/iot-hub\/\"><em>https:\/\/azure.microsoft.com\/en-us\/services\/iot-hub\/<\/em><\/a><\/p>\n<h3><strong>Accelerate Data Recovery<\/strong><\/h3>\n<p style=\"padding-left: 30px\">This is a new <strong>SQL Server 2019<\/strong> feature which is being much highlighted, not only in <strong>PASS Summit<\/strong>.<\/p>\n<p style=\"padding-left: 30px\">Long-running operations are something very common to happen. However, sometimes, after 1 or 2 hours of a running operation, some junior DBA may decide it was too much and try to kill the process and rollback. Here the problem begins.<\/p>\n<p style=\"padding-left: 30px\">In order to stop the running transaction, all the activities need to be rolled back and this can take almost or even the same time the transaction has taken until the kill was requested. There is no solution, the DBA can only wait for the rollback.<\/p>\n<p style=\"padding-left: 30px\">The <strong>Accelerated Data Recovery<\/strong> (ADR) arrived with <strong>Hyperscale<\/strong> cloud database, announced in\u00a0<strong>PASS Summit\u00a02018<\/strong>. This year, <strong>ADR<\/strong> was included in <strong>SQL Server 2019<\/strong>. It allows for a fast recovery, without all the rollback waiting time that usually happens.<\/p>\n<p style=\"padding-left: 30px\"><strong>ADR<\/strong> uses row versioning techniques, but with a different internal structure than the one used by <strong>snapshot isolation level<\/strong>.<\/p>\n<p style=\"padding-left: 30px\">\u00a0<\/p>\n<p style=\"padding-left: 30px\"><em>In order to know more about <\/em><strong><em>ADR<\/em><\/strong><em>, check these links:\u00a0<\/em><\/p>\n<p style=\"padding-left: 30px\"><a href=\"https:\/\/www.linkedin.com\/pulse\/sql-server-2019-how-saved-world-from-long-recovery-bob-ward\/\"><em>https:\/\/www.linkedin.com\/pulse\/sql-server-2019-how-saved-world-from-long-recovery-bob-ward\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/database-administration\/how-does-accelerated-database-recovery-work\/\"><em>https:\/\/www.red-gate.com\/simple-talk\/sql\/database-administration\/how-does-accelerated-database-recovery-work\/<\/em><\/a><\/p>\n<h3><strong>Azure ARC<\/strong><\/h3>\n<p style=\"padding-left: 30px\">Microsoft knows many companies use more than a single cloud provider. If Microsoft relies only on their cloud services, the war among cloud providers would be difficult, Microsoft can win some, lose others. So, why not a complete change in the battlefield?<\/p>\n<p style=\"padding-left: 30px\"><strong>Azure ARC<\/strong> allows the deployment and management of Microsoft cloud services to many different clouds: <strong>Azure<\/strong>, <strong>Google<\/strong>, <strong>AWS<\/strong>, all the deployment and management made from a single tool &#8211; <strong>Azure ARC<\/strong>. This completely changes the battlefield, even if one company has some reason to use a different cloud provider, they can still use <strong>Azure ARC<\/strong> to manage the entire environment and even deploy Microsoft services to other cloud providers.<\/p>\n<p style=\"padding-left: 30px\">Even more: <strong>Azure ARC<\/strong> also allows the deployment of <strong>Azure Services<\/strong> to on-premises services.<\/p>\n<p style=\"padding-left: 30px\">During the demonstration, <strong>Azure ARC<\/strong> was able to provision an Azure <strong>SQL Managed Instance<\/strong> in 30 seconds. It was incredible! Just a few weeks before I have talked about <strong>Azure SQL Managed Instance<\/strong> and the provisioning time was 3 hours or more. So, I asked at the Data Clinic, how was it possible? How could <strong>Azure ARC<\/strong> cause a so huge difference in the provision time?<\/p>\n<p style=\"padding-left: 30px\">The answer was very interesting: Because <strong>Azure ARC<\/strong> was provisioning the managed instance inside a container. In this case, would the container version of a <strong>SQL Managed Instance<\/strong> have any difference in relation to the regular version of a <strong>SQL Managed Instance<\/strong>? Yes, they will be slightly different, but the difference is not clearly documented yet.<\/p>\n<p style=\"padding-left: 30px\"><em>In order to learn more about <\/em><strong><em>Azure Arc<\/em><\/strong><em> you can access this link: <\/em><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/azure-arc\/\"><em>https:\/\/azure.microsoft.com\/en-us\/services\/azure-arc\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><img loading=\"lazy\" decoding=\"async\" width=\"1600\" height=\"900\" class=\"wp-image-86742\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-close-up-of-a-logo-description-automatically-ge-1.jpeg\" alt=\"A close up of a logo\n\nDescription automatically generated\" \/><\/p>\n<h3><strong>Azure Synapse Analytics<\/strong><\/h3>\n<p style=\"padding-left: 30px\">It&#8217;s an oversimplification think that <strong>Azure Synapse Analytics<\/strong> is a new name for <strong>Azure SQL Datawarehouse<\/strong>. <strong>Azure Synapse Analytics<\/strong> contains <strong>Azure SQL Datawarehouse<\/strong>, but also contains many new features. It seems that it&#8217;s going to be a central point for all, or almost all, <strong>Azure Data Platform<\/strong>.<\/p>\n<p style=\"padding-left: 30px\">One very important new feature is the <strong>Azure Synapse Studio<\/strong>. This new tool creates a unified experience among many data platform services: <strong>Azure SQL Datawarehouse<\/strong>, <strong>Spark<\/strong>, <strong>Power BI<\/strong> and more being used from a single front-end tool.<\/p>\n<p style=\"padding-left: 30px\">Explaining in a better way: from the <strong>Azure Synapse Studio<\/strong> we have access to the <strong>Data Warehouse<\/strong> model in <strong>Azure SQL Datawarehouse<\/strong> (this name will die, now it\u2019s <strong>Azure Synapse Analytics<\/strong>), manipulation of the data using <strong>Spark<\/strong> (I&#8217;m not sure if it&#8217;s linked with <strong>Databricks<\/strong>), machine learning models with <strong>Azure Machine Learning<\/strong> and visualization dashboards with <strong>Power BI<\/strong>, this one connected to the <strong>Power BI<\/strong> portal. You can see the changes made in the portal and anything you change from <strong>Synapse Analytics<\/strong> will affect the portal.<\/p>\n<p style=\"padding-left: 30px\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" class=\"wp-image-86743\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-close-up-of-a-logo-description-automatically-ge-1.png\" alt=\"A close up of a logo\n\nDescription automatically generated\" \/><\/p>\n<p style=\"padding-left: 30px\">\nCan you make any kind of <strong>Power BI<\/strong> implementation inside <strong>Azure Synapse Studio, <\/strong>such as <strong>dataflows<\/strong> and so on? Not yet, but it&#8217;s just a beginning, it&#8217;s expected to evolve a lot.<\/p>\n<p style=\"padding-left: 30px\">In summary, <strong>Azure Synapse Analytics<\/strong> is much more than only <strong>Azure SQL Datawarehouse<\/strong>, it\u2019s not a only a name change.<\/p>\n<p style=\"padding-left: 30px\"><em>You can learn more about <\/em><strong><em>Azure Synapse Analytics<\/em><\/strong><em> on these links:<\/em><\/p>\n<p style=\"padding-left: 30px\"><a href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/simply-unmatched-truly-limitless-announcing-azure-synapse-analytics\/\"><em>https:\/\/azure.microsoft.com\/en-us\/blog\/simply-unmatched-truly-limitless-announcing-azure-synapse-analytics\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/synapse-analytics\/\"><em>https:\/\/azure.microsoft.com\/en-us\/services\/synapse-analytics\/<\/em><\/a><\/p>\n<h3><strong>Data Tools for Synapse Analytics<\/strong><\/h3>\n<p style=\"padding-left: 30px\"><strong>SQL Server Data Tools<\/strong> \u2013 <strong>SSDT<\/strong> \u2013 is around for many years, even more than <strong>Entity Framework Migrations<\/strong>, which is broadly used today.<\/p>\n<p style=\"padding-left: 30px\">I have used <strong>SQL Server Data Tools<\/strong> as part of the deployment process in many projects, it\u2019s great to control the version of the database and make schema compares.<\/p>\n<p style=\"padding-left: 30px\">The most recent versions were a bit disappointing, the features for schema compare were reduced, I never discovered why. Due to that, the best recommendation was to go for 3<sup>rd<\/sup> part tools when needed.<\/p>\n<p style=\"padding-left: 30px\">It seems our relationship with <strong>SSDT<\/strong> is getting into a new chapter. <strong>SSDT<\/strong> has a new database project for <strong>Synapse Analytics<\/strong>, allowing us to create our data warehouse structure in <strong>SSDT<\/strong> and control the CICD process.<\/p>\n<p style=\"padding-left: 30px\"><em>You can learn more about SSDT and Synapse Analytics on this link: <\/em><a href=\"https:\/\/cloudblogs.microsoft.com\/sqlserver\/2019\/11\/07\/new-in-azure-synapse-analytics-cicd-for-sql-analytics-using-sql-server-data-tools\/\"><em>https:\/\/cloudblogs.microsoft.com\/sqlserver\/2019\/11\/07\/new-in-azure-synapse-analytics-cicd-for-sql-analytics-using-sql-server-data-tools\/<\/em><\/a><\/p>\n<h3><strong>Azure Data Share<\/strong><\/h3>\n<p style=\"padding-left: 30px\"><strong><img loading=\"lazy\" decoding=\"async\" width=\"368\" height=\"304\" class=\"wp-image-86744\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-22.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><br \/>\nData Share<\/strong> is a new azure service which allows a company to share data with other companies.<\/p>\n<p style=\"padding-left: 30px\">Is it better than creating services for the other company?<\/p>\n<p style=\"padding-left: 30px\">Well, using <strong>Data Share<\/strong> we are manipulating data, sharing parts of our data lake without need to develop something. In the same way, the other company will receive data in order to input in their own system. Is this a benefit? Maybe.<\/p>\n<p style=\"padding-left: 30px\">Another possible reason to use\u00a0<strong>Data Share<\/strong> would be for governance: all shared pieces of data would be controlled in the same place, the data shares. However, it may only work well among companies using <strong>Azure<\/strong>, if the other company is using another cloud service, I believe data share may not work.<\/p>\n<p style=\"padding-left: 30px\"><em>You can read more about <\/em><strong><em>Data share<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-share\/\"><em>https:\/\/docs.microsoft.com\/en-us\/azure\/data-share\/<\/em><\/a><\/p>\n<h3><strong>Big Data Cluster<\/strong><\/h3>\n<p style=\"padding-left: 30px\">Big Data Clusters are the most highlighted feature in <strong>SQL Server 2019<\/strong>. A big summary of what it is would be saying it&#8217;s a Microsoft implementation of a <strong>Hadoop<\/strong> and <strong>Data Lake<\/strong> solution.<\/p>\n<p style=\"padding-left: 30px\"><em>Would you like to know more about <\/em><strong><em>Hadoop<\/em><\/strong><em>? Check this link: <\/em><a href=\"https:\/\/hadoop.apache.org\/\"><em>https:\/\/hadoop.apache.org\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>Learn more about <\/em><strong><em>Data Lake<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_lake\"><em>https:\/\/en.wikipedia.org\/wiki\/Data_lake<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">\nLet&#8217;s simplify the explanation, step by step. <strong>Big Data Cluster<\/strong> is an implementation of a Cluster processing solution. This means this feature allows to create a set of servers, with a master node and slave nodes. The master node can receive jobs to process, break these jobs among the slave nodes and collect the results later, resulting the job being processed by many servers as a set, allowing <strong>SQL Server<\/strong> to process huge amounts of data.<\/p>\n<p style=\"padding-left: 30px\">In the past, this solution was built before using the name of <strong>Parallel Data Warehouse<\/strong>. <strong>PDW<\/strong> is a <strong>SQL Server<\/strong> appliance, but it can be easily understood as a different <strong>SQL Server<\/strong> edition. It&#8217;s a cluster for parallel processing, exactly as <strong>Big Data Cluster<\/strong>. What&#8217;s the difference?<\/p>\n<p style=\"padding-left: 30px\"><em>You can learn more about Parallel Data Warehouse on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/analytics-platform-system\/parallel-data-warehouse-overview?view=aps-pdw-2016-au7\"><em>https:\/\/docs.microsoft.com\/en-us\/sql\/analytics-platform-system\/parallel-data-warehouse-overview?view=aps-pdw-2016-au7<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">\nThe difference is in relation to the language and storage of the data. <strong>PDW<\/strong> is totally based on SQL architecture and language. On the other hand, we have for a long time an open architecture called <strong>Hadoop<\/strong> and some variations of it, such as <strong>Spark<\/strong>.<\/p>\n<p style=\"padding-left: 30px\">During the growth of <strong>Azure<\/strong> and the cloud environment, both solutions were created in the cloud. <strong>PDW<\/strong> was created as <strong>SQL Data Warehouse<\/strong> while Hadoop was created as <strong>HD Insight<\/strong>, which provides many flavours of <strong>Hadoop<\/strong>, such as <strong>Spark<\/strong> and more. It became a kind of a race to see which technology would conquer the heart of the market, parallel processing clusters based in SQL or based in <strong>Hadoop<\/strong>.<\/p>\n<p style=\"padding-left: 30px\"><em>Learn more about HD Insight on this link: <\/em><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/hdinsight\/\"><em>https:\/\/azure.microsoft.com\/en-us\/services\/hdinsight\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><strong><img loading=\"lazy\" decoding=\"async\" width=\"466\" height=\"328\" class=\"wp-image-86745\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-23.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><br \/>\nBig Data Cluster<\/strong> is a <strong>Hadoop-like<\/strong> solution which can be used in the on-premises environment. However, it&#8217;s way more than a simple <strong>Hadoop-like<\/strong> solution. Big Data clusters use <strong>HDFS Tiering<\/strong> to create <strong>Data Virtualization<\/strong> and allow us the creation of a <strong>Data Lake<\/strong>, an environment where the data doesn&#8217;t need to be moved from its source location in order to be processed.<\/p>\n<p style=\"padding-left: 30px\"><strong>HDFS tiering<\/strong> was also introduced to us during a sponsored breakfast organized by DELL. It\u2019s a feature linked to <strong>Big Data Cluster<\/strong> in <strong>SQL Server 2019<\/strong> which allow us to mount external storages into a single <strong>HDFS<\/strong> storage. This leads us to the <strong>Data Lake<\/strong> concept: instead of moving the data among storages and technologies, leave the data in its place and process the data where it already is.<\/p>\n<p style=\"padding-left: 30px\"><em>You can read more about <\/em><strong><em>HDFS tiering<\/em><\/strong><em> here: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/big-data-cluster\/hdfs-tiering?view=sql-server-ver15\"><em>https:\/\/docs.microsoft.com\/en-us\/sql\/big-data-cluster\/hdfs-tiering?view=sql-server-ver15<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>Learn more about <\/em><strong><em>Data Virtualization<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/blogs.technet.microsoft.com\/machinelearning\/2017\/06\/21\/data-virtualization-unlocking-data-for-ai-and-machine-learning\/\"><em>https:\/\/blogs.technet.microsoft.com\/machinelearning\/2017\/06\/21\/data-virtualization-unlocking-data-for-ai-and-machine-learning\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">Another huge difference is the use of <strong>Polybase<\/strong>. In order to reach the data sources, <strong>Big Data Cluster<\/strong> uses <strong>Polybase<\/strong>. However, that&#8217;s not the old <strong>Polybase<\/strong> we discovered in <strong>SQL Server 2016<\/strong>. That&#8217;s a new <strong>Polybase<\/strong> with support to many different data sources, such as Oracle and much more. <strong>Polybase<\/strong> uses its pushdown technology to delegate the processing of SQL predicates to the remote machine responsible for the data.<\/p>\n<p style=\"padding-left: 30px\">I still have a lot to understand and write about the evolution of <strong>Polybase<\/strong>, because the old <strong>Polybase<\/strong> in <strong>SQL Server 2016<\/strong> required a server configuration to set the technology to which the connection would be made, so one <strong>SQL Server<\/strong> instance would be tied with a single technology source for <strong>Polybase<\/strong>. Besides that, the pushdown technology was very difficult to configure with <strong>Hadoop<\/strong>. I believe all these have changed, resulting in the <strong>Big Data Clusters<\/strong>, an on-premise (or <strong>IaaS<\/strong>) solution for the creation of a <strong>Data Lake<\/strong>.<\/p>\n<p style=\"padding-left: 30px\"><em>Learn more about <\/em><strong><em>Polybase<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/polybase\/polybase-guide?view=sql-server-ver15\">https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/polybase\/polybase-guide?view=sql-server-ver15<\/a><\/p>\n<p style=\"padding-left: 30px\"><em>You can read more about <\/em><strong><em>Big Data Clusters<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/big-data-cluster\/big-data-cluster-overview?view=sql-server-ver15\"><em>https:\/\/docs.microsoft.com\/en-us\/sql\/big-data-cluster\/big-data-cluster-overview?view=sql-server-ver15<\/em><\/a><\/p>\n<h3><strong>Azure Databricks<\/strong><\/h3>\n<p style=\"padding-left: 30px\"><strong><img loading=\"lazy\" decoding=\"async\" width=\"1447\" height=\"969\" class=\"wp-image-86746\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-cell-phone-description-automati-24.png\" alt=\"A screenshot of a cell phone\n\nDescription automatically generated\" \/><\/p>\n<p>\nDatabricks<\/strong> is <strong>Spark<\/strong> with a new name. This was everything I knew about <strong>Databricks<\/strong> before the <strong>PASS Summit<\/strong> and this leaves a lot of questions.<\/p>\n<p style=\"padding-left: 30px\">Since we already had Spark working as one of the flavours of <strong>HDInsight<\/strong>, why we need <strong>Databricks<\/strong>?<\/p>\n<p style=\"padding-left: 30px\">I got two answers for that:<\/p>\n<p style=\"padding-left: 30px\">A) <strong>Databricks<\/strong> is a more complete spark, since it loads a more complete set of modules by default than the <strong>Spark<\/strong> we can provision with <strong>HDInsight<\/strong>.<\/p>\n<p style=\"padding-left: 30px\">B) <strong>Databricks<\/strong> doesn&#8217;t have the provision problem. <strong>Spark<\/strong> in <strong>HDInsight<\/strong> needs to be provisioned, used and deleted because the cost to have the <strong>Spark<\/strong> servers provisioned is huge. <strong>Databricks<\/strong>, on the other hand, doesn&#8217;t have this problem, it charges you for the time you use it.<\/p>\n<p style=\"padding-left: 30px\"><em>You can learn about this provisioning challenge on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/v1\/data-factory-compute-linked-services#azure-hdinsight-on-demand-linked-service\"><em>https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/v1\/data-factory-compute-linked-services#azure-hdinsight-on-demand-linked-service<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">I loved the second explanation, and this leads us to the next question. However, I would find later this was an incomplete answer. The correct answer, but incomplete.<\/p>\n<p style=\"padding-left: 30px\">Since <strong>Databricks<\/strong> doesn&#8217;t have the provisioning problem anymore, it&#8217;s competing with <strong>Azure Data Lake<\/strong>, isn&#8217;t it? Aren&#8217;t both doing the same thing?<\/p>\n<p style=\"padding-left: 30px\">The unofficial answer I received was: Yes, they are. However, <strong>Azure Data Lake<\/strong> its language, <strong>U- SQL<\/strong>, are dead, you shouldn&#8217;t build anything new with them<\/p>\n<p style=\"padding-left: 30px\">First, it\u2019s important for you to mind that in this context I&#8217;m not talking about <strong>Data Lake Storage<\/strong>, which is alive and kicking. I&#8217;m talking about <strong>Azure Data Lake<\/strong> and its language, <strong>U-SQL<\/strong>, which were a cluster processing solution on-demand, without the provisioning challenges. At some point, <strong>Azure Data Lake<\/strong> started to be called <strong>Azure Data Lake Analytics<\/strong>.<\/p>\n<p style=\"padding-left: 30px\"><em> You can read more about <\/em><strong><em>Azure Data Lake Analytics<\/em><\/strong><em> here: <\/em><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/data-lake-analytics\/\"><em>https:\/\/azure.microsoft.com\/en-us\/services\/data-lake-analytics\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>Learn more about <\/em><strong><em>U-SQL<\/em><\/strong><em> language on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/u-sql\/\"><em>https:\/\/docs.microsoft.com\/en-us\/u-sql\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">\nAnother thing you should mind is that this is an unofficial answer. If you will consider this or not, it&#8217;s your choice, according how usually unofficial answers from people close to Microsoft are proven to be true sometime later.<\/p>\n<p style=\"padding-left: 30px\">These are not the only advantages of <strong>Databricks<\/strong>. The demonstrations during the sessions highlighted how <strong>Databricks<\/strong> keeps a record of machine learning training models in such a way we can identify a history of improvement according we work in our ML models.<\/p>\n<p style=\"padding-left: 30px\">It&#8217;s not clear how much would this be linked to <strong>Azure Machine Learning<\/strong>, but it doesn&#8217;t seem to be a kind of link such as <strong>Synapse Analytics<\/strong> has with <strong>Power BI<\/strong>.<\/p>\n<p style=\"padding-left: 30px\">Databricks also has an improved storage which received its own name: <strong>Delta Lake<\/strong>.<\/p>\n<p style=\"padding-left: 30px\"><em>You can read more about <\/em><strong><em>Databricks<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/databricks\/\"><em>https:\/\/azure.microsoft.com\/en-us\/services\/databricks\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>Learn more about <\/em><strong><em>Delta Lake<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/databricks\/delta\/delta-intro\"><em>https:\/\/docs.microsoft.com\/en-us\/azure\/databricks\/delta\/delta-intro<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>Get more information about <\/em><strong><em>Azure Machine Learning<\/em><\/strong><em>: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/overview-what-is-azure-ml\"><em>https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/overview-what-is-azure-ml<\/em><\/a><\/p>\n<h3><strong>Power BI <\/strong><\/h3>\n<p style=\"padding-left: 30px\">One of the main highlights for <strong>Power BI<\/strong> was the <strong>Data Protection and Governance<\/strong> new features. This show how <strong>Microsoft<\/strong> is engaged in turning <strong>Power BI<\/strong> into an enterprise product.<\/p>\n<p style=\"padding-left: 30px\">In summary, <strong>Power BI<\/strong> will have a link with <strong>Sensitive Labels<\/strong> already used in <strong>Microsoft Office<\/strong> in order to classify information. Users will be able to classify the information as confidential and other levels and this new feature comes with additional features for monitoring, permissions, governance and so on.<\/p>\n<p style=\"padding-left: 30px\"><em>You can learn more about <\/em><strong><em>Sensitive Labels<\/em><\/strong><em> on this link: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/microsoft-365\/compliance\/sensitivity-labels\"><em>https:\/\/docs.microsoft.com\/en-us\/microsoft-365\/compliance\/sensitivity-labels<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">Well, I&#8217;m not so sure if this will become something usual, <strong>Sensitive Labels<\/strong> require an additional level of management and planning in the organization, I don&#8217;t see many clients using this feature.<\/p>\n<p style=\"padding-left: 30px\"><em>You can read more about these features here: <\/em><a href=\"https:\/\/powerbi.microsoft.com\/en-us\/blog\/announcing-new-data-protection-capabilities-in-power-bi\/\"><em>https:\/\/powerbi.microsoft.com\/en-us\/blog\/announcing-new-data-protection-capabilities-in-power-bi\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">In some ways linked to this, but way more important is the <strong>Data Lineage<\/strong> feature, an important feature for data warehouse environments.<\/p>\n<p style=\"padding-left: 30px\">In the past, Microsoft tried to include this feature inside <strong>ETL<\/strong> tools, it was when <strong>SSIS<\/strong> was still called <strong>DTS<\/strong> (ops, this may reveal my age!). It didn&#8217;t get the attention it deserves, in my opinion not because the feature was bad, but because usually data lineage, although very important, is way down in the list of concerns when building a data warehouse and many people working on this don&#8217;t even know exactly what data lineage is and why it&#8217;s important.<\/p>\n<p style=\"padding-left: 30px\">As the name stands for, the <strong>Data Lineage<\/strong> feature keeps track of the source data used to produce a report or dashboard. This feature can save your job when, after building a very expensive data warehouse, two dashboards built by different users show opposite information about the company business. Using the <strong>Data Lineage<\/strong>, you can track the information used by the users and identify why the dashboards are different, understanding each dashboard point of view.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1378\" height=\"895\" class=\"wp-image-86747\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/03\/a-screenshot-of-a-computer-description-automatica-3.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p style=\"padding-left: 30px\">So, will this feature work this time? I hope so.<\/p>\n<p style=\"padding-left: 30px\"><em>More about Power BI Data Lineage: <\/em><a href=\"https:\/\/docs.microsoft.com\/en-us\/power-bi\/service-data-lineage\"><em>https:\/\/docs.microsoft.com\/en-us\/power-bi\/service-data-lineage<\/em><\/a><em><br \/>\n<\/em><\/p>\n<p style=\"padding-left: 30px\">Now, let&#8217;s talk about the most surprising news about Power BI: the fact Microsoft is moving Power BI towards a position to replace Azure Analysis Service. Yes, replace Azure Analysis Services.<\/p>\n<p style=\"padding-left: 30px\">In order to better understand this, let\u2019s go back in time. <strong>SQL Server Analysis Services<\/strong> (<strong>SSAS<\/strong>) is an on-premises server built to create <strong>Semantic Models<\/strong> as <strong>Data Marts<\/strong>.<\/p>\n<p style=\"padding-left: 30px\"><strong>Data Mart<\/strong> is a focused piece of the Data Warehouse which can be distributed through the company. That\u2019s why <strong>Data Marts<\/strong> are usually built as <strong>Semantic Models<\/strong>, a model which can be easily understood by business people, less complex than the entire <strong>Data Warehouse<\/strong><\/p>\n<p style=\"padding-left: 30px\"><em>On this link, you will find more information about this move: <\/em><a href=\"https:\/\/powerbi.microsoft.com\/en-us\/blog\/power-bi-premium-and-azure-analysis-services\/\"><em>https:\/\/powerbi.microsoft.com\/en-us\/blog\/power-bi-premium-and-azure-analysis-services\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\">Before mentioning the most surprising news about <strong>Power BI<\/strong>, let&#8217;s analyse the current relationship between <strong>SSAS<\/strong> and <strong>Power BI<\/strong>.<\/p>\n<p style=\"padding-left: 30px\"><strong>SSAS<\/strong> is a tool which allows us to create a semantic model for our <strong>Data Warehouse<\/strong>. According to the data warehouse architecture, this is a <strong>Data Mart<\/strong>, a focused piece of the entire <strong>DW<\/strong>. The <strong>SSAS<\/strong> data mart is not only focused, but it&#8217;s a semantic model, meaning it&#8217;s built to be self-explaining to businesspeople. One of the best client tools for <strong>SSAS<\/strong> is <strong>Excel<\/strong>, once connected to <strong>SSAS<\/strong>, the data mart is exposed as a cube, allowing the business user to create pivot table reports using the measures they need at the moment, in a tool they are already used to.<\/p>\n<p style=\"padding-left: 30px\"><strong>SSAS<\/strong> supports two types of models: Multi-dimensional and Tabular. They have many differences between them, but these differences are disappearing with the evolution of the tool, although we can\u2019t say yet they have the same features.<\/p>\n<p style=\"padding-left: 30px\">In order to better understand the difference between the two models, you can follow the two articles in which I illustrate how to build the same data mart with each of the models, highlighting the differences.<\/p>\n<p style=\"padding-left: 30px\"><em>Multi-dimensional model: <\/em><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/creating-a-date-dimension-in-an-analysis-services-ssas-cube\/\"><em>https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/creating-a-date-dimension-in-an-analysis-services-ssas-cube\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>Tabular model: <\/em><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/creating-a-date-dimension-in-a-tabular-model\/\"><em>https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/creating-a-date-dimension-in-a-tabular-model\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><em>You can learn more about <\/em><strong><em>SSAS<\/em><\/strong><em> semantic model on this link: <\/em><a href=\"https:\/\/blogs.msdn.microsoft.com\/microsoft_press\/2017\/10\/23\/designing-a-multidimensional-bi-semantic-model\/\"><em>https:\/\/blogs.msdn.microsoft.com\/microsoft_press\/2017\/10\/23\/designing-a-multidimensional-bi-semantic-model\/<\/em><\/a><\/p>\n<p style=\"padding-left: 30px\"><strong>Azure Analysis Services<\/strong> only implements the tabular model, not the multi-dimensional model. This creates questions about the future of the multi-dimensional model, although the differences\/limitations of the tabular in relation to the multi-dimensional. For example, I\u2019m not a fan of the tabular limitation to have only one active relationship between two tables. What do you think about? Let\u2019s talk more in the comments.<\/p>\n<p style=\"padding-left: 30px\"><strong>Power BI<\/strong>, on the other hand, was created to be a <strong>Visualization<\/strong> and <strong>Self-Service BI<\/strong> tool. The idea is to allow the business users to access the corporate <strong>Data Warehouse<\/strong>\/<strong>Data Mart<\/strong> and merge it with any additional information the business needs, even information collected from the web. In this way, the user doesn&#8217;t need to make special requests to the corporate IT department, avoiding delays and overwhelming the IT department. As part of the <strong>Self-Service BI<\/strong> concept, <strong>Power BI<\/strong> also offers <strong>Self-Service ETL<\/strong> using <strong>Power Query<\/strong> and, more recently, <strong>Power BI Dataflows<\/strong>, allowing even a business user to create <strong>ETL<\/strong> code to retrieve the needed information.<\/p>\n<p style=\"padding-left: 30px\"><strong>Power BI<\/strong> was born from <strong>Excel<\/strong> plug-ins. Each part of Power BI was a different <strong>Excel<\/strong> plug-in, such as <strong>Power Query<\/strong> and <strong>Power Pivot<\/strong>. <strong>Power Pivot<\/strong>, on the other hand, was created from the same tabular engine than <strong>SSAS<\/strong>, just with some additional size limitations to fit into <strong>Excel<\/strong>.<\/p>\n<p style=\"padding-left: 30px\">Therefore, <strong>Power BI<\/strong>, <strong>SSAS<\/strong> and <strong>Azure Analysis Services<\/strong> share the same core engine for the tabular model and Microsoft is working to eliminate the differences. It makes sense the idea of replacing <strong>Azure Analysis Services<\/strong> with <strong>Power BI<\/strong>, leaving only two questions without a clear answer:<\/p>\n<ol>\n<li>The complete set of tabular model features is only available in Power BI Premium, which is expensive. Power BI Pro has limitations which prevent the development of a complete tabular semantic model. Analysis Services, on the other hand, has SSAS as an on-premises option, allowing the company to start the development on the correct way and scale up for bigger and cloud editions in the future. How to solve this dilemma?<\/li>\n<li>Will the tabular model, an in-memory solution, really replace the multi-dimensional model, which could store pre-calculations of many combinations of the dimensions?<\/li>\n<\/ol>\n<p style=\"padding-left: 30px\"><em>On this link, you will find more information about the replacement move between power bi premium and Azure Analysis Services: <\/em><a href=\"https:\/\/powerbi.microsoft.com\/en-us\/blog\/power-bi-premium-and-azure-analysis-services\/\"><em>https:\/\/powerbi.microsoft.com\/en-us\/blog\/power-bi-premium-and-azure-analysis-services\/<\/em><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><strong>Conclusion<\/strong><\/h3>\n<p>2020 is starting full of new and exciting technologies for <strong>Data Platform<\/strong> in the cloud, it\u2019s time to recycle our knowledge.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From last November 4 to 8 I have been at\u00a0PASS Summit in Seattle, attending excellent technical sessions and having the honour to deliver a\u00a0 lightning session, which is a small 10 minutes presentation. Why should you read this? Being one of the biggest data events in the world, PASS Summit usually exposes the present and&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[6810],"class_list":["post-86736","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/86736","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=86736"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/86736\/revisions"}],"predecessor-version":[{"id":86785,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/86736\/revisions\/86785"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=86736"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=86736"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=86736"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=86736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}