{"id":94424,"date":"2022-05-23T17:00:49","date_gmt":"2022-05-23T17:00:49","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=94424"},"modified":"2024-09-03T20:15:19","modified_gmt":"2024-09-03T20:15:19","slug":"power-bi-when-a-power-query-native-query-is-not-enough","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/power-bi-when-a-power-query-native-query-is-not-enough\/","title":{"rendered":"Power BI: When a Power Query Native Query is not enough"},"content":{"rendered":"<p>In <strong>Power BI<\/strong>, when importing data with <strong>Power Query<\/strong>, one basic performance concept is the use of native queries. The transformations will perform better if they can be converted to a native query, especially a single native query for all transformations.<\/p>\n<p>However, this is just a starting point for the optimizations. Sometimes, native queries for the transformations are not enough.<\/p>\n<h2>Test Environment<\/h2>\n<p>This will be the starting point:<\/p>\n<ul>\n<li>An <strong>Azure SQL Database<\/strong> using the sample <em>AdventureWorskLT<\/em><\/li>\n<li>You need to execute the script <em>Make_big_adventure.SQL<\/em> adapted for the <em>AdventureWorksLT<\/em>. You can find it on <a href=\"https:\/\/github.com\/DennesTorres\/BigAdventureAndQSHints\/blob\/main\/make_big_adventureLT.sql\">https:\/\/github.com\/DennesTorres\/BigAdventureAndQSHints\/blob\/main\/make_big_adventureLT.sql<\/a><\/li>\n<li>Recommendation: The <strong>Azure SQL Database<\/strong> is recommended to have 10 DTU\u2019s or more. Less than that and some slowness may be noticed<\/li>\n<li>In Power BI we will import the tables <em>BigProduct<\/em>, <em>BigTransactionHistory<\/em> and <em>SalesLT.ProductModel<\/em><\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"263\" height=\"144\" class=\"wp-image-94426\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-text-application-descr-14.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<h2>Transformations and a Date Dimension<\/h2>\n<p>A model needs a date dimension. Every fact happens on a date and the date is an important dimension to analyse the fact. In this example, the <em>TransactionDate<\/em> column is found in the <em>TransactionHistory<\/em> table.<\/p>\n<p>Why is the <em>TransactionDate<\/em> field is not enough, you may ask.<\/p>\n<p>When analysing the facts, it might be analysed by Year, Month, Day, Day of the week, and much more. If relying only on the <em>TransactionDate<\/em> field, you will need to create <strong>DAX<\/strong> measures, and this would impact the performance of your model.<\/p>\n<p>Building a date dimension, you will not have the need to build so many <strong>DAX<\/strong> expressions and the model will have better performance.<\/p>\n<p>We can build a dynamic date dimension, retrieving the minimum and maximum date from the <em>TransactionHistory<\/em> table for that. That&#8217;s where our problems start.<\/p>\n<h2>The Wrong way<\/h2>\n<p>1) On <em>TransactionHistory<\/em> table, select the column <em>TransactionDate<\/em><\/p>\n<p>2) Change the <em>Data Type<\/em> of the column to <em>Date<\/em><\/p>\n<p>If you need to handle time in your model, date and time needs to be two different dimensions and two different fields in the fact table. A time dimension will have only 24 rows if built with hour granularity, 1440 if built with minute granularity and so on. On the other hand, if Date and Time were managed as a single dimension, we would have 1440 rows for each day, or something similar. That&#8217;s why date and time needs to be different dimensions.<\/p>\n<p>In our example, we don&#8217;t really have time information. We will just ignore the time by changing the data type to date.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"541\" height=\"386\" class=\"wp-image-94427\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/table-description-automatically-generated-9.png\" alt=\"Table\n\nDescription automatically generated\" \/><\/p>\n<p>3) Righ-click the <em>TransactionHistory<\/em> table<\/p>\n<p>4) Select the Reference menu item<\/p>\n<p>We can duplicate the <em>TransactionHistory<\/em> query or make a reference to it. Let&#8217;s start with a reference and understand the consequences later.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"362\" height=\"506\" class=\"wp-image-94428\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-word-descr-6.png\" alt=\"Graphical user interface, application, Word\n\nDescription automatically generated\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-word-descr-6.png 362w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-word-descr-6-215x300.png 215w\" sizes=\"auto, (max-width: 362px) 100vw, 362px\" \/><\/p>\n<p>5) On the new query, select the <em>TransactionDate<\/em> column<\/p>\n<p>6) Click the <em>Remove Other Columns<\/em> menu item<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"329\" height=\"280\" class=\"wp-image-94429\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-table-desc-2.png\" alt=\"Graphical user interface, application, table\n\nDescription automatically generated\" \/><\/p>\n<p>7) Click the button on the right side of the <em>TransactionDate <\/em>column header<\/p>\n<p>8) Click the <em>Sort Ascending<\/em> menu item<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"445\" height=\"500\" class=\"wp-image-94430\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-description-25.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p>9) Click the menu option <em>Keep Top Rows<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"736\" height=\"313\" class=\"wp-image-94431\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-word-descr-7.png\" alt=\"Graphical user interface, application, Word\n\nDescription automatically generated\" \/><\/p>\n<p>10) On the <em>Keep Top Rows<\/em> window, type <em>&#8220;1&#8221;<\/em> to keep only 1 row<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"875\" height=\"273\" class=\"wp-image-94432\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-description-automaticall-4.png\" alt=\"Graphical user interface\n\nDescription automatically generated\" \/><\/p>\n<p>11) Right click the value of the row and click the menu item <em>Drill down<\/em><\/p>\n<p>Even with a single row and field, the result of the query is still a table. We need to transform it to a single value to use it as a parameter for the function we will build next.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"443\" height=\"253\" class=\"wp-image-94433\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-text-application-descr-15.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<p>12) Right-click the <em>&#8220;TransactionHistory (2)&#8221;<\/em> table and disable the option Enable Load<\/p>\n<p>We don&#8217;t need this value to be part of the model. But if we leave the load enabled, a new step will be created in the end of the query to transform it into a table, and it will end up failing.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"349\" height=\"460\" class=\"wp-image-94434\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-description-26.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p>13) Rename the <em>&#8220;TransactionHistory (2)&#8221;<\/em> table to <em>MinDate<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"329\" height=\"341\" class=\"wp-image-94435\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-description-27.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p>14) Repeat the steps 4-12, but now sorting in descending order<\/p>\n<p>15) Rename the new table to <em>MaxDate<\/em><\/p>\n<h2>Power Query: The Problem<\/h2>\n<p>If you right click the Keep Top Rows step of the <em>MinDate<\/em> query, you may notice the <em>View Native Query<\/em> is active. This option is only disabled on the <em>Drill Down<\/em> to the <em>TransactionDate<\/em> field.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"312\" height=\"610\" class=\"wp-image-94436\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-description-28.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p>All the most expensive steps were transformed into a native query. A superficial view would make us believe the transformations are as optimized as possible, but that&#8217;s not true at all.<\/p>\n<p>The query below is the native query built by <strong>Power BI<\/strong>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"867\" height=\"720\" class=\"wp-image-94437\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-text-application-descr-16.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<p>Let&#8217;s analyse the execution plan. We can copy the query from <strong>Power BI<\/strong> to <strong>SSMS<\/strong> and check the estimated execution plan. As you may notice, this execution plan is terrible:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1114\" height=\"187\" class=\"wp-image-94438\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/a-picture-containing-word-description-automatical.png\" alt=\"A picture containing Word\n\nDescription automatically generated\" \/><\/p>\n<ul>\n<li>It&#8217;s using a table scan, there is no index for this execution plan<\/li>\n<li>There is a Sort operation. Sort operations in execution plans are very heavy and should be avoided at all costs.<\/li>\n<\/ul>\n<p>The first idea would be to create an index based on the <em>TransactionDate<\/em>, the column used in the transformations. The Create Index statement would be like this one:<\/p>\n<div><span style=\"font-family: Courier New;font-size: 10pt\"> <span style=\"color: blue\">CREATE<\/span>\u00a0<span style=\"color: blue\">NONCLUSTERED<\/span>\u00a0<span style=\"color: blue\">INDEX<\/span>\u00a0<span style=\"color: maroon\">[inddate]<\/span> <br \/>\n\u00a0\u00a0<span style=\"color: blue\">ON<\/span>\u00a0<span style=\"color: maroon\">[dbo]<\/span><span style=\"color: silver\">.<\/span><span style=\"color: maroon\">[bigTransactionHistory]<\/span>\u00a0<span style=\"color: maroon\">(<\/span>\u00a0<span style=\"color: maroon\">[transactiondate]<\/span>\u00a0<span style=\"color: blue\">ASC<\/span>\u00a0<span style=\"color: maroon\">)<\/span> <\/p>\n<p><span style=\"color: maroon\">go<\/span>\u00a0 <\/span><\/div>\n<p>After creating the index, this will be the new query plan:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1109\" height=\"170\" class=\"wp-image-94439\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-description-29.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p>The table scan was replaced by an <strong>Index Scan<\/strong>, but the <strong>Sort<\/strong> operation is still present, and you may notice it takes 95% of the query cost.<\/p>\n<p>You may ask why the <strong>Sort<\/strong> was not solved by the index itself. If you check the query, you may notice the inner queries use a <em>Convert<\/em> function over the <em>TransactionDate<\/em> field to transform it to the <em>Date<\/em> type.<\/p>\n<p>The <strong>Order By<\/strong> is executed over the result of the <em>Convert<\/em>, so it can&#8217;t use the index. The <em>Convert <\/em>function needs to be executed first and the result needs to be ordered.<\/p>\n<p><strong>In Summary:<\/strong> The order of the transformations is affecting the query performance. If the data type were one of the last transformations, the query plan could be better. But before reaching the solution, we will need to solve another problem.<\/p>\n<h2>Reference vs Duplicate<\/h2>\n<p>The data type transformation is located on the <em>TransactionHistory<\/em> table. The queries to calculate the <em>MaxDate<\/em> and <em>MinDate<\/em> have reference to the <em>TransactionHistory<\/em> query, so they all contain the data type conversion.<\/p>\n<p>We could think about removing the data type conversion from the <em>TransactionHistory <\/em>query, but this would not work very well.<\/p>\n<p>On the result, the <em>TransactionHistory<\/em> table will need to be linked with the date dimension. Both date fields will need to have the same data type, so the <em>TransactionHistory<\/em> query will need the data type transformation.<\/p>\n<p>The solution for this problem is to use duplicate, instead of reference. If we duplicate the <em>TransactitonHistory<\/em> query before applying the data type transformation, we will have control of the data type transformation on the <em>MinDate<\/em> and <em>MaxDate<\/em> query and we will still be able to apply the same data type transformation on the <em>TransactionHistory<\/em> without affecting the other ones.<\/p>\n<p>This is a very interesting example because we can clearly see the difference between <em>Reference<\/em> and <em>Duplicate<\/em> of a query and this example will only have good performance if we duplicate the query.<\/p>\n<p>But when duplicating the query, aren&#8217;t we multiplying the execution time? If the queries are completely transformed in different native queries, the &#8220;duplication&#8221; of the execution time would happen anyway but isolating the queries with the Duplicate option we can optimize each one to make them faster.<\/p>\n<p><strong>In summary<\/strong>, on our example the secret is duplicate the <em>TransactionHistory<\/em> before changing the data type, implement each of the duplications, leaving the change of the data type for last and finally changing the data type of the <em>TransactionDate<\/em> field in the <em>TransactionHistory<\/em> query.<\/p>\n<h2>The Result<\/h2>\n<p>The sequence of the tasks is different, we leave the change data type and drill down for last. They will be executed over a single value and will not become part of the native query.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"328\" height=\"641\" class=\"wp-image-94440\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-description-30.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<p>The native query is simplified, without the type conversion.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"874\" height=\"264\" class=\"wp-image-94441\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-text-application-email-13.png\" alt=\"Graphical user interface, text, application, email\n\nDescription automatically generated\" \/><\/p>\n<p>This makes a way better query plan, making a good use of the index for the transformations and making the result way faster<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"538\" height=\"183\" class=\"wp-image-94442\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/text-description-automatically-generated-with-med.png\" alt=\"Text\n\nDescription automatically generated with medium confidence\" \/><\/p>\n<h2>After Match<\/h2>\n<p>After analysing and solving these performance problems, let&#8217;s complete the example creating the date dimension.<\/p>\n<p>We can use a function written in <strong>M<\/strong> by Chris Web. You can find the function on this link <a href=\"https:\/\/blog.crossjoin.co.uk\/2013\/11\/19\/generating-a-date-dimension-table-in-power-query\/\">https:\/\/blog.crossjoin.co.uk\/2013\/11\/19\/generating-a-date-dimension-table-in-power-query\/<\/a><\/p>\n<ol>\n<li value=\"16\">On the top menu, click the button <em>New Query-&gt; Blank Query<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"185\" height=\"591\" class=\"wp-image-94443\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/diagram-description-automatically-generated-with-1.png\" alt=\"Diagram\n\nDescription automatically generated with medium confidence\" \/><\/p>\n<ol>\n<li value=\"17\">Click on the menu <em>View-&gt;Advanced Editor<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"516\" height=\"151\" class=\"wp-image-94444\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-table-desc-3.png\" alt=\"Graphical user interface, application, table\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li value=\"18\">In the <em>Advanced Editor<\/em> window, paste the query copies from the Chris Webb blog<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1026\" height=\"720\" class=\"wp-image-94445\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/text-description-automatically-generated-1.png\" alt=\"Text\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li value=\"19\">Click the <em>Done<\/em> button<\/li>\n<li>Rename the function to <em>BuildDateDimension<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"260\" height=\"228\" class=\"wp-image-94446\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-description-31.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li value=\"21\">On the top menu, click the button <em>New Query-&gt; Blank Query<\/em><\/li>\n<li>Click on the menu <em>View-&gt;Advanced Editor<\/em><\/li>\n<li>Add the following Query:<\/li>\n<\/ol>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\">let\r\n  Source <span style=\"color: #333333\">=<\/span> BuildDateDimension(MinDate,MaxDate)\r\n<span style=\"color: #000000;font-weight: bold\">in<\/span>\r\n  Source\r\n<\/pre>\n<\/div>\n<ol>\n<li value=\"24\">Click the <em>Done<\/em> button<\/li>\n<li>Rename the query to <em>DateDim<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"257\" height=\"261\" class=\"wp-image-94447\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/05\/graphical-user-interface-application-word-descr-8.png\" alt=\"Graphical user interface, application, Word\n\nDescription automatically generated\" \/><\/p>\n<h2>Lessons Learned<\/h2>\n<ul>\n<li>It&#8217;s important to know <strong>SQL Server<\/strong> and query optimization to work with Power BI<\/li>\n<li>Sometimes the optimization is beyond Power BI, it&#8217;s on the source system<\/li>\n<li>In Power bi ELT&#8217;s, if you make table level transformations and filters first and leave column level transformations for last, the native queries may be easier to optimize<\/li>\n<li>You need to take care with the decision between Reference and Duplicate<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>In Power BI, when importing data with Power Query, one basic performance concept is the use of native queries. The transformations will perform better if they can be converted to a native query, especially a single native query for all transformations. However, this is just a starting point for the optimizations. Sometimes, native queries for&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":94428,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,159166],"tags":[101611,143554,5993],"coauthors":[6810],"class_list":["post-94424","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blogs","category-powerbi","tag-power-bi","tag-power-query","tag-query-plans"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/94424","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=94424"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/94424\/revisions"}],"predecessor-version":[{"id":104260,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/94424\/revisions\/104260"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/94428"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=94424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=94424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=94424"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=94424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}