{"id":89544,"date":"2021-01-18T22:09:15","date_gmt":"2021-01-18T22:09:15","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=89544"},"modified":"2024-09-03T20:15:21","modified_gmt":"2024-09-03T20:15:21","slug":"power-bi-reading-parquet-from-a-data-lake","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/power-bi-reading-parquet-from-a-data-lake\/","title":{"rendered":"Power BI reading Parquet from a Data Lake"},"content":{"rendered":"<p>Data Lakes are becoming more usual every day and the need for tools to query them also increases.<\/p>\n<p>While writing about querying a data lake using <strong>Synapse<\/strong>, I stumbled upon a <strong>Power BI<\/strong> feature I didn\u2019t know was there.<\/p>\n<p>When reading from a data lake, each folder is like a table. We store in the folder many files with the same structure, each file containing a piece of the data.<\/p>\n<p>Data Lake tools are prepared to deal with the data on this way and read the files transparently for the user, but <strong>Power BI<\/strong> required us to read one specific file, not the folder. That&#8217;s until last November. If we google (<em>verb: To google<\/em>) about <strong>Power BI<\/strong> and Parquet files we can find many work arounds to read Parquet files in Power BI, but no mention to the new Parquet connector released on last November (<a href=\"https:\/\/powerbi.microsoft.com\/en-us\/blog\/whats-new-in-power-query-dataflows-november-2020\/\">https:\/\/powerbi.microsoft.com\/en-us\/blog\/whats-new-in-power-query-dataflows-november-2020\/<\/a>), so I had to write about it.<\/p>\n<p>The feature I\u2019m illustrating on this article is in fact a combination of two features:<\/p>\n<ul>\n<li>The feature to combine multiple files from Azure Data Lake Gen 2 storage. This was in preview in October 2019 in is available for a while, but I was surprised I couldn\u2019t find any article really explaining the M code used to combine the files and how to customize the code.<\/li>\n<li>The Parquet connector is the responsible to read Parquet files and adds this feature to the Azure Data Lake Gen 2. This connector was released in November 2020.<\/li>\n<\/ul>\n<p>In order to illustrate how it works, I provided some files to be used in an Azure Storage. You can download the files here. You will also need to <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/storage\/common\/storage-account-create?tabs=azure-portal&amp;WT.mc_id=DP-MVP-4014132\">provision a new storage account<\/a> and it will need to be an Azure Data Lake Storage Gen 2.<\/p>\n<p>On the examples, I will use the address <a href=\"https:\/\/lakedemo.dfs.core.windows.net\/opendatalake\/trips\">https:\/\/lakedemo.dfs.core.windows.net\/opendatalake\/trips<\/a> for the storage, but you need to replace it with the DFS endpoint of your own storage.<\/p>\n<p>Let\u2019s make a step-by step:<\/p>\n<ol>\n<li>Open <strong>Power BI<\/strong><\/li>\n<li>Select <strong>Get Data<\/strong> option on the main screen<\/li>\n<li>Select <strong>Azure Data Lake Storage Gen2<\/strong>. We will test directly with one of the most efficient options<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"410\" height=\"452\" class=\"wp-image-89545\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/word-image-62.png\" \/><\/p>\n<p>There are 3 storage options:<\/p>\n<ul>\n<li>Azure Blob Storage<\/li>\n<li>Data Lake Storage Gen 1<\/li>\n<li>Azure Data Lake Storage Gen 2<\/li>\n<\/ul>\n<p>It\u2019s important to choose the correct option according your storage type, this affects the performance.<\/p>\n<ol>\n<li>On the URL box, type this URL: <a href=\"https:\/\/lakedemo.dfs.core.windows.net\/opendatalake\/trips\">https:\/\/lakedemo.dfs.core.windows.net\/opendatalake\/trips<\/a><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"310\" class=\"wp-image-89546\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-text-application-descr.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click <strong>Ok<\/strong> button<\/li>\n<li>Click the <strong>Combine<\/strong> button<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"590\" height=\"452\" class=\"wp-image-89547\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-text-application-descr-1.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<p>This screen has the traditional <strong>Transform<\/strong> and <strong>Load<\/strong> buttons but also has the <strong>Combine<\/strong> button, which has both options, <strong>Transform<\/strong> and <strong>Load<\/strong>, below it.<\/p>\n<p>The traditional <strong>Transform<\/strong> and <strong>Load<\/strong> will be dealing with the list of files inside the Azure Storage folder. From this point, it will be our decision what to do with each file.<\/p>\n<p>The Combine button, on the other hand, will bring to us a pre-built M script to combine all the files in the folder. It\u2019s easy to mistake this feature believing Power BI will read only the current files, but in fact the script is flexible in such a way to read all the files in the folder, even future files included there.<\/p>\n<ol>\n<li>Select the option <strong>Combine &amp; Transform<\/strong><\/li>\n<li>Click <strong>Transform Data<\/strong> button<\/li>\n<\/ol>\n<h2>The M Code \u2013 How it Works<\/h2>\n<p>On <strong>Power Query<\/strong> window, you may notice the pre-built steps in the <strong>Applied Steps<\/strong> window. It\u2019s also very interesting the way the queries were built: The final query is in a folder called <em>Other Queries<\/em> while you also have a folder called <em>Helper Function<\/em> containing a parameterized function.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"304\" height=\"326\" class=\"wp-image-89548\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-application-word-descr.png\" alt=\"Graphical user interface, application, Word\n\nDescription automatically generated\" \/><\/p>\n<p>Let\u2019s analyze the M code to better understand how it works. Using the menu <strong>View<\/strong>-&gt; <strong>Advanced Editor<\/strong> you can access the M code.<\/p>\n<p>This is how our M code looks like:<\/p>\n<div class=\"m-result\">\n<div class=\"m-code\"><span class=\"constant keyword\" style=\"color: #0000ff\">let<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">Source<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">AzureStorage.DataLake<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;https:\/\/lakedemo.dfs.core.windows.net\/opendatalake\/trips&#8221;<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Filtered Hidden Files1&#8243;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.SelectRows<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">Source<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant keyword\" style=\"color: #0000ff\">each<\/span>\u00a0<span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">[<\/span><span class=\"identifier\" style=\"color: #000000\">Attributes<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">]<\/span><span class=\"constant operator\" style=\"color: #000000\">?<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">[<\/span><span class=\"identifier\" style=\"color: #000000\">Hidden<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">]<\/span><span class=\"constant operator\" style=\"color: #000000\">?<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0&lt;&gt;<\/span><span class=\"literal boolean\" style=\"color: #569cd6\">\u00a0true<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Invoke Custom Function1&#8243;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.AddColumn<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Filtered Hidden Files1&#8243;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"literal string\" style=\"color: #a31515\">&#8220;Transform File&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"constant keyword\" style=\"color: #0000ff\">each<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Transform File&#8221;<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">(<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">[<\/span><span class=\"identifier\" style=\"color: #000000\">Content<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">]<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">)<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\"><br \/>\n\u00a0\u00a0)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Renamed Columns1&#8243;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.RenameColumns<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">#&#8221;Invoke Custom Function1&#8243;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;Name&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"literal string\" style=\"color: #a31515\">&#8220;Source.Name&#8221;<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Removed Other Columns1&#8243;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.SelectColumns<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Renamed Columns1&#8243;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;Source.Name&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"literal string\" style=\"color: #a31515\">&#8220;Transform File&#8221;<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\"><br \/>\n\u00a0\u00a0)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Expanded Table Column1&#8243;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.ExpandTableColumn<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Removed Other Columns1&#8243;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"literal string\" style=\"color: #a31515\">&#8220;Transform File&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.ColumnNames<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">(<\/span><span class=\"identifier\" style=\"color: #000000\">#&#8221;Transform File&#8221;<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">(<\/span><span class=\"identifier\" style=\"color: #000000\">#&#8221;Sample File&#8221;<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">)<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">)<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\"><br \/>\n\u00a0\u00a0)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Removed Columns&#8221;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.RemoveColumns<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">#&#8221;Expanded Table Column1&#8243;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;Source.Name&#8221;<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Grouped Rows&#8221;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.Group<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Removed Columns&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;Month&#8221;<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;Trips&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant keyword\" style=\"color: #0000ff\">each<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">List.Sum<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">[<\/span><span class=\"identifier\" style=\"color: #000000\">Trips<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">]<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant keyword\" style=\"color: #0000ff\">type<\/span><span class=\"type\" style=\"color: #267f99\">\u00a0number<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">}<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\"><br \/>\n\u00a0\u00a0)<\/span><span class=\"constant keyword\" style=\"color: #0000ff\"><br \/>\nin<\/span><br \/>\n\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Grouped Rows&#8221;<\/span><\/div>\n<\/div>\n<p>These are the steps this code is executing:<\/p>\n<ul>\n<li>Filter all files, making sure to not include hidden files<\/li>\n<li>Use the <strong>AddColumn<\/strong> method to call the function <strong>\u201cTransform File\u201d<\/strong> for each row<\/li>\n<li>Remove additional columns, leaving only the file name and result of the function<\/li>\n<li>Expands the column containing the result of the function<\/li>\n<\/ul>\n<p>This main script calls the Transform File function for each file in the folder. There is no fixed file name, all the files will be transformed and returned. This means that at any time new files are included in this data lake folder, a simple refresh will bring the data to the dashboard, leaving the solution flexible as a client solution for a data lake needs to be.<\/p>\n<p>The <strong>M<\/strong> code for the <strong>Transform File<\/strong> function is this:<\/p>\n<div class=\"m-result\">\n<div class=\"m-code\"><span class=\"constant keyword\" style=\"color: #0000ff\">let<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">Source<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Parquet.Document<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">Parameter1<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Changed Type&#8221;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.TransformColumnTypes<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">Source<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;DateID&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant keyword\" style=\"color: #0000ff\">type<\/span><span class=\"type\" style=\"color: #267f99\">\u00a0text<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">}<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Added Custom&#8221;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.AddColumn<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">#&#8221;Changed Type&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"literal string\" style=\"color: #a31515\">&#8220;Month&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant keyword\" style=\"color: #0000ff\">each<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Text.Middle<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">(<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">[<\/span><span class=\"identifier\" style=\"color: #000000\">DateID<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">]<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><span class=\"literal number\" style=\"color: #098658\">4<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><span class=\"literal number\" style=\"color: #098658\">2<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">)<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Grouped Rows&#8221;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.Group<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">#&#8221;Added Custom&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;Month&#8221;<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;Trips&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"constant keyword\" style=\"color: #0000ff\">each<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.RowCount<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">_<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Int64.Type<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">}<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Changed Type1&#8243;<\/span><span class=\"constant operator operator-equality\" style=\"color: #000000\">\u00a0=<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Table.TransformColumnTypes<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">(<\/span><span class=\"identifier\" style=\"color: #000000\">#&#8221;Grouped Rows&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">{<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">{<\/span><span class=\"literal string\" style=\"color: #a31515\">&#8220;Month&#8221;<\/span><span class=\"constant\" style=\"color: #000000\">,<\/span>\u00a0<span class=\"identifier\" style=\"color: #000000\">Int64.Type<\/span><span class=\"constant bracket bracket-2\" style=\"font-weight: bold;color: darkgoldenrod\">}<\/span><span class=\"constant bracket bracket-1\" style=\"font-weight: bold;color: goldenrod\">}<\/span><span class=\"constant bracket bracket-0\" style=\"font-weight: bold;color: gold\">)<\/span><span class=\"constant keyword\" style=\"color: #0000ff\"><br \/>\nin<\/span><br \/>\n\u00a0\u00a0\u00a0\u00a0<span class=\"identifier\" style=\"color: #000000\">#&#8221;Changed Type1&#8243;<\/span><\/div>\n<div class=\"m-backlink\">\u00a0<\/div>\n<\/div>\n<p>The function is using the Parquet connector released in November to process the file.<\/p>\n<h2>Additional Transformations<\/h2>\n<p>Probably we would like to make additional transformations to the data. For that, we have a choice to make: If we make the transformations on the main query, all the files will be combined first and only after the combine our transformations will be executed.<\/p>\n<p>On the other hand, we have the option to make the transformations inside the function. If we do so, the transformations will be applied for each file before combining them. When they are combined, they will already be with the transformed result set.<\/p>\n<p>For each transformation, we will need to identify if it will perform better when executed for each file or when executed over the combined result.<\/p>\n<p>Let\u2019s compare both options.<\/p>\n<h3>Transformations on the Combined Result<\/h3>\n<ol>\n<li>Select the main query, <strong>Query1<\/strong><\/li>\n<li>Select the <strong>DateID<\/strong> column<\/li>\n<li>On the top bar, change <strong>Data Type<\/strong> to <strong>Text<\/strong><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"315\" height=\"359\" class=\"wp-image-89549\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-application-description.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click the <strong>Add Column<\/strong> menu<\/li>\n<li>Click <strong>Custom Column<\/strong> button<\/li>\n<li>On <strong>New Column Name<\/strong> box, set the name as <strong>Month<\/strong><\/li>\n<li>On <strong>Custom Column Formula<\/strong> box, set the expression as <em>=Text.Middle([DateID],4,2)<\/em><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"720\" height=\"452\" class=\"wp-image-89550\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-text-application-descr-2.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click <strong>Ok<\/strong><\/li>\n<li>Click the <strong>Group By<\/strong> button<\/li>\n<li>Select the <strong>Month<\/strong> column<\/li>\n<li>On <strong>New Column Name<\/strong> box type <strong>Trips<\/strong><\/li>\n<li>Keep the default Operation, Count Rows<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"868\" height=\"394\" class=\"wp-image-89551\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-application-description-1.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click Ok<\/li>\n<li>Select the <strong>Month<\/strong> column<\/li>\n<li>On the top bar, change <strong>Data Type<\/strong> to <strong>Whole Number<\/strong><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"452\" height=\"358\" class=\"wp-image-89552\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/table-description-automatically-generated-1.png\" alt=\"Table\n\nDescription automatically generated\" \/><\/p>\n<p>That\u2019s it, our ETL is ready to be used on the dashboards. Let\u2019s check the execution time of the ETLs<\/p>\n<ol>\n<li>Click <strong>Tools<\/strong> menu<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"532\" height=\"142\" class=\"wp-image-89553\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-application-word-descr-1.png\" alt=\"Graphical user interface, application, Word\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click the <strong>Start Diagnostics<\/strong> button<\/li>\n<li>Click <strong>Home<\/strong> menu<\/li>\n<li>Open the <strong>Refresh Preview<\/strong> drop down<\/li>\n<li>Click <strong>Refresh All<\/strong> menu item<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"202\" height=\"170\" class=\"wp-image-89554\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-application-word-descr-2.png\" alt=\"Graphical user interface, application, Word\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click <strong>Tools<\/strong> menu<\/li>\n<li>Click <strong>Stop Diagnostics<\/strong> button<\/li>\n<\/ol>\n<p>On the left side of the screen, in the query window, you will find a new folder called <strong>Diagnostics<\/strong> with two queries inside the folder, holding the results of the diagnostics.<\/p>\n<ol>\n<li>Click the <strong>Diagnostics_Aggregate<\/strong> query<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"384\" height=\"376\" class=\"wp-image-89555\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-application-description-2.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>On the<strong> StartTime<\/strong> column header, open the drop down<\/li>\n<li>Click the <strong>Sort Ascending<\/strong> menu item<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"260\" height=\"452\" class=\"wp-image-89556\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-text-application-descr-3.png\" alt=\"Graphical user interface, text, application\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Take note of the time on the first record<\/li>\n<li>On the <strong>StartTime<\/strong> column header, open the drop down<\/li>\n<li>Click <strong>Clear Sort<\/strong> menu item<\/li>\n<li>On the <strong>EndTime<\/strong> column header, open the drop down menu<\/li>\n<li>Click the <strong>Sort Descending<\/strong> menu item<\/li>\n<li>Take note of the time on the first record<\/li>\n<li>Calculate the time difference between the first time and the 2<sup>nd<\/sup> time you took note<\/li>\n<\/ol>\n<p>In my example, the total time was 7 seconds. You may find slight differences.<\/p>\n<h3>Transformations on Each File<\/h3>\n<p>Let\u2019s build the example again, this time building the transformations inside the function, so they would be applied for each file instead of the final result.<\/p>\n<ol>\n<li>Repeat the steps 1-7 from previous steps again<\/li>\n<li>Select the <strong>Transform Sample File<\/strong> query<\/li>\n<li>With the <strong>Transform Sample File<\/strong> query selected, repeat the steps 9-23<\/li>\n<li>Select the main query, <strong>Query1<\/strong><\/li>\n<li>Remove the last step in <strong>Applied Steps<\/strong> window, the <strong>Change Type<\/strong> step<\/li>\n<li>Select the first column, <strong>Source.Name<\/strong><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"588\" height=\"452\" class=\"wp-image-89557\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/table-description-automatically-generated-2.png\" alt=\"Table\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click the <strong>Remove Column<\/strong> button<\/li>\n<li>Click <strong>Group By<\/strong> button<\/li>\n<li>Select the <strong>Month<\/strong> column<\/li>\n<li>In the <strong>New Column Name<\/strong> box, type <strong>Month<\/strong><\/li>\n<li>On\u00a0<strong>Operation<\/strong> drop down, select <strong>Sum<\/strong><\/li>\n<li>On\u00a0<strong>Column<\/strong> drop down, select <strong>Trips<\/strong><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"390\" class=\"wp-image-89558\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/01\/graphical-user-interface-application-description-3.png\" alt=\"Graphical user interface, application\n\nDescription automatically generated\" \/><\/p>\n<ol>\n<li>Click <strong>Ok<\/strong><\/li>\n<li>Repeat Steps 24-39<\/li>\n<\/ol>\n<p>On my example, the execution time results in 4 seconds.<\/p>\n<h2>Conclusion<\/h2>\n<p>As you may notice, on this example the transformations made on each file performed better than the transformations made after the final combination. This illustrates how important it is to understand this structure and test the performance, deciding which one will perform better for your transformations.<\/p>\n<p>This new feature may not be so obvious, hidden in a <strong>Combine<\/strong> button and a complex M code structure, but it\u2019s still much better than possible work arounds for the problem.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data Lakes are becoming more usual every day and the need for tools to query them also increases. While writing about querying a data lake using Synapse, I stumbled upon a Power BI feature I didn\u2019t know was there. When reading from a data lake, each folder is like a table. We store in the&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,159166],"tags":[5364,123646,4162,101611],"coauthors":[6810],"class_list":["post-89544","post","type-post","status-publish","format-standard","hentry","category-blogs","category-powerbi","tag-azure","tag-data-lake","tag-microsoft","tag-power-bi"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/89544","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=89544"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/89544\/revisions"}],"predecessor-version":[{"id":89559,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/89544\/revisions\/89559"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=89544"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=89544"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=89544"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=89544"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}