{"id":98261,"date":"2023-09-20T17:00:16","date_gmt":"2023-09-20T17:00:16","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=98261"},"modified":"2024-09-03T20:15:18","modified_gmt":"2024-09-03T20:15:18","slug":"microsoft-fabric-checking-and-fixing-tables-v-order-optimization","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/microsoft-fabric-checking-and-fixing-tables-v-order-optimization\/","title":{"rendered":"Microsoft Fabric: Checking and Fixing Tables V-Order Optimization"},"content":{"rendered":"<p><a href=\"https:\/\/bufaloinfo-my.sharepoint.com\/:u:\/g\/personal\/dennes_dtowersoftware_com\/ETivrFKefglNt8NpsgiPXmwBQAJRKpnBJr9psGRHbMloGQ?e=FP9kgk\">Download the notebook used on this blog<\/a><\/p>\n<p>I explained in a previous article <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/fabric-lakehouse-convert-to-table-feature-and-workspace-level-spark-configuration\/\">how the Tables in a lakehouse are V-Order optimized<\/a>. We noticed this configuration depends on our settings, which can be enabled or not.<\/p>\n<p><strong><em>One question remains: How could we check if the tables are V-Order optimized or not?<\/em><\/strong><\/p>\n<p>The tables we will use in this example are the same provided by Microsoft as a sample and we described in details how to load them in <a href=\"https:\/\/www.red-gate.com\/simple-talk\/cloud\/azure\/microsoft-fabric-lakehouse-and-data-factory-in-power-bi-environment\/\">our article about lakehouse<\/a>.<\/p>\n<p>You can check more details about <a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/data-engineering\/delta-optimization-and-v-order?tabs=sparksql\">V-ORDER Optimization on this link<\/a><\/p>\n<h2>Checking tables configuration<\/h2>\n<p>We can use the following <strong>PySpark<\/strong> script to check the configuration of the tables in our l<strong>akehouse<\/strong>:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\"><span style=\"color: #008800;font-weight: bold\">import<\/span> <span style=\"color: #0e84b5;font-weight: bold\">pyarrow.dataset<\/span> <span style=\"color: #008800;font-weight: bold\">as<\/span> <span style=\"color: #0e84b5;font-weight: bold\">pq<\/span>\r\n<span style=\"color: #008800;font-weight: bold\">import<\/span> <span style=\"color: #0e84b5;font-weight: bold\">os<\/span>\r\n<span style=\"color: #008800;font-weight: bold\">def<\/span> <span style=\"color: #0066bb;font-weight: bold\">show_metadata<\/span>(delta_file_path,tablename):\r\n \r\n\r\n <span style=\"color: #888888\"># Print schema metadata<\/span>\r\n\r\n <span style=\"color: #008800;font-weight: bold\">print<\/span>(f<span style=\"background-color: #fff0f0\">\"<\/span><span style=\"color: #666666;font-weight: bold;background-color: #fff0f0\">\\n<\/span><span style=\"background-color: #fff0f0\">Schema Properties for : {tablename}\"<\/span>)\r\n\r\n <span style=\"color: #008800;font-weight: bold\">print<\/span>(<span style=\"background-color: #fff0f0\">\"--------------------\"<\/span>)\r\n\r\n schema_properties <span style=\"color: #333333\">=<\/span> pq<span style=\"color: #333333\">.<\/span>dataset(delta_file_path)<span style=\"color: #333333\">.<\/span>schema<span style=\"color: #333333\">.<\/span>metadata\r\n\r\n <span style=\"color: #008800;font-weight: bold\">if<\/span> schema_properties:\r\n\r\n     <span style=\"color: #008800;font-weight: bold\">for<\/span> key, value <span style=\"color: #000000;font-weight: bold\">in<\/span> schema_properties<span style=\"color: #333333\">.<\/span>items():\r\n         <span style=\"color: #008800;font-weight: bold\">print<\/span>(f<span style=\"background-color: #fff0f0\">\"{key.decode('utf-8')}: {value.decode('utf-8')}\"<\/span>)\r\n\r\n <span style=\"color: #008800;font-weight: bold\">else<\/span>:\r\n\r\n    <span style=\"color: #008800;font-weight: bold\">print<\/span>(<span style=\"background-color: #fff0f0\">\"No schema properties found.\"<\/span>)\r\n\r\n\r\n <span style=\"color: #888888\"># Test the function with a path to your delta file#<\/span>\r\n\r\nfull_tables <span style=\"color: #333333\">=<\/span> os<span style=\"color: #333333\">.<\/span>listdir(<span style=\"background-color: #fff0f0\">'\/lakehouse\/default\/Tables'<\/span>)\r\n\r\n<span style=\"color: #008800;font-weight: bold\">for<\/span> table <span style=\"color: #000000;font-weight: bold\">in<\/span> full_tables:\r\n  \r\n  show_metadata(<span style=\"background-color: #fff0f0\">'\/\/lakehouse\/default\/Tables\/'<\/span> <span style=\"color: #333333\">+<\/span> table,table)\r\n<\/pre>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1540\" height=\"453\" class=\"wp-image-98262\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/a-screenshot-of-a-computer-description-automatica.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>This script is inspired on a script provided by a user in the <a href=\"https:\/\/community.fabric.microsoft.com\/\">Fabric Community<\/a> forums.<\/p>\n<p>What we should notice about how this code works:<\/p>\n<ul>\n<li>We have a procedure defined, called <em>show_metadata<\/em>.<\/li>\n<li>The procedure receives an array of table names as a parameter.<\/li>\n<li>In order to generate the array dynamically, we use <em>os.listdir<\/em>. In this way, the script works for all tables in the l<strong>akehouse<\/strong>.<\/li>\n<li>The <em>show_metadata<\/em> procedure is called in a loop for each one of the tables. Mind the standard path for the tables in the <strong>lakehouse<\/strong>.<\/li>\n<li>Inside the procedure, we generate a dataset and use the <strong>properties .schema.metadata<\/strong> to read the properties of the table.<\/li>\n<li>We make a loop on the table properties to show each one.<\/li>\n<li>The table properties need to be decoded from <strong>UTF-8<\/strong><\/li>\n<\/ul>\n<p>The image below shows a piece of the result after executing this script, highlighting the <strong>V-ORDER<\/strong> property on the tables:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1364\" height=\"505\" class=\"wp-image-98263\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/a-close-up-of-text-description-automatically-gene.png\" alt=\"A close-up of text\n\nDescription automatically generated\" \/><\/p>\n<h2>Using the script to test the configurations<\/h2>\n<p>Using this script, we can test the default configuration in different situation, and we will discover the following:<\/p>\n<ul>\n<li>The default configuration for spark sessions is to use <strong>V-ORDER<\/strong> when writing tables unless we explicitly disable it.<\/li>\n<li>This makes the <strong>V-ORDER<\/strong> also default when using the <strong>Convert-To-Table<\/strong> UI, unless we disable it on workspace levell (Why would we do such thing?)<\/li>\n<\/ul>\n<h2>Fixing a table without V-ORDER<\/h2>\n<p>Considering <strong>V-ORDER<\/strong> is default on spark sessions in Fabric, it will not be common to have tables not using <strong>V-ORDER<\/strong>.<\/p>\n<p>But let&#8217;s create one in this situation and discover how to fix it when needed.<\/p>\n<p>If the table <em>dimension_stock_item<\/em> already exists in your demo environment, you will need to drop it before running this script.<\/p>\n<p>A simple <strong>spark SQL<\/strong> statement solves it:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\"><span style=\"color: #333333\">%%<\/span>sql\r\ndrop table dimension_stock_item\r\n<\/pre>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"496\" height=\"93\" class=\"wp-image-98264\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/word-image-98261-3.png\" \/><\/p>\n<p>We can use the following <strong>pyspark<\/strong> code to create a table without the <strong>v-order<\/strong> optimization:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\"><span style=\"color: #008800;font-weight: bold\">from<\/span> <span style=\"color: #0e84b5;font-weight: bold\">pyspark.sql.types<\/span> <span style=\"color: #008800;font-weight: bold\">import<\/span> <span style=\"color: #333333\">*<\/span>\r\n\r\nspark<span style=\"color: #333333\">.<\/span>conf<span style=\"color: #333333\">.<\/span>set(<span style=\"background-color: #fff0f0\">\"spark.sql.parquet.vorder.enabled\"<\/span>, <span style=\"background-color: #fff0f0\">\"false\"<\/span>)\r\n\r\n<span style=\"color: #008800;font-weight: bold\">def<\/span> <span style=\"color: #0066bb;font-weight: bold\">loadFullDataFromSource<\/span>(table_name):\r\n\r\n    df <span style=\"color: #333333\">=<\/span> spark<span style=\"color: #333333\">.<\/span>read<span style=\"color: #333333\">.<\/span>format(<span style=\"background-color: #fff0f0\">\"parquet\"<\/span>)<span style=\"color: #333333\">.<\/span>load(<span style=\"background-color: #fff0f0\">'Files\/'<\/span> <span style=\"color: #333333\">+<\/span> table_name)\r\n\r\n    df<span style=\"color: #333333\">.<\/span>write<span style=\"color: #333333\">.<\/span>mode(<span style=\"background-color: #fff0f0\">\"overwrite\"<\/span>)<span style=\"color: #333333\">.<\/span>format(<span style=\"background-color: #fff0f0\">\"delta\"<\/span>)<span style=\"color: #333333\">.<\/span>save(<span style=\"background-color: #fff0f0\">\"Tables\/\"<\/span> <span style=\"color: #333333\">+<\/span> table_name)\r\n\r\nfull_tables <span style=\"color: #333333\">=<\/span> [\r\n\r\n   <span style=\"background-color: #fff0f0\">'dimension_stock_item'<\/span>\r\n\r\n]\r\n\r\n<span style=\"color: #008800;font-weight: bold\">for<\/span> table <span style=\"color: #000000;font-weight: bold\">in<\/span> full_tables:\r\n\r\n    loadFullDataFromSource(table)\r\n<\/pre>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"725\" height=\"324\" class=\"wp-image-98265\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/a-screenshot-of-a-computer-program-description-au.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" \/><\/p>\n<p>We should mind the following about this <strong>pyspark<\/strong> script:<\/p>\n<ul>\n<li>We have a <em>loadFullDataFromSource <\/em>procedure to make the load in a default way. It works unless the table has some specific partitioning.<\/li>\n<li>We disable the v-order configuration for the session on the start of the script.<\/li>\n<li>We create an array of the tables, but it has only one table. It&#8217;s only a standard way to build it.<\/li>\n<li>We make a for loop on the array, calling the <em>loadFullDataFromSource<\/em> procedure. Once again, it&#8217;s only a standard way to build it, since in our example it&#8217;s only for one table.<\/li>\n<\/ul>\n<p>Once the import is executed with the <strong>V-ORDER <\/strong>disabled, we can execute again the script to check the Tables metadata and we may notice the lack of <strong>V-Order<\/strong> property on this table. The image below illustrates this:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1260\" height=\"249\" class=\"wp-image-98266\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/word-image-98261-5.png\" \/><\/p>\n<h2>Fixing the lack of V-ORDER<\/h2>\n<p>We need to fix this problem without loading the entire table again. In order to do so, we can use the <strong>OPTMIZE<\/strong> statement. This statement can reorganize the <strong>PARQUET<\/strong> files to avoid the small file size problem and apply the <strong>V-ORDER<\/strong> when this one is missing.<\/p>\n<p>It&#8217;s very simple:<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff;overflow: auto;width: auto;border: solid gray;border-width: .1em .1em .1em .8em;padding: .2em .6em\">\n<pre style=\"margin: 0;line-height: 125%\" class=\"crayon:false\"><span style=\"color: #333333\">%%<\/span>sql\r\nOPTIMIZE dimension_stock_item VORDER;\r\n<\/pre>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"490\" height=\"91\" class=\"wp-image-98267\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/a-white-rectangular-object-with-black-text-descri.png\" alt=\"A white rectangular object with black text\n\nDescription automatically generated\" \/><\/p>\n<p>We can execute the script to check the table\u2019s metadata again and there it is: The table will have the <strong>V-ORDER<\/strong> enabled.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1219\" height=\"258\" class=\"wp-image-98268\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/a-computer-screen-shot-of-text-description-automa.png\" alt=\"A computer screen shot of text\n\nDescription automatically generated\" \/><\/p>\n<h2>Disclaimer<\/h2>\n<p>On the documentation, the <strong>OPTIMIZE<\/strong> is used to reorganize the files and apply the <strong>V-ORDER<\/strong>. The <strong>ALTER TABLE<\/strong>, also present on the documentation, is used to set a <strong>TBLPROPERTY <\/strong>which should ensure future writes in the table use <strong>V-ORDER<\/strong>.<\/p>\n<p><strong>TBLPROPERTY<\/strong> seems to have no relation with the metadata extracted by the script illustrated here. The <strong>TBLPROPERTY<\/strong> could be set and the metadata not, and the opposite.<\/p>\n<p>Microsoft Fabric is still in preview, many of these details are still going to change and the documentation may not be precise.<\/p>\n<h2>Conclusion<\/h2>\n<p>All the defaults lead to the <strong>V-ORDER<\/strong> always being enabled in <strong>lakehouse<\/strong> (<strong>data warehouses<\/strong> are a completely different story). But it&#8217;s important to be able to check it every time a different scenario appears.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Download the notebook used on this blog I explained in a previous article how the Tables in a lakehouse are V-Order optimized. We noticed this configuration depends on our settings, which can be enabled or not. One question remains: How could we check if the tables are V-Order optimized or not? The tables we will&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,159164,159166],"tags":[123648,159034,158998,158997,101611],"coauthors":[6810],"class_list":["post-98261","post","type-post","status-publish","format-standard","hentry","category-blogs","category-microsoft-fabric","category-powerbi","tag-data-platform","tag-delta","tag-lakehouse","tag-microsoft-fabric","tag-power-bi"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98261","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=98261"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98261\/revisions"}],"predecessor-version":[{"id":98273,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98261\/revisions\/98273"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=98261"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=98261"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=98261"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=98261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}