{"id":102060,"date":"2024-05-06T16:07:53","date_gmt":"2024-05-06T16:07:53","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=102060"},"modified":"2024-09-03T20:04:45","modified_gmt":"2024-09-03T20:04:45","slug":"how-to-build-metadata-driven-pipelines-in-microsoft-fabric","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/how-to-build-metadata-driven-pipelines-in-microsoft-fabric\/","title":{"rendered":"How to Build Metadata Driven Pipelines in Microsoft Fabric"},"content":{"rendered":"<p>Over the past years, \u201ctraditional\u201d ETL development has morphed into <strong>data engineering<\/strong>, which has a more disciplined software engineering approach. One of the benefits of having a more code-based approach in data pipelines is that it has become easier to build <strong>metadata driven pipelines<\/strong>. What does this mean exactly?<\/p>\n<p>Say for example you need to copy a set of tables from a source database into data lake storage. If you have 50 source tables, you don\u2019t want to create 50 distinct pipelines manually. It\u2019s not only very time-consuming (and boring), but it\u2019s also hard to maintain. Suppose you need to add an audit column to each result set. This means you need to update 50 pipelines!<\/p>\n<p>In the \u201cold days\u201d of SSIS, you could use <a href=\"https:\/\/www.mssqltips.com\/sqlservertutorial\/9089\/introduction-to-the-biml-language\/?utm_source=KoenVerbeeck\">BIML<\/a> to generate your SSIS packages (you still ended up with 50 different packages, but at least you didn\u2019t have to create them yourself). In code-based pipelines like notebooks in Spark, you can write Python to extract metadata of the source tables and use that data to dynamically extract all the source data into different data frames and dump those into data lake storage. In SQL Server, you might use <em>dynamic SQL<\/em> to generate SQL statements on-the-fly to do the transformations dynamically.<\/p>\n<p>The goal of metadata driven code is that you <a href=\"https:\/\/sqlkover.com\/build-once-add-metadata\/\">build something only once<\/a>. You need to extract from relational databases? You build one pipeline that can connect to a relational source, and you <strong>parameterize everything<\/strong> (server name, database name, source schema, source table, destination server name, destination table et cetera). Once this parameterized piece of code is ready, all you must do is enter metadata about the sources you want to extract. If at a later point an additional relational source needs to be extracted, you don\u2019t need to create a brand-new pipeline. All you need to do is enter a new line of data in your metadata repository.<\/p>\n<p>Aside from speeding up development \u2013 after you\u2019ve made the initial effort of creating your metadata driven pipeline \u2013 is that everything is consistent. You tackle a certain pattern always in the same way. If there\u2019s a bug, you need to fix it in one single location.<\/p>\n<p>In this article, we will guide you through creating metadata driven pipelines in <a href=\"https:\/\/www.microsoft.com\/en-us\/microsoft-fabric\">Microsoft Fabric<\/a>. Pipelines are a low-code to no-code environment, but we can still implement metadata driven patterns in them using parameterization. Pipelines in Fabric are similar to pipelines in <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/data-factory\">Azure Data Factory<\/a> or <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/data-factory\/concepts-pipelines-activities?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext&amp;tabs=synapse-analytics\">Azure Synapse Analytics Pipelines<\/a>, but there are differences.<\/p>\n<p>If you want to follow along with the examples, you can get a <a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/get-started\/fabric-trial\">free trial of Fabric<\/a> (as long as the offer stands) and a <a href=\"https:\/\/azure.microsoft.com\/en-us\/free\">free Azure trial<\/a>. You can also create a <a href=\"https:\/\/www.mssqltips.com\/sqlservertip\/7754\/microsoft-fabric-capacities-and-cost-structure\/\">cheap pay-as-you-go Fabric capacity<\/a> in Azure, just don\u2019t forget to <a href=\"https:\/\/www.mssqltips.com\/sqlservertip\/7774\/save-money-microsoft-fabric-pause-start-capacity\/\">pause it<\/a> once you\u2019re done to save on costs.<\/p>\n<h2>Copying One Single Table<\/h2>\n<p>When you want to create a metadata driven framework, it\u2019s always a good idea to start easily. You create a pipeline that copies one single table and everything\u2019s hardcoded. If you try to jump directly into the parameterization and fetching of metadata, the solution might get too complex too fast and it\u2019s harder to debug. With a simple case, you can first validate if certain items work, like the connections to source and destination.<\/p>\n<p>First, make sure you have a Fabric-enabled workspace (either by having a capacity assigned to it or by using a trial). Next, we need some sample data to read from. In the article <a href=\"https:\/\/www.mssqltips.com\/sqlservertip\/7565\/adventureworks-database-installation-azure-sql-database\/?utm_source=KoenVerbeeck\">How to Install the AdventureWorks Sample Database in Azure SQL Database<\/a> it is explained how to get a small sample database in Azure. Of course, you can use any other source to experiment yourself. In this article, we\u2019re going to write the data to a warehouse. If you don\u2019t have one yet, you can familiarize yourself with this compute type in the article <a href=\"https:\/\/www.mssqltips.com\/sqlservertip\/7833\/microsoft-fabric-build-data-warehouse\/?utm_source=KoenVerbeeck\">What are Warehouses in Microsoft Fabric?<\/a>.<\/p>\n<p>Let\u2019s login into <a href=\"https:\/\/app.fabric.com\">Fabric<\/a> and switch to the <em>Data Factory<\/em> persona (bottom left corner of the screen):<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"303\" height=\"331\" class=\"wp-image-102061\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-1.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>This will take you to the Data Factory experience where you can choose to create a new pipeline:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"292\" height=\"218\" class=\"wp-image-102062\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-2.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Give the new pipeline a name and hit <strong>Create<\/strong>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"316\" height=\"246\" class=\"wp-image-102063\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-3.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>A new pipeline canvas will be opened, and four options will be presented to you: add a new pipeline activity, the Copy data wizard\/assistant, the option to load sample data or using a pre-defined template.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"894\" height=\"345\" class=\"wp-image-102064\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-4.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>There\u2019s a template to bulk copy from a database (or to import files), which seems promising for our metadata driven pipeline:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"510\" height=\"331\" class=\"wp-image-102065\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-5.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>It seems to be using the pattern we\u2019re going to implement in this article:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1063\" height=\"639\" class=\"wp-image-102066\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-6.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>However, only the following destination are supported by the template:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1110\" height=\"170\" class=\"wp-image-102067\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-7.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Since we want to write to a Fabric warehouse, we can\u2019t use it. Instead, let\u2019s start from scratch by using the first option (Pipeline Activity). We get a blank canvas where we can add a new pipeline activity to the pipeline: the Copy data activity. In the <em>Source<\/em> pane, we need to configure the source connection by either selecting an existing one, or creating a new one:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"723\" height=\"507\" class=\"wp-image-102068\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-8.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Create a new connection and choose Azure SQL DB as the connection type.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"751\" height=\"509\" class=\"wp-image-102069\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-9.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Once you\u2019ve verified the connection, you can select a table from the dropdown list. Later, we will parameterize this. In the <em>Destination<\/em> pane (which is called <em>Sink<\/em> in ADF), choose <strong>Workspace<\/strong> as the data store type and select your warehouse from the list:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"589\" height=\"403\" class=\"wp-image-102070\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-10.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>In the table option, choose the <strong>Auto create table<\/strong> option. To make sure we can run the pipeline multiple times, go to <strong>Advanced<\/strong> and enter the following SQL statement in the pre-copy script:<\/p>\n<pre class=\"lang:none theme:none\">IF OBJECT_ID(N'dbo.AW_Address', N'U') IS NOT NULL \u00a0\nBEGIN;\n\u00a0 \u00a0 DELETE FROM dbo.AW_Address;\nEND;<\/pre>\n<p>We\u2019re using a <code>DELETE<\/code> statement since <code>TRUNCATE TABLE<\/code> is not supported for the moment. If it at some point in time it is available in the Fabric Warehouse, <code>TRUNCATE<\/code> will be the preferred option.<\/p>\n<p>The pipeline will run the pre-copy script before it actually creates the table. If we don\u2019t check for table existence, we get the following error:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"457\" height=\"298\" class=\"wp-image-102071\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-error-description-auto.png\" alt=\"A screenshot of a computer error\n\nDescription automatically generated\" \/><\/p>\n<p>The <em>Mapping<\/em> pane stays empty, which means columns will be mapped using their name (and if that fails, by ordinal). A best practice that I\u2019d like to introduce is that the <strong>source and destination have the exact same column names<\/strong>, to ensure a smooth mapping process.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"620\" height=\"139\" class=\"wp-image-102072\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-11.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>In the <em>Settings<\/em> pane, make sure <strong>Enable staging<\/strong> is selected (otherwise the pipeline fails because it doesn\u2019t support writing directly to the warehouse):<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"527\" height=\"302\" class=\"wp-image-102073\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-12.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Once everything is configured, you can hit <strong>Run<\/strong> to test the pipeline. While the pipeline running, you can click on the Copy data activity name in the <em>output<\/em> pane to get a more detailed view of the performance:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"630\" class=\"wp-image-102074\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-13.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>We can verify the data has landed in the warehouse by running a simple <code>SELECT *<\/code> statement:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"773\" height=\"402\" class=\"wp-image-102075\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-14.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>Getting Metadata and Parameterizing the Pipeline<\/h2>\n<p>Once the pipeline works for a single table, it\u2019s time to put the metadata in action and to parameterize everything. The first step is to get the metadata of which tables we want to load. You can store this in a table, but most relational databases have the INFORMATION_SCHEMA tables, which allow you to query the metadata of the database. In the source database, we can run the following query:<\/p>\n<pre class=\"lang:none theme:none \">SELECT\n     SourceSchema       = TABLE_SCHEMA\n    ,SourceTable        = TABLE_NAME\n    ,DestinationTable   = TABLE_NAME\n    ,DeleteStatement      = \n    'IF OBJECT_ID(N''dbo.' + TABLE_NAME + ''', N''U'') IS ' \n        + 'NOT NULL'\n        + CHAR(13) + CHAR(10) \/* new line*\/\n        + 'BEGIN;'\n        + CHAR(13) + CHAR(10) \/* new line*\/\n        + '    DELETE FROM dbo.' + TABLE_NAME + ';'\n        + CHAR(13) + CHAR(10) \/* new line*\/\n        + 'END;'\nFROM INFORMATION_SCHEMA.TABLES\nWHERE   TABLE_TYPE = 'BASE TABLE'\n    AND TABLE_SCHEMA &lt;&gt; 'dbo';<\/pre>\n<p>This gives us everything we need to parameterize the Copy data activity. In the pipeline, add a <strong>Lookup activity<\/strong> to the canvas.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1281\" height=\"152\" class=\"wp-image-102076\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/word-image-102060-16.png\" \/><\/p>\n<p>In the General pane, give the activity a decent name and <strong>set a proper timeout<\/strong> (we don\u2019t want an activity hanging for hours, time is literally money with Fabric capacities).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"607\" height=\"473\" class=\"wp-image-102077\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-15.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>I also typically set the <strong>retry<\/strong> to at least one, because network failures do happen. In the <em>Settings<\/em> pane, set the <strong>data store type<\/strong> to <em>External<\/em> and reuse the source connection we created earlier.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"856\" height=\"547\" class=\"wp-image-102078\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-16.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Set the type to <strong>Query<\/strong> and copy paste the metadata query. Make sure <strong>First row only is not selected<\/strong>, otherwise the metadata driven pipeline is not going to do a lot of work. Hit <strong>Preview data<\/strong> to ensure everything works.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"769\" height=\"552\" class=\"wp-image-102079\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-17.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Once we have the metadata, we need to iterate over it. This means that for every row returned by this result set, we will execute the Copy activity. We can achieve this by using the <strong>ForEach activity<\/strong>, which can be found on the <strong>Activities<\/strong> tab.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"756\" height=\"127\" class=\"wp-image-102080\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-18.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Add a <code>ForEach<\/code> activity to the canvas, connect the Lookup to this activity using an <strong>On success<\/strong> constraint (the green arrow) and move the Copy activity to the inside of the ForEach. You can do this by cutting the activity, go to the edit pane of the <code>ForEach<\/code> and then paste it onto the new canvas of the <code>ForEach<\/code>. Finally, your canvas should look like this:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"630\" height=\"294\" class=\"wp-image-102081\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-19.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>We need to tell the <code>ForEach<\/code> over which items to iterate over, which would be the output of the Lookup. We can view what this output is like by running the pipeline. <a href=\"https:\/\/sqlkover.com\/fabulous-stuff-in-fabric-part-2-deactivating-activities-in-pipelines\/\">Deactivate the ForEach<\/a> (right-click and choose <strong>Deactivate<\/strong>) and hit <strong>Run<\/strong>. Once the pipeline is finished, you can click on the output to view the returned JSON.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"795\" height=\"273\" class=\"wp-image-102082\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-20.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>The output should look like this:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1080\" height=\"452\" class=\"wp-image-102083\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-close-up-of-a-computer-screen-description-autom.png\" alt=\"A close up of a computer screen\n\nDescription automatically generated\" \/><\/p>\n<p>We\u2019ve got a JSON containing an array named <em>value<\/em> of 10 items. In the settings pane of the <code>ForEach<\/code>, we\u2019ll reference this array in the <em>dynamic content<\/em> of the <strong>Items property<\/strong>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"880\" height=\"776\" class=\"wp-image-102084\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-21.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>You can use the expression builder to get the output of the Lookup. Just add <code>.value<\/code> at the end to retrieve the array. By default, the <code>ForEach<\/code> will run everything in parallel. You can control the number of parallel items by setting the <strong>Batch count<\/strong> property. If you don\u2019t want any parallelism, you can check <strong>Sequential<\/strong>. Make sure you have an ordered array to have predictable results.<\/p>\n<p>All that is left is to parameterize the <code>Copy<\/code> activity itself. In the <em>Source<\/em> pane, select <strong>Enter manually<\/strong> in the Table settings and then use the dynamic content to set the source schema and source table name.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"954\" height=\"409\" class=\"wp-image-102085\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-22.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>We can reference the current item of the iteration by using the <code>@item()<\/code> function, followed by the desired column name of the metadata result set. In the <em>Destination<\/em> pane, parameterize the <strong>destination table name<\/strong> and the <strong>Pre-copy script<\/strong> properties.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"961\" height=\"495\" class=\"wp-image-102086\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-23.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>I\u2019m prefixing the table name with <em>\u2018AW_\u2019<\/em>, which can be done with the <code>@concat<\/code> function. The entire expression:<\/p>\n<pre class=\"lang:none theme:none\">@concat('AW_',item().DestinationTable)<\/pre>\n<p>Like before, the column mapping stays empty. The configuration is now finished and we can finally test our metadata driven pipeline (don\u2019t forget to activate the <code>ForEach<\/code>). In the Output pane you can observe that 10 parallel copies have been started:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"964\" height=\"665\" class=\"wp-image-102087\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-24.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>In the Fabric warehouse, the 10 tables are now created and populated with data, all with one single pipeline.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"460\" height=\"524\" class=\"wp-image-102088\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-25.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>There\u2019s a downside to the approach we used though. More specifically, the auto-create table option is sometimes a bit \u201cgenerous\u201d when it comes to sizing string data types. We can view this by again querying the <code>INFORMATION_SCHEMA<\/code> tables:<\/p>\n<pre class=\"lang:none theme:none\">SELECT\n\u00a0 \u00a0 \u00a0TABLE_NAME\n\u00a0 \u00a0 ,COLUMN_NAME\n\u00a0 \u00a0 ,DATA_TYPE\n\u00a0 \u00a0 ,CHARACTER_MAXIMUM_LENGTH\nFROM INFORMATION_SCHEMA.COLUMNS\nWHERE TABLE_NAME = 'AW_Address'\nORDER BY ORDINAL_POSITION;<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1070\" height=\"376\" class=\"wp-image-102089\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/04\/a-screenshot-of-a-computer-description-automatica-26.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>If you don\u2019t mind that every string column is 8000 characters in length, you can leave everything as-is. If not, you can construct a <code>CREATE TABLE<\/code> statement dynamically by using metadata from the source system. This is left as an exercise for the reader.<\/p>\n<h2>Conclusion<\/h2>\n<p>In this article, we\u2019ve shown you how you can easily set up a metadata driven pipeline to import sources external to Microsoft Fabric. All you need is metadata which you can either enter manually in some repository, or you can retrieve it from the source system if possible (like using the <code>INFORMATION_SCHEMA<\/code> tables in this example).<\/p>\n<p>You can apply this pattern on most sources, such as relational databases, but also on files and REST APIs for example.<\/p>\n<p>By using the metadata driven approach, you can significantly save on development time, but it also improves consistency and maintainability of your data pipelines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Over the past years, \u201ctraditional\u201d ETL development has morphed into data engineering, which has a more disciplined software engineering approach. One of the benefits of having a more code-based approach in data pipelines is that it has become easier to build metadata driven pipelines. What does this mean exactly? Say for example you need to&#8230;&hellip;<\/p>\n","protected":false},"author":110905,"featured_media":103103,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143528,53,159164],"tags":[158997],"coauthors":[159081],"class_list":["post-102060","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bi-sql-server","category-featured","category-microsoft-fabric","tag-microsoft-fabric"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/102060","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/110905"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=102060"}],"version-history":[{"count":3,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/102060\/revisions"}],"predecessor-version":[{"id":103104,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/102060\/revisions\/103104"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/103103"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=102060"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=102060"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=102060"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=102060"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}