{"id":104727,"date":"2024-12-10T17:02:32","date_gmt":"2024-12-10T17:02:32","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=104727"},"modified":"2024-12-17T19:10:51","modified_gmt":"2024-12-17T19:10:51","slug":"a-gentle-introduction-to-dbt","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/business-intelligence\/general-analytics\/a-gentle-introduction-to-dbt\/","title":{"rendered":"A gentle introduction to dbt"},"content":{"rendered":"\n<p>In a typical data warehouse project, data is loaded, transformed and stored into a data store using an ETL (extract \u2013 transform \u2013 load) process. Recently, there has been a shift to ELT processing, where data is first loaded into a data store (this can be a database, but also a data lake or something similar) and then transformed in-place.<\/p>\n\n\n\n<p>Over the past years, the tool <a href=\"https:\/\/www.getdbt.com\/\">dbt<\/a> \u2013 short for <strong>data build tool<\/strong> \u2013 has become quite popular in the data engineering world for handling such an ELT process. dbt takes on the role of the \u201cT\u201d, meaning it\u2019s responsible for transforming the data in a certain data store. dbt is not meant for transferring data from one store to another, you\u2019ll need another tool for this, such as Azure Data Factory for example.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.getdbt.com\/\"><img loading=\"lazy\" decoding=\"async\" width=\"248\" height=\"90\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104727-1.png\" alt=\"\" class=\"wp-image-104728\"\/><\/a><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>dbt is heavily centered around the SQL language and it aims to make the SQL development in an analytics project more efficient, more reusable and in the end better tested and documented. dbt calls people working with their product \u201canalytics engineers\u201d. In this article, you\u2019ll get an introduction to the dbt tool. It will by no means be a complete overview, but you\u2019ll get a good grip of what this tool is supposed to do.<\/p>\n\n\n\n<p><strong>Disclaimer<\/strong>: this article is not endorsed by dbt, nor is the author affiliated in any way with dbt. The author writes from personal experience from having used the tool in data projects.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-get-dbt\">How to get dbt<\/h2>\n\n\n\n<p>dbt comes in two flavors: an open-source and free-to-use, command-line version called <a href=\"https:\/\/docs.getdbt.com\/docs\/core\/installation-overview\">dbt core<\/a>, and a SaaS cloud version called <a href=\"https:\/\/www.getdbt.com\/product\/dbt-cloud\">dbt cloud<\/a>. The cloud version comes with more bells and whistles but has <a href=\"https:\/\/www.getdbt.com\/pricing\">paid plans<\/a>. However, there\u2019s a free developer license you can use to try the product out.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1116\" height=\"312\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-13.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104729\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>You can find a comparison between the dbt core and cloud versions on <a href=\"https:\/\/www.getdbt.com\/product\/dbt-core-vs-dbt-cloud\">here<\/a>. In this article, I\u2019m using the free dbt cloud license, as it\u2019s easier to use and explain with the user interface.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-create-your-first-project\">Create your first project<\/h2>\n\n\n\n<p>When you log in for the first time after you\u2019ve created your account, you will be asked to complete the project setup.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1042\" height=\"706\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104727-3.png\" alt=\"\" class=\"wp-image-104730\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Let\u2019s create a connection to a database. When you want to create a new connection from the dropdown, you will be taken to a new tab where you can define the connection properties. Originally, dbt only supported a couple of databases out-of-the-box, such as Snowflake, Databricks and PostgreSQL, but nowadays there\u2019s a wide range of databases to choose from:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"839\" height=\"465\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-14.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104731\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>It\u2019s possible to connect other data stores as well (MySQL for example), if there\u2019s an adaptor available. You can find those in the <a href=\"https:\/\/docs.getdbt.com\/docs\/community-adapters\">community adaptors<\/a>. Let\u2019s create a connection to Microsoft Fabric. Since dbt is a SQL-based tool which will create tables and views, you are implicitly restricted to the warehouse. You\u2019ll be asked to specify the server name, which you can find the warehouse settings, and the database name. You can leave the default port as is.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"875\" height=\"494\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-15.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104732\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Once the connection is saved, you can select it in the previous screen:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"688\" height=\"241\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-16.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104733\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>In the next screen, you need to specify the credentials that will be used to authenticate against the Fabric warehouse. You can choose between an Active Directory (Entra ID) account with password (you might want to test out how this works with your Multi-Factor Authentication setup), or a service principal.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"823\" height=\"335\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-17.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104734\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Personally, I went for the user with password. I also changed the schema to the default <code>dbo <\/code>schema.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"413\" height=\"465\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-login-screen-description-automa.png\" alt=\"A screenshot of a login screen\n\nDescription automatically generated\" class=\"wp-image-104735\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The schema you specify here will be used to contain the objects created by dbt. This might be useful when you work with multiple developers on the same development database. Every developer can specify their own development credentials with their own specific schema. That way, each developer can work on their own schema in the database. Don\u2019t forget to test the connection at the end!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"844\" height=\"220\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screen-shot-of-a-computer-description-automatic-1.png\" alt=\"A screen shot of a computer\n\nDescription automatically generated\" class=\"wp-image-104736\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Optionally, you can configure source control for the project. You can connect to one of the git providers or use a <a href=\"https:\/\/docs.getdbt.com\/docs\/collaborate\/git\/managed-repository\">managed repository<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1327\" height=\"472\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-chat-description-automatically.png\" alt=\"A screenshot of a chat\n\nDescription automatically generated\" class=\"wp-image-104737\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>And the project is now configured! Now we can get our hands dirty with some SQL. <br><img loading=\"lazy\" decoding=\"async\" width=\"971\" height=\"846\" class=\"wp-image-104738\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-18.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-create-your-first-model\">Create your first model<\/h2>\n\n\n\n<p>In dbt terminology, a <a href=\"https:\/\/docs.getdbt.com\/docs\/build\/models\">model<\/a> is typically a SELECT statement. Depending on the properties of the model, the result of that SELECT statement will either be a view, table or ephemeral. The last option basically means the model will not be persisted in some way, but rather will be used as a common table expression (CTE) when referenced by another model.<\/p>\n\n\n\n<p>Let\u2019s dive into the code. Go to the IDE to get started.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"839\" height=\"368\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screen-shot-of-a-computer-description-automatic-2.png\" alt=\"A screen shot of a computer\n\nDescription automatically generated\" class=\"wp-image-104739\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>You will be asked to initialize the project.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"332\" height=\"142\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104727-13.png\" alt=\"\" class=\"wp-image-104740\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This will create standard configuration files (such as the <code>dbt_project.yml<\/code> file) and a default folder structure.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"326\" height=\"550\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-19.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104741\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>In the changes section of the version control, you get an overview of all the changes tracked by the git repository. In the file explorer you can find the project structure, along with their current git status. By hitting <strong>Commit and sync<\/strong>, we can commit all the changes to the git repo. Don\u2019t forget to supply a meaningful commit message.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"487\" height=\"196\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104727-15.png\" alt=\"\" class=\"wp-image-104742\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Once the changes are committed, the version control panel will allow you to create a new branch or to change a branch, refresh the git state (useful if you work with multiple developers on the same project), or revert any changes.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"327\" height=\"233\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104727-16.png\" alt=\"\" class=\"wp-image-104743\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>In the default file structure, we can find some predefined folders, such as:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li><strong>macros<\/strong>: this will hold custom developed macros, which can be reused in the different models. Similar to user-defined functions in SQL Server, but with a sauce of dynamic SQL.<\/li>\n\n\n\n<li><strong>models<\/strong>: here you will find all the models of the project, which is the core of dbt. You can create subfolders to organize your project. After initiating a project, you can find two example models, along with a schema definition, in the <em>example<\/em> folder.<\/li>\n\n\n\n<li><strong>seeds<\/strong>: small reference datasets which you can upload to dbt.<\/li>\n\n\n\n<li><strong>tests<\/strong>: custom developed tests that can be used across models.<\/li>\n<\/ul>\n<\/div>\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"322\" height=\"349\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104727-17.png\" alt=\"\" class=\"wp-image-104744\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Let\u2019s take a look at those sample models. The first one, <code>my_first_dbt_model.sql<\/code>, simply creates two records of data.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"591\" height=\"467\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-program-description-au-1.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" class=\"wp-image-104745\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The configuration line specifies how the model will be materialized. If you want the model to be created as a view, you can change it:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">{{ config(materialized='view')}}<\/pre>\n\n\n\n<p>The second model, <code>my_second_dbt_model.sql<\/code>, simply reads from the first model and applies a filter with a <code>WHERE<\/code> clause.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"488\" height=\"193\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-program-description-au-2.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" class=\"wp-image-104746\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>In the <code>FROM<\/code> clause, you can see the actual table name is not used. Instead, a <strong>reference<\/strong> to the fist model is created \u2013 using the model name \u2013 with <a href=\"https:\/\/docs.getdbt.com\/docs\/build\/jinja-macros\">Jinja<\/a>, a Python-esque templating language. This is one of the keys strengths of dbt. Instead of hard-coding object names, references are used instead, which allow dbt to construct a data lineage graph:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1091\" height=\"261\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104727-20.png\" alt=\"\" class=\"wp-image-104747\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Okay, it doesn\u2019t seem like much when you only have two models, but for larger data warehouse projects it certainly is useful. Suppose a source column changes data type. With the lineage view you can easily determine which models upstream are possibly impacted by this change.<\/p>\n\n\n\n<p>When you schedule <a href=\"https:\/\/docs.getdbt.com\/docs\/deploy\/jobs\">jobs<\/a> in dbt (a dbt cloud feature) to execute the models, dbt can use the lineage to construct a DAG (directed acyclic graph). This DAG can be used to execute the models efficiently. By knowing which dependencies exist, parallelization can be optimized. Or, for example, when you want to reload a single fact table, dbt can determine which dimensions must be loaded first.<\/p>\n\n\n\n<p>There are some other files in the project that are important as well. The <code><strong>schema.yml<\/strong><\/code> in the example folder contains a schema definition for the two sample models.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"633\" height=\"536\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-program-description-au-3.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" class=\"wp-image-104748\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>For each model, you can specify a description and its corresponding columns. A column can have a description as well, and you can associate \u201ctests\u201d with it. All these descriptions can be useful when you want to generate documentation. In this example, each column has two tests: a <code>unique<\/code> and a <code>not_null<\/code> test. These are built-in tests, and in an RDBMS like SQL Server they would be implemented with constraints, such as the unique constraint (which is a unique index) and a <code>NOT NULL<\/code> constraint.<\/p>\n\n\n\n<p>However, not all databases support those constraints. For example, the Fabric warehouse doesn\u2019t have a unique constraint (it does \u201csupport\u201d primary key and foreign keys, but they are not enforced. See <a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/data-warehouse\/table-constraints\">the documentation<\/a> for more information.). You can also create your own custom tests which are more complex than the built-in ones. More on that in a later article. Keep in mind that the language specification for the schema file is <a href=\"https:\/\/en.wikipedia.org\/wiki\/YAML\">YAML<\/a>, so indentation matters.<\/p>\n\n\n\n<p>Another file is <code>dbt_project.yml<\/code>. It contains project-specific configurations and most of the time you don\u2019t need to make many changes to this file. However, there\u2019s an interesting feature where you can force certain properties as default for a folder. We find this at the end of the file:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"988\" height=\"309\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-program-description-au-4.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" class=\"wp-image-104749\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>By specifying the following line, we can force the materialization of \u2018table\u2019 for all the models within that subfolder:<\/p>\n\n\n\n<p><strong>+materialized: table<\/strong><\/p>\n\n\n\n<p>You can still overrule this specification by explicitly setting the materialization in the model itself.<\/p>\n\n\n\n<p>Now it\u2019s time to actually build the models. Let\u2019s go to the second model <code>my_second_dbt_model.sql<\/code>. At the bottom, you can select <strong>compile<\/strong>. This will replace all of the Jinja code with the actual SQL code, so you can see what the final SQL will look like:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"570\" height=\"391\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-program-description-au-5.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" class=\"wp-image-104750\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>When you hit <strong>Build<\/strong>, it will try to create the table in the Fabric warehouse, which will fail because the first model doesn\u2019t exist yet.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"903\" height=\"215\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/word-image-104727-24.png\" alt=\"\" class=\"wp-image-104751\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>But we can click on the arrow right next to Build to get all the different Build options:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"266\" height=\"612\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-program-description-au-6.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" class=\"wp-image-104752\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>When we choose <strong>Build +model (Upstream)<\/strong>, dbt will try to create the table, but also all the other tables it might be dependent on. You can also choose to create all tables up- and downstream.<\/p>\n\n\n\n<p>We still get an error though, but this time because dbt tries to create the table, but it fails to do so because the <code>not_null<\/code> test fails.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1807\" height=\"615\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-chat-description-automatically-1.png\" alt=\"A screenshot of a chat\n\nDescription automatically generated\" class=\"wp-image-104753\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>In the left-hand pane you can also see the actual dbt cli commands that were used. In the system logs, you can find all the actual SQL statements dbt issued against the database.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1151\" height=\"554\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-program-description-au-7.png\" alt=\"A screenshot of a computer program\n\nDescription automatically generated\" class=\"wp-image-104754\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>You can also see in the log that dbt successfully created the first table, populated it with data (since tables are created with <a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/data-warehouse\/ingest-data-tsql#creating-a-new-table-with-the-result-of-a-query-by-using-create-table-as-select-ctas\">CTAS statements<\/a>)<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1174\" height=\"192\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-chat-description-automatically-2.png\" alt=\"A screenshot of a chat\n\nDescription automatically generated\" class=\"wp-image-104755\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>But since a test failed, everything was rolled back. Let\u2019s change the first model to remove the <code>NULL<\/code> value:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"741\" height=\"555\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-20.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104756\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Don\u2019t forget to save the model and build everything again. This time, all the tables will be created, and the tests will succeed.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"536\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-21.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104757\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>It\u2019s interesting to see how dbt actually does the tests. It will run a SQL statement and when it returns one or more rows, the test will fail. In the case of the unique test, it will do a group by on the unique field and return all rows with <code>HAVING COUNT(*)&gt;1<\/code>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"518\" height=\"480\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-code-description-autom.png\" alt=\"A screenshot of a computer code\n\nDescription automatically generated\" class=\"wp-image-104758\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>When we look at the Fabric warehouse, we can see the two new tables:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"316\" height=\"283\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/11\/a-screenshot-of-a-computer-description-automatica-22.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" class=\"wp-image-104759\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion\">Conclusion<\/h2>\n\n\n\n<p>In this article, we give you a short introduction to the dbt tool. This software, which is available as an open-source CLI tool or as a Freemium SaaS cloud service, allows you to do the \u201cT\u201d of \u201cELT\u201d using a flair of dynamic SQL. It comes with tons of extra features (not all of them were shown in this article for sake of brevity), such as automated testing, git integration, documentation, data lineage, macros for code reuse, and efficient job scheduling using DAGS. You can also extend the tool with <a href=\"https:\/\/hub.getdbt.com\/\">packages<\/a> which can contain more functionality, additional connectivity or pre-defined macros. It\u2019s a bit like how extensions work in your browser, or plugins in Visual Studio.<\/p>\n\n\n\n<p>However, there are some downsides as well. It\u2019s an extra tool in your stack and if you have a larger development team you\u2019ll need to pay for licenses (unless you use the dbt core version). It also means that you need to switch contexts. For example, if you build your pipelines in Fabric, then write your SQL in dbt and then the reports in Power BI, it\u2019s no longer in one centralized environment. It boosts the power of SQL, but since you\u2019re a bit stuck with the CTE and CTAS structure, (very) complex SQL might not be as easily translated to dbt.<\/p>\n\n\n\n<p>For example, I had some issues with a model in a project where I needed to load the table incrementally, but I had to do some checks on the table itself first. This works fine as long as it\u2019s one SQL statement, since you can refer to a model itself (which might not exist yet at compile time) with the Jinja <code>{{this}}<\/code>. But since the SQL itself was very complex, I needed to split it up in multiple statements and then \u201ccircular\u201d logic broke down.<\/p>\n\n\n\n<p>People with quite some SQL experience might get the feeling that they don\u2019t need this tool, they can write the SQL themselves just fine. But dbt comes with extra features as mentioned before, and it can boost the productivity of your development team.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a typical data warehouse project, data is loaded, transformed and stored into a data store using an ETL (extract \u2013 transform \u2013 load) process. Recently, there has been a shift to ELT processing, where data is first loaded into a data store (this can be a database, but also a data lake or something&#8230;&hellip;<\/p>\n","protected":false},"author":110905,"featured_media":104760,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,159165],"tags":[159230,4824],"coauthors":[159081],"class_list":["post-104727","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-featured","category-general-analytics","tag-dbt","tag-etl"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104727","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/110905"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=104727"}],"version-history":[{"count":3,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104727\/revisions"}],"predecessor-version":[{"id":104876,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/104727\/revisions\/104876"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/104760"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=104727"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=104727"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=104727"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=104727"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}