Database Projects for Microsoft Fabric Data Warehouse

Database Projects for Fabric Data Warehouse is one of the most recent news of Fabric for SDLC (Software Development Life Cycle). Microsoft released the Database Projects around 2010 and have they evolved since that year. The original name was a tool called SSDT, released as part of SQL Server, but running inside a Visual Studio Shell.

If you had Visual Studio already installed, it would run inside it, otherwise it would install a shell to run.

What is a Database Project

A database project provides more than way of use it, so we need to define it carefully:

  • From an existing database, we can generate the scripts of the existing database
  • On a new database project, we can create the schema of the database and deploy

The main point is the possibility to keep the script of the database under source control, including the database project in the SDLC (Software Development Lifecycle).

Two methods of Database Source Control

We can consider in the market there are two methods of database source control:

  • Incremental Scripts
  • Differential Scripts

Incremental Scripts

Using this method, we keep incremental scripts of every change made in the database. The scripts have a sequential order. The execution is always done in this order.

This method requires a reverse. We need a script to reverse the change as well, so we can advance or reverse in time.

The image below shows an example of Migration scripts created by Entity Framework. Each file contains two methods, one for implement the changes and another to reverse them.

Bringing Control Back to Database Deployment with ReadyRoll and Entity  Framework Code First Migrations - CodeProject

Using this method we extract the differential scripts and store them. We can use tools for this purpose or create the scripts manually (but I wouldn’t like to be the guy who does it manually).

If the scripts get mixed, lost, or anything similar, the updates will not work.

Example: Most ORM’s (Object Relational Mappings), such as Entity Framework or NHibernate, work in this way.

Differential Scripts – The Database Projects

This is the method used by Database Projects.

Why You Should Use a SSDT Database Project For Your Data Warehouse — SQL  Chick

Using this method, we keep the database schema scripts. The scripts to create the objects.

This method requires a process call schema compare: The database project can be compared with the online version. The tool creates a differential script during the compare process, executes it and the script is discarded.

There are many configuration rules in relational about how the tool will proceed in relation to the differential script, the most important are:

  • The scripts need to be transactional. In complex DDL scripts, this may not be so simple to build, but the tool manages it. The point is: Or the entire update happens, or nothing happens.
  • Data movement may be needed. Some changes of structure may require an object to be dropped and created again and the differential script needs to manage the data movement required for this.
  • Data loss could happen. For example, you could reduce the size of a field or remove it. All tools I know have by default an option to stop in case of data loss and let the DBA analyze and order to proceed or not on a case-by-case basis.

What database projects can do

Using Schema Compare instead of incremental scripts makes it easier to identify the changes from one version to another, comparing the script of the objects instead of analyzing what the incremental scripts are doing.

The schema compare feature brings the following possibilities:

  • Compare two different versions of a project to find the differences and apply the differences from one to another
  • Compare two different live versions of a database to find the differences and apply the differences from one to another
  • Compare a project with a live version. The update can happen from the project to the live version or from the live version to the project. Of course, the last one is not good SDLC practice, but it can happen.

Which one to choose

In my personal experience, the Incremental scripts are the preferred choice by developers, because it’s close to their ORM work.

However, it leaves the database structure tied with development code. Because of that, the Differential Scripts are usually the preferred choice from DBAs.

Considering developers work mostly in production systems while the analytics is left for data specialists, it’s natural the first option for Microsoft Fabric Data Warehouse appears in the form of a Differential Script – The database projects

Database Projects for Fabric Data Warehouse and Synapse Serverless

The focus is so much on Fabric the fact the database projects are supporting Synapse Serverless as well will not be noticed. This is a considerable improvement on the SDLC (Software Development Lifecycle) for Synapse Serverless environments.

Over the years, extensions were created for Visual Studio Code and Azure Data Studio to support the database projects on them as well.

Using a difficult to understand strategy, Microsoft teams often release features on the extensions for these tools – Visual Studio Code and Azure Data Studio – and much later the feature is extended to Visual Studio itself.

These extensions are called SQL Database Project and are still in preview.

A screenshot of a computer

Description automatically generated

Each software has an insider version. The Insider version is like a version released directly from GitHub to you. I found less bugs on these extensions when testing the insider’s version, although, in my opinion, they are still a promise.

You cand download Azure Data Studio Insiders Version from this link

Creating the Database Project for Microsoft Fabric Data Warehouse

My tests were done with Azure Data Studio insiders’ version. The extension is available on the regular version, but I faced strange error messages which are pointed by links on the web to be related to compatibility level.

It doesn’t make much sense, because although we can see, we have no control over the compatibility level in Microsoft Fabric Data Warehouse.

Once the extension is installed, we get a new left bar icon for the database projects.

A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated

The main options are:

Create a Project : This will be an empty project

Open a Project : What the name implies

Create a Project from Database: Create the project and import the schema from the database

Update a project from the database: This should have the effect of a schema compared with the project as a target. It doesn’t seem a needed option.

Create a SQL Project from an OPENAPI/Swagger spec: This one surprises me. For someone my age, it seems like doing things in reverse order. On what scenarios would you use this option?

When we choose the option to create a project, we can choose what kind of project we would like to create and here are the new features. Under SQL Server project, we can find a Azure Synapse Data Warehouse in Fabric project and a Synapse Serverless SQL Pool, two new types.

A screenshot of a computer

Description automatically generated

However, when you use the option Create a Project from Database, it asks immediately for the connection, not offering a list of types of projects.

A screenshot of a computer

Description automatically generated

 

Default Option when No Project Type Selected

During my tests, sometimes it creates an Azure Synapse Serverless SQL Project by default. This made me edit the “.sqlproj” file on the notepad and discover the codename used for both type of projects, so I can change from one to the other one needed to solve bugs:

Microsoft.Data.Tools.Schema.Sql.SqlDwUnifiedDatabaseSchemaProvider : Fabric Data Warehouse

Microsoft.Data.Tools.Schema.Sql.SqlServerlessDatabaseSchemaProvider : Synapse Serverless Project

A screenshot of a computer code

Description automatically generated

On the final version of Azure Data Studio, I faced some instable bugs. After some tests, only one persisted, during the schema compare:

A white background with black text

Description automatically generated

A google research mentions this with some relation with compatibility level, but although we can see compatibility level on Fabric, we can’t change it.

After all the instability, I gave up and went directly to the Insiders version.

Managing Connection with Fabric Data Warehouse

We need to go to Power BI and use the option Copy Connection String on the Data Warehouse.

A screenshot of a computer

Description automatically generated

Every developer will recognize this is not a good choice of names. Connection String has a different meaning for developers, while in Power BI it contains only the server’s name, not the entire connection string.

The mismatch starts on Azure Data Studio by itself. When creating the connection, we can choose to input parameters one by one or the entire connection string. We need to choose one by one, because the connection string copied from Power BI is not a connection string, but only the server’s name, captche?

A screenshot of a computer

Description automatically generated

Active Directory was very recently renamed to Microsoft Entra. I found some inconsistency between the usage of Active Directory and Entra ID on the dropdown Authentication Type. Both options mean the same thing.

If I’m not mistaken, Synapse Serverless projects are using Entra ID, while Fabric Data Warehouse is using Active Directory. Or something similar.

When you make a login in active directory (I will take a long time to get used to Entra. Does this have a meaning in English? Did anyone research in other languages?) you get a token. Azure Data Studio is making a reasonable service storing this token for you. Sometimes it complains you have duplicated tokens, and if it happens, you need to close it and open it again. But it could be worse, way worse.

A screenshot of a computer error

Description automatically generated

Azure Data Studio gives us the option to save the connection in a folder. This is a good way to organize our server connections.

A screenshot of a computer

Description automatically generated

I don’t understand why it ignores the name we chose for the connection and saves it outside the folder we request, leaving the folder empty.

A screen shot of a computer

Description automatically generated

 

Importing the Database Schema

The connection is made to the server, it doesn’t contain a database name. Once the connection is established, the database names on the same server are retrieved and listed for us.

Some surprises appeared at this point:

  • A list of several databases was displayed
  • They are all databases in the same workspace
  • There are lakehouses and Data warehouses mixed

We can easily conclude there is a different server for each workspace.

A screenshot of a computer

Description automatically generated

The last one was the biggest surprise. Is there a plan to make Database Projects work with lakehouses SQL Endpoints?

Warehouse Importing Result

Each object from the database is created in a different .SQL file using a folder structure. The files contain the simple .SQL script to create the file, nothing incremental or more complex. This makes it easy to make version comparisons in source control, identify exactly which objects were changed.

A screenshot of a computer

Description automatically generated

We can make changes to the structure offline. For example, we can create a new table. In the example below, I’m creating a new table called TestTable.

A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated

I will also change the size of one of the fact_sale table fields

Comparison and Comparison results

We start the schema compare using a right-click over the project and choosing Schema Compare menu item.

A screenshot of a computer

Description automatically generated

The project will already be filled as source, and the target will be empty.

The choice of the target connection for the schema compare use similar windows as when we created the project from the database, no news at this point.

The most common scenario is the source being the project and target being the server. However, on the top button bar you have an option to reverse the order, what will invert the comparison and application of changes.

The comparison is also triggered by a button on the top bar.

It takes some minutes to be completed and it will point out the differences on each file. The column Action tell us if the file was changed or if it’s a new file.

A screenshot of a checklist

Description automatically generated

A small bug is causing all tables to be made as difference. During the import, some special character is included in the end of the files, and this is pointed out as a difference. The image below shows the difference pointed in empty rows when there is not a real difference.

A white rectangular object with a black border

Description automatically generated

We can also see the difference of the fact_sale table highlighted for us:

A screenshot of a computer

Description automatically generated

But the real differences will be pointed out as well, making it easy to decide what to update or not. The update application can be selective, you can decide what to update on the target side by checking or unchecking the column Include

Applying the Results

Unfortunately, when applying changes, we receive a mix of two messages: “Not supported” and an error about the roles of the user. It’s not clear if there is already a work around or if we will need to way a future version. The message also includes a mention to “Alter Table”, but when we are only adding tables the message continues to be the same.

Summary

This is a very promising feature. It seems we will have it available for us in a bunch of weeks, and this will make the SDLC with Microsoft Fabric easier.