A Fabric Pipeline uses JSON as source code. They are also saved in repositories as JSON. We first idea we get is editing the pipeline in JSON format. We can copy the JSON and create new pipelines with small variations, making changes directly on the JSON. However, at first sight we get disappointed, because the … Read more
Over the past years, “traditional” ETL development has morphed into data engineering, which has a more disciplined software engineering approach. One of the benefits of having a more code-based approach in data pipelines is that it has become easier to build metadata driven pipelines. What does this mean exactly? Say for example you need to … Read more
PySpark is a powerful language for data manipulation and it’s full of tricks. Let’s discover some of them. Control the Type of a NULL column If you are creating a pysspark dataframe, but one of the columns contains only null values (None), how could you control the type of the column? There is an interesting … Read more
When implementing real-time ingestion, we usually implement an architecture called lambda. Using the lambda architecture, KustoDB in Microsoft Fabric is always recommended for the speed layer. Do you know why? Let’s analyze in detail. 1 – KustoDB uses SSD KustoDB uses an internal SSD storage. Lakehouses use ADLS as their backend. In this way, Kusto … Read more
I have been talking about Data Exploration in Power BI on many of my sessions, specially the sessions about Data Marts. The new data exploration feature is one more feature on this expanding scenario for data exploration. This one brings some interesting details. We start using this feature from a query. The feature will allow … Read more
Finally, mirroring is available for Fabric! You can mirror an Azure SQL to Fabric. It works for CosmoDB and Snowflake as well, but in this article, I will focus on Azure SQL. It is 100%, no, but it is definitely a feature that is really great even now. Before getting into a step-by-step of the … Read more
Let’s consider a simple statement for partitioning and save a table in a lakehouse: df.write.mode("overwrite").format("delta").partitionBy("Year","Month","Day").save("Tables/" + table_name) Let’s consider we load the data daily, with all the transactions from the day. The table will save the transactions for each day in different partitions. We can expect the table to keep the partitions from previous day, … Read more
On the blog Fabric Notebook and Deployment Pipelines I explained a technique to keep notebooks configuration values in JSON files on lakehouses, a good solution from many different points of views. What if we need to provide maintenance to the JSON configuration file using notebooks? The first problem is the fact the typical statement to … Read more
Dataflows Gen 2 are the new version of Power BI dataflows. There are so many changes in relation to the previous version they are considered a new feature. The main difference is the possibility to set a target for the result of each query in the dataflow. In this way, it can be used as … Read more
On my article about Source Control with GIT, Power BI and Microsoft Fabric, I illustrate how to use the PBIP file format to include Power BI reports and semantic models in a source control process and stablish a SDLC (Software Development Lifecycle) for Power BI. However, the complete explanation is based on saving the development using … Read more
When organizing our SDLC (Software Development Lifecycle) in Power BI/Fabric, we use Deployment Pipelines and create rules to change connection configurations every time we promote an object from one environment (dev for example) to another (test, for example). Kusto connections, on the other hand, are not so simple. You can check more about Deployment Pipelines … Read more
Eventstream has many differences in relation to the technologies it proposes to replace. Event Hub, Stream Analytics, Streaming Dataflows and more. We can compare these technologies, but EventStream in Microsoft Fabric has some specific differences from all of them. One of the differences is how the transformation of the input data is linked to the … Read more
Power BI and Fabric are implementing source control support. It’s a long-awaited feature for Power BI. However, it’s important to highlight some basic principles which should be followed as source control best practices. Some of them apply to any project in source control, some are specific for this environment, and some are specific for this … Read more
Nikola Ilic, best known as Data Mozart, published a great article and video about how to make semantic model data available in Microsoft Fabric. This allows the data to be used in lakehouses or data warehouses. One major question that arises is, “should we use a top-down or bottom-up (or both) approach in Microsoft Fabric?” … Read more
Recently Azure Resource Graph was announced as a new connector in Power BI. Azure Resource Graph provides access to almost all resources inside the azure environment of a company. Why is this important? Resource Graph by itself is a very important tool to analyze the provisioned resources on Azure environment without lose the control of … Read more
We can say Fabric is the evolution of the Power BI environment. Power BI is a self-service environment, and so is Fabric. This allows the implementation of very interesting architectures, which will be the subject of future videos and articles. However, it’s not something free-and-easy, and it shouldn’t be. Using Fabric Admin Portal (or Power … Read more
The animation on the top of this article tries to track the evolution of the enterprise architecture since SQL Server 7.0 introduced tools for ETL, Semantic model and much more. Some of you probably remember these tools as SSIS and SSAS. At that time they had even older names, but no one wants to confess remembering … Read more
I have published videos and articles before about Lakehouse maintenance. In this article I want to address a missing point for a lot of Fabric administrators: How to do maintenance on multiple lakehouses that are located in different workspaces. One of the videos I have published explains the maintenance of multiple lakehouses, but only addresses … Read more
On my article about Fabric source control extended features, I explained how Microsoft included the notebooks on the source control. In this way we can include notebooks on a Software Development Lifecycle (SDLC) for Power BI objects. In this way, the notebooks need to flow from the development environment to test and production environments. However, … Read more
The source control features in Microsoft Fabric are evolving every day. The PBIP feature included in Power BI allowed us to include source control on a SDLC process for Power BI, supporting reports and datasets linked directly from the portal to a repository. The New Source Control Features Recently, without much news, Microsoft extended the … Read more