Microsoft Fabric: Data Warehouse Ingestion using T-SQL and more

data warehouse
Comments 0

Share to social media

.In this blog I will illustrate how we can ingest data from a blob storage to a Microsoft Fabric Data Warehouse using T-SQL.

Create a new workspace

  1. On the left ribbon, click Workspaces.
  2. Click the New Workspace button.

Interface gráfica do usuário, Aplicativo

Descrição gerada automaticamente

  1. Provide a name for the workspace.

Tabela

Descrição gerada automaticamente

  1. On advanced configuration, ensure you choose Premium by Capacity, and you need to select a capacity.

Interface gráfica do usuário, Texto, Aplicativo, Email

Descrição gerada automaticamente

  1. Click the Apply button.

Interface gráfica do usuário

Descrição gerada automaticamente com confiança média

Creating the Microsoft Fabric Data Warehouse

  1. On the left ribbon, Experiences button, click the Data Warehouse experience.

Interface gráfica do usuário, Aplicativo

Descrição gerada automaticamente

  1. Click the Warehouse button to create a new WarehouseInterface gráfica do usuário, Texto, Aplicativo

Descrição gerada automaticamente
  2. Define the name wwiSample for the Data Warehouse and click Create.

Interface gráfica do usuário, Texto, Aplicativo

Descrição gerada automaticamente

Interface gráfica do usuário, Texto, Aplicativo, Email

Descrição gerada automaticamente

Creating the schema to import the data

  1. Click the New SQL query button.

 –dimension_city
DROP TABLEIF EXISTS [dbo].[dimension_city];

CREATE TABLE [dbo].[dimension_city]
(
[citykey] [INT] NULL,
[wwicityid] [INT] NULL,
[city] [VARCHAR](8000) NULL,
[stateprovince] [VARCHAR](8000) NULL,
[country] [VARCHAR](8000) NULL,
[continent] [VARCHAR](8000) NULL,
[salesterritory] [VARCHAR](8000) NULL,
[region] [VARCHAR](8000) NULL,
[subregion] [VARCHAR](8000) NULL,
[location] [VARCHAR](8000) NULL,
[latestrecordedpopulation] [BIGINT] NULL,
[validfrom] [DATETIME2](6) NULL,
[validto] [DATETIME2](6) NULL,
[lineagekey] [INT] NULL
);

–fact_sale

DROP TABLEIF EXISTS [dbo].[fact_sale];

CREATE TABLE [dbo].[fact_sale]
(
[salekey] [BIGINT] NULL,
[citykey] [INT] NULL,
[customerkey] [INT] NULL,
[billtocustomerkey] [INT] NULL,
[stockitemkey] [INT] NULL,
[invoicedatekey] [DATETIME2](6) NULL,
[deliverydatekey] [DATETIME2](6) NULL,
[salespersonkey] [INT] NULL,
[wwiinvoiceid] [INT] NULL,
[description] [VARCHAR](8000) NULL,
[package] [VARCHAR](8000) NULL,
[quantity] [INT] NULL,
[unitprice] [DECIMAL](18, 2) NULL,
[taxrate] [DECIMAL](18, 3) NULL,
[totalexcludingtax] [DECIMAL](29, 2) NULL,
[taxamount] [DECIMAL](38, 6) NULL,
[profit] [DECIMAL](18, 2) NULL,
[totalincludingtax] [DECIMAL](38, 6) NULL,
[totaldryitems] [INT] NULL,
[totalchilleritems] [INT] NULL,
[lineagekey] [INT] NULL,
[month] [INT] NULL,
[year] [INT] NULL,
[quarter] [INT] NULL
);

  1. Click the Run button.

The Data Warehouse has full DDL support to prepare tables for data ingestion or create table aggregations.

  1. On the Explorer, open Schemas->dbo->Tables
  2. Click the three dots on the right and select Refresh.

Interface gráfica do usuário, Texto, Aplicativo, Word

Descrição gerada automaticamente

Interface gráfica do usuário

Descrição gerada automaticamente

  1. Double click the query in the top of the window and give it a new name, Create Tables

Interface gráfica do usuário, Texto, Aplicativo

Descrição gerada automaticamente

  1. You will be able to see your query on the left side, in the Explorer.

Interface gráfica do usuário, Aplicativo

Descrição gerada automaticamente com confiança média

The Queries area in the explorer is like the queries in datamarts, but in the Data Warehouse we have the Shared Queries, allowing us to create a distinction between our personal queries and queries shared with other users.

You can move a query to Shared Queries by clicking the 3 dots and selecting the option Move to Shared Queries

Interface gráfica do usuário, Aplicativo

Descrição gerada automaticamente

Ingesting the data

We can use the COPY statement to ingest the data from an Azure Blog Storage to the data warehouse.

This is the COPY statement syntax:

Texto

Descrição gerada automaticamente

The copy statement is capable of:

  • Load data from Blob Storage or ADLS Gen 2
  • Load Parquet or CSV data
  • Connect to secure resources using Credential

Many features documented for the COPY INTO statement for Synapse also work with Synapse Data Warehouse inside Microsoft Fabric. You can discover more about COPY INTO here

  1. Click the button New SQL Query
  2. Paste the following code in the new query window:
COPY INTO [dbo].[dimension_city] FROM ‘https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorldImportersDW/tables/dimension_city.parquet’ 
WITH (file_type = ‘PARQUET’);
COPY INTO [dbo].[fact_sale] FROM ‘https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorldImportersDW/tables/fact_sale.parquet’
 WITH (file_type = ‘PARQUET’);

This storage is a public storage provided by Microsoft

  1. Click the Run button.

Interface gráfica do usuário, Texto, Aplicativo, Email

Descrição gerada automaticamente

The result is 50 million of records loaded in 1min and 26 seconds. Not bad.

  1. Rename the query to Loading Data

Building the Model

The Data Warehouse allows us to create a model which will be shared by all datasets created from the data warehouse. This is similar with the creation of a model in a Power BI Datamart.

  1. On the tabs under the explorer, click the Model tab.

  1. Link the fact_table with the dimension_city by CitKey field, dragging and dropping the field from the fact_table to the dimension city.

Interface gráfica do usuário

Descrição gerada automaticamente

  1. The Create Relationship window will be, in general, correct, you only need to click the Confirm button.

Interface gráfica do usuário, Texto, Aplicativo, Email

Descrição gerada automaticamente

Uma imagem contendo Diagrama

Descrição gerada automaticamente

Creating a Report

The data warehouse is completely integrated with Power BI. All the data is stored in OneLake, as explained in the article https://www.red-gate.com/simple-talk/cloud/azure/data-intelligence-on-light-speed-microsoft-fabric/

Power BI reports access the data in the OneLake using a technology called Direct Lake, which is faster than usual technologies to connect to SQL or Datalakes, either using Import or Direct Query.

  1. Click the button New Report on the top of the window.

Uma imagem contendo Logotipo

Descrição gerada automaticamente

The data window on the right side of the screen brings the tables with the relationship already stablished according to the model.

Interface gráfica do usuário, Aplicativo

Descrição gerada automaticamente com confiança média

  1. On the visualization pane, select Arcgis Maps for Power BI

Interface gráfica do usuário, Aplicativo, Word

Descrição gerada automaticamente

  1. Drag the StateProvince field from the dimension_city to the Location box in the Visualizations bar.

Interface gráfica do usuário, Aplicativo

Descrição gerada automaticamente

  1. Move the Profit field from the fact_sale to the Size box in the visualization bar.

Interface gráfica do usuário, Tabela

Descrição gerada automaticamente

  1. Resize the Arcgis visual to fill the report area.

Mapa

Descrição gerada automaticamente

  1. Click the File -> Save menu.

Interface gráfica do usuário, Aplicativo

Descrição gerada automaticamente

  1. Type the name Profit by Region and click the Save button.

Interface gráfica do usuário, Texto, Aplicativo

Descrição gerada automaticamente

  1. On the left bar, click DataWarehouseSimple to return to a Workspace view.

Interface gráfica do usuário, Texto, Aplicativo, chat ou mensagem de texto

Descrição gerada automaticamente

Besides the data warehouse and the report, you created, the model created in the data warehouse was saved as a dataset.

Interface gráfica do usuário, Texto, Aplicativo, Email

Descrição gerada automaticamente

Conclusion

On this blog, you found a walkthrough to create a data warehouse, load from a blob storage using T-SQL, create the modelling and build a report.

About the author

Dennes Torres

See Profile

Dennes Torres is a Data Platform MVP and Software Architect living in Malta who loves SQL Server and software development and has more than 20 years of experience. Dennes can improve Data Platform Architectures and transform data in knowledge. He moved to Malta after more than 10 years leading devSQL PASS Chapter in Rio de Janeiro and now is a member of the leadership team of MMDPUG PASS Chapter in Malta organizing meetings, events, and webcasts about SQL Server. He is an MCT, MCSE in Data Platforms and BI, with more titles in software development. You can get in touch on his blog https://dennestorres.com or at his work https://dtowersoftware.com

Dennes's contributions