How to Stream Data from Azure Event Hubs into a Fabric Eventhouse

Comments 0

Share to social media

At Microsoft Build 2024, Real-Time Intelligence was announced with the Real-Time hub as its centralized location for all data-in-motion. This shows the commitment of Microsoft to make Fabric your one-stop shop for all things data analytics, both streaming and batch data. There are several methods to work with streaming data in Fabric, such as eventstream and eventhouse.

In this article, we’ll show you how you can send streaming data through event hubs directly into an eventhouse. An eventhouse is a collection of one or more KQL databases (which are very similar to Azure Data Explorer databases), which is a data store optimized for handling real-time data and particularly suited for time-series analysis. For more information about KQL databases, check out this article.

This setup can be useful when you want to send data from your event sources – such as IoT devices, but also clickstream data, events from social media etc. – to your KQL database for further analysis, without doing any transformation on the event data itself. You can transform the data in your KQL queries though. If you want to transform the data in-flight, eventstreams are a better option.

To generate real-time data, I’m using the open-source Real-Time Data Simulator, created by Data Platform MVP Kamil Nowinski. It can generate different kinds of payloads, and the default payload is about fuel types.

A screenshot of a computer

Description automatically generated

There’s also a new Generate Data tool in Azure Event Hubs itself. Both tools have their pros and cons.

A screenshot of a computer

Description automatically generated

The Azure tool comes with a bunch of pre-defined datasets, such as weather data, taxi data, vehicle toll booth and so on, while the Real-Time Data Simulator has more options on how the messages are sent, as you can configure the number of threads and batches. For all purposes, you can use either tool to generate data and follow along with this article.

Create an Event Hub in Azure

Before we can send our real-time data, we need an Azure Event Hub first. In the Azure Portal, choose “Create a resource”.

A screenshot of a computer

Description automatically generated

In the search box, enter “eventhubs”, search, and then click on Create.

A screenshot of a screenshot of a website

Description automatically generated

This will take you to an overview page, where you can select Create again.

A screenshot of a web page

Description automatically generated

In the configuration screen, you need to either select an existing resource group, or create a new one. In the following screenshot, I created a new resource group with the name “fabricevents”.

A screenshot of a computer

Description automatically generated

Next, you’ll need to specify a name for the event hubs namespace. This needs to be globally unique. An event hub namespace is a bit similar to a logical SQL Server in Azure. It’s a container for the various event hubs, just like the SQL Server is a container for the various Azure SQL databases. You also need to choose a location (preferably where your other resources and your Fabric tenant are located). Keep in mind that some locations might be cheaper than others.

A screenshot of a computer screen

Description automatically generated

Since I’m creating this event hub only for demo purposes, I’m choosing the lowest Standard tier and a throughput of only one unit. There’s even a cheaper tier – Basic – but that tier doesn’t allow us to create a custom consumer group.

You can skip to the validation page and hit Create to get the Event Hubs namespace provisioned.

A screenshot of a computer

Description automatically generated

Once the resource is ready, we can find the host name in the Overview pane (we will need this later). At the top, we can find a link to create a new Event Hub.

A screenshot of a computer

Description automatically generated

In the configuration screen of the new event hub, you’ll need to specify a name, a partition count (you can leave this to the default as we won’t be dealing with that much streaming data) and the retention policy.

A screenshot of a computer

Description automatically generated

The retention policy dictates how long the events will be kept around, which might be important if for example the Fabric capacity is paused. The default is 1 hour, but you can set this to maximum 168 hours.

You can skip to the validation page again and hit Create.

A screenshot of a computer

Description automatically generated

In the Event Hub namespace, the new Event Hub will be added in the Entities section.

A screenshot of a computer

Description automatically generated

Click on the link to go to the Event Hub instance.

A screenshot of a computer

Description automatically generated

At the top, click on the link to create a new Consumer group. Give it a name and hit Create.

A screenshot of a computer

Description automatically generated

In the Settings section, go to Shared access policies and add a new policy.

A screenshot of a computer

Description automatically generated

Give the policy a name and select Manage from the options.

A screenshot of a computer

Description automatically generated

The policy will generate some keys, which can be used by other apps to connect to the Event Hub. Later on, we will need the policy name – mypolicy – and the primary key to connect the Fabric Eventhouse to the Event Hub. In the data generation tool we will need the “connection-string-primary key”.

A screenshot of a computer

Description automatically generated

To recap, we created the following in Azure:

  • An Event Hubs namespace using the Standard tier
  • An Event Hub instance with a specific retention policy
  • A consumer group. This will keep track of which events have already been consumed with a pointer
  • A shared access policy which provides us with the keys to connect to the Event Hub instance

Create an Eventhouse in Microsoft Fabric

In Microsoft Fabric, make sure you have a Fabric-enabled workspace (using either the Fabric trial or an Azure capacity). In the bottom left corner, switch to the Real-Time Intelligence workload.

A screenshot of a computer

Description automatically generated

This will take you to an overview page with standard actions and templates for handling real-time data. From the item list, choose Eventhouse and give it a name.

A screenshot of a computer

Description automatically generated

By default, the eventhouse will come provisioned with a KQL database of the same name.

A screenshot of a computer

Description automatically generated

When you click on the database in the left-hand pane, you will be taken to an overview page for that specific database.

A screenshot of a computer

Description automatically generated

In the menu, select Get Data and then Event Hubs.

A screenshot of a computer

Description automatically generated

Since we don’t have any tables yet, select New table and name it GenerationByFuelType.

A screenshot of a computer

Description automatically generated

In the Configure the data source section, choose Create new connection. Enter the name of the Event Hubs namespace and the name of the Event Hub instance.

A screenshot of a computer

Description automatically generated

Change the name of the new connection into something more meaningful (and preferably shorter) and enter the name of the shared access policy and copy paste the primary key from the policy into the Shared Access Key field. Save the connection. Once the connection is successfully made, you will be able to select the desired consumer group from the dropdown.

A screenshot of a computer

Description automatically generated

In the next screen, Fabric will try to parse the incoming events. First, change the format from TXT to JSON.

A screenshot of a computer

Description automatically generated

Now we will need to generate some events before we can move on. Download the Real-Time Data Simulator from Github and launch the application by either starting the debugger from Visual Studio, or by compiling the code in Visual Studio to an .exe file and running that file.

The app can send events with a customized JSON payload by using multiple threads and batches. You can configure the wait time between the different batches. Don’t forget to change the name of the Event Hub and to replace the connection string with the connection string (primary key) from your access policy.

A screenshot of a computer

Description automatically generated

Let’s send 10 events so that the wizard in Fabric can parse the JSON and determine the schema for the table.

A screenshot of a computer

Description automatically generated

In the Event Hub instance monitoring pane, we can view 10 messages that were received and the same number of messages were sent.

A screenshot of a graph

Description automatically generated

Hit Finish to end the wizard and get the table created.

A screenshot of a computer

Description automatically generated

Keep in mind that those initial 10 messages will not be stored in the newly created table; they are lost. In the database overview we can now see the new table:

A screenshot of a computer

Description automatically generated

Let’s send some more events through Event Hubs to the KQL database:

A screenshot of a computer

Description automatically generated

We can keep track of all the messages in the monitoring pane of the Event Hub:

A screenshot of a graph

Description automatically generated

You can also see the events being stored in the table:

A screenshot of a computer

Description automatically generated

Once all the messages have been processed, the table has a compressed size of about 5KB (this might be different on your end if you have another payload).

A screenshot of a computer

Description automatically generated

What happens if the Fabric capacity is paused?

We have successfully set up a real-time connection between an app sending messages through Event Hubs to a KQL database in Fabric. But what happens when we pause our Fabric capacity?

A screenshot of a computer

Description automatically generated

If our publisher is still sending events, will they be dropped if the capacity is paused? Luckily, no. Thanks to the retention settings we configured earlier for the Event Hub instance, the events will be kept around for the entire time of the retention period. If the capacity is paused longer than the retention period, you might miss out on events.

Let’s test this out. First, we need to empty the KQL table to make sure we can count the events correctly. This can be done with the following KQL statement:

A screenshot of a computer

Description automatically generated

Now we can pause our capacity (see two screenshots above on how to do this) and start sending events again. Let’s send 1,000 messages:

A screenshot of a computer

Description automatically generated

Wait till all messages have arrived in the Event Hub instance. Then, resume the capacity again. Fabric will start reading all the messages it has missed while it was paused. This can be verified in the Event Hubs monitoring pane, where we will see two distinct pikes: one for the incoming messages, and one for the outgoing messages (which seem to be higher for some reason).

A graph with red and blue lines

Description automatically generated

Conclusion

In this article, we have shown you that it is quite simple to set up a real-time streaming pipeline between a source, Azure Event Hubs and a Microsoft Fabric Eventhouse. No code at all was required to get everything configured. The data in the KQL database can be analyzed with KQL statements, but you can also build a Power BI report directly on top of this data. It’s also possible to sync this data to a delta table in OneLake so that other workloads – like the lakehouse or the warehouse – can use this data as well.

This real-time pipeline directly stores the data in a KQL table without any transformations. If you do wish to transform the data inside the messages, you might want to look into Fabric Eventstreams. They come with their own built-in Event Hub, as explained by Patrick Leblanc in this Guy in a Cube video.

Load comments

About the author

Koen Verbeeck

See Profile

Koen Verbeeck is a data professional working at AE. He helps organizations to get insight in their data and to improve their analytics solutions. Koen has over a decade of experience in developing data warehouses, models and reports using the Microsoft data platform. Since 2017 Koen is a Microsoft Data Platform MVP. He has a blog at http://www.sqlkover.com, writes articles for MSSQLTips.com and is a frequent speaker at Microsoft Data Platform events.