At Microsoft Build 2024, Real-Time Intelligence was announced with the Real-Time hub as its centralized location for all data-in-motion. This shows the commitment of Microsoft to make Fabric your one-stop shop for all things data analytics, both streaming and batch data. There are several methods to work with streaming data in Fabric, such as eventstream and eventhouse.
In this article, we’ll show you how you can send streaming data through event hubs directly into an eventhouse. An eventhouse is a collection of one or more KQL databases (which are very similar to Azure Data Explorer databases), which is a data store optimized for handling real-time data and particularly suited for time-series analysis. For more information about KQL databases, check out this article.
This setup can be useful when you want to send data from your event sources – such as IoT devices, but also clickstream data, events from social media etc. – to your KQL database for further analysis, without doing any transformation on the event data itself. You can transform the data in your KQL queries though. If you want to transform the data in-flight, eventstreams are a better option.
To generate real-time data, I’m using the open-source Real-Time Data Simulator, created by Data Platform MVP Kamil Nowinski. It can generate different kinds of payloads, and the default payload is about fuel types.
There’s also a new Generate Data tool in Azure Event Hubs itself. Both tools have their pros and cons.
The Azure tool comes with a bunch of pre-defined datasets, such as weather data, taxi data, vehicle toll booth and so on, while the Real-Time Data Simulator has more options on how the messages are sent, as you can configure the number of threads and batches. For all purposes, you can use either tool to generate data and follow along with this article.
Create an Event Hub in Azure
Before we can send our real-time data, we need an Azure Event Hub first. In the Azure Portal, choose “Create a resource”.
In the search box, enter “eventhubs”, search, and then click on Create.
This will take you to an overview page, where you can select Create again.
In the configuration screen, you need to either select an existing resource group, or create a new one. In the following screenshot, I created a new resource group with the name “fabricevents”.
Next, you’ll need to specify a name for the event hubs namespace. This needs to be globally unique. An event hub namespace is a bit similar to a logical SQL Server in Azure. It’s a container for the various event hubs, just like the SQL Server is a container for the various Azure SQL databases. You also need to choose a location (preferably where your other resources and your Fabric tenant are located). Keep in mind that some locations might be cheaper than others.
Since I’m creating this event hub only for demo purposes, I’m choosing the lowest Standard tier and a throughput of only one unit. There’s even a cheaper tier – Basic – but that tier doesn’t allow us to create a custom consumer group.
You can skip to the validation page and hit Create to get the Event Hubs namespace provisioned.
Once the resource is ready, we can find the host name in the Overview pane (we will need this later). At the top, we can find a link to create a new Event Hub.
In the configuration screen of the new event hub, you’ll need to specify a name, a partition count (you can leave this to the default as we won’t be dealing with that much streaming data) and the retention policy.
The retention policy dictates how long the events will be kept around, which might be important if for example the Fabric capacity is paused. The default is 1 hour, but you can set this to maximum 168 hours.
You can skip to the validation page again and hit Create.
In the Event Hub namespace, the new Event Hub will be added in the Entities section.
Click on the link to go to the Event Hub instance.
At the top, click on the link to create a new Consumer group. Give it a name and hit Create.
In the Settings section, go to Shared access policies and add a new policy.
Give the policy a name and select Manage from the options.
The policy will generate some keys, which can be used by other apps to connect to the Event Hub. Later on, we will need the policy name – mypolicy – and the primary key to connect the Fabric Eventhouse to the Event Hub. In the data generation tool we will need the “connection-string-primary key”.
To recap, we created the following in Azure:
- An Event Hubs namespace using the Standard tier
- An Event Hub instance with a specific retention policy
- A consumer group. This will keep track of which events have already been consumed with a pointer
- A shared access policy which provides us with the keys to connect to the Event Hub instance
Create an Eventhouse in Microsoft Fabric
In Microsoft Fabric, make sure you have a Fabric-enabled workspace (using either the Fabric trial or an Azure capacity). In the bottom left corner, switch to the Real-Time Intelligence workload.
This will take you to an overview page with standard actions and templates for handling real-time data. From the item list, choose Eventhouse and give it a name.
By default, the eventhouse will come provisioned with a KQL database of the same name.
When you click on the database in the left-hand pane, you will be taken to an overview page for that specific database.
In the menu, select Get Data and then Event Hubs.
Since we don’t have any tables yet, select New table and name it GenerationByFuelType.
In the Configure the data source section, choose Create new connection. Enter the name of the Event Hubs namespace and the name of the Event Hub instance.
Change the name of the new connection into something more meaningful (and preferably shorter) and enter the name of the shared access policy and copy paste the primary key from the policy into the Shared Access Key field. Save the connection. Once the connection is successfully made, you will be able to select the desired consumer group from the dropdown.
In the next screen, Fabric will try to parse the incoming events. First, change the format from TXT to JSON.
Now we will need to generate some events before we can move on. Download the Real-Time Data Simulator from Github and launch the application by either starting the debugger from Visual Studio, or by compiling the code in Visual Studio to an .exe file and running that file.
The app can send events with a customized JSON payload by using multiple threads and batches. You can configure the wait time between the different batches. Don’t forget to change the name of the Event Hub and to replace the connection string with the connection string (primary key) from your access policy.
Let’s send 10 events so that the wizard in Fabric can parse the JSON and determine the schema for the table.
In the Event Hub instance monitoring pane, we can view 10 messages that were received and the same number of messages were sent.
Hit Finish to end the wizard and get the table created.
Keep in mind that those initial 10 messages will not be stored in the newly created table; they are lost. In the database overview we can now see the new table:
Let’s send some more events through Event Hubs to the KQL database:
We can keep track of all the messages in the monitoring pane of the Event Hub:
You can also see the events being stored in the table:
Once all the messages have been processed, the table has a compressed size of about 5KB (this might be different on your end if you have another payload).
What happens if the Fabric capacity is paused?
We have successfully set up a real-time connection between an app sending messages through Event Hubs to a KQL database in Fabric. But what happens when we pause our Fabric capacity?
If our publisher is still sending events, will they be dropped if the capacity is paused? Luckily, no. Thanks to the retention settings we configured earlier for the Event Hub instance, the events will be kept around for the entire time of the retention period. If the capacity is paused longer than the retention period, you might miss out on events.
Let’s test this out. First, we need to empty the KQL table to make sure we can count the events correctly. This can be done with the following KQL statement:
1 |
.clear table GenerationByFuelType data |
Now we can pause our capacity (see two screenshots above on how to do this) and start sending events again. Let’s send 1,000 messages:
Wait till all messages have arrived in the Event Hub instance. Then, resume the capacity again. Fabric will start reading all the messages it has missed while it was paused. This can be verified in the Event Hubs monitoring pane, where we will see two distinct pikes: one for the incoming messages, and one for the outgoing messages (which seem to be higher for some reason).
Conclusion
In this article, we have shown you that it is quite simple to set up a real-time streaming pipeline between a source, Azure Event Hubs and a Microsoft Fabric Eventhouse. No code at all was required to get everything configured. The data in the KQL database can be analyzed with KQL statements, but you can also build a Power BI report directly on top of this data. It’s also possible to sync this data to a delta table in OneLake so that other workloads – like the lakehouse or the warehouse – can use this data as well.
This real-time pipeline directly stores the data in a KQL table without any transformations. If you do wish to transform the data inside the messages, you might want to look into Fabric Eventstreams. They come with their own built-in Event Hub, as explained by Patrick Leblanc in this Guy in a Cube video.
Load comments