When implementing real-time ingestion, we usually implement an architecture called lambda. Using the lambda architecture, KustoDB in Microsoft Fabric is always recommended for the speed layer.
Do you know why? Let’s analyze in detail.
1 – KustoDB uses SSD
KustoDB uses an internal SSD storage. Lakehouses use ADLS as their backend. In this way, Kusto may have a better performance.
2 – Mapping Policies
If you decide to make a Direct Ingestion to the KustoDB, it supports the use of a mapping policy, capable of making complex mapping of JSON files to a table format.
Read more about this on https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/create-ingestion-mapping-command
3 – KustoDB Update Policies
On the speed layer, we don’t load descriptions for dimensions. We load foreign key to dimensions, making the streaming faster.
However, when displaying the data to the end user, we need the description from the dimensions.
The solution for this is the Update Policies. They work as a kind of trigger, allowing that once a row is ingested, it can be merged with the dimensions generating a record the end user will understand.
You can read more about update policies on https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/update-policy
4 – Retention Policies
Retention policies have two different usages, both very important. The most obvious is to stablish a time window for the data in the speed layer.
The speed layer is intended to hold the data for a very limited time and discard it. The first thought of a developer is to create an automated routine to discard the data.
Here comes the Retention Policy: It will automatically discard the data after a time window. We have no need to make a custom development.
The second usage is to help the Update Policies. When merging the data with dimensions, you don’t need the original ingested data anymore. A retention configuration to 0 on the ingestion tables achieve this.
Read more about the retention policies on https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/retention-policy
Summary
These are complex but great reasons to use KustoDB in Fabric as the speed layer for real-time ingestion.
Load comments