Azure Event Hubs is an event processing cloud service that can ingest millions of events per second and make them available for storage and analysis. Whether the data originates from websites, applications, machinery, or mobile devices, Event Hubs can log events in near real-time. It does so using technologies that support low latency and high availability, while providing flexible throttling, authentication, and scalability.
Event Hubs is often touted for its ability to support high throughput data sources, millions of devices, or a combination of both, making the service an ideal fit for the anticipated influx of Internet of Things (IoT) data. Indeed, Event Hubs appears to be just what we need to handle the endless streams of information coming in from the dispersed world of connected devices.
But before we make any decisions about Event Hubs, we should at least take a closer look at how all the pieces fit together so we have some sense of how to get data from points A to B-and what it will cost us to get there.
The Service Bus namespace
Event Hubs is a managed service that can handle message streams from a wide range of connected devices and systems. It is not, however, a traditional messaging solution. Event Hubs was built with size and speed in mind-massive volumes of data in near real-time-and is streamlined for one-way event processing and nothing else. Think cloud. Think IoT. Think about all that information no one knows what the heck to do with.
Event Hubs, the service, is built around the event hub managed classes within the Service Bus namespace (part of the Microsoft.ServiceBus.dll assembly). Service Bus provides a messaging infrastructure for relaying data between various devices, services, and other endpoints. In this sense, you can also think of Service Bus as a multi-tenant cloud service, where developers create their own namespaces for defining the communication mechanisms they need.
Currently, the Service Bus infrastructure supports the following four communication mechanisms:
- Queue: A broker that stores messages until the recipients can receive them. Each message travels in only one direction and can be received by only one recipient.
- Topic : A broker that can support multiple subscriptions for unidirectional messaging. Each subscription can be filtered to match specific criteria.
- Relay : A bidirectional communication node that simply passes messages onto their destinations.
- Event hub : An ingestion point for unidirectional event and telemetry data coming in from distributed endpoints and earmarked for analysis and storage.
It is, of course, the event hub objects we’re most concerned with and what they mean to the Event Hubs service. The event hub classes are aimed at processing event data at high throughput. What this means, for example, is that they don’t implement the full messaging capabilities available to Service Bus topics. Event hubs also use a partitioned consumer pattern in which each consumer reads only a subset of the message stream. Queues and topics follow a competing consumer model in which they attempt to read from the same resource, which can result in scale limits and additional complexities.
If you’re developing an Event Hubs solution, you’ll definitely want to dig deeper into the Service Bus architecture and the event hub classes, but for now, let’s take a step back and look at the service from a broader perspective and examine how data flows from one point to the next.
The Event Hubs workflow
The data-flow story starts with the publishers, the original sources generating all that event information just waiting to be consumed. The data can come from any device or system capable of generating event or telemetry data, whether cars, online games, web farms, machinery, aircraft, mobile apps, system hardware, or just about anything with an electronic heartbeat.
To send the data to Event Hubs, developers must publish the events via the Advanced Message Queuing Protocol (AMQP) or the secure version of the Hypertext Transfer Protocol (HTTPS). AMQP is an open-standard application layer protocol specific to messaging and used by such products as RabbitMQ. HTTPS facilitates secure Internet communication by protecting the HTTP traffic with Transport Layer Security (TLS) or Secure Sockets Layer (SSL) encryption.
Because both AMQP and HTTPS are such widely implemented protocols, Event Hubs can support just about any of today’s event-generating devices. Event Hubs also includes a number of native client libraries to provide direct support for platforms such as Android and iOS.
Event publishers use Shared Access Signature (SAS) tokens to identify themselves to the Event Hubs service. Each publisher can have its own identity, or multiple publishers can share a common SAS token, depending on the solution’s requirements. Whichever scenario you use, the publisher’s name must match the token when publishing an event so publishers can be uniquely identified.
An Event Hubs solution can publish events individually or batched. In either case, the maximum size is 256 KB. Anything beyond that will generate an error. If you can keep within the batch size limit, then using them can significantly increase throughput. You can also send your events asynchronously to increase the event rate. However, taking this approach can result in events being transmitted even if the event stream has been throttled, which can result in failures or lost messages if improperly implemented.
From an Event Hubs’ perspective, an event publisher is seen as a virtual endpoint that can do nothing but send messages. Event Hubs cannot transmit messages back to the publisher. The service’s purpose is to collect event streams at high throughput for use in real-time and batch processing. Event Hubs is essentially a managed event ingestor service, able to process events from millions of devices, yet still preserve the event order on a per-device basis.
Event Hubs uses partitions to manage the massive data streams. Each partition holds an ordered sequence of events, with new events added to the end of the sequence. Partitions act as commit logs that maintain the data for a configurable amount of time. An Event Hubs solution must include at least 8 partitions, but no more than 32, with the default set at 16. That said, organizations can actually request up to 1024 by submitting a support ticket with Microsoft. There have even been suggestions that the amount can be pushed even higher.
Once you’ve created the Event Hubs solution, you cannot change the partition count, so you must be thinking long-term when planning the initial project. You’ll have to determine how many events per second you’ll need to stream, the size of those events, and the number of publishers you plan to support. Event Hubs assigns one partition to each publisher, regardless of the type. Be sure to refer to the Event Hubs documentation for specifics on how to estimate your partition needs.
Event Hubs makes it possible to stream million of events per second and send the data to multiple destinations, whether for analysis or storage (or both). The service uses a scalable event ingestor that sits between the event publishers and subscribers, decoupling event production from consumption.
To help facilitate this process, events support stream offsets, a type of client-side cursor that identifies the position of an event within its partition. Consumers can use the offsets to specify a point in the event stream to begin reading events. Consumers read the events directly from the partition, without impacting or needing direct access to the publishing operations.
The subscribing consumers must use AMQP to connect to Event Hubs in order to access the event stream. AMQP supports higher throughput and lower latency than mechanisms such as HTTP GET. Only one active consumer at a time should be connected to a partition, although a partition will support up to five connections to allow for the lag time between when a reader disconnects and the service recognizes that it’s disconnected.
The connections themselves are facilitated by consumer groups, which provide views into the data. Multiple applications can use the same consumer group to access the data they need, at the rate they need it, and based on their own offsets. Event Hubs provides a default consumer group, but developers can create up to 20 of their own groups, depending on their subscription plan.
As with publishers, a consumer group request must be accompanied by a token that grants the necessary rights for accessing the event data. You can secure all consumer groups with a common SAS key, or you can assign keys to the individual groups.
When a consumer connects to a partition, it is responsible for tracking its current position within the event stream, a process known as checkpointing. Checkpointing uses the stream offsets within the partition to refer to its last position. Each consumer is responsible for maintaining a record of its own checkpoints. When the consumer reconnects to Event Hubs, it passes in the offset to specify where to start reading the data.
The cost of doing business
Subscription rates for the Event Hubs service are based on a combination of throughput units and ingress events. Throughput units refer to units of capacity, based on a maximum of 1 MB/s ingress (events sent into Event Hubs) and 2 MB/s egress (events consumed from Event Hubs). A single partition can support up to one throughput unit, which means the number of partitions should be equal to or greater than the number of throughput units.
For the purpose of pricing, an ingress event is a unit of data that can be up to 64 KB. Each unit represents a billable event. Egress events and management operations are not counted against the ingress event pricing, but do count toward the throughput unit pricing. Regardless of the type and extent of your Event Hubs solution, you must take both calculations into account.
Currently, Azure supports two Event Hubs pricing plans-Basic and Standard-broken down as shown in the following table:
|Ingress events||$0.028 per million events||$0.028 per million events|
The Basic plan supports only one consumer group and up to 100 Service Bus brokered connections. The Standard plan supports up to 20 consumer groups and 1,000 brokered connections. The Standard plan also comes with publisher policies and supports up to seven days additional storage. Publisher policies make it easier to implement large numbers of independent event producers. Support packages are also available to both plans.
These prices are based on US dollars and are, of course, subject to change. There are also plenty of other details to take into account when estimating costs, such as additional event storage that might be required, how throughput units are billed (hourly), or the total number of permitted ingress events, management operations, and API calls per second. Be sure to read all the fine print before setting off on your Event Hubs journey. The key is to balance the necessary number of throughput units against the required number of partitions, which means knowing what you need before you need it.
Azure Event Hubs
If you find yourself in a position where you need to handle a lot of event or telemetry data in near real-time, then Event Hubs might be worth considering. It can help you get started quickly while avoiding the upfront costs an in-house solution would demand. Whether the service would be useful as a long-term strategy would be difficult to determine without a careful cost analysis. Even if you can predict all factors, arriving at a cost is never as straightforward as we would hope, whether implementing an on-premises solution or going with a service such as Event Hubs. The much-anticipated IoT tsunami will only add to the confusion, if and when it arrives.
Even so, what Microsoft is offering with Event Hubs shows promise. It is still a new technology though, and it might be worth waiting for the service to run through a full cycle before fully committing to the platform. At the very least, you might want to start out small, giving you a chance to get your feet wet and Microsoft a chance to prove out the technology with those who can afford to wade through that awkward startup phase.