In the previous article in this series, I introduced the concept of IoT and provided a solution taxonomy as framing for thinking about IoT solutions: Ingest, Stage, Process and Act. In this article, I will elaborate on the Ingest stage starting with an agnostic perspective and map the capabilities in the Microsoft Azure platform that are best suited to meet the objectives and challenges inherent to the ingestion of large volumes of data exhaust.

In order to glean the insights that IoT promises, it is of course necessary to communicate with the devices that comprise an IoT solution. This aspect deals with messages that originate from the device, namely telemetry and inquiries. Clemens Vasters who works on IoT in Microsoft engineering distills these message types into the following four patterns: telemetry, inquires, notifications and commands.

As the name suggests, telemetry entails messages that the device voluntarily shares with an API via a one-way message exchange pattern. A good analogy is to think of telemetry as turning on logging and or instrumentation in an application so that you can proactively monitor its health and status. As you can imagine, just as with instances of a software solution, contending with the volume and frequency of data transmitted from the device requires robust infrastructure capable of handling load at scale. Telemetry messages tend to be finely grained to maintain small packet size on the wire and are commonly pretty chatty by nature.

Alerts are a special kind of telemetry that require a higher Quality of Service (QoS) than regular telemetry. Think of alerts as the difference between logging an Error versus simply logging information or warnings. If I have a machine on the factory floor that is about to run out of a raw material or a car that is sensing an SOS message because it believes its being broken into, you want those messages to be processed much quicker than regular telemetry that is streaming job velocity heuristics or location data.

Inquires also originate from the device but typically follow a request-response messaging pattern or an asynchronous messaging pattern (assuming the protocol used supports callbacks, more on this in a moment). An inquiry is simply a question that the device is asking to an external API. Examples include requesting a policy refresh or obtaining a fact that will be used to inform a decision on some kind of action.

Of course, the ability to ingest messages is predicated on the ability to communicate from the device to the API. While this problem is much easier than dealing with other message types I’ll cover under “Action” the myriad of network and application protocols in use in brownfield solutions as well as the options available for greenfield solutions is nothing short of mind boggling. Figure 1 provides a broad snapshot of the most common network and application protocols and doesn’t even attempt to address all of the custom, esoteric combinations you will likely encounter.


Figure 1: The number of networking and application protocols in IoT are massive.

What’s more, this space is so new that new protocols and standards are emerging constantly and it will take some time for the dust to settle.

Fortunately, Microsoft has been working on this problem for some time and recognizes that today there is no single right answer when it comes to protocols. At the same time, the Azure Service Bus engineering, Azure Customer Advisory Team (CAT) and Microsoft Patterns & Practices teams have and are continuing to provide prescriptive guidance on how to implement messaging gateways by embracing a Service Assisted Communications (SAC) model which Clemens Vasters covers in detail in a post entitled “Service Assisted Communication” for Connected Devices“.

In a SAC model, the device sends messages to an endpoint (or cloud gateway) that presents an “outbox” just as you might (very loosely) think about how outbound email works using SMTP. The idea is that the device consumes an endpoint via an IP routable protocol like HTTP, TCP, or UDP. Conversely, and as I’ll cover in more detail in under “Action”, the device receives messages via a dedicated “inbox”, kind of like the POP protocol. Figure 2 shows a logical diagram of the inbox/outbox metaphor.


Figure 2 The Service Assisted Communication Model.

It turns out that Azure Service Bus Brokered Messaging is a natural fit for realizing this metaphor. Brokered Messaging defines two newer Service Bus entities (Queues and Topics) that were added after Relay. You can learn more about Azure Service Bus by visiting the official documentation on the Microsoft Azure Service Bus documentation page or checking out my article in CODE Magazine titled “Introducing Queues and Topics in Azure Service Bus“. While Azure Service Bus had been leveraged to implement countless modern application designs to support both hybrid and cloud scale solutions, it is a perfect fit for enabling communication from and to devices because of the very low barrier to communication. For example, to send a message to an Azure Service Bus Queue, the following simple HTTP request can be made that sends a payload (and or request header(s) to the queue called “telemetry_queue”:

Once the message arrives in the queue, a process or pump can consume the message.

Topics extend queues in that they can have one or many subscriptions. As you might guess, this is very helpful when communicating to devices and I’ll discuss in detail how this applies to the Action stage in the article that follows.

Of course, you are not limited to the HTTP API. Microsoft ships rich SDKs for .NET, Java, NodeJS, PHP, Python and Ruby. The unit of communication for Queues and Topics is known as a Brokered Message. At a high level, a Brokered Message includes a header, which includes several properties including content type and routing semantics as well as a property bag. Of course the type also includes the body.

Figure 3 below shows an example of code you might write in a C-based language like C# or Java that instantiates a Brokered Message and uses the Properties property in the Header to keep the payload of the message to an absolute minimum:

Figure 3: Code to create a BrokeredMessage.

This message might resemble telemetry from an expensive machine on the factory floor that creates high margin widgets. This voluntary information flowing from the device might be sent every 30 seconds to allow monitoring of stress and internal temperature to proactively identify potential issues such as deviation from normally observed data. With the benefit of trending and rules, an abnormal reading over a window of time might suggest a problem such as the use of a raw material with the wrong viscosity or that the machine is at risk of overheating due to elevated temperature on the factory floor on a hot summer day (very common in Arizona, for example).

Notice the Label property. I’m setting it to “tlm” to denote that this message is a regular telemetry message. This is very useful in the Stage and Process stages because decisions on storage as well as how to process the message can be made by inspecting the label. For example, if I instead set the Label property to something like “alert”, I might have a subscription that is dedicated to alerts backed by several competing consumers (workers) whose job it is to pop the message ASAP, essentially giving messages with an “alert” label priority processing.

As you can probably guess from the key/value pairs, I’m also sending a unique device identifier, the current time in UTC (always think global), the interval, in seconds, that the device is set to send telemetry, the current RPM and the current temperature in Fahrenheit. If I have a process that is monitoring this telemetry and perhaps applying Azure Stream Analytics (which I’ll cover in the Process topic in a later article) to identify when a threshold is hit that recognizes a potential anomaly, I might have a control message sent to the device to increase the interval from 30 seconds to 5 seconds to get a finer grain of data to make a decision to perhaps stop production or send a technician to the factory floor to investigate and avoid damaging the equipment, or worse, ruining an expensive production run.

Azure Service Bus Queues and Topics also support AMQP 1.0. The Advanced Message Queuing Protocol is an open standard that defines the shape and structure of messages on the wire. This allows brokers to implement support for the standard without the proprietary glue. As within the enterprise, vendor neutrality is highly relevant to IoT scenarios that take integration and messaging both behind the firewall and beyond to unprecedented levels. AMQP is highly efficient for the type of messaging you will encounter in these kind of solutions because it is a connection-oriented, binary protocol which is very efficient. It is also reliable in that it provides delivery guarantees and supports various messaging topologies including client to client, client to broker (and vice-a-versa) and broker to broker.

But what happens when the device doesn’t support IP networking? This is a very common scenario in which smaller devices (some no larger than the head of a pencil eraser!) don’t have the resources, let alone capability to talk over an IP protocol. Some examples include very lightweight protocols like Zigbee or NFC. In this case, the device communicates to the external endpoint via a bridge, or field gateway. The role of this gateway is to receive messages over a specific protocol and forward the messages over say HTTP or AMQP. As you might imagine there are commercial appliances that are hitting the market that address this very problem, but I’ve seen a handful of solutions where more maker oriented devices like the Raspberry Pi or Netduino Plus 2, shown in Figure 4 are being used for this very purpose.


Figure 4 Prototyping boards like this are not just a great way to prove out a solution but also make good field gateways.

In fact, the vast majority of wearables such as smart watches and fitness bands rely on the smart phone in your pocket to terminate protocol and enable communication with the outside world.

Figure 5 depicts these two scenarios which we’ll refer to as “Direct” and “Field or Custom Gateway”:


Figure 5 Service Assisted Communication Model Patterns.

Note that the Gateway can also be a soft layer that supports additional protocols common within IoT solutions such as MQTT, CoAP (RFC 7252) as well as your own implementation or those you will encounter in brown field scenarios. Either way, the role of the field/custom gateway is to terminate native protocol on ingress and translate to the required network and application protocol on egress. Microsoft has a reference architecture code-named Reykjavik that addresses gateways holistically. You can learn more about Reykjavik in my 2014 Cloud Conf presentation Service Assisted Device Communications on Microsoft Azure on Channel9.

With a good foundation for the SAC model and why Azure Service Bus messaging is a perfect fit, no topic on how to effectively manage data exhaust at scale would be complete without covering the newest entity to ship on Azure Service Bus: Event Hubs!

A Gentle Introduction to Azure Service Bus Event Hubs

Event Hubs is a sibling to Queues and Topics and was designed for one purpose: scale.


Figure 6 Event Hubs are designed for high burst, low latency messaging for high scale ingest.

While it is possible to design a Queue or Topic based solution to scale to arbitrary rates (hundreds of thousands of messages per second and more) by partitioning across entities, this does require some careful thought around implementation and regardless, still carries overhead due to the messaging guarantees offered by Queues and Topics. For this reason, Queues and Topics make for an excellent choice for handling alerts and more importantly inquiries, commands and notifications. I’ll cover this extensively in an upcoming article focusing on the Action stage.

Event Hubs takes a partition based approach to support the ingestion of millions of events per second from a variety of devices without you having to worry about the implementation details at all.

You can think of a partition as a sequence of events resembling a commit log. As new events are received they are added to the partition as shown in Figure 7:


Figure 7 Partition is a temporary log of events.

Devices (or applications) are referred to as event publishers and downstream processes that consume the event are referred to as consumers. Consumers are then organized into Consumer Groups. For example, a Consumer Group might be a worker role that reads the stream and moves the events to Azure Table Storage or any durable backing store.

Note however that another distinction between Event Hubs and their siblings is that there is no concept of popping messages off of a partition. Rather, a partition resembles more of a temporary tape (if you’re old enough to remember those) meaning that you can read ahead and go back and re-read events if you need to, but there is no “erase” button. See Figure 8:


Figure 8: Logical view of partitions within an Event Hub entity.

This makes Event Hubs a natural fit for complex event processing (CEP) scenarios (hopping, sliding and tumbling window patterns are common) and we’ll dive into how Azure Stream Analytics makes for a great event consumer in the Process topic which will be covered in a later article.

The maximum retention period for events is 7 days, however most solutions targeting Event Hubs will not typically require a retention period anywhere near that long in testing and machine learning experiment scenarios. Additionally, it’s important to understand that while with Queues and Topics, you achieve read/consumer performance by implementing multiple consumers that compete for messages on a single entity, Event Hubs achieve read performance by parallelizing reads which while not trivial to implement is key to achieving performance. I’ll cover a special helper created by the Azure Service Bus team called Event Processor Host in the next article to demonstrate exactly this scenario.

Provisioning and Pricing

I won’t take you step by step on how to create an Event Hub on a Service Bus namespace as this is covered very nicely in official Getting Started documentation.

It is important to note, however that when provisioning an Event Hub, you define up-front how many partitions you would like (up to 32 total). By default, Azure allocates 8 partitions for you which is also the minimum number of partitions required. Once you configure the partition count, you cannot change this later without contacting Microsoft support.

Partitions are primarily a data organization concept. They really are about the maximum degree of downstream parallelism you expect your solution to require. A simple way of thinking about this is the maximum number of servers you expect to read from an event hub for a given consumer group. Scale and throughput are controlled via “Throughput Units” which is a reservation for capacity which entitles you to a maximum of 1MB/s or 1,000 events ingress and egress of up to 2MB/s. An event is defined by Microsoft as a unit of data with a payload of 64KB or smaller. By applying some simple math, we can see that taking the default configuration, each of the 8 partitions will receive approximately 125 ingress events per second (assuming constant load) and this can be scaled up by adding Throughput Units which as of this writing can be configured up to 20 for a theoretical throughput of up to 20K messages per second for a single Event Hub. Further throughput can be provided by working with Microsoft and filing a support ticket.

Event Hubs have a distinct pricing model from Queues and Topics, but they both have similarities and are both very reasonable. Event Hubs charge an hourly cost for the throughput unit and a message cost of 2.8 cents per million ingress events. As you can change the number of throughput units hourly and are charged for actual messages ingested this keeps costs very low. Complete pricing details are available on the Event Hubs pricing page.

A Closer Look

Let’s look at some examples to highlight the ingest/send side of the equation.

Event Hubs support both HTTP and AMQP 1.0. When using HTTP, you send messages to your Event Hub entity using POST:

The option to use HTTP is very important for small devices and there are already some nice examples of using the REST API in boards that have finite compute resources as well as solutions written in languages for which the Azure Service Bus SDK has not yet been implemented.

You will notice that the URI is very similar to the Queue example above, but remember that Event Hubs are very different from Azure Service Bus Queues and Topics. Unlike Queues and Topics, Event Hubs are really designed to be a very low latency, high throughput buffer for large volumes of data and thus give up features like dead lettering, sequencing and support for transactions in favor of raw ingest performance.

Also notice that as with Azure Service Bus, Event Hubs require HTTPS. This limits the ability for non SSL/TLS capable devices to send messages to an Event Hub, but again, this is another good use case for a field gateway. As a side note, this is a major challenge today due to smaller devices not having the compute resources to support this critical protocol to secure messages and don’t let this be an excuse for releasing non-secure solutions into the wild. Instead, use a field gateway and ensure that the gateway always lives behind your firewall. Microsoft and others are hard at work to support the realization of RFC 4279 which offers an alternative to certificates by leveraging much more compact shared keys.

Here’s an end to end example of taking the telemetry message I modeled earlier and implementing it for Event Hubs.

Telemetry message in Event Hub

You start by creating an instance of EventHubClient and passing in the connection information (which you obtain from the Azure Portal) and the name of the Event Hub.

From there, you’ll notice that I’m not using BrokeredMessage. As mentioned above, the design goals for Event Hub are very different than Queues and Topics and as such, a new unit of communication, “Event Data” is used to encapsulate messages for an Event Hub entity.

Of course, as with BrokeredMessage I could use the body which accepts an object, an array of bytes or a stream but in this case it really isn’t necessary since all I’m sending is key/value pairs.

Next I call the synchronous version of the Send method to send the Event Data instance. I repeat this process in a loop to simulate a device that is sending telemetry at a given interval.

Note that the code above is intentionally simplified to keep the example simple and would benefit from many improvements in a real-world, production scenario including using Async, Await and exception management in case something goes wrong. You would likely never put the loop to sleep and likely use some kind of a timer based on the device’s clock using a delegate, but hopefully you get the idea given a very simplistic surface area.

Wrapping Up

So far in this article, I focused on the ingestion side of things. In the next article. I’ll cover the receive side in more detail along with a look at the Event Process Host which will cover how to move telemetry events from Event Hub partitions to storage such as Azure Table Storage and how Stream Analytics fits in to give you real-time insights using different windowing patterns.

Lastly, the Patterns & Practices team at Microsoft has released a new project called “Data Pipeline Guidance”. It is an OSS project that is a work in progress focused on providing guidance that demonstrates proven practices regarding the high-scale, high-volume ingestion of data in a typical event processing. You can follow the project on GitHub here: https://github.com/mspnp/data-pipeline.

Special thanks to Dan Rasanova, PM on the Azure Service Bus team for reviewing and adding clarity to this piece.