2187-just_azure.svg

In my last article I covered the foundational aspects of diagnostics in Azure Cloud Services (web/worker roles). We took a look at how, in a Cloud Service, we can continue to use familiar tools such as custom logs (NLog, Log4Net, etc.), application tracing, performance counters, the Windows Event Log, etc. to instrument our application. We then wrapped up our introductory look at Cloud Service diagnostics by diving deeper into how the diagnostic agent actually works – first buffering locally and then transferring the diagnostic data to an Azure storage account.

In this article we’ll continue to broaden our understanding of Cloud Service diagnostics by understanding the key aspects of configuring the diagnostic agent to collect the data we want, and transferring that data to storage on a schedule we define. Oh, and if you are already a wiz at configuring Azure diagnostics, feel free to save some time and jump down to “Some Practice Advice” towards the end of the article.

Next steps

You may recall that an Azure storage account is used as a location to store the data collected by the Azure diagnostic agent running on the web/worker role instance(s). The diagnostic agent will save the data into either an Azure table or blob container depending on the structure of the data, as detailed below:

2218-intro-to-azure-diagnostics-table-62

Table 1 – Diagnostic items and where to find them

Since we understand what the diagnostic agent can save, and where it saves it, we can now turn our attention to the following questions:

  • How does the agent ‘know’ the specifics of the data to save?
  • Does it save all the data points, or just some of the data?
  • How frequently should the data be saved?

The Azure diagnostic agent has a default configuration – locally collect the Azure logs, IIS logs, and Azure diagnostic infrastructure logs, but persist nothing to an Azure storage account. It’s not really that helpful, but it is a starting point. There are two ways to configure the Azure diagnostic agent to do useful work. The agent can be configured imperatively via code, or declaratively via a configuration file. Let’s take a look at both approaches, and then discuss a few reasons why one approach may be preferred over the other.

Imperative configuration

Most examples of Azure diagnostic configuration will likely show the imperative, code-first approach to configuring the diagnostic agent. Why? Because it’s quick and relatively easy to understand. Plus, configuration via code was how Microsoft first showed developers how to configure Azure diagnostics.

To configure the Azure diagnostic agent via code, you’ll need to add a few lines of code to your web or worker role’s OnStart method, which you’ll typically find in the WebRole.cs or WorkerRole.cs file in your project. The configuration code below takes care of a few things:

  1. Gets the initial default configuration.
  2. Defines the performance counter data to collect, including the sample and transfer rate.
  3. Defines the trace log verbosity level and transfer rate.
  4. Defines the verbosity level and transfer rate for the Azure diagnostic infrastructure logs
  5. Defines which Windows Event Logs to collect and the transfer rate.
  6. Starts the diagnostic monitor (the agent) using the new configuration and a connection string for the desired Azure storage account.

Setting diagnostics in code:

There’s much more that can be configured here, and you’re encouraged to read the document provided on MSDN to get a detailed view of all the available options.

Once the DiagnosticMonitor.Start() method is executed, Azure will perform a few steps to configure the diagnostic agent. And now I can hear you all muttering “Say what?”

Right . . . let’s back up and briefly explain how the agent is able to configure itself. Once per minute, the agent will automatically poll a special container, wad-control-container, in the configured Azure storage account. In that container will be a special configuration file which the diagnostic monitor agent will use to configure itself.

2218-Diagnostic-Configuration-Polling-62

Figure 1 – Diagnostic Configuration Polling

If it is the very first time the role starts, Azure will use the imperative code to create the configuration file (essentially translating the code into an XML structure the agent understands).

There will be a blob in the wad-control-container container for each role instance. If you were to use Azure Management Studio to inspect the storage account, you would see something similar to the screenshot below.

2218-Config-files-in-AMS-2-620.png

Figure 2 – Diagnostic Configuration Files in Blob Storage

A few things to note about the image above:

  • The GUID immediately under the wad-control-container represents the deployment ID for the Cloud Service
  • “ImperativeWorker” is the name of the role
  • There is one file for each role instance – the file name corresponds to the name of the instance.

2218-Instances.png

Figure 3 – Role Instances

Ok, let’s get back to our story. The use of DiagnosticMonitor.Start will override the default configuration, but it will not override anything already in the wad-control-container container. Additionally, DiagnosticMonitor.Start only affects the agent on the local instance – it doesn’t impact other instances in the current role. Read that last sentence again. Got it? We’ll come back to this important point later.

This approach is very effective for new deployments, but not so much for updating existing deployments. So what’s a developer to do instead?

Declarative configuration

Since we already know that the agent uses a diagnostic configuration file located in the wad-control-container container associated with each role instance, we can use this knowledge to create our own configuration file. Knowledge is power!

When a role instance starts, the Diagnostics module is imported into the service model. If the web/worker role contains a diagnostic configuration file named diagnostics.wadcfg, the diagnostic monitor agent will configure itself using that file. Genius! The location of the file varies by role type:

  • Web Role – located in the bin directory at the root of the role
  • Worker Role – located at the root of the role

Diagnostics.wadcfg is an XML file which conforms to the schema at %ProgramFiles%\Microsoft SDKs\Windows Azure\.NET SDK\[VERSION]\schemas\DiagnosticsConfig201010.xsd, where [VERSION] corresponds to the Azure SDK version being used. For example, C:\Program Files\Microsoft SDKs\Windows Azure\.NET SDK\v2.3\schemas\DiagnosticsConfig201010.xsd.

If you like the pointy sharp elements of XML, feel free to roll your own file. You should end up with a diagnostics.wadcfg file that looks something like the sample below:

Most of the diagnostics.wadcfg file is fairly easy to understand, but I do want to call out the value for the sampleRate and scheduledTransferPeriod attributes. As an example, look at the scheduledTransferPeriod of the PerformanceCounterConfiguration data source – What on earth is PT3M? That value is an ISO 8601 standard for representing a time duration. In this case, PT3M indicates a time duration of 3 minutes. If you want to learn a bit more about this, there’s plenty of information on ISO 8601 available on Wikipedia.

If crafting a large XML document by hand doesn’t excite you, then an easier approach is to let Visual Studio help you create the configuration file. To configure diagnostics in Visual Studio, open the Properties for the desired role, and select the Configuration tab.

2218-Role-Configuration-no-marks-620.png

Figure 4 – Enable Diagnostics via Visual Studio

From there, you’ll want to make sure the Enable Diagnostics check box is selected. Now you can select one of the available options – Errors only, All information, or Custom plan. I would encourage you to explore each option as it is a great learning experience. For starters, editing the custom plan is a good way to learn about the available diagnostic sources that can be configured. For example, using this approach it is fairly easy to add various performance counters. Personally, I find this a lot easier because I can never remember the correct syntax for the performance counters I need! Here I can just select the counter for which I want to sample data, and enter the sample rate:

2218-Performance-Counter-Configuration-i

Performance counters

Once finished configuring the data sources to collect (including the verbosity, sample rate, and transfer period), you’ll find the diagnostics.wadcfg added to your project:

2218-Diagnostic-Configuration-File-per-R

Figure 5 – Diagnostic Configuration File per Role

Remember how we earlier mentioned that diagnostics.wadcfg had to be in a specific location for Web and Worker roles? Do we actually have to remember to manually put the diagnostics.wadcfg file in the right spot? Nope! Recent updates to the Visual Studio tooling will handle that for us. Ahhh … isn’t it great when technology saves us from having to remember those pesky details? If you were to dive into the folder structure of the project, you’d find a specific folder structure for the roles, and the diagnostics.wadcfg file already placed in the correct folder:

2218-Diagnostic-Configuration-File-in-Ro

Figure 6 – Diagnostic Configuration File in Worker Role

Bonus – Azure Diagnostics for non-.NET applications

The same diagnostics.wadcfg file we’re referring to here for .NET applications can also be used for non-.NET applications running in a Cloud Service. For instance, if you were to deploy a node.js application hosted in an Azure web role, you could would place the diagnostics.wadcfg file in the \bin folder of your node.js project.

2218-Node.png

Figure 7 – Diagnostics configuration for node.js application

Also, be sure to import the Diagnostics module in your ServiceDefinition.csdef file, and set the Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString in your ServiceConfiguration.cscfg file.

There’s a precedence

An order of precedence that is. As we’ve discussed, there are a few different ways to configure the Azure diagnostics monitor agent. In a future article we’ll take a look at another approach – remotely changing the diagnostic monitor agent configuration. However, before we get into that, it’s important to understand the order of precedence for how the diagnostic monitor agent applies configuration:

  1. wad-control-container container – the diagnostic configuration file created for each role instance has the highest order of precedence for configuring diagnostic data sources. This is the first place the diagnostic monitor agent checks for its configuration.
  2. Imperative Code
    1. RoleInstanceDiagnosticManager.SetCurrentConfiguration() – this will update the diagnostics.wadcfg file in the wad-control-container for the specific instance only.
    2. DiagnosticMonitor.Start() – this will apply to the specific instance only, but will not update the diagnostics.wadcfg file in the wad-control-container.
  3. Declarative Configuration – the diagnostics.wadcfg file at the root of the worker role or bin directory of the web role is used to configure the diagnostics monitor agent.
  4. Default Configuration – used only if none of the above configuration options were implemented. The default configuration locally (on the role instance itself) collects Azure logs, IIS logs, and infrastructure logs, but does not transfer them to Azure table or blob storage.

For the most recent updates, the best place to look is the MSDN documentation.

Some practical advice

Enough with the “how to do X” stuff. Let’s get into some practical details around Azure diagnostic configuration.

More is less

Only transfer the data that is needed. The more diagnostic data you transfer to storage, the more money you’ll pay to store it, and the extra, largely unneeded data just creates more noise. Instead, turn up the volume only when needed. Verbose or Debug is OK for development, but likely too much noise for production. In Production, set to Warn or Error initially and – only when needed – change to a more verbose level of logging.

When working on-premises with a single machine, it’s easy to want to view diagnostic data all the time. When running in the cloud, likely with N-instances, that’s just not realistic. Don’t try to sample performance counter data every 1 second and transfer every 30 seconds – that’s going to result in a lot of noise that you’re just not going to be able to respond to. Especially if/when you’re running across multiple instances.

Instead, set the sample rate to every 1 or 2 minutes, and the transfer period to every 5 minutes. Of course, don’t just take my word for it – try it out and validate the setting works for your scenario and needs.

I do declare

Always use the declarative approach. The declarative approach is by far the easiest and safest approach to configuring diagnostics, so much so that it is actually getting hard to find information on the imperative code approach via any official Microsoft documentation. With the recent updates to Visual Studio tooling, there really isn’t any reason not to use the declarative syntax.

The really “fun” part comes if you start to mix declarative and imperative configuration. Because we know the order of precedence, we know that the wad-configuration-container is the first place checked by the agent and, if there’s a file there, it uses it. Then comes any configuration set via code.

So what would happen if the file in the wad-configuration-container contained a different configuration from that which is in code? Chaos, I tell you . . . sheer and utter chaos! When a role instance reboots, the configuration set via code would be applied for that specific instance, not what’s in the wad-configuration-container. Now you’ve got a situation where some roles are running with the configuration from the wad-configuration-container, and others from what is set in the role’s OnStart() method. They’re out of sync, which is likely not a good situation.

Some instances will be collecting and transferring diagnostic data according to the configuration set via code, while others will be operating according to the configuration in the wad-configuration-container. Having role instances out of sync in regards to their diagnostic data is not going to be helpful in troubleshooting, should a problem arise!

Let’s wrap it up already

Last time we reviewed some of the foundational aspects of how Azure diagnostics works in Cloud Services. In this article we dived a little deeper and reviewed two approaches for configuring Azure diagnostics. With the imperative code approach it is fairly easy to write and understand, but it does open us up to few problems when it comes to updating the configuration. On the other hand, we can use the declarative configuration approach, and let the Visual Studio tooling help create a configuration file that will work well for new and subsequent updates of the deployed solution.

Next time we’ll explore a few advanced scenarios with Azure diagnostics. We’ll take a look at how to remotely update the diagnostic configuration, so we don’t have to redeploy the solution to change what is collected. We’ll also take a look at on-demand transfers, including the pros and cons of such an action.