One the most touted benefits of the cloud is scalability; the ability to quickly move from a few machines handling your web sites and service requests to dozens, hundreds, or even thousands when necessary. It’s great; but it can also have a dark side as well. If you’ve ever have a bug that just randomly appears in your solution then you’ll know that those non-repeatable, every-once-in-a-while, issues can be ridiculously hard to diagnose on even a single machine. Now imagine that same bug when you have dozens, hundreds or thousands of machines: Troubleshooting for more than just a handful of machines means that, for any chance of success, you need to get at centralized logs so that you can correlate times to identify which machine(s) might be involved.
The story up till now
Microsoft Azure has a Platform as a Service (PaaS) offering that is now called Cloud Services. This was originally called Hosted Services, as you’ll notice from the references in documentation or lower level commands. From the start, it has had a diagnostics feature which collected useful evidence such as IIS logs, trace messages, and performance counters for the virtual machines running in Cloud Services. This made it much easier to gather the data from all the instances in a Cloud Service. You would create some configuration and direct all the data to be pushed to an Azure storage account for dissection and analysis. Until recently, that feature was only available for Cloud Services, and to a much lesser degree, the Azure Web Sites. Now, you can also take advantage of Azure Diagnostics for the Infrastructure as a Service (IaaS) Azure Virtual Machines as well.
Even before this new development, you had a few options if you wanted to know what the IaaS Virtual Machines were doing in your Azure deployments. You could, of course, use a number of 3rd party solutions, such as New Relic or Azure Watch. Some of these tools used agents running on the machines in order to push data out to a monitoring system: others used PowerShell or remote connections to monitor the machines. These are good services and offer quite a few features and options beyond what Microsoft provides. They are definitely worth a look.
Another option , though still under preview, was Azure Monitoring Service API. This service exposes data on several of the Azure platform services including the IaaS Virtual Machines feature. For Virtual Machines, the majority of this data is being obtained at the hypervisor level so you can pull values such as CPU utilization, network usage, etc., but there isn’t much customization or expandability at the moment. Neil McKenzie has recent written about using the Monitoring Service API with Azure Virtual Machines which can fill in more of the details on this service.
Alternatively, if your organization utilizes System Center, you can monitor your Azure VMs as you handle your other virtual server assets. You can read all sorts of documentation and help about using System Center to monitor your Azure environments at the Microsoft website.
But that’s the story up till now. Now we have the ability to use Azure Diagnostics directly with IaaS Azure Virtual Machines.
Leveraging VM Extensions for Diagnostics
The concept of Extensions for Virtual Machines was recently introduced. The idea of a VM Extension is that an agent can be installed on an Azure Virtual Machine that supports an extension plug-in model. When the agent starts up, it can be configured to execute one or more extensions so as to allow for a type of plug-and-play capability for the services and features of the platform.
It turns out that this is also very similar to the existing VM agent. This has been running on Cloud Service virtual machines in order to host extensions for Cloud Services such as the remote desktop forwarding capability and, surprise, surprise, Azure Diagnostics. For Cloud Services, these were called modules and are initially configured as part of the service definition, though they can now also be configured on the fly using REST requests or PowerShell cmdlets.
For a Virtual Machine, you can elect to have the agent installed when you provision it, or you can install the agent at later time. The Virtual Machine agent has several more options for extensions beyond just diagnostics, such as BGInfo, anti-malware tools, chef integration, puppet integration and even custom scripts. These are extensions from both Microsoft and others, so you can expect the options for extensions to grow over time. You can find out more about the Virtual Machine extensions on the Azure Blog and MSDN documentation, but for the purposes of this article we just need to know that now Azure Diagnostics is a VM extension that we can enable.
In order to enable Diagnostics for Virtual Machines, the VM agent must be installed on the target Virtual Machine. If you create the VM via the Azure Management Portal , the agent will be automatically included if you used the Quick Create, and it is an option if you choose to create the VM through the gallery wizard.
If you create the VM using PowerShell , then the agent will be installed by default unless you specifically indicate you wish not to install it by including the DisableGuestAgent parameter to the Add-AzureProvisioningConfig cmdlet when using New-AzureVM or New-AzureQuickVM. If for some reason the agent wasn’t installed when the machine was originally provisioned, you can still install it using a package provided by Microsoft. Sandrino Di Mattia even shows you how to automate the VM Agent MSI installation using Remote PowerShell on his blog.
Getting Diagnostics Switched On
Once the VM agent is installed on the target Virtual Machine, it is just a matter of enabling and configuring the diagnostics extension. The extensions for a VM can be modified and configured, not only by a REST API, but also with some Microsoft-provided PowerShell Cmdlets that actually call the REST API under the hood for you. The PowerShell Cmdlets will be used for the examples in this article.
To use the PowerShell Cmdlets you’ll need to install the Azure PowerShell Cmdlets (version 0.8.7.1 is the current at the time of writing). You may want to read How to Install and Configure Azure PowerShell if you aren’t already familiar with it.
To see which extensions have been enabled for a VM, you can use the Get-AzureVM cmdlet. In the example below, the result is piped to a format cmdlet to get a nicer view (you’ll already need to have imported your publish settings file or signed in using the Add-AzureAccount cmdlet.).
(Get-AzureVM -Name iaasdiag -ServiceName iaasdiag).ResourceExtensionStatusList | Format-table -Property HandlerName, Version, Status
This will output something like this:
HandlerName Version Status
----------- ------- -------
Microsoft.Compute.BGInfo 1.1 Ready
In the cmdlet call above, the name of the Cloud Service hosting the VM and the name of the VM happen to be the same (iaasdiag), but that isn’t always the case. Substitute your own virtual machine name and Cloud Service name. If you aren’t sure what these are you can run the Get-AzureVM cmdlet with no parameters to see the list of deployed virtual machines in your current subscription.
As we can see, the BGInfo extension is already installed, but it gets added by default. Before we turn on the Azure Diagnostics for this VM, let’s think about the configuration of what we want to collect.
As already mentioned above, the Azure Diagnostics feature for Cloud Services has been around for quite some time. There is, therefore, a lot of documentation on what can be collected. There are also several ways you can configure the diagnostics for Cloud Services, but the one that will be the most applicable to Virtual Machines is providing the configuration through a diagnostics.wadcfg file. This is an XML configuration file that details out all of the types of information you want to collect, how often to push it to the storage account, etc. The MSDN documentation describes the contents of the configuration file in detail, though as it turns out that the IaaSDiagnostics extension uses a new version of Azure Diagnostics, so the schema is a bit different. In fact, in the Cloud Services extensions (use Get-AzureServiceAvailableExtension to see the list) you’ll see that there are two for diagnostics: one called simply ‘Diagnostics’ and the other called ‘PaaSDiagnostics’. The one named PaasDiagnostics also uses the newer version of the Azure Diagnostics schema. The differences in the current Azure Diagnostics and the newer version is beyond the scope of this article but can be found in Azure documentation. For the purposes of this article just be aware that the extension being described here uses the newer one.
We’ll use a very simple example of capturing the average CPU performance counters and the Windows Application Event Log, so our configuration looks like this:
<DataSource id="Application!*"" />
Processor Time" sampleRate="PT1M" />
Briefly, this configuration is sampling the CPU usage every minute. It is set to transfer the Application Event Log and the CPU performance counter data to a storage account every 5 minutes. For the purposes of the script below the above configuration is saved into a file named iaas-diagnotics.wadcfg.
You can read more about setting up Azure Diagnostics for Cloud Services from an article series by Michael Collier over on Justazure.com: Microsoft Azure Diagnostics.
Just as with the Cloud Service diagnostics, the data we collect can be pushed to an Azure Storage Account so we will also need to make sure we have a Storage Account set up and ready for us to use. It is simple to set up a storage account and the process is well documented,so we’ll skip detailing that out here. Be aware, though, that we do need the Azure Storage Account Key as well as the Azure Storage Account Name to complete the setup, both of which can be obtained from the Azure Management Portal once the storage account is created.
Now that we have our configuration and know where we want the diagnostics data to be stored we need to enable the diagnostics extension on the Virtual Machine and provide this configuration. We do this by using a combination of the Set-AzureVMExtnesion and Update-AzureVM cmdlets.
$serviceName = "IaasDiag"
$vmname = "IaasDiag"
$diagnosticsFilePath = "C:\Temp\iaas-diagnostics.wadcfg"
$storageName = "mikewovm"$storageKey = (Get-AzureStorageKey $storageName).Primary
$storageAccount = New-AzureStorageContext -StorageAccountName $storageName -
(Get-AzureVM -ServiceName $serviceName -Name $vmname) | Set-
$diagnosticsFilePath -StorageContext $storageAccount -Version 1.1 | Update-
This PowerShell code assumes that the configuration you want to use resides as a text file called iaas-diagnostics.wadcfg in the c:\temp\ directory. The first few lines of script set up the name of the Cloud Service the VM is hosted in, the name of the VM, the path to the configuration file and the name of the storage account to save the diagnostics data to. In the next lines, the script gets a reference to the storage account where the diagnostics data will be saved using the New-AzureStorageContext cmdlet.
The last line of the script retrieves the VM that you wish to work with using the Get-AzureVM cmdlet in order to apply the change. In this case our VM has a name of iaasdiag and also happens to be hosted at iaasdiag.cloudapp.net (these correspond to the –ServiceName and –Name parameters and do not need match). The Set-AzureVMDiagnosticsExtension cmdlet is then used to add the IaaSDiagnostics extension to the configuration of the VM.
Note that this script is just updating the local configuration model that was retrieved using the Get-AzureVM cmdlet, and it won’t apply any changes until you call Update-AzureVM which is at the end of the script. This allows you to modify multiple aspects of a VM configuration but only push the configuration when all your changes have been made. Finally, the Update-AzureVM cmdlet is called as part of the pipeline and the platform will handle the installation and configuration of the extension.
You can monitor the status of the changes by using the same call as shown above to get the status of the resource extensions, but the script above doesn’t return until it has some sort of status on the changes requested. In this case after executing the above cmdlets the status would show:
HandlerName Version Status
----------- ------- ------
Microsoft.Azure.Diagnostics.Iaa... 126.96.36.199 Ready
Microsoft.Compute.BGInfo 1.1 Ready
What Happens at the VM Level?
So, what exactly happened when the extension was installed? How did the VM change to start gathering the data we wanted? First, when the update was sent after the Update-AzureVM cmdlet was called, the VM Agent is told to install that extension if it didn’t already exist. The VM Agent looks up where the install for the agent resides and pulls it locally for an installation. The configuration is also copied down to the machine at that time. Once the extension is installed it is started up and the configuration is read.
If you remote in the Virtual Machine you can actually see the C:\Packages\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics\188.8.131.52 folder where extension was installed to. The configuration you provided in the parameters gets buried into an encoded configuration file in the C:\Packages\Plugins\Microsoft.Azure.Diagnostics.IaaSDiagnostics\184.108.40.206\RuntimeSettings directory. This bit of trivia may change over time so don’t count on file paths and any convention you see always being that way forever. I’m pointing it out here only because you may be wondering what is hidden inside the black box. You might find this information useful for troubleshooting extensions in the future, but as for the IaasDiagnostics extension it does a pretty good job of writing to the Windows Application log as it starts up so you can mostly use that for troubleshooting.
Getting at the Data
Once the extension is up and running, it is pushing data to the same places in the Azure Storage accounts as the PaasDiagnostics extension. This means that any tooling that can look into Azure storage tables, such as Azure Management Studio from Cerebrata or Visual Studio’s Azure tooling in Server Explorer, will also be able to view the data from the Iaas VMs.
Some Things to Think About
You may think, from reading this, that it will soon be as easy to gather diagnostics from your VMs in a consistent way as it is from Cloud Services. However, there are a few things you may want to think about as you set these up.
You don’t need to create a new storage account for each VM because you can have multiple VMs, including machines from Cloud Services, all pushing diagnostics data to the same storage account. Storage accounts do have throughput limits, so make sure you aren’t overtaxing any single account by assigning too many machines to it. This is especially true if you are gathering a LOT of data regularly.
If you change the storage keys for security updates, you’ll need to modify all of them for any VM that uses them. This task is not as nice as it is for CloudServices where you can update many VMs at once just by updating the serviceConfiguration file in a rolling update. You’ll have to update this for each Virtual Machine individually. This sounds like a job for a good PowerShell script to me. Be prepared to have to do this.
I wish that one could point the extension to a file in BLOB storage that contains the configuration so that many VMs can share the same configuration. This would reduce the need for updating many machines if you need to make changes, but then also allow you to do specific one-offs as necessary.
Wrapping it Up
Microsoft has been saying for a while now that the distinction between IaaS and PaaS is getting very blurred. This move towards a unified way of gathering and storing the diagnostics information between the Virtual Machines features and Cloud Services certainly continues this trend.