Every journey starts with a single step, namely the first. (Or in IT: step 0, just to get all geeky). Since this is the first article, I’m not going to get into hands-on just yet. To fully understand the Azure Virtual Machines offering, we’ll need to take a look at how everything is connected and co-existing. I’ll do this by first giving a little lesson in history and hopefully only confusing you a little.
In IT we went from large rooms filled with one single computer (ENIAC style), to personal PC’s to workgroup servers to rack filled datacenters with HW servers, winding up with everything reconsolidated into virtualized machines. And yet not much has changed except for one thing: the way we provision our necessary resources today. It’s no longer a painfully long running process (oh yes sir, one server? Not vanilla flavored? ETA delivery 3 to 6months …), and platforms like Microsoft Azure are fast and agile in giving what you need, and this has changed our IT Service landscape dramatically.
A Common Mistake
To continue my story I also need to get something out of my system. Since its early existence, Microsoft Azure had a couple of technologies on board, but no Virtual Machines. Or so it seemed. To be honest: everything in Microsoft Azure is based upon Virtual Machines. Whether it is PaaS or IaaS, it’s all based upon VM’s on which you run your software. The only differentiator is that for Cloud Services Roles (PaaS, through Web and Worker Roles) the VM’s are not persistent and are recreated upon every update and deployment (more on that later). Even going in Deep Dive Mode, we can tell that today’s Azure services like Websites also run on Virtual Machines (they are also PaaS roles to be honest.) I’d even go so far as to suggest that most of Microsoft’s SaaS products are actually running on Hyper-V machines in order to cope with load and growth.
Microsoft’s first original attempt at something like a real exposed Virtual Machine came in 2010 when Redmond announced the availability of the now-deprecated VM Role, a bring-your-own non-persisted VM.
Confused? Well read on and it will become clearer. If we take a look at today’s layout of the way we use resources, we can break them down into several offerings, which we’re familiar with – IaaS, PaaS, and SaaS.
In traditional Data Centres (DC’s), we used to have just the left hand side of this diagram, requiring us to manage every little nut and bolt. In the early year of Data Center optimization we started virtualizing everything, thus having more servers and less hardware (do note that Mainframe folks have known this technology since pre-2000, but that’s a different story). We then started doing things like web farm automation, using platform busses, and so on and so on…
Now why am I telling you all this? Because no matter what technology, hosting or platform you’re using as a Cloud solution, you’ll always run up to some limits at some point in time. Whether it’s throttling, IO issues, Bandwidth, and you’ll always have to keep in mind that no matter how good you choose your base VM, you’re always depending on the underlying hardware (it really is the weakest link next to your ISP, but that’s yet another different story.) So keep that in mind when choosing your VM’s. Always check them in accordance to your workload.
That said, since TechEd US 2014, we also have dedicated Hardware for our Virtual Machines on Azure (the brand new A8 and A9). With the latter I mean that the new SKU’s/VM sizes have their own scale units and that they pack an additional set of features bound to the hypervisor hardware itself (Infiniband NIC’s, IO, memory, etc etc ). Now that you’re aware of this, it should help you think of how to architect your solutions in a cloudy and more (and I quote Scott Klein) IaaS-ier model.
Another consideration, when looking at the different cloud models, is the issue of segregation of duties. I’m often asked: “So when it comes to patching, when does Microsoft patch, and do we have control?” And this is often a question which is being asked the wrong way. Because people always mean the VM’s by that. As in your own datacenter, you are responsible for patching your Virtual Machines, starting from the OS level up to the entire Application layer (SQL, SharePoint, etc. …). So yes, Microsoft patches, but only on the level of the host and not on the level of the guests. You are responsible for your own deployment, and that should not to be taken lightly.
Luckily enough you can easily include Azure VMs in your own patching rotation / scheme / mechanism, so that you can easily ease this particular pain. There’s one exception to that rule: When it comes to Cloud Services, Microsoft manages the base OS images, and thus also the patching of the entire layer (because in a PaaS model you only want to maintain your application, not the entire infrastructure.)
The entire Microsoft Azure Ecosystem (I really do see it as more than just a platform, although it is called one) is complex in its foundations, yet simple in its end use, and that’s the beauty of it all. Most of the underlying mechanisms are transparent for those who use the services offered by our friendly Redmondians, but that doesn’t mean there can’t be complexity hidden in the details. Host patching is one of those details, as it has an impact on many things.
Without elaborating on this too much (I’m saving that for another article), you should know that patching may have an impact on HA (High Availability) and performance. Now, Microsoft never patches unannounced (unless it’s really necessary), so when an element of the platform is about to be patched they’ll notify you, the customer, with a nice mail explaining what, how, where and when this will happen. That way, you can take your proper pre-cautions or even plan some of your own upgrades in that particular timeframe.
As you can see, Microsoft tries to be as transparent as possible when it comes to patching and upgrading / maintenance, but one can never avoid a certain downtime when it comes to single instances.
That’s another difference between VMs and Cloud Services(PaaS) for instance – Cloud Services warn you that for any deployment with less than 2 instances, the SLA will not be maintained up to the 99.95% Service Level Agreement (SLA). In the case of virtual machines, this will not be the case unless you configure an availability set after the creation of the VM! (This has to do with the previously mentioned updates and will become clearer later in this series.)
When considering SLA’s for Virtual machines, they offer you a 99.95% guarantee only when they are balanced, at least 2 instances are existing, and they are internet-facing. In the case of any other Virtual machine, it used to be 99.95% for any single instance but, since General Availability (GA), they took it down to 99.9%. But then again this is exactly the same as on-premises if you don’t scale and deliver high availability by duplicating the environment.
A little on the nasty side of things – The SLA Salad
When talking SLAs, do note that when looking at Virtual Machines you don’t only have to consider the VM SLA, but a couple more since VM’s are impacted by several SLA’s. Storage, compute, and bandwidth all come to mind and need to pass the review process when it comes to safeguarding your application or services.
The key in the SLA statement on the site is the bit about “two or more instances in the same availability set.” Like I said above, this is actually the same as in a normal on-premises situation. Admittedly, we all know that the cloud is resilient and highly available, but a virtual machine stays a virtual machine 🙂
My point is this: in an IaaS offering, you are responsible for everything except the environment on which your VM runs. If you want an HA environment on-premises, you need to deploy at least two VM’s to guarantee resilience and availability, either in a clustered or non-clustered setup (depending on what you are doing). The same applies to Microsoft Azure, you just need to see it as a large Hyper-V infrastructure (without the hypervisor management hassle on your side, and just a tad more complexity.)
Now where does that add up to your rights on the SLA? Well Microsoft promises you an internet facing availability on connectivity of 99.95% (which comes down to 21.56 minutes of down-time per month, maximum. or 4.38h a year). This means that Microsoft will guarantee you an availability of the VM service but not of your machine. For that you (or your team) are solely responsible, although Microsoft do provide you with mechanisms for safeguarding those numbers (more on those in future articles).
Batteries not included (Everything else is)
Well then again maybe not everything, but it comes darn close! Let’s go over what we have available in the stack – when atomizing Virtual machines, we can distinguish 3 core components of the Microsoft Azure ecosystem:
The basis of all-things-VM lies under the storage component banner. Since VMs are Hyper-V guests running on a fabric-controlled host cluster, you need some storage to put your VHDs on. For this you need a storage account, which can host Tables, Queues and BLOBs. It’s the latter format on which the VHDs will be stored, within a dedicated container (comparable to a folder structure under the Windows Platform) helpfully called “VHD.”
As for many things in Microsoft Azure, Storage provides an integral part of the entire ecosystem. We’ll get into the depths of storage when we talk about the full architecture of the Virtual Machine platform in a future article. In the meantime, it’s enough to know that a couple of things will be stored in the Storage layer – Disks, Images and Virtual Machines – and we’ll start off by defining them:
Actually there are two different versions of these storage types: OS-versioned and data-disk-type. Although a VM basically comes with a couple of disks upon its creation, they’re not always as persistent as you would like them to be – so what is a VHD file, ultimately? Well, these files are stored as page BLOBs (one of the two available BLOB types in your storage account – the second is a block BLOB), which are collections of 512 pages optimized for random reading and writing.
Images are prepared disks which can be used to create a new VM from at your convenience. They differ from the VHD in that these contain a Sysprepped or launch-able Virtual Machine. This contains all the software you’ll need to run all the applications you’re hosting on the VM, and anything else besides that (custom tooling, agents for monitoring, server roles and features, etc.) You can consider this as a template which you’ll use in either Server scaling, provisioning automation, or just as a snapshotting mechanism (new and improved since TechEd and Build 2014.) If you’re a developer, consider this as your base class. In the end this is a specialized VHD already capable of running by means of a VM Configuration file (see the next section on Virtual Machines.)
The Virtual Machine
On the storage layer this is basically a configuration file containing all the specifics for a machine definition: included the VM size (formerly described as T-Shirt sizes, and recently upgraded to AudiÂ© models A0-A9.) This can be seen as a VM definition file as we know it in Hyper-V.
And additionally: Azure Files, a new SMB supported service
Although this is a pure storage solution, this feature enables files share scenarios. It enables High Availability (HA) and Disaster Recover (DR) scenarios, which is why it’s also significant and vital for the VM service.
The compute part for the Virtual Machines is the running VM on top of the Fabric layer, which manages the hypervisor for Microsoft Azure. This mechanism is totally based on Hyper-V and some of the mechanisms that can be found in Windows Azure Pack and System Center Private Cloud principles (more on that in another article hopefully).
At this level we deal with running or initiated (when in de-allocated mode) versions of the image, so it’s either a running virtual device based upon any of the images you’ve used, or even a migrated-to-the-cloud VM. The machine will act as a virtualized device with all the know capabilities and limitations from any virtualization perspective, although there’s a couple of more specific factor for Microsoft Azure VMs which we’ll touch upon.
Remember when I told you that everything in Azure is a VM? Well here in the compute layer you can also see that there is a large affinity between Cloud Services and VMs, such that for each VM there’s always a Cloud service created. The reason for this will be elaborated on in one of the next articles.
Some features in this aspect of Azure will facilitate the use of Virtual machines, and help you overcome some of the difficulties of managing virtual machines / IaaS in the cloud, so let’s take a look:
A cloud wouldn’t be a cloud if it didn’t feature one of the main concepts of cloud computing: elasticity and (unlimited) scaling. Virtual Machines do support this feature, although its implementation is somewhat different in a Cloud Services versus Websites scenario.
For keeping your files safe and sound you need a backup solution, and luckily Azure also has a service for this. It’s not a “pure” VM service, but it’s one you cannot live without and it can be nicely integrated into your backup scenario and Disaster Recovery Process (DRP). Since this integrates with tools like System Center DPM and the normal Windows Server Backup tools, this can all be managed centrally (if you want to know more about this then I’ve actually written a blogpost diving into Azure backup details.)
Hyper-V recovery manager
Consider it a live migration tool for the cloud. This service will soon be renamed to Site Recovery when going in to vNext (it was announced and demonstrated at TechEd NA 2014.) Before the name-change it was just a “basic” orchestration service, but its vNext incarnation will be more of a handling and pick up system, and you’ll be able to restore your VM’s in Azure as well as just going on-prem to on-prem.
For those familiar with System Center, these are a kind of runbooks similar to what you already know. We can easily create PowerShell Workflows and execute them in a scheduled way from within this service.
Last but certainly not least, we come to the networking component. This will clearly be the basis of your Azure connectivity, movement and hybrid environments when it comes down to communication. For me this a very complex matter, but one that can be really granularly managed if you’re aware of its working and capabilities. In networking we can distinguish the following services and elements:
A way to “group” resources by proximity in an Azure Data Centre for optimal internal performance. Although this was the only way to achieve this in the past, it has become rather obsolete due to the new SKU’s available in the Azure ecosystem. The coming of the new A8 and A9 units (the Azure scale units on which they calculated the available space within a datacenter) with make it harder to use these affinity groups in the future. Instead, a new concept has been introduced: Region Wide networks, which can now span over a complete region / geographic location.
Although more related to the DR and compute areas, this is most commonly mistaken with above affinity group. Hence I place it here not to confuse you, but to make sure you understand the distinction. Availability sets make sure your VMs get upgraded and fault-handled in a correct way, which is making sure your applications and machines updates still get handled in a separate fault process as necessary, just to guarantee the highest possible uptime availability.
This service has recently been expanded, and we now have internal and public load balancers available. The public ones allow you to balance traffic over end-points by dint of being internet-facing, while the internal ones can be used for scenarios as SharePoint Farms and SQL Server High availability requirements.
Your direct link to Azure! This solutions allows you to have a direct line to Azure data centres by means of an MPLS (Multiple Protocol Label Switching) VPN or through an Expressroute location (Exchange Provider facilities.) This enables scenarios of hybrid cloud, especially for the purposes of HA and DR. The biggest benefit of this is that you’re not going over a public internet connection, but rather a direct safe line to the cloud.
Made generally available at the beginning of the year, this has been one of the more stable services within the Azure platform since it’s preview release almost 3 years ago. This load balancing solution can operate in either round robin, failover or performance-optimization “modes” (more in the matter of latency and proximity), and has an additional function in that it keeps your public VIP (virtual IP) even when your cloud service has gone down.
Windows Azure Active Directory
Although not really related with the VNet (Virtual Network), I’m still going to mention Active Directory (AD) in this section. As with the availability sets, this can be used as an additional asset to help you on the front-end of your applications and services. What I (rather unsurprisingly) hear a lot is that people think that they can use this service as a replacement for their on-prem AD. NEWSFLASH: this isn’t the case! This solution helps you with Single Sign-On (SSO) for your publicly available apps, or other scenarios which require federated access, but at the moment this is not a replacement for AD in a network setting – It’s an extension to that.
Conclusion, and a wrap-up
Did I give you a headache? I hope not, but if I gave you a bit more insight into the Azure platform, then I’ve achieved my goal. People think VMs are easy and that there’s not much to it but, considering all the options and the possibilities even at this shallow depth, hopefully you get the idea that there’s plenty to learn. The goal here was to set out a primer. A basic vocabulary. A base layer of paint, if you will. And this will make your understanding and usage of virtual machines within Azure that bit stronger and clearer. This was all about easing into the basic theory, so the next parts will be more hands-on, with deeper information on what we’ve already covered, and even more. Hold on tight!