Exchange 2010 High Availability

In April 2009 Microsoft released a public beta of Exchange 2010, the latest and greatest version of a part of its unified communications family of products. Recently in August 2009, a feature complete Release Candidate version was released for public download. In this article Neil Hobson takes a look at some of the high availability features of Exchange 2010.

For many years Exchange only offered a high availability solution based on the shared storage model, whereby use of Microsoft clustering technologies protected against server-based failure but did nothing to protect against storage failures.  Although there were improvements to this form of high availability in Exchange 2007, where it was known as a Single Copy Cluster (SCC), the real changes to high availability in Exchange 2007 came with the introduction of a technology known as continuous replication.  With this technology, transaction logs are shipped from one copy of a database to another which allows an organization to deploy an Exchange high availability solution that also dealt with storage failure.  This storage failure protection was available on a single server with the use of Local Continuous Replication (LCR) and was also available across servers with the use of Cluster Continuous Replication (CCR).  Therefore, with LCR, CCR and SCC, Exchange 2007 administrators had three different high availability methods open to them.  It was also possible to cater for site failure with an extension to the continuous replication technology that was known as Standby Continuous Replication.  I won’t go into detail on these Exchange 2007 solutions here as I’ve covered them in a previous article called Exchange 2007 High Availability here on Simple-Talk.  However, the bottom line is that many organizations have deployed technologies such as CCR in order to provide high availability and technologies such as SCR to provide site resilience.

From my experiences, more organizations have deployed CCR in preference to SCC and it comes as no surprise to learn that SCC has been dropped entirely from Exchange 2010.  As you will shortly see, the continuous replication technology lives on in Exchange 2010 but there are many changes in the overall high availability model.

With Exchange 2007 Service Pack 1, a full high availability solution was generally deployed using a total of four servers. Two servers were installed as a single CCR environment, giving high availability for the users’ mailboxes.  The other two servers were deployed as combined Hub Transport and Client Access Servers, and were configured as a load-balanced pair. The reason for this was simply that if the mailbox server role was clustered, it was not possible to implement the additional Exchange 2007 server roles, such as the Hub Transport and Client Access Server role, on the server running the mailbox role. For the larger enterprises, this wasn’t an unreasonable approach but for the smaller organizations a total of four servers sometimes seemed to be overkill for an internal messaging system.  To address this specific issue, Microsoft has designed Exchange 2010 such that all server roles can be fully redundant with as few as two servers, providing you have deployed an external load balancer for incoming Client Access Server connections. In other words, it’s now possible to combine the mailbox server role with other roles such as the Hub Transport and Client Access Server role.  Of course, larger organizations will still be likely to implement dedicated servers running the various server roles but this is something that will definitely help the smaller organizations to reduce costs.  Remember, though, the external load balancer requirement for incoming Client Access Server connections.

With this in mind, let’s get going and look at some of the high availability features of Exchange 2010.  Don’t forget that this is a high-level look at the new features; in later articles here on Simple-Talk, we’ll be diving much more deeply into these features and how they work.  Right now, the idea with this article is to get you to understand the concepts behind these new features and to allow you to do some initial planning on how you might use them in your organization.

Database Availability Groups

Perhaps one of the most important new terms to understand in Exchange 2010 is the Database Availability Group (DAG). The DAG is essentially a collection of as few as one (although two is the minimum to provide a high availability solution) and up to 16 mailbox servers that allow you to achieve high availability in Exchange 2010. DAGs use the continuous replication technology that was first introduced in Exchange 2007 and are effectively a combination of Cluster Continuous Replication (CCR) and Standby Continuous Replication (SCR).  DAGs make use of some of the components of Windows Failover Clustering to achieve high availability but to reduce overall complexity, these cluster elements are installed automatically when a mailbox server is added to a DAG and managed completely by Exchange.  For planning reasons, it’s important to understand that the DAG forms the boundary for replication in Exchange 2010.  This is a key difference over SCR in Exchange 2007, where it was possible to replicate outside of a CCR environment to a standalone server in a remote data center.  However, you should also be aware that DAGs can be split across Active Directory sites if required, meaning that DAGs can therefore offer high availability within a single data center as well as between different data centers.

An important component to a DAG is the file share witness, a term that you will be familiar with if you have implemented a CCR environment in Exchange 2007.  Like its name suggests, the file share witness is a file share on a server outside of the DAG. This third server acts as the witness to ensure that quorum is maintained within the cluster. There are some changes to the file share witness operation as we shall discuss later in this section. When creating a DAG, the file share witness share and directory can be specified at the time; if they are not, default witness directory and share names are used. One great improvement over Exchange 2007 is that you do not necessarily need to create the directory and the share in advance as the system will automatically do this for you if necessary.  As with Exchange 2007, the recommendation from Microsoft is to use a Hub Transport server to host the file share witness so that this component will be under the control of the Exchange administrators. However, you are free to host the file share witness on an alternative server as long as that server is in the same Active Directory forest as the DAG, is not on any server actually in the DAG, and also as long as that server is running either the Windows 2003 or Windows 2008 operating system.

A DAG can be created via the New-DatabaseAvailabilityGroup cmdlet or via the New Database Availability Group wizard in the Exchange Management Console.  The DAG must be created before any mailbox servers are added to it, meaning that effectively an empty container is created which is represented as an object in Active Directory.  For example, Figure 1 shows a newly created DAG, called DAG1, in Active Directory as viewed using ADSIEdit. 

808-Exchange%202010%20High%20Availabilit

Figure 1: ADSIEdit DAG View

You can see that a DAG has an object class of msExchMDBAvailabilityGroup and that the actual Database Availability Group container location is found under the Exchange 2010 administrative group container.  Bringing up the properties of the DAG object in ADSIEdit reveals the important configuration items such as the file share witness share and directory names as you can see in Figure 2.

808-Exchange%202010%20High%20Availabilit

Figure 2: DAG Properties

Once a DAG has been created, mailbox servers can be added to it as required.  This is another simple process that can be achieved by right-clicking the DAG object in the Exchange Management Console and choosing the Manage Database Availability Group Membership option from the context menu. The corresponding Exchange Management Shell cmdlet is the Add-DatabaseAvailabilityGroupServer cmdlet.  For example, to add the mailbox server called E14B1S1 to a DAG called DAG1, you’d run the following cmdlet:

Since DAGs make use of several Windows Failover Clustering components, it comes as no surprise to see that the Enterprise Edition of Windows Server 2008 is required on mailbox servers that are added to a DAG, so do ensure that you take this into account when planning your Exchange 2010 implementation.

When creating a DAG, there are options around network encryption and compression that can be set.  This is possible because Exchange 2010 uses TCP sockets for log shipping whereas Exchange 2007 used the Server Message Block (SMB) protocol.  For example, it’s possible to specify that the connections that occur using these TCP sockets are encrypted.  Equally, it’s also possible to decide that these same connections also use network compression.

Mailbox Servers and Databases

Inside each DAG there will normally exist one or more mailbox servers, although it is possible to create an empty DAG as discussed earlier within this article. On each mailbox server in the DAG, there will typically exist multiple mailbox databases.  However, one of the key differences between Exchange 2010 mailbox servers and their Exchange 2007 counterparts is that Exchange 2010 mailbox servers can host active and passive copies of different mailbox databases; remember that in Exchange 2007, an entire server in a CCR environment, for example, was considered to be either active or passive.  However, in Exchange 2010, the unit of failover is now the database and not the server, which is a fantastic improvement in terms of failover granularity.  Consider the diagram below in Figure 3.

808-Exchange%202010%20High%20Availabilit

Figure 3: Database Copies

In Figure 3, you can see that a DAG named DAG1 consists of two mailbox servers called MBX1 and MBX2. There are a total of three active mailbox databases, shown in green, across both servers and each active mailbox database has a passive copy, shown in orange, stored on the alternate server. For example, the active copy of DB1 is hosted on the server called MBX1 whilst the passive copy of DB1 is hosted on the server called MBX2. The passive copies of mailbox databases are kept up-to-date via log shipping methods in the same way that was used in Exchange 2007, such as between the two cluster nodes within a single Exchange 2007 CCR environment. As you might expect, the active copy of the mailbox database is the one which is used by Exchange.  Within a DAG, multiple passive copies of a mailbox database can exist but there can only be a single active copy.  Furthermore, any single mailbox database server in a DAG can only host 1 copy of any particular mailbox database. Therefore, the maximum possible number of passive copies of a mailbox database is going to be one less than the number of mailbox servers in a DAG, since there will always be one active copy of the mailbox database. For example, if a DAG consisted of the maximum of 16 mailbox servers, then there could be a maximum of 15 passive copies of any single mailbox database. However, every server in a DAG does not have to host a copy of every mailbox database that exists in the DAG. You can mix-and-match between servers however you wish.

As mentioned earlier in this section, the unit of failover in Exchange 2010 is now the database.  However, if an entire mailbox server fails, all active databases on that server will need to failover to alternative servers within the DAG.

One other vital piece of mailbox database information that you should consider in your planning for Exchange 2010 is the fact that database names are now unique across the forest in which Exchange 2010 is installed.  This could be a problem in organizations that have deployed Exchange 2007 with the default database name of mailbox database.  Therefore, if you are going to be transitioning from Exchange 2007 to Exchange 2010 in the future, take time now to investigate your current database naming standards.

The Active Manager

At this point, you might be wondering how Exchange 2010 determines which of the mailbox databases is considered to be the active copy.  To manage this, each mailbox server in a DAG runs a component called the Active Manager.  Specifically, one mailbox server in the DAG will be the Primary Active Manager (PAM) whilst the remaining mailbox servers in the DAG will run a Secondary Active Manager (SAM).  We will discuss the relationship between clients, Client Access Servers and the active copy of the mailbox database in the next section, as there are some significant changes in this area too.  To view the Active Manager information, you can use the Get-DatabaseAvailabilityGroup cmdlet and pipe the results into the format-list cmdlet.  In other words, you will need to run the following cmdlet:

Some of the information returned with the Get-DatabaseAvailabilityGroup cmdlet references real-time status information about the DAG and one of the parameters returned is the ControllingActiveManager parameter.  This parameter will show you which server is currently the PAM.  It’s the job of the PAM to decide which of the passive copies of the mailbox database should become the active copy in the event of an issue with the current active copy.  In an environment consisting of many passive copies of mailbox databases, there will naturally be many choices of suitable mailbox databases available to the PAM.  As might be expected, the PAM is able to determine the best copy of the mailbox database available for use and it does this via many different checks in order to minimize data loss.  Each SAM also has an important part to play, as they inform other services within the Exchange 2010 infrastructure, such as Hub Transport servers, which mailbox databases are currently active.

Client Access Server Changes

In Exchange 2007, Outlook clients connect directly to the mailbox servers whilst other forms of client access, such as OWA, Outlook Anywhere, POP3, IMAP4 and so on, connect via a Client Access Server.  The Client Access Server is then responsible for making the connection to the mailbox server role as required.  In Exchange 2010, one other fundamental change over previous versions of Exchange is that Outlook clients no longer connect directly to the mailbox servers.

On each Client Access Server, there exists a new service known as the RPC Client Access Service that effectively replaces the RPC endpoint found on mailbox servers and also the DSProxy component found in legacy versions of Exchange.  The DSProxy component essentially provides the Outlook clients within the organization with an address book service either via a proxy (pre-Outlook 2000) or referral (Outlook 2000 and later) mechanism.  A likely high availability design scenario will therefore see a load-balanced array of Client Access Servers deployed, using technologies such as Windows Network Load Balancing or 3rd party load balancers, which will connect to two or more mailbox servers in a DAG as shown below in Figure 4.

808-Exchange%202010%20High%20Availabilit

Figure 4: CAS Array

When an Outlook client connects to an Exchange 2010 Client Access Server, the Client Access Server determines where the active copy of the user’s mailbox database is located and makes the connection with the relevant server.  If that particular mailbox database becomes unavailable, such as when the administrator wishes to take the database offline or it fails, one of the passive copies will become the new active copy as previously described within this article.  It’s the Client Access Server that loses the connection to the old active copy of the mailbox database; the actual connection from the client to the Client Access Server is persistent which is obviously good from a user experience point of view.  Then, the Client Access Server will fail over to the new active mailbox database in the DAG as directed by the PAM.

Summary

In this article we’ve taken a high-level look at some of the new Exchange 2010 high availability features and how they come together to provide an overall high availability solution.  If you’re planning on looking at Exchange 2010, it makes sense to start understanding these new features and how they can benefit your organization.  Also, there are other interesting features available in Exchange 2010 that further serve to increase the overall high availability and reliability of the messaging environment, such as shadow redundancy in the Hub Transport server role.  In future articles here on Simple-Talk, we’ll be covering these areas in much more detail.