How SQL Monitor Monitors Azure-based SQL Server Failover Cluster Instances
Jon Hayman shows you how to set up SQL Monitor to monitor Azure-based clustered SQL Server instances right alongside your on-premise clusters.
If you’re working with SQL Server, one of the most appealing features of cloud-based computing services, such as Azure and AWS, is that they make it so much easier to design systems that are both highly-available and offer disaster recovery. No longer do you have to worry about the myriad issues arising from setting up geographically-diverse datacenters, and there is much more flexibility in scaling up or down the computational resources available to your instances.
However, you’ll need to closely monitor your SQL Server instances running on cloud-based clusters. You’ll need to know when you’re about to hit a resource limit, so that you can scale-up; you’ll want to know when servers fails over, so that you can ensure all instances are running; and you’ll need to be sure that the architecture of your system allows adequate performance.
This article will show you how to set up SQL Monitor (version 7.1.6 or later) to monitor Azure-based clustered instances and discusses some differences in the way that clusters are constructed in Azure that mean the SQL Monitor set-up is somewhat different than for SQL Server instances on traditional Windows clusters.
Setting up the SQL Server FCI on Azure
We’re going to assume that you’re reasonably familiar with Windows Server Failover Clusters (WSFCs), SQL Server Failover Cluster Instances (FCIs), as well as the basics of Azure Virtual Machines and Azure networking. We assume that you have configured WSFC on your Azure VMs, and that you have created a SQL Server FCI running on these Azure VMs.
The SQL Monitor configuration that we describe a little later specifically applies to FCIs that were set up as described in the Microsoft documentation.
FCI set up
“Some differences are fine, of course: it doesn’t matter if, for example, you use a different witness mechanism or don’t use S2D. However, if your setup is significantly different to the one described there and you have problems getting going with SQL Monitor, please get in touch. There are lots of ways in which your environment might differ from that described there, and we need to understand those differences, and how we can make it easier to adapt the SQM setup to accommodate them.
–Jon Hayman, SQL Monitor Software Engineer
Traditional Windows Clusters versus Azure clusters
There are a few differences in the way that clusters are constructed in Azure (also in EC2, which we’ll cover in a separate post) that affect monitoring. I’ll review briefly some of the underlying technical differences. The most significant of these comes from the way that an Azure Load Balancer is used to route traffic associated with cluster IP addresses.
Windows Clusters
In a traditional Windows cluster environment, both the WSFC and the SQL Server FCI have their own DNS entries and their own IP addresses. If you want to access your SQL Server FCI, you would do so via to the FCI’s domain name, which resolves to the cluster’s IP address, and traffic to this address is intercepted by the active cluster node.
In the following simple example, we can see machine SQLNode1 with IP address 10.1.0.8 and an additional IP address, 10.1.0.102, and machine SQLNode 2 resolving to 10.1.0.9 with additional IP address 10.1.0.101.
The address 10.1.0.101 is the IP address for associated with the WSFC, and the address 10.1.0.102 is the IP address for the SQL Server FCI, so SQLNode1 is the active FCI node and SQLNode2 is the primary WSFC node: this example shows that the primary node of the underlying Windows Server Failover Cluster is not necessarily the same as the active node of the SQL Server Failover Cluster Instance.
Inside the WSFC, the SQL Server FCI is installed into a separate “group”, which contains a set of resources such as disks, services and IP addresses. The group and its resources are owned by a single node at any one time. On failover, the whole group moves to the failover node: SQL Server stops on one machine and starts on the new active machine. The FCI IP address is only associated with the node that’s currently running SQL Server. This allows applications to use the cluster IP address 10.1.0.102 (or its corresponding domain name) and not worry about the fact that they are being served by one of several servers.
Here’s a screenshot of the output of ipconfig showing that SQLNode1 is listening for traffic to 10.1.0.102 through Windows associating an additional IP address with the network adapter.
Only one node within the cluster intercepts traffic to the SQL Server FCI IP address at any point in time (and similarly for the WSFC IP address). Here’s the screenshot for the SQLNode2 – note that it won’t intercept traffic to 10.1.0.102 but it will intercept traffic to 10.1.0.101 since it’s the primary WSFC node.
In a traditional environment, this works because the operating system can determine what IP addresses the network card should intercept traffic from: it relies on the network infrastructure allowing both machines in the cluster seeing all traffic addressed to the cluster IP addresses.
Azure clusters and the load balancer
In Azure, everything works in much the same way, but the problem is that any network interface in Azure can be associated with only one IP address, so the Azure infrastructure isn’t aware of how to route traffic to the cluster IP addresses: traffic from any other virtual machine to them is simply lost. This means that traffic from any other virtual machine to 10.1.0.102 will be discarded since the Azure configuration of the virtual machine records only that its network card is associated with the address 10.1.0.8.
Here’s where the load balancer steps in: it redirects traffic to what it believes to be the active node.
If your installation follows the setup in the Microsoft documentation, you’ll have a load balancer with an internal IP address the same as that for the FCI, so any traffic to the FCI’s IP address is redirected by the load balancer to the IP address of the active machine (in this case, 10.1.0.8).
The load balancer, being a general-purpose tool, doesn’t do anything sophisticated or cluster-specific to determine which is the active node, like following the witness file or issuing a WMI query: it simply determines whether a port is open, as is configured by the PowerShell commands in the Microsoft documentation under the “Configure cluster for probe” section.
The Azure load balancer only redirects SQL Server traffic, so only port 1433 is redirected, and it only redirects the IP address FCI; the WSFC’s IP address isn’t redirected.
Implications of the Azure load balancer for setting up SQL Monitor
All of this has a couple of effects when setting-up SQL Monitor. Firstly, we’ve had to introduce a small new feature that’s going to let you use the FCI’s domain name instead of the WSFC’s domain name when adding a new cluster. Secondly, since SQL Monitor needs to collect performance information from the active node using WMI, we need to redirect the WMI port 135 through the load balancer.
Behind the scenes, we’ve also made some changes to the way that SQL Monitor maintains connections to the monitored machine to make it more robust through failover; these were needed since the load balancer will keep a connection to a node open even if, due to a failover, it will no longer route new traffic to that node, a situation that’s impossible in a normal Windows failover cluster.
Finally, our setup assumes that you have internal load balancer in Azure. We don’t think this is a limitation since you can’t specify the IP address that a public load balancer takes, so you can’t make it correspond to the FCI’s IP address, which means that you wouldn’t be able to speak through a public load balancer with the active node. For this reason, we anticipate that you’ll run SQL Monitor on an Azure virtual machine connected to the same virtual network as your FCI (if you’re running Monitor elsewhere, don’t forget that you can now connect the SQL Monitor web interface to multiple base monitors), or establish a VPN connection between your SQL Monitor machine and the FCI’s virtual network.
Using an existing load balancer
“If you’re using SQL Monitor to connect using a public load balancer that you already have working, we’d really like to hear how you’ve got it working!
–Jon Hayman, SQL Monitor Software Engineer
Setting up SQL Monitor for Azure VM-based SQL Server FCIs
We’re going to assume that you’re starting with a running FCI. You’ll need to have Monitor installed on a machine on the internal network of the cluster: this could be by installing Monitor inside a virtual machine in Azure itself or creating a VPN connection into Azure.
Configure the load balancer
Configure the load balancer you already have set up to forward port 135 on the frontend IP address (which is the same as the IP address for the SQL Server FCI) to port 135 in the backend pool of cluster nodes according to the health probe that’s already configured. Floating IP should be disabled.
Check that everything’s working by running the following commands in PowerShell from your SQL Monitor machine (assuming SQL Server is listening on port 1433):
Test-NetConnection <the domain name of your SQL Server FCI> -p 135
Test-NetConnection <the domain name of your SQL Server FCI> -p 1433
Add the SQL Server FCI to SQL Monitor
In SQL Monitor, go to the Configuration > Monitored servers page and enter the Add SQL Server dialog, then follow these steps:
- Enter the domain name of the SQL Server FCI as the machine name and enter appropriate credentials.
- Enter the Connection properties dialog window by clicking the Edit properties link
- Make sure you’ve selected TCP/IP communication for SQL Server and select port 1433, and importantly select Fix cluster name. This will prevent SQL Monitor from attempting to connect via the domain name associated to the WSFC, which resolves to an inaccessible IP address.
- Click ‘OK’ and ‘Add’, and within a minute or so SQL Monitor should begin monitoring your FCI in Azure.
Conclusion
The necessity of monitoring SQL server when running in the cloud doesn’t go away. SQL Monitor provides all the critical diagnostic data you need to monitor and troubleshoot your cloud-based SQL Server instances. Better still, if only part of your estate is cloud-based, SQL Monitor presents brings all this vital information into one place, alongside its ability to alert and report.
Hopefully, this short guide will get you past any stumbling-blocks in using SQL Monitor with Azure FCIs, and give you with some useful background information on the very interesting world of Azure networking. The changes we’ve made behind the scenes to SQL Monitor now allow it to monitor a comprehensive range of other cloud-based high-availability systems, such as Availability Groups in Azure and both FCIs and Availability Groups in EC2: stay tuned for detailed steps covering these.
Useful References
- Stairway to AlwaysOn Level 1: What is SQL Server AlwaysOn?
http://www.sqlservercentral.com/articles/Failover+Clustered+Instance+(FCI)/107536/ - Always On Failover Cluster Instances (SQL Server)
https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/always-on-failover-cluster-instances-sql-server - Azure Load Balancers and SQL Server
https://www.red-gate.com/simple-talk/cloud/cloud-data/azure-load-balancers-sql-server/
Tools in this post
Redgate Monitor
Real-time SQL Server and PostgreSQL performance monitoring, with alerts and diagnostics