Product articles
SQL Provision
Masking Data for Development and Testing
How to Provision a Set of Databases to…

How to Provision a Set of Databases to Multiple Azure-based Servers

Grant Fritchey shows how to provision a group of interdependent databases, masked to protect sensitive or personal data, to each machine in an Azure-based test cell.

Many applications need multiple databases to support them. There will be cross-database dependencies. Sometimes, these databases are co-located on the same instance, sometimes there are linked server dependencies. It means that the team need to run regular tests on a server, or set of servers, which are set up more like the real production system with all the databases and dependencies in place. This can be an expensive task, both in terms of time and hardware resources (servers, disk space), so how do we do it? In this article, I’ll show how to use SQL Provision to deploy all the databases on a single production SQL Server instance to multiple Azure Virtual Machines.

This solution would be ideal in cases where, for example, setting up the test cell would require a lot of hardware, which was only going to be needed now and again, for a debugging or testing session on a set of inter-dependent databases. With some refinement, it might also allow teams to build a short-term “replica” production system in Azure, to perform tests and then, once done, simply remove it.

The processes that are used here, with a few changes to the details (a network share instead of Azure Storage, for example), are equally applicable to setting up local SQL Server instances, on with either physical machines or virtual machines. The key is in automating the provisioning of a set of databases to each test server.

Provisioning database copies with SQL Provision

SQL Provision consists of two tools: SQL Clone and Data Masker. SQL Clone allows users to create copies of SQL Server databases quickly, using minimal disk space. It creates an ‘image’ of a source database, and from that image it deploys multiple ‘clones’, which are lightweight but fully functional database copies with access to all the data.

For general development and test work, the source database might be the latest build from source control, stocked with the required test data, such as by importing standard test data sets, or by generating artificial data. SQL Clone will create an image of this database and then deliver to each team member their own database copy, or clone, very quickly. Each developer can make rapid, often experimental, changes to the data or structure of the database, or run destructive tests, without affecting the work of others.

For certain types of database test, or for debugging issues that you’ve struggled to replicate on your test servers, you’ll want to work data that is closely representative of the real thing. In these cases, you can use SQL Clone and Data Masker to create a masked image from a production database, or database backup, and from it deploy a set of ‘clean’ clones, such that sensitive or private data is obfuscated before it leaves a secure server.

However you create the image, a clone can be reset to its original state in seconds, is easy to customize, and quick to refresh, once anew build or backup becomes available.

In this demonstration, all my SQL Server instances are hosted on Azure Virtual Machines. I have one production SQL Server instance with five databases. I’ll use SQL Provision to create masked images of each database, from its respective full database backup file, and then deliver ‘clean’ clones of all five databases, as a unit, to each of three development and test servers.

The Azure set up

My test rig uses a series of Azure virtual machines. One machine is named Production and is meant to simulate a production server, and three other machines (devMaria, devAchmed, devTodd) represent the development and test servers.

Production hosts five databases (InventoryManagement, InventoryReporting, Sales, SalesSupport, Shipping), which represent various aspects of a business model and, because of cross-database dependencies, must be moved together into each of the non-production servers.

The virtual machines are all members of a single Resource Group in Azure. They can communicate with each other just as if they were within a single network. In order to keep things relatively simple, I’ve not set up Azure Active Directory for this demonstration, but generally, we’d have two AAD domains, one for the production system and the other for development and test systems. This doesn’t affect the practicalities of the demo, except for the need to copy the backup files to local storage on one of the Azure VMs (I’ll explain this a little later).

SQL Clone Server

The devMaria Azure virtual machine also doubles as the SQL Clone Server (where I install SQL Provision). Generally, you’d have a dedicated Clone Server, to separate some of the processing from where the databases are being consumed, but it does not otherwise affect the demo.

Installing SQL Provision is very straight forward and nothing special was needed to do this within an Azure VM. You simply need to register each of the servers (devMaria, devTodd and devAchmed) with SQL Clone by installing a SQL Clone Agent on each. This will allow SQL Clone to create the clones on any of these servers.

Azure blob and file storage for backups and images

The other two elements are an Azure blob storage container, which is the destination for the database backups, taken using Backup to URL, and a Azure file storage container as a centralized place to store SQL Clone images.

For the Azure blob storage, you’ll need to create a SQL Credential (called MyBackuptoURL in my case) on any SQL Server instance that needs to deliver backups to the blob storage container. You can then use this named resource in the backup scripts, as you’ll see later, when I present the PowerShell script.

The Azure file share for storing the images need to be accessible to every machine. In my case, I placed it within the default Storage Account, called provisiondiag932, which I used to build out all the virtual machines (‘\provisiondiag932.file.core.windows.net\images‘).

In order to allow each server to read from the file storage, you’ll need to configure the storage account security. The PowerShell script in Listing 1, is one provided by Microsoft and customized for my own use. It uses the cmdkey utility to store my storage account credentials for provisiondiag932 within Windows so that each server can read the Azure file share without needing to specify credentials.

Listing 1

Set Up Data Masking

Since we’re copying ‘production’ databases in this demo, we need to be very careful that the data that we deliver to the non-production environments could not be used it identify any individuals nor accidently reveal any other sensitive or private information. For this, we use the Data Masker tool in SQL Provision to generate data masking sets, one for each of the five databases. So, for the Sales database, for example, we’ll create a $SalesMask masking set, which will then be applied automatically by SQL Clone, when it creates an image from the Sales database backup file. By doing the data cleansing automatically, during image creation, we ensure that at no point in the process could we accidently release confidential information to non-production environments.

I’ve stored each of the five masking sets locally, on devMaria, but I could also have put them into the shared file storage location, in this case provisiondiag932. The latter could be a good choice if you’re managing this process from more than one server or need to access the masking files from multiple locations.

I’m only doing a single ‘data cleanup’ step here needed, but you can add another cleanup step to the Clone creation process, if needed

Automating the Process

With all the Azure servers in place and having ensured all SQL Server instance can access both the backup location and the shared file storage for SQL Clone images, we’re ready to automate the process of provisioning our development and test servers with clean copies of the five production databases.

There are five main parts to the automated process.

  1. Initial set up, to establish the necessary variables, and their values, used throughout the script
  2. Create the database backups for each of the 5 databases
  3. Removing any existing clones, and then images, on each of the 3 dev/test servers
  4. Create 5 new images, applying the appropriate data masking set in each case
  5. Create 3 clones of each of the 5 images and deliver a set of 5 clones to each of the 3 dev/test servers

I’ve chosen to automate every step of the process in a single script. In your systems, you’ll probably have one process for backing up your databases. You may choose to add a step to that process to create images from the backups, or, you may create the images as a secondary process. Either is a good choice. It’s also likely that you’ll have a mix of processes for setting up clones from the images. Some of those will be automated centrally as I’m doing in my script, while others will be self-service setups for individual developers’ machines. Dividing or sub-dividing different aspects of this process will be very simple once you understand how the whole process is put together.

Listing 2 shows the complete script, automating the entire process. I’ve broken it down by section and will explain what’s going on in each one, after the listing. This is but one possible path to achieve the same result, and there are many ways you might choose to write your PowerShell. All that really matters is that you arrive at a point where you can automate the entire process.

Listing 2

With the script in place, following are the basic descriptions of each of the major steps describing why I made the choices I did.

1. Setup

The set-up section is generally just to just to establish the necessary variables for use in the script. However, at the end of the set-up section, I do also remove the existing backups prior to creating the new ones.

This was necessitated by an issue I ran into with the backup-to-URL process. I was not able to use INIT to overwrite existing backups in blob storage. I should have been using FORMAT. Rather than spend a lot of time troubleshooting that issue, I just removed the files prior to creating new ones. This is obviously not something I recommend anyone do in a real production environment!

2. Create the Database Backups

These are normal backup commands, as supplied by Microsoft, using the Backup-SqlDatabase PowerShell cmdlet, with the destination being the Azure blob storage, identified by the mybackuptoURL storage account.

I run this basic command nce for each database, through the array that was set up earlier.

After creating each backup, I use Azcopy to copy them to local storage on devMaria (the SQL Clone Server). As I note in the code, this step would not be necessary if you have set up Azure Active Directory, and domain name services. Since I didn’t do that for this demo, SQL Clone cannot access the backup files in the blob storage container.

3. Clean Up Old Clones and Images

Initially, when I created the scripts for managing the images and clones across all my servers, I use a Get-Date command to give each Image a unique name. Then, I realized all the headaches that managing this methodology could create. How many days do I keep online? How old do I allow developers databases to be? All sorts of management questions came up that could sideline the core of the discussion, which was how to get multiple databases to multiple servers.

So, instead, I decided that when I refreshed an image from a new copy of the associated database backup, I first removed all Clones associated with that Image, and then removed the parent image. Other choices here could be to keep some number of active images, say two or three, and only remove clones and images that are older than that.

The first time I run the script, as currently configured, it doesn’t find any images matching the values I supply, so it generates errors when it attempts to locate them. I just ignored these errors during the first run.

In this demo, we create multiple clones from each image and install a clone from each image on each of the three dev/test servers. This means that in reach case we need to remove one clone from each of the three servers before we can delete the parent image. Each image and clone is named after the source database.

The command Get-SQLClone can get a specific clone on a specific instance, or, as I’ve done here, it can get all clones that match a given name, which we then pipe to the Remove-SQLClone command. This removes all clones with that name in a single step. We can then remove the parent image in the same way:

4. Create a masked image from each backup

First, we establish the name and location of the masking set for each database:

We need to create five images, one for each database, so we call the New-SqlCloneImage command five times through the foreach, using the array.

In order to create an image from a backup, SQL Clone requires a SQL Server instance, so that it can perform what’s termed a ‘pass-through’ restore operation. Here, I’m using the default instance on devMaria to help me create the Images, hence $devMaria[0]. After that, I just specify the backup file, the image destination, and the masking set to apply.

Image creation is the most expensive part of the process, but once it’s done, creating the clones is lightning fast.

5. Create New Clones

Finally, I create the clones. To do this I first retrieve each of the five images I just created. Then, I connect to each of the three dev/test servers in turn and dispatch the clone to the default SQL Server instance, in each case.

Running the Script

The output from this PowerShell is quite dull. In fact, if the script is successful, all you will see is the output AzCopy command. However, in the SQL Clone GUI you should see 5 images and 15 clones:

Even though all the work is being done through scripts, you can see the results in the GUI, providing you a way to report, track, and understand what your scripts are doing for you.

On the Activity page in SQL Clone console, we can watch the script run. This image shows the Sales image getting created:

For each of the actions, there is a set of logs available:

If you click on any one of the logs it will open in a web page, or you can simply download them. This is the log output from the successful creation of the Sales image:

With all this, you don’t have to spend a lot of time putting instrumentation and logging into your scripts. It’s built into SQL Clone.

Conclusion

Copying all the databases required to support an application to a set of development and test servers sounds like a painful and time-consuming task. However, SQL Provision makes it straightforward.

While setting up scripts like this does involve labor, and you’ll need to adapt it as required to cater for all types of authentication, and to add error handling and so on. However, it’s quite straight forward if you first plan out how you want to manage your images and the clones. You’ll also have to adopt it for your environment, any differences in security, and possibly add some error handling. This article shows the basics of a script that automates the process of delivering a set of inter-dependent databases in a fast, lightweight manner, while helping to ensure that your DevOps process complies with any set of data management laws that restrict the movement of sensitive or private data.

Tools in this post

SQL Provision

Provision virtualized clones of databases in seconds, with sensitive data shielded

Find out more