SQL Provision and Azure SQL Database: Creating Local Development and Test Databases
As more businesses start to use cloud-based platforms, such as Azure SQL Database, as their primary database solution, they find that they still need to support on-premise workflows for development and test databases. This is a perfect task for SQL Provision.
Your organization, like many others, is probably moving towards use of cloud-hosted database platforms, such as Azure SQL Database. However, even if some of your production databases are on Azure SQL Database, you still need to provide development and test copies of that database, whether they are hosted on a local physical or virtual machine, or on a virtual machine in the Azure cloud, or also in Azure SQL Database. The problem with hosting lots of copies of the database in Azure SQL Database, is that it gets cost prohibitive, it places more restrictions on developers than they often like, and sometimes network connections and bandwidth cause problems. This means you’ll sometimes need to move a copy of the database from Azure SQL Database to the local system, for development and test work.
This article will describe how you can create a local copy of the database from a BACPAC of your Azure SQL Database production database, and then use SQL Provision to deploy as many development and test copies (clones) of that database as you need. If your database contains any non-public data, you can also use the Data Masker component of SQL provision to ensure this data is obfuscated, before deploying the clones.
While the technique I describe will reduce both the storage footprint and the time required to deploy all the required development and test databases, it is still partly a manual process. It will be useful in one-off cases, such as where you need to debug a production problem that you can’t reproduce locally. However, to transform it into general-purpose development technique, you would need both to automate it, and to have a way of updating the clones with the latest development changes. I’ll discuss approaches for both these tasks.
Moving databases to the cloud and back again
Most organizations migrate to the cloud in stages. Some databases are easy to migrate while others are harder, or perhaps will never be migrated. The result is a hybrid architecture.
Even if the production copy of a database is hosted on Azure SQL Database, many organizations still need to host on-premise instances (physical machines or VMs) or IaaS-VM-based instances, for development and testing work.
It’s important that developers and testers can work with data that reflects as closely as possible the characteristics of the production database. For all the reasons described here, this is the best way to ensure that the team can deploy tested, reliable changes to our cloud databases as efficiently as possible, and so deliver value to end users, continuously.
In a hybrid architecture, such as described above, this might mean that we’ll regularly need to copy our PaaS databases locally and make copies available for out development team. Of course, traditionally, this has many challenges:
- The cost of maintaining multiples copies of any SQL Server database escalates very quickly, as the size of the database grows – we end up copying terabytes of information around and taking up a huge amount of space.
- Taking huge amounts of time creating the copies and waiting on the responsible person/people to do so, which can become a bottleneck.
- Finding a reliable, automated way to sanitize the data, before making it available outside environments that are “production secure” – copying live sensitive data or Personally Identifiable Information (PII) into environments other than Production is a workflow that our IT leadership can’t allow due to global data privacy legislation and best practice when it comes to protecting our customers’ data.
Copying Azure SQL databases using SQL Provision
We would like to be able to take a copy of Azure SQL database and create copies of it that are safe for our developers and testers, because all sensitive data has been removed or masked. Our data masking process must be automated, repeatable, it must entirely obfuscate all sensitive and personal data, but it must also result in data that retains the ‘look and feel’ of the original. We also need to avoid taking up the same amount of time and space that multiple restores would take.
Fortunately, SQL Provision is designed to do all this. It uses SQL Clone to create an image of the source database, and then distribute lightweight database copies, or clones, to the various team servers and Data masker to ensure that all sensitive data is protected, before the clones are created and deployed. If you’re not already familiar with SQL Provision you can see it in action here and get started with the tool yourself here.
The complicating factor when using SQL Provision with Azure SQL Databases is that the tool does not have a service that can have full access to the VM that is hosting the SQL Server. With Azure, you are co-tenant and have access only to the database, so we cannot create an image directly from a PaaS database. We’ll need another way to get our data down from the cloud.
Step 1 – Exporting a BACPAC in Azure
The first thing we need is a copy from which we can ultimately create our masked image with SQL Provision, which requires that we use either a live SQL Server Database or a backup (native .bak or Redgate SQL Backup .sqb). Unfortunately, with Azure SQL DB, there’s no backup mechanism that we can leverage to restore anywhere other than within Azure.
However, there are several alternative ways to copy of an Azure SQL database, summarized nicely in this article. If we just require the data, we can use BCP or SSIS. However, we want the database and the data, so we’ll need to export the Azure SQL DB to a BACPAC. The BACPAC file will contain both schema and data. It is not transactionally-consistent, but for the purposes of development and testing, this is the easiest way of moving the copy of the database around.
I’ll show a manual way to export to a BACPAC (and then, in step 2, import it locally). However, if you need to perform this task more frequently, you’ll want to automate it, and will prefer to export the database and data to a BACPAC or DACPAC (given that they can now deploy data too) using the SQLPackage utility, and then import it to a local server. More on this later.
I’ll use the Azure Portal to export the source cloud database to a BACPAC, and save it to Azure BLOB storage (you could equally well connect the database from SSMS, and use Tasks > Export Data-Tier Application…).
Log into the Azure Portal, then select the database and existing BLOB store you want to use, in my case DMDatabase_Production and dmblobstore respectively:
To create the BACPAC file, navigate to your production database, select Export, and then configure all the required settings:
Hopefully, you will receive a message that your export request was successfully submitted. After a short while, depending on the size of the database, the BACPAC file should show up in the BLOB store:
Step 2 – Importing a local database copy from the BACPAC
We’ve generated a BACPAC file from our source database, but this still isn’t something that SQL Provision can consume to create the image. So, we’re going to need to bring the BACPAC file into either an on-premise or IaaS-based SQL Server instance using SSMS to “Import a Data Tier Application”, and create a new, local copy of the database.
Here, I’ll be using my own copy of SQL Server Express 2016 and SSMS 17.9, but if your production database contains sensitive or private data, this local copy would need to be hosted on a locked-down SQL Server instance, in a supervised server room.
In SSMS, right click in the object explorer and “Import a Data Tier Application“, based on our BACPAC:
Just follow through the wizard step and import the BACPAC from Window Azure by providing the Storage Account, Container and Filename, which we can easily select, once we connect to the storage account:
I give the copy of the Database the name, DMDatabase_Temp, to indicate that this copy will not persist, because we don’t want any sensitive data to remain after we have imaged it.
Let the wizard carry out the import; again, it may take some time depending on the size of the file. Eventually, we should see the local copy on the nominated instance:
The following figure shows the top 10 rows of DM_Customer, in our temporary local copy:
Step 3 – Creating a masked image and clones for developers
From here in, SQL Provision takes over. You can use it to mask and obfuscated any sensitive data, as part of its image creation process, before it leaves this secure environment, and then deploy clones of this ‘sanitized’ image to all the Dev/Test environments
The first step is to define the data masking rules that will ensure that the data is anonymized, before distribution. I’m not going to describe the masking set in detail, but it’s available to download and study, and it uses a table-to-table synchronization rule to retain the relationship between the DM_Customer and DM_Customer_Notes tables. If you need help getting started with data masking, check out our getting started Redgate University videos, or for technical demos on masking different types of data, see our Product Learning articles.
We can then go ahead and create an image (our centralized copy) using , DMDatabase_Temp as the source:
When we get to image modifications step, we select our Data Masker masking set:
We place the masked image onto a fileshare that everyone in team can access. We give the image a name, in this example, we’ll go with DMDatabaseImage-24-12-18.
The final step is to create from this masked image, the clones we need for development and testing. These Clones are differencing disks that we can deploy to any SQL Server instance; they work just like any normal database, but reference the masked image for all data, except for any data that is modified locally using a clone. At time of creation they will only be around 50 MB (for any size image):
At this point, you can see the development and test database in SSMS and can work with them as normal.
If you query any of the clones, you’ll see that all personally and identifying information has been swapped for ‘fake’ data, but in a way that maintains consistency, and the correct
Finally, we drop the temporary copy of the database because we don’t want any sensitive data to persist.
Automating and extending the process
The solution as it stands is useful for one-off cases only, such as a troubleshoot and fix a production problem that you can’t reproduce without the real production schema, code and data. It is a manual process, and it provides a copy of the production system, rather than the new release that the developers are working on.
To make it a more useful technique, we’d need to automate it. The best way to do this is probably to use PowerShell with SQLPackage, to export the database to a DACPAC and then import it. You can also use SQL Provision’s PowerShell cmdlets to automate the cloning process.
Also, to make the technique useful for developers, the version of the database would need to be changed to the current development version. You can also use SQLPackage to do this (as above), but it’s easier to use SQL Compare. You can compare the schema of the masked ‘production clone’ to the latest development build, in version control, and apply to the clone the latest development changes. At that point, the team have a clone that represents the latest build, plus real (but obfuscated) data for testing.
To take this even further, you can link clones to Source Control using the Redgate SQL Toolbelt and ensure these copies are always up to date and then include them as part of a Source Control, Continuous Integration and Deployment pipeline.
Conclusion
Using an Azure SQL Database as a source, we made available four brand-new development databases and a test database on our local SQL Server instance. Each clone was 50MB in size, took seconds to spin up from the image, and was safe to use for development work since any sensitive data had been obfuscated, assuming you’ve set up the masking sets correctly, and know exactly where all sensitive data might be “hidden”.
Our developers and testers can work on databases that are accurate copies of Production but taking up a fraction of the disk space and time that would otherwise be required. To read more about SQL Provision and how it can automate and streamline your own dev/test workflows, you can read more about the tooling here.
Tools in this post
SQL Provision
Provision virtualized clones of databases in seconds, with sensitive data shielded