How to Automatically Create and Refresh Development and Test Databases using SQL Clone and SQL Toolbelt

In order to be able to deliver database changes more quickly, there are several tasks that must be automated. It can be a daunting job to ensure that the whole team has the latest database build when there is a proliferation of copies, and the database is big. Phil illustrates a solution by taking a set of Redgate tools to show how they can be used together, via PowerShell, to build a database from object-level source, stock it with data, document it, and then provision any number of test and development servers with the database build, taking care to save any DDL changes to the existing copies of the database.

This article presents a PowerShell automation script that creates and refreshes the various databases on one or more servers that are required for test and development work, It uses Redgate’s SQL Clone, SQL Compare and a few other optional tools from the SQL Toolbelt.

A basic premise of this solution is that a database is being developed, and its source is stored in the version control system (VCS). The automation script will build the database from the latest build script in version control, stock it with data if wanted, and document the database if required. In the next stage, it uses SQL Clone to create an image of this database, and from the image it creates create ‘clones’ of the database, or refreshes existing clones, on each of the SQL Server development and test instances that you specify. The net result is that all the development and test databases should end up with metadata and data that is identical to that specified in the source, for that database version.

The automation script comes with a few safety checks to ensure, for example, that before dropping existing clones and creating new ones from a new image, any changes to local database clones are written to a directory in the VCS as a safeguard.

Background

I’ve been in IT long enough to remember an innocent time in database development, where we generally had just one development database on one server, and we all developed and tested on it.

In many respects, we managed quite well because, after all, a relational database is a multi-user, multi-process device, and is its own development environment. We stored the database object scripts in source control, and would check out objects, such as tables or views, to edit them, do our work and then perform a check-in to include the changes in the next development version. There were complications when one developer decided to alter an object with interdependencies on the work of another developer, so it was just as well we worked in the same room, within shouting distance. The advent of schemas put an end to the obvious collisions.

Each nightly build would create a new version of the database, purely from the scripts in source control. In theory, this meant that nothing unwanted, nothing that hadn’t been specifically included, could creep into the build. In practice, even database developers are human, and mistakes were common. Nightly integration tests were run to prove that each build could produce a functioning database.

Of course, having a tested, functioning database in development did not necessarily mean we had a database that was deliverable to production. Generally, we would scrabble around to ensure that the updated tested version of the database worked with any existing production system that relied on the database, practicing in staging until everything went right. It wasn’t pretty, and could take a long time. We would miss things that we assumed would be in place, and we’d find conflicts that had to be resolved.

Nowadays, we like to build, integrate and deploy the database automatically, via a script. We practice this process as often as we can. There are no halcyon days when the database is too immature to build and deploy: Instead, we prefer to start to take the database through to staging as soon as we’ve written anything more than a couple of ‘hello world’ database objects. This means we get immediate feedback of a problem, and we can also say, at any point in time, whether we have a version that is potentially deliverable. We want to be able to deliver a database very quickly and reliably, at any stage of the database lifecycle, if the application developers are waiting for a database change, or if there is an important security fix to put in place.

The more rapid the delivery cycle, the more requirements there are for copies of the latest build. We have developers who need to work on their own isolated copy of the database, but we also need several copies of the database for development and testing, rather than sharing just one. We often need a test cell with servers and databases to support a wide variety of tests, performed in parallel, and in some tests we need to validate the build against other versions of the database.

The more rapidly we must deliver, the harder it is to keep the database documentation up-to-date, and the better we need to get at automating that process. If parts of the data that we use to develop with are sensitive in terms of privacy, company security or are financial, we need to mask, obfuscate or create from scratch the data that we develop and test on. All these factors mandate an automated build process.

A PowerShell automation solution

At some point, one must stop talking in generalities, cease waving one’s arms whilst glossing over real technical problems, and produce a practical script that deals as sensibly as possible with these challenges, and can reliably automate database provisioning to the development and test servers.

There are several dangers inherent in this, of course. Firstly, no two development teams have ever done database development in the same way, so scripts must be changed to conform to your development environment. Secondly, real scripts are too idiosyncratic. They reveal too much about the quirks of the way you go about the process of developing databases.

To try to overcome these problems, I’ve created a PowerShell script that, I hope, is easy to adapt but deals with all the issues in a general way. The general principles are:

  • All databases that are specified in a list must, if the build succeeds, end up being at the version of the current successful build.
  • Before we update a development database, we preserve changes just in case the developer has forgotten to save their work in source control.
  • We audit as much of the provisioning process as possible.
  • We need to allow for arbitrary pre- and post-build scripts, mainly for inserting data.
  • It should be possible to inspect the entire database build script.
  • The PowerShell script to automate the whole process should have no hard-coded variables. These should be held in a separate build data file.

What the solution does, at a glance

Figure 1 lays out, broadly, what the script does at each stage of the database provisioning process.

Figure 1

What you need to do to get it to run

You will need, at minimum, a couple of SQL Server instances, a copy of SQL Compare, and a copy of SQL Clone. You need to install SQL Clone and set up the SQL Clone Server and the client development and tests servers on which you will create the clones.

If you have the toolbelt, you’ll be able to use SQL Data Compare, SQL Data Generator and SQL Doc, as required.

You will need to download the PowerShell automation solution from GitHub. This consists of two PowerShell files, a build data file (MyInstallationData.ps1) and a process script (RedgateProvisioning.ps1). The process script take the instructions for the build and provisioning, and all parameters, from the build data file. Nothing by way of configuration information is held in the process script.

You will need to change the values stored in the build data file to define your database, server environment, network paths, list of clones, and so on. I’ll describe the structure of the build data file later in the article.

How the solution works

The solution has two broad phases of operation, the build phase, and the provisioning phase.

The build phase creates the latest database version on the build server, documents it and stocks it with test data. To do this, it uses various SQL Toolbelt tools, namely SQL Compare, SQL Data Compare, SQL Doc and SQL Data Generator. Only SQL Compare is essential.

The provisioning phase provides each designated development or test instances with a copy of the build database, produced by the build phase. Traditionally, this might mean taking a backup of the build database and restoring it to each of the development and test servers, or copying across the MDF file to each server and attaching it to the instance. In either case, this means that you need to copy the same bytes of data many times over the network. This becomes increasingly expensive in terms of both time and disk space as the size of the build database grows, and the copies proliferate.

In this solution, the provisioning phase uses SQL Clone, which copies the bytes only once, and then uses disk storage virtualization technologies, built into Windows, to virtualize the data on each target instance.

SQL Clone creates one full copy of all the data and metadata that makes up the source database; the image. From this image, it can then create clones on each of the development SQL Server instances. Each clone will be only a few tens of MB in size, but has access to all those same bytes by reading from the image. The only data stored locally for each instance are data pages containing changes made directly to the local clone databases.

The Build phase

The script will:

  1. Generate the build script

    We check to see if build scripts already exist in the database’s Source directory (at DatabasePath). If they don’t, then the process script will create them from a designated development server (current.DevServerInstance) and database (current.Database), using SQL Compare, and will then store them at DatabasePath\Source. This allows you to put an existing database into source control. This was useful for testing the script, but I left it in just in case one or two of you haven’t got your database in source control! We assume that this Source directory will already be a source control directory, or will be turned into a GitHub repository. Whichever way it happens, you specify the directory in MyInstallationData.ps1.

    Listing 1

    Similarly, the script will check for the existence of a source data directory, which will hold the INSERT script for static data and any other data needed for testing. If it doesn’t exist, the script will, if required, generate a data INSERT script (DataSyncFile.sql) from the existing data in the current database, using SQL Data Compare. Be careful with this, it is designed merely for the small amount of static data that is required for a database to run. Test data needs to be loaded from native BCP format. Insert statements would take too long.

    In reality, there will be a number of different data sets for various forms of testing. As an alternative to using SQL Data Compare, or native BCP, we can specify that we need to generate the data using a SQL Data Generator project file (.sqlgen) as a post-build step.

    Once the object-level build scripts exist in the database’s Source, it will use them to generate a single database build script. It uses SQL Compare to read the Source directory of scripts, compare that with the model database on the target build server instance, and generate a synchronization script that will synchronize the two. In effect, this produces a database build script (Database.sql), with all the objects in the correct dependency order.

    Listing 2

  2. Build the latest version of the database, from version control

    This stage builds the new database version, on the designated build server (build.NewBuildServerInstance) using the build script from the previous stage.

    If a build database already exists, representing the previous day’s build, it will destroy it and then build the new version, having first checked for any existing clones (created from the image of this database – see the Provisioning phase). If any clones exist, it compares this build database to the clones that were created from it, using SQL Compare, and write any changes made to the local clone as a “diff” script (which will have the clone database name appended with the date) to a Changes directory, just in case they weren’t checked in, before we deploy new clones.

    Listing 3

    Having done this, it checks for any active user processes on the build database, and if it’s OK to do so, drops the existing build database. It then creates a new database with the same name and then executes the build script.

    Listing 4

    There is also a “build check” step, where it uses SQL Compare to compare the new build with the source code directory to make sure that it was successful (i.e. that the build database matches exactly the database described by the scripts in the Source directory)

    Listing 5

  3. Fill the database with test data

    If the script finds a SQL Data Generator project file, DataGenerator.sqlgen, in the database’s Source directory, it will use it to load the new build database with generated test data. Otherwise, it will use the DataSyncFile.sql script containing the INSERT statements.

    Listing 6

  4. Generate database documentation

    The script will generate database documentation (HTML) for the new build database, or refresh the existing documentation, using SQL Doc, if we specify a path to the SQL Doc executable in the build data file, and save it to version control

    Listing 7

    The Provisioning phase

    During this phase the script will use SQL Clone to create an image of the database and then, from this image, create or refresh clones on the list of clones (named Clones) in MyInstallationData.ps1 that are in one of the target servers that have been registered on the SQL Clone Server.

  5. Create an image of the build database

    Listing 8

  6. Create clones

    The script creates as many clones as you specify from the image. It goes through the list of clones that you provide in the data file and, if the clone exists, removes it. We have already saved any alterations. It then creates it from the image.

    Listing 9

    The Installation data file (MyInstallationData.ps1)

    You will need to change the build data file to define your database, server environment, network paths, and list of clones, all in a PowerShell PSON Data structure. This is in several sections and subsections

    Tools

    A list of the locations of the various tools

    • The path to where we have SQL Compare installed
    • The path to where we have SQL Data Compare installed (leave $null if not wanted)
    • The path to where we have SQL Data generator installed (leave $null if not wanted)
    • The path to where we have SQL Doc installed (leave $null if not wanted)

    Source

    The various directories in which you want to store files and logs:

    • The path to the Source directory for this database
    • The location of the executable SQL data insertion script.
    • The location of the SQL Data Generator file for any columns you need to obfuscate (leave $null if not wanted)
    • Where you want to put the reports for a database.
    • Where changes between clone and build are stored

    Current

    The details of the current development shared server if you don’t yet have a VCS-based source code directory (leave as $null or delete if you don’t need or want to use this. It is only used if the system can’t find your source code directory

    • The Server Instance of the development SQL Server to get the source from (optional)
    • The name of the database

    Build

    The location of the database that you want to build:

    • The Server of the SQL Server Instance you want to use for the build
    • The name of the database you want to call it
    • The SQL Data Generator project you want to use for the data (optional)

    Image

    The details of the image that you want to create:

    • The name of the image we want to create
    • The clone server URL
    • The ID of the SQL Clone image location (usually 1)

    Clones

    A list of all the cloned databases, referenced by the NETNAME of the clone servers and the name of the clone database

    Alternative Strategies

    Normally when I do a PowerShell script that uses a proprietary tool, I like to provide an additional alternative version that uses whatever is provided by Microsoft or has a free software license, but in this case, it would be rather impractical. It is possible to do a build from object scripts: I’ve illustrated how to do so in my article, How to Build and Deploy a Database from Object-Level Source in a VCS.

    Cloning can present a much bigger problem. One alternative mechanism for provisioning is to copy the mdf file onto each machine and attach it on each server. One can, of course run a backup from the database you want to copy, and restore it to each clone. Both these techniques are slow, end up with a mass of data being moved around the network and use a lot of disk space. It can also leave more complications for security due to the file permissions that are necessary. The editor tells me that Simple Talk is due to publish an article by Grant Fritchey that tells you how to do this.

    For further information

    The script itself is rather too long to be embedded in an article. Besides this, I keep adding features and trying to improve it. Everyone who tries it wants additional features. I’ve therefore decided to place it on Github in the hope that someone else will take it and improve it.

    https://github.com/Phil-Factor/DeployViaToolBeltAndClone