For large databases, it’s a hard and time consuming task to provide database developers with development or test copies of that database that resemble the production database, in terms of the size, distribution and nature of the data. It can take a lot of time to produce a copy of the latest database build, provisioned with realistic volumes of data. It takes further time and a lot of hardware resources to provide multiple copies of this database to each of the development and test environments.
Nevertheless, many developers and testers would like to be able to work on their own isolated copy of the database. They often also need to be able to perform test runs quickly, in parallel, to verify that the application integrates, and behaves correctly, and that any database changes they make don’t cause a deterioration in application performance. In short, the team need to be able to perform integration and acceptance testing with realistic “production-like” data from as early as possible in the development cycle.
Fortunately, it turns out that we can use the technological advances that underlie virtualization to make the database provisioning process a simpler, quicker task. SQL Clone uses established disk virtualization technologies that are built into 64-bit Windows to allow us to ‘clone’ a SQL Server database within seconds, and using very little additional disk space. Under the covers, the actual database image is held only once, in the network, and never updated: all changes are held locally with the clone, using existing virtualization technologies within Windows. From SQL Server’s perspective, the ‘database clone’ we create is just a normal database, complete with data, and we can query it and update it just as we would any other database.
Challenges of provisioning development and test with realistic data
Developers would often like to be able to “spin up” quickly in their environment a database that resembles production. For example, let’s say customers have reported a flaw with the current application and the business would like the development team to diagnose the problem quickly and issue a hotfix.
Often, such problems only occur in the production data, so before they can even get started they will need to wait for their environment to be “refreshed” with the latest production data, which usually means waiting for the DBA to restore a backup. This can slow down development and cause frustration to the business.
Likewise, it can slow down testing. Database integration, load and performance tests aim to ensure that the all application processes supported by the database behave correctly, and perform and scale to expectations.
With unit tests, we test isolated units of code, often with the database “mocked”, but end-to-end processes will need access to many related tables, and to any objects associated with these tables. We need to test these processes against data that resembles the production data, and against sufficient volumes of data to mimic the production ‘payload’. For many teams, this might again mean that the DBA needs to refresh the test environment with a production database, by restoring a recent backup. If that’s not permissible from a regulatory standpoint, then the team need some other way of loading data that is representative of the production data. This could mean cleaning and masking a copy of the production data, while retaining a realistic data distribution, or alternatively it could mean using standard data sets.
The larger the database, the larger the volume of data we need to create or move, the slower the process of database provisioning. Also, since the ‘average’ database can reach hundreds of gigabytes in size, provisioning disk space also becomes an issue.
For example, let’s say you’ve developed a thorough suite of tests for each of your application processes (create purchase order, generate invoice, etc.). To test each process one after the other, against realistic volumes of test data, will take 20 hours. You’d like all tests to complete overnight. The obvious solution to the problem is to parallelize the tests. It’s a relatively straightforward task to spin up a number of virtual test environments, with the latest builds of both the application and database installed and configured. The hard part is provisioning the database in each environment with volumes of realistic data.
This is where SQL Clone can really help. It uses standard disk virtualization technologies in Windows to allow us to create a database ‘clone’ that behaves just like a normal database, but has a vastly smaller footprint.
How SQL Clone works
From SQL Server’s perceptive, a ‘clone’ is just a normal database and we can work on it just like any other database.
However, the original database in Figure 1 is 1 GB and its clone is a mere 30 MB, so how does this ‘magic’ work? In fact, it uses no magic. Creating a clone is a simple, 2-step process each of which relies only on standard virtualization technology.
First, we create a data image, which is the ‘immutable source’ of clone production for a given database. The data image is a full byte copy of the database (data and log files) at a given point in time. We can take the image directly from a SQL Server database or from a database backup.
When we create the image we’re essentially creating a Virtual Hard Disk (VHD) to which we capture a page-by-page copy of the database. SQL Clone uses the Volume Shadow Copy Service (VSS) to capture a data image from a live database, and simply uses the native database restore process when creating a data image from a backup. Producing the data image will take about as much time as performing a database restore, and the image will be roughly the same size as the source database (MDF plus LDF files).
Now, we can create ‘clones’ of the source database, from the data image. When we create a clone, we’re simply creating a local differencing disk (also a type of VHD) with a connection to the remote data image. SQL Clone “virtually attaches” the MDF and LDF files in the data image to the operating system on the server where we are creating the clone. The bytes that make up the clone database are presented through the VHD in a way that is transparent to SQL Server – virtualization is taking place at windows file system level.
The data image cannot be manipulated by clones. Instead SQL Server writes any changes we make to the clone database to the local differencing disk file. Therefore, any change we make to a clone has no impact on other clones, the data image or the source database.
How to start creating database clones
Figure 3 shows the basic components we need to have in place in order to create data images and clones using SQL Clone.
The first step is to install SQL Clone itself, which consists of two major components, the SQL Clone Server and SQL Clone Agents.
We use the SQL Clone Server to create and manage data images and clones. We only need to install this once, on a machine that has access to all the machines hosting SQL Server instances from which we wish to create data images or to which we wish to deliver clones, as well as to the file share location where the data image will be stored.
The SQL Clone Server performs all its work through agents and so we also need to install a SQL Clone Agent on each machine on which we need to create data images or to which we wish to deliver clones.
Finally, we need to set up a Windows network share location, for storing the data images. Again, this will need to be accessible to each machine from which we need to create images or on which we need to create clones.
Install SQL Clone
Download SQL Clone and run the installer on a machine that has access to all SQL Server instances on which you wish to use clones.
The first job is to configure the SQL Clone management service. This service will need to create a SQL Server database, called
sqlclone_config by default, which it will use to store details of all data images and clones. You’ll need to create the SQL Clone management service under a service account with the correct privileges to access the database.
Once the install completes, the SQL Clone Server web application (http://SQLCloneServerName.DomainName:14145/) will open automatically and you’ll need to authenticate using your Windows credentials.
Logically, the next job is to create a data image of the database we wish to clone. However, clicking Create New Image in Figure 4 will quickly divert us into installing a SQL Clone agent on the server hosting the source database, since it’s the agent that needs to create the data image.
Therefore, we’ll start with the task of downloading and installing an agent.
Install the SQL clone agent
To install SQL Clone Agent directly on the server hosting the database you wish to clone, navigate from that machine to the clone management service, at http://SQLCloneServerName.DomainName:14145/, and then click on the Download Agent link in the bottom left of Figure 4. This will auto-download a single-use clone agent installer.
Run the executable to install the Agent, and the SQL Clone Agent Setup wizard will start automatically (give it a few seconds if it doesn’t appear straight away). We need to define the service account for the clone agent. Also on the clone agent setup screen, we have the option to customize some settings, namely the clone location and server connection settings.
This account will need administrative rights on the local machine, and to have read/write access to the file shares used for sharing data images. to have access to the SQL Server instances with permissions to
CREATE DATABASE and
VIEW ANY DEFINITION, granted either via the Agent account’s Windows login, or via a SQL Server login (see the Requirements section of the documentation for further details).
Click finish and the agent should install. This should normally only take a few seconds, but may take significantly longer if it first has to download .NET 4.5.2 and the VC++ redistributables. In any event, if the installation completes, we’ll see an “Agent Setup Complete” message and at this point we’re ready to start creating a data image. Unless you see this screen and click OK, the Agent is not set up.
The alternative approach, if you’re accessing the SQL Clone management service from local machine where you installed it is to download an installer locally, copy it across to the server hosting the database you want to clone, and run it.
Create the data image from a SQL Server database
The next job is to create a data image of the database we wish to clone. Open up the SQL Clone web interface (if it’s unavailable, check that the SQL Clone service is running). Click Create New Image. We are presented with the options to create the data image either from a SQL Server database or from a SQL Server full database backup.
In this article, we’ll only discuss creating a data image from a live database. A follow-up article will cover creating data images and clones from a backup.
Behind the scenes, creating a data image from a SQL Server database proceeds in four basic stages, the first three of which are depicted in Figure 8.
- Create an initial, small VHD, at the data image location (shown as 10MB in size in Figure 8)
- Create a mount point to this VHD on the local operating system of the machine hosting the source database
- Use the VSC service to “remote copy” the MDF and LDF files to the data image location
- Un-mount the VHD
From the SQL Clone web interface, the first step is to add the SQL Server instance hosting the database we wish to clone. The Agent dropdown box should list all instances on which you installed the Clone Agent. Fill in the instance name and the authentication details and click Add.
SQL Clone will attempt to establish a connection to the instance, and if it succeeds, you’ll reach the screen shown in Figure 10, where you can select the source database for the data image. You can also add a new SQL Server instance from this screen, should you prefer to add all your SQL Server instances before starting to create images and clones.
In this example, we’ll create a data image from a database called MyAW2012 (a developer copy of AdventureWorks2012) on a SQL Server 2012 instance.
Next, enter the path and network share where you wish to store the data image. The Clone Agent will need read and write access to this location.
Depending on how you wish to use the clones, it might be beneficial to choose a high-speed, high-bandwidth network location for the data image. Initially, before we make any modifications to the clone database, all the data will be retrieved from the data image, and so its location will affect the query performance of each Clone created from that image. This is a factor to consider if, for example, you are performing load or performance testing using a Clone.
Give the data image a meaningful name, probably including the date the image was taken.
Click Create Image and the image creation process will start. Remember that this will take about as long as a database backup or restore operation, and will depend on the speed of the network over which you’re creating the image, as well as the speed of the disks and operating system for the storage location.
When the image creation process completes, you’ll see a screen similar to Figure 13.
As you can see, SQL Clone reports the data image size as 906 MB, which we can verify by inspecting the VHD file itself, at the data image location. In this case, Figure 14 shows the exact size of the VHD file as about 885 MB (close enough!).
The size of the source database in this case was about 1 GB, which we can verify in a number of ways, such as simply by opening the Properties screen for the database in SSMS, as shown in Figure 15, or by inspecting the data and log files
This confirms that the data image is roughly the same size as the source database. Occasionally I’ve seen cases where SQL Clone appears to report the size of the data image as being substantially larger than the original database (for example, reporting a data image of 320MB for a database of 230 MB), but this appears to be simply due to the dynamic VHD sizing mechanism.
Let the cloning commence
Having created a data image, we’re ready to start cloning. Figure 16 offers a simplified depiction of creating a clone. The database clone is essentially a small, local VHD file called a differencing disk, with a connection to the data image VHD, accessible to the clone via a mount point.
The database and clone sizes quoted in the image are illustrative only, the point being that the clone for a source database of many GB, will only be tens of MB in size.
As noted earlier, the data and log files for the clone database are presented through the data image VHD in a way that is transparent to SQL Server. We can query and update the clone database just as for any other database. Any changes we make are written to the local differencing file and so these changes are isolated, and any change made to clone have no affect at all on the integrity or performance of the source database, nor do they affect the data image.
Upon initial clone creation, the diff file is essentially “empty” so when we query the clone, SQL Clone retrieves the builds data pages from original image. As we make changes, it will reconstruct the required pages from the diff file plus the original data file.
To create a clone, go to the Dashboard in SQL Clone, which will display our existing data images and the clones associated with each image. For each data image, SQL Clone displays the source database, the source server, data image location and the data image size.
Click “Clone” on one of the data images, in this example we’ll create a clone of the MyAW2012 database. Confirm the data image selection on the next screen, and then choose a destination SQL Server instance on which to create the clone, and give it a name. Here, we’re going to create the clone on a SQL Server 2016 instance.
That’s it; now we just create the clone! This will take just a few seconds. Once created, we can query and work with it from SSMS, just as any normal database.
We can create multiple clones from the same data image, for example to perform parallel database testing, or simply so that each developer can work with his or her own local copy of the database, with real data.
Figure 20 shows the MyAW2012_Clone, created on a SQL Server 2016 instance, right alongside a second clone of the same data image, MyAW2012_MyClone, created on the same SQL Server 2012 instance as the source database.
Notice that, compared to a source data image size of close to 1 GB, the MyAW2012_MyClone clone, on SQL 2012, is 23 MB, and the MyAW2012_Clone clone, on SQL 2016 is 97 MB. The size difference is accounted for by the fact that SQL Server had to “upgrade” the database, created on a SQL 2012 instance, to run on a SQL 2016 instance, and those changes will have been written into the local differencing file. If we’d been cloning in the other direction, trying to create a clone of a SQL 2016 database on a SQL 2012 instance, we’d have seen an error during clone creation.
By default, SQL Clone creates the clones at %localappdata%\Red Gate\SQL Clone, although this may change in future, and it’s also possible to use a different location using the Advanced Settings section of the Clone Agent installer (see Figure 5).
Navigate to this location, we find a folder holding each of the clones created on that machine. Figure 21 shows the VhdMount point and the local differencing disk,
disk.vhd, for the MyAW2012_MyClone clone, on the SQL Server 2012 instance.
Don’t be perturbed by the apparent 2 TB size of the VhdMount folder; this is just a reference to the storage, which is actually in the data image repository on the file share you’re using. This is the ‘mounted volume’ approach in Windows, which uses ‘symbolic linking’ (which you may be familiar with from Unix or Linux).
If we click into the VhdMount folder, there is a Data folder containing the data and log files for the database.
Again, remember that these are made accessible to SQL Server via the VHD mount point to the remote data image; this is entirely transparent to SQL Server.
Troubleshooting SQL Clone
If you encounter any errors in installing the Clone agent, or in creating images or clones, you should hopefully get a helpful error message SQL Clone web app, but you can refer to a number of places for further information:
- Open Windows event log viewer on the SQL Clone server (for example, by typing eventvwr.msc from a PowerShell command window) and look at the application errors – this is often the best place to start
- Also check the Application event log on the machine where the Clone Agent is installed
- Query the sqlclone_config database, which contains an
OperationMessagestable that may be helpful.
Administering clones and images
Most clones will have a finite lifespan. For example, during testing, we might spin up a clone, run the tests, then immediately dispose of it. We can keep track of what clones exist for what data images via the Dashboard section of the SQL Clone server web app. From here, we can delete clones, and also delete data images, having first deleted all associated clones.
SQL Clone was designed by people who had to rapidly deliver database improvements. It was done to get over the time and network resources needed to do the necessary testing. It was designed for other teams in the same predicament but is more generally useful within development where every developer is working on an isolated instance of the database that has to be kept up to date with the version in source control. It does this by exploiting the existing virtualization technology within Windows in a creative way to nail a specific DevOps problem.
This article described the basics of how to install and set up SQL Clone, and the basics of how it works. Once all the parts are installed and configured, the process of creating data images and clones is straightforward. Image creation will take about as long as restoring a database and storing the image will require about as much space as the original database. However, once we have this image, we can create from it any number of database clones, and each developer and tester can work on their own isolated clone database, without affecting the work of others.
Also in Blog
I’ve read through a number of the industry thought leaders to get an understanding of how DevOps is being communicated out there. As with so much else in life, you can start at Wikipedia to get a ge...
Also in Database DevOps & DLM
In this three-part series, guest bloggers from DevOpsGuys look at the real role of Ops in DevOps. Where it changes, how it changes, and why Ops has an important part to play in the brave new world of ...
Also in Redgate products
My previous article in this series explained why it's important for a development team to adopt a common standard for formatting SQL code. It also gave a broad overview of the styles and actions withi...
Also about SQL Clone
To celebrate the launch of SQL Clone, we had a bit fun building a skill for Alexa that provisions a database in seconds using just your voice and SQL Clone. Check out Grant Fritchey’s video showing ...