With a database, many people can view the data at the same time. If someone makes a change and commits it, then the change is visible not only for that connection, but all the others as well. Everyone sees the effects of that change; that is what databases are for.
But what if you want to make changes to the data or even the structure of the database, and ensure that the change affects only you, as if you were the only person using the database? What if you want to be even more subtle, and make these changes visible to one team, while the other users are unaffected? Or, you want to make changes that affect only your version of the database, and then subsequently roll back these changes, so that, as far as the database is concerned, they never happened?
Databases struggle to do all this; it isn’t how they are designed to work. So, under what circumstances would you need to stand the way that a database usually works on its head like this?
Uses for clones
You’re a developer and you need to work with a database that corresponds in scale and appearance to the real data. You need a particular version of the database, but you want the freedom to be able to make changes and run tests that, on a shared server, would affect other developers or even overwrite their work. Then, having saved what you need to, you want to revert those changes instantly.
How do you do it? On a shared development server, you’d generally restore a backup to that point in time or detach the current version and re-attach to the original. Both approaches take everyone offline, and the ensuing delay affects everybody’s work. The alternative of giving every developer their own copy of the current ‘trunk’ database doesn’t scale easily and can present a provisioning nightmare, if you are practicing continuous integration.
Instead, you need a ‘database service’ that gives every developer or team their own, ephemeral view of an identical database that then reflects just the changes made by that person, or team. SQL Provision includes a tool called SQL Clone that was designed to meet these requirements. Having created one complete image of the source database, it then allows you to take many other databases (clones) to the same version of the database, with the same data, very quickly. Having done so, each developer can then migrate his or her own individual clone, to a different version or with different data, without affecting anyone else.
Both developers and testers dream of a clean way of doing this. Imagine, for example, that you are developing SQL code for the part of a business process that amends a customer’s details. You need to start from a pristine copy of the database with a ‘known’ dataset and state, amend the customer’s data, run a test to make sure the result is exactly as expected, and then revert quickly to the original state, ready to make the next change and run the next test. SQL Clone makes this very easy to do.
Similarly, you need to run an integration test that checks the entire business process, end-to-end. Again, you need to start from a known database version, with a known dataset and then run the entire process and check the data against a criterion that represents the correct result. Having done that, you need to revert the database to its original state. Because you are likely to run all this as a scripted overnight process, you need to be able to revert automatically, maybe several times in a test run. Again, SQL Clone accommodates this because it was designed to fit in with a DevOps approach. It easily scriptable in PowerShell, using the principle that anything that was achievable in the Clone web console could also be scripted, and vice versa. This allows much of Database CI to be reliably automated.
You might also want to test the behavior of two or more different implementations of the same function or stored procedure. SQL Clone would allow to spin up multiple identical copies of the database, side by side, install different versions of the procedure in each one and then run the tests simultaneously. It can do this without using up excessive disk space on the SQL Server instance.
Of course, it’s not just developers who occasionally need to work with databases in this way. Imagine you are running a training course for staff to use a database application. In training you follow a scripted example where each trainee needs perform the business operation that amend customer details. After the results are checked, each trainee needs the database returned to the initial state for the next training exercise.
So what technology does SQL Clone use that makes all this possible?
Working the magic with virtual technology
Nowadays, any 64-bit operating system supports virtual hard disks (VHDs). These look like real hard drives to the system but are, in fact, just files. They are usually needed for virtual machines, but SQL Clone uses them to allow SQL Server instances to unwittingly share database files. There is nothing new to the idea of the hardware of drives being different to what is presented to the user. RAID, cloud storage and NAS technology has prepared the way.
Virtual hard disks
VHDs come in three flavors: fixed, expandable (dynamic), and differencing. In the first, a fixed part of the ‘backing store’ real-estate is allocated, in the second, the storage is allocated only when required, and in the third type – the differencing VHD – a parent virtual disk is used as the basis but never actually modified. Any subsequent writes are written as differences to a ‘differencing VHD image file’. The drive as seen by each user is a combination of the two.
Even with contemporary hardware, it isn’t sensible to have many duplicated copies of essential components, such as the operating system, or common applications, eating up disk resources. However, the introduction of virtual machines caused precisely this problem. It was solved by the idea of a ‘differencing’ VHD. The parent VHD contains the Windows operating system image, which each virtual machine uses as its system boot disk.
Each VM user can write to disk, unaware that their changes are written instead to the new differencing VHD. The changes appear only to the machine that made them, so you can use a shared image of the operating system but each VM user can configure it, and write files to it, as required.
Although the main purpose of this device is to save in storage by preventing duplication, it has other uses. One interesting use is that the differencing disk can be side-stepped to reveal the unmodified parent virtual disk image in its original state. This is handy where a mis-configuration of the operating system has accidentally prevented a reboot. By reverting to the original parent image, any individual VM can be recovered quickly in an emergency by doing a clean system bootup of the operating system, without affecting any other machine.
How SQL Clone uses VHDs
VHDs are therefore just abstract disks that are, in fact files of a special format (either VHD or the later VHDX). This technology can be used for any shared files which must remain immutable so that they can be shared, and yet can be altered locally by each individual virtual process. It is ideal for use in cases where large files, or large collections of files, must be shared.
SQL Clone uses this same technology to make it possible to share very large database files, without needing to copy them to every machine that needs to use them. An installation of SQL Server that hosts a clone sees that it has a database that is huge and stored locally, which it can read, and to which it can write. It behaves as any other database.
However, the reality is rather different. SQL server is seeing a combination of pages stored locally that represent changes, and the unchanged pages inside the shared VHD image. SQL Server reads any changed data pages from the differencing file on the server hosting the clone, and all other data pages from the remote image. Each clone is just a differencing VHD mount point, with a connection to the shared Image VHD, holding copies of the MDF and LDF files from the source database. This ‘mount point’ is a physical location on the local drive that is created in the default data location for the SQL Server instance. SQL Clone loads onto these ‘mount points’ the root directory of another volume, in this case the Image VHD. The image VHD is a file on a file share, which can be hosted elsewhere on the network.
There is more to it, of course, for cloning a database. The VHD image must be created using the appropriate Windows API call and, if a live database is being cloned, the database MDF and LDF files must be copied into the Image VHD using the Windows Volume Shadow Copy Service. By doing so, you achieve a transactionally-consistent ‘on disk’ state, in much the same way as a backup or a SQL Server Snapshot is done. Open transactions will not be applied to the image. Alternatively, you can create the Image VHD as before, and then copy in the database files by restoring from a backup, if that is more appropriate, such as when creating an image from a production system.
Each local clone can be treated like any normal database, but one that takes up far less space than the original database. In much the same way that a differencing VHD can be used to reboot the operating system, you can, in seconds, get a database clone back to its original image merely by removing and re-creating the difference file. This is a mouse-click, or a line of PowerShell code.
Setting up SQL Clone
SQL Clone has several different components. It is network-based and relies on a server-based Clone Server, with a configuration database, in order to coordinate things. SQL Clone Server resides on a Windows Server. To install SQL Clone, you don’t need to use a Windows domain within your local network, but if you’re not on a Windows domain, the only restriction is that you need to store images on a file share on the same server. SQL Clone Server uses a browser to provide an interactive console. Since it hosts its own web service, it can be accessed, with the right credentials, anywhere on the network from a workstation or mobile device. This allows you to check SQL Clone and do any operation from anywhere in your network.
SQL Clone Server communicates with Clone Agents, as well as PowerShell, via web services. The Clone Agents do the job of creating and deleting image and clones. The Clone Server coordinates them and does bookkeeping in the config database. The SQL Clone Server needs a connection to a SQL Server, with enough privileges to create the config database.
Other than its own configuration database, SQL Clone Server doesn’t touch any database or file; everything is done by Agents. Each SQL Server instance that will host one or more clones, or from which you wish to create images, needs to have a Clone Agent installed. The Clone Agent that writes the image needs to be able to write to a network location to provide the VHD image files. This file share must be visible to the Clone Agents on any on machines hosting SQL Server instances on which you want to create images or clones. The settings for each agent are held in a JSON file, which can, with a great deal of caution, be altered if it becomes essential to change the location of the management service of the Clone Server.
SQL Clone creates one image and can then set up clones on all SQL Server instances that have a local agent. It can also delete images via a PowerShell script when they have no active clones created from them. SQL Clone Server does the check to make sure that an image is never deleted until all the clones that use the image are deleted first. There are more details of setting up SQL Clone, and its network architecture in particular, in the Best practices for SQL Provision guide.
Just as in a virtual machine, the operating system has no idea that it is accessing a virtual drive, so SQL Server is entirely unaware that there is anything different about a clone. In fact, a clone looks and behaves so much like a normal database that it is tricky to determine, as in Blade Runner, that it is, in fact, a clone. As a precaution therefore, when SQL Clone sets up each clone, using the image, it adds an extended property to the database object so that any process that manages all the databases on a server such as maintenance, backup or audit, is aware that the database is a clone.
The PowerShell module can be run anywhere on the same network, with the right credentials. This allows the users of a clone to revert their copy of the database to the original image without having to pester an admin to do it for them. It also allows provisioning to be done from a single location.
Using clones in development work
For SQL Server, a clone is just another database, and for the developer using a clone, the same largely applies, although there are a few points where developers might be unsure, which I’ll address.
A clone is robust. It is just a VHD mount, so it carries on working even if the SQL Clone Server in unavailable for a brief period. Queries executed on a clone will generally get the same execution plans as queries on the source database; the clone contains all the same data and statistics as the original database. There may, however, be differences if any of the instance-specific settings, such as the
parallelism are different from the source.
There will be performance differences, though. The database file is read from an image located on network storage, and even in the best of local area networks the increased latency means it is likely to be slower than reading from a local hard drive.
If the developer runs operations or statements that make many changes, then the performance may be noticeably slower, as all the affected pages must be served over the network. Also, the differences recorded in the local differencing disks can become enormous and will take up space. However, an individual difference file is unlikely to get beyond the original image size because each page is represented either in the VHD or the differencing file.
In creating a clone, you must be aware of the version of SQL Server that was used to create the image. You can only create a clone on a SQL Server instance that is the same version or newer than the version that was used to create the image. Therefore, it makes sense to keep the SQL Server instances that host the clones upgraded to a recent version.
A clone can be removed or updated by the agent on the server. If a clone is accidentally removed, it is unlikely to be a disaster in testing, and if the developer is working in source control, it is unusual to lose anything important. However, as a precaution, it is perfectly possible to script a routine that regularly saves any changes in the code for the development database. For the developer, it is really a matter of changing habits and rituals to accommodate the slight change in the nature of the database.
If you need to try out a routine that has large repercussions to the data, but then quickly return the data to its previous state, working with a clone is ideal. There comes a time when the alternative of running the routine within a transaction that is then rolled back rather than committed is difficult to achieve. With a clone, the easiest approach for the developer or tester is probably to use the reset-clone PowerShell command, or do it from the SQL Clone Server dashboard. Suddenly, the database is back in its pristine state. Automated tests will be able to tear-down a test run just from a PowerShell command.
If you aren’t working on the trunk development of the database, but maybe a special version for a different country or legislative area, you can still clone the same image, but you’ll need also to create a script that migrates the trunk to your version. This is stored in a template within the Clone Server and applied during the creation of a clone. This way, each developer can have their particular migration script, as defined in a template, during the creation of the clone so that the clone database is configured exactly as required.
SQL Clone allows developers and testers, as well as admins and trainers, to use SQL Server in unconventional ways, for special purposes. It has obvious uses whenever you need many copies of a database, to which you need to make ephemeral changes. It is ideal for training staff to use an application, or to run several tests where the database must start with a particular dataset. It will allow a team of developers to work on a large database at the same time without the danger of affecting each other’s work and will allow developers to run unit tests without permanently altering the data.
The larger the database, the more effective SQL Clone will be in making life easier for all the people responsible for ensuring that a database works well and reliably. It will make it easier to develop and test, to meet the changing requirement of the organisation that uses it. It will make it easier for trainers to ensure that the database is used correctly.
This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.
Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.
He is a regular contributor to Simple Talk and SQLServerCentral.