How to Create Ten 200GB Test Databases in 60 Seconds

In a recent survey run on community site SQL Server Central, 58% of respondents admitted that their test database is an exact copy of their production database. Many of these respondents concede that they should be removing sensitive data, but aren’t.

It seems that the desire to have realistic data will for many trump the desire to keep sensitive data out of development and test environments.

The reality is that DBAs, developers and testers benefit immensely from access to production-like databases for testing application changes, testing schema changes, and testing ad-hoc queries. But the average production database is between 100Gb and 1TB, and often locked down for data privacy reasons.

So what to do?

1)Test against a hand-crafted test database. Hand-crafting test data is fiddly, tedious and incredibly slow, with the data rarely resembling real data. And it will never be big enough for load-testing.

2)Test against automatically-generated data, e.g. using Redgate’s SQL Data Generator. The data quantity is realistic, but generating this database will take a long time and still requires the full disk storage requirements. Also, this data will not be realistic or meaningful. For instance, your application may have a feature where it shows a Google Map of a customer’s address. If the customer’s address is just a random string, rather than a valid address, then this feature cannot be tested.

3)Restore a backup of production. The data is now perfectly realistic, but you still have problems. Restoring takes lots of time and disk space, particularly if you have to provision many copies. Also, testing is often destructive, so you’ll need to keep re-restoring for all your testers. There’s also the data sensitivity angle – you probably want to restrict developers and testers from having unlimited access to production backups – a leak like this caused the recent Patreon hack.

Today I’d like to talk about a new technology we’re working on at Redgate – with the working-title SQL Clone. With SQL Clone, you can take a ‘data image’ of a database, then rapidly create an unlimited number of clone databases. These clone databases take only a few seconds to create and occupy about 50Mb each, but have the same schema and data as the original.

SQL Clone

My database, ProdData is 12Gb is size. I want the ten testers in my team to each have a copy of the production database, but I don’t want to spend ages restoring databases or use loads of disk space. So I decide to use SQL Clone.

I go into SSMS, and select my database in the Object Explorer:

ss10.png

I click ‘New Snapshot’ to create an image of ProdData. This snapshot has to make a copy of the existing database, which takes 2 minutes and uses 12Gb. This is an expensive operation time-wise and storage-wise, but it’s a one-off. Once completed, we have a data image snapshot ready to clone.

ss20.png

Now I can click New Clone and create a test database. Each takes about 5 seconds to create, and uses 44Mb of hard-disk space, nothing like the 12Gb original! Even if your production database is 500Gb, SQL Clone will still only take 5 seconds and use tens of megabytes for a clone.

Here I have created three test databases: ProdData_Test01, ProdData_Test02 and ProdData_Test03. Note that they appear in the Object Explorer alongside the original ProdData database.

ss30.png

In the database properties, you can see that SQL Server believes ProdData_Test01 is just a normal 12Gb database. SQL Clone is using Microsoft’s virtual-disk technology to just store the differences between the test database and the data image I created earlier.

ss40.png

These test databases are effectively sandboxes. They can be read and written without touching the original database or other clones. They can be trivially created and destroyed as you wish. A query against a clone behaves identically as a query against ProdData:

ss50.png

Data Sensitivity

But what if your original database has sensitive data? For instance, you may have a CreditCardNumber column in your Customers table which you want to hide from QA and developers.

You can mask this data using Redgate’s SQL Data Generator. When creating the data image, you can use SQL Data Generator to generate random (but realistic-looking) data on top of the sensitive data. This isn’t yet integrated into SQL SQL Clone, but the feedback we’ve gathered so far indicates that fast, secure, easy data-masking should be integral to SQL Clone.

Roadmap

We’re currently still working on the alpha. SQL Clone won’t be production ready until 2016. In the meantime, sign up to the Early Access Program and let us at Redgate know which features you want.