It isn’t easy to get enough meaningful data to work with, for developing databases. As a DBA, I would prefer developers to use very small databases consisting of their own data or a well-defined set of test data. By having smaller data sets, it is easier for the DBA to manage the myriad development and QA environments. However, the testers and developers would rather not spend time creating dummy data when perfectly good production data is available. They have a very good point, because the best data to develop against and test with is the data that the users of the application will see. How do you resolve this disparity in approach?
Many, perhaps most, organizations, just go with using production data in their development environments. It speeds up the development process and makes testing more accurate. But there are snags. First, you can’t expose all the company data to everyone in the company, or risk exposing it to anyone outside the company: This means that you must put mechanisms in place to remove or modify the sensitive data. Second, frankly, it takes up a lot of space. Sometimes, in my experience, more (much more) space was dedicated to development and QA than was dedicated to production. This was because they had many copies of the production database for development, multiplying the storage requirements.
There can be another problem: You can hit trouble if you don’t have an established and thoroughly documented methodology for deploying your database. It’s very easy to set up a deployment for a development sandbox machine when you’re building it from scratch and supplying your own data. However, when you introduce the complication of bringing the production database and its data into your development sandbox, you must come up with a much more sophisticated approach.
Given these problems, what can you do to make a difference?
SQL Virtual Restore and SQL Source Control
There are two core problems: you need to have a mechanism to get at production data without having to have terabytes of storage and you need a mechanism for deployment that updates production database schema from source control. Red Gate’s SQL Virtual Restore and SQL Source Control is a match made in heaven for the development sandbox, answering both questions directly. These tools will also help to solve the more detailed problems such as cleaning up sensitive data.
In this article, I’ll show several ways that these tools can be used together to define a sandbox server and deploy to it as part of a development process. For simplicity, we’ll imagine a database that has been under development for some time. It’s checked into source control through SQL Source Control and has been deployed to production. Now we want to move the data and structure from production back down to our sandbox environment, in order to begin a new phase of development against the database. The first step is to set up a database in SQL Source Control.
I’m starting with a blank database, BookInventory. Since I like to work directly out of source control at all times, I’m going to link to it right away. This is the initial SQL Source Control screen, prior to linking the database:
If I click on the “Link database to source control…” link, highlighted in blue on the page above, it will open the “Link to Source Control” window. In this window, I can select different source control systems to hook up to. Red Gate even provides a version of SVN that can run for you during evaluation, so you can try the system out. The built-in systems are SVN, TFS and Vault. If you have a source control system other than these, and it has a command line interface, you can configure it to work with SQL Source Control.
On the “Link to Source Control” window we’re also going to pick whether or not this database is a shared resource, or a dedicated database. Since we’re looking at setting up a sandbox machine, it’s a dedicated database. Clicking on the OK button will create the link. The SQL Source Control screen should now look like this:
Although this database is linked, there is nothing to pull out of source control in this instance, because I’ve just added a blank database. So I’ll add a couple of tables to my database like this:
CREATE TABLE Book
BookId INT NOT NULL,
BookName VARCHAR(150) NOT NULL,
CONSTRAINT BookPk PRIMARY KEY CLUSTERED (BookId)
CREATE TABLE Author
AuthorId INT NOT NULL,
AuthorName VARCHAR(150) NOT NULL,
CONSTRAINT AuthorPk PRIMARY KEY CLUSTERED (AuthorId)
CREATE TABLE AuthorBook
BookId INT NOT NULL,
AuthorId INT NOT NULL,
CONSTRAINT AuthorBookPk PRIMARY KEY CLUSTERED (BookId, AuthorId)
ALTER TABLE AuthorBook WITH CHECK ADD CONSTRAINT AuthorBookFKAuthor FOREIGN KEY(AuthorId)
REFERENCES Author (AuthorId) ;
ALTER TABLE AuthorBook WITH CHECK ADD CONSTRAINT AuthorBookFKBook FOREIGN KEY(BookId)
REFERENCES Book (BookId) ;
Once the objects needed for production are complete, I’ll check them into source control:
I’m going to go ahead and load some data into the database using SQL Data Generator, just to provide something to query against, and so that the database is larger than the default. I inserted just 15,000 rows into the tables and grew the full size of the database out to 15mb.
This database now gets deployed to production and, as the application it supports gets used, it becomes quite large, so that copying it into our development environment will be far too expensive. Instead, I’m going to apply SQL Virtual Restore to a recent backup of the production database to create a virtual database that will become the new development database. The restore operation is very simple to show in T-SQL:
RESTORE DATABASE [BookInventory_Virtual]
FROM DISK = 'd:\bu\BookInventory.bak'
WITH MOVE N'BookInventory' TO N'C:\Data\BookInventory_Virtual.vmdf',
MOVE N'BookInventory_log' TO N'C:\Data\BookInventory_Log_Virtual.vldf'
It’s basically a normal restore statement, but the addition of the letter “v” to the file names acts as a trigger for the HyperBac Service – the Windows service through which SQL Virtual Restore runs – to create this as a virtual database. We’ll call it BookInventory_Virtual, SQL Virtual Restore’s default suggestion. When the restore is completed, the amount of space used by the database itself is only 1mb, 14mb less than the original. This saving becomes even more dramatic as the size of the original database goes up. In my own testing I’ve seen a database of over 100gb get created as a 3mb virtual database. The space saving comes about because we’re not actually creating a new database. Instead, we’re creating a virtual database which just keeps small file stubs for the virtual database. All the data is still in the backup file. As you make changes to data, the changes are stored in the file stubs, so that over time these may grow beyond their initial small size.
Once the virtual database is created in our sandbox environment, we will need to attach it to the existing database in our source control system through SQL Source Control. The linking process is exactly the same as in the initial example. At this point in our tests, there are no differences between what we’re referring to as production, our backup of the original database, and the new virtual database just created. This means that after linking our database to source control we won’t need to move anything into or out of source control.
Now then, let’s assume that we need to add a stored procedure to the database. Here’s a good one:
CREATE PROC ListBookAndAuthor
FROM dbo.Author AS a
JOIN dbo.AuthorBook AS ab ON a.AuthorId = ab.AuthorId
JOIN dbo.Book AS b ON ab.BookId = b.BookId;
And of course this is created in our sandbox environment, BookInventory_Virtual. Once it’s created, we’ll need to check it in through SQL Source Control. Let’s assume that another developer is working on the same system. He’s going to go through the same process, creating a virtual database and then hooking his database up to source control through SQL Source Control. Once he’s configured, he’ll get an alert that there are things available in the Get Latest window:
This other developer can click on the ‘Get Latest’ button and just move forward from there. Clearly, in a normal development environment, this sort of thing can go round and round for a bit. But what’s going on with the backup file in the meantime?
The backup file itself is not changing. However, the very small placeholders created by the HyperBac Service, which define your virtual database, will be growing as you add objects or modify data. These will store the information for all system modifications over time, and that’s something to keep an eye on. You wouldn’t want to keep a virtual in place too long on your system, because these files will grow to the size of the original, and more, depending on what you are doing with your database.
At some point, we’ll want to get a new copy of the database from production. This could be because of changes to the production data that you want to have available for testing, or because your virtual files have started to grow. If we simply rerun the restore operation that created our virtual in the first place, and this time, replace the existing database, you’ll get a new virtual database in its place. But what happens to SQL Source Control when we do this? Nothing at all. It recognizes that it’s still the same database that was attached previously, but, the next time we go to that database with the SQL Source Control window open, we will see all outstanding changes waiting for us in the Get Latest window.
Using SQL Virtual Restore and SQL Source Control together to create and maintain a development environment based on your production database is a match made in heaven. You will have the ability to quickly and easily get a copy of production on your development machine, using very little disk space. Then, you can develop and deploy your database directly out of source control, so that you achieve tight integration with your development teams.
To watch the recording of Grant’s webinar ‘A Sandbox Development Process’, click here.