Supporting Large Scale Team Development

With a large-scale development of a database application, the task of supporting a large number of development and test databases, keeping them up to date with different builds can soon become ridiculously complex and costly. Grant Fritchey demonstrates a novel solution that can reduce the storage requirements enormously, and allow individual developers to work on their own version, using a full set of data.

I recently helped a DBA team that had to maintain about 30 different virtual environments for a large-scale development effort. There were so many different environments because the application in question had a very large number of other systems that it was dependent on, and that were dependent on it. All these cross dependencies meant that the different development teams working on different applications were at varying levels of completion. With all that going on, we had to keep three versions of SQL Server (2000, 2005, and 2008) up-to-date in all 30 environments. There were multiple databases on each version, all at different stages of completion. For any given database, there might be between three and five identical installations, but they varied quite widely across the different environments.

Maintaining the servers, while a serious task that took lots of time, was not very hard. Thanks to the virtual environments, it was really easy to maintain patches and settings across all the environments. The truly difficult part was the two things we had to do to maintain a common set of databases, some of which were refreshed from production, some deployed by the various applications. First, we had to deploy database changes to all those environments. Second, we had to refresh the databases to the different environments, on demand. The term “refresh” in this case simply meant restoring a backup from production to one of the development environments. To say the least, all this could keep a person hopping.

The solution for deploying the databases was largely solved at development time. All databases were built out of source control. This means that techniques similar to those outlined in the Team Development book were followed pretty closely. That only left refreshing the databases on demand.

At the time, the only option we were aware of was to perform full restores of the databases. That meant we spent a lot of time developing automation tasks around those restore operations, in order to attempt to make the refreshes easier and more repeatable. We built routines in PowerShell to check for available disk space prior to a restore, in order to avoid filling up the non-virtual hard drive space that we had to maintain all over the place. We were even working on a mechanism for monitoring database size in production and available disk space in the environments. This would generate reports on a daily basis to show where we were hurting for space, before anyone even attempted a database restore. In short, these processes generated a lot of manual labor, all because we couldn’t get away from physical storage despite our virtual environments.

All this was necessary because it’s easier for developers to use production data for testing than it is to create and maintain large amounts of very specific test data. Yes, setting up valid business test data is possible (and I’d recommend you check out Red Gate SQL Data Generator for some good ways to automate it), but it is a lot of work, and if you don’t do that work as you develop, you have to fall back on production data.

That’s where Red Gate SQL Virtual Restore would have come in handy. SQL Virtual Restore can mount a backup file to a server as if it were actually restoring the database. A virtual restore would clearly be useful for all sorts of disaster recovery scenarios, restoring an individual table or row from a backup, reporting, offline consistency checks, and lots more. But, just because you’ve set up one server to access the backup file, that doesn’t mean another server can’t access it at the same time. That’s right! You can actually have multiple databases on multiple servers at the same time when using SQL Virtual Restore. This isn’t just reads. You can do writes too. Here’s a quick run through on how easy it is to set up SQL Virtual Restore to have multiple servers accessing a single backup file.

I have one instance of SQL Server 2008 R2 running on my laptop. There is a second instance running within a virtual machine hosted on the laptop. SQL Virtual Restore is installed on both. From there, it’s almost fun. First, I take a standard, native, full backup of the AdventureWorksLT database. I’m using the “lite” version of AdventureWorks just because the file management is a lot easier.

This location on my machine is shared with the network, so I can access it through the UNC path from other machines, including my virtual machine. With the backup in place, I can create a virtual restore of the database. You can launch SQL Virtual Restore from the Red Gate folder in the Start menu, or directly from the Tools menu in SQL Server Management Studio (SSMS) on the machines where you’ve installed it. The SQL Virtual Restore GUI opens as shown here in Figure 1:

1257-Fig1.jpg

Figure 1: Select SQL Server

You can use this to select the instance you’re interested in accessing. I’m going with the defaults since it’s my local machine. Clicking on “Next” will move to the “Select backup files” window as shown in Figure 2:

1257-Fig2.jpg

Figure 2: Select backup files

Using the “Add Files…” button at the bottom of the screen will open a standard file selection window. You can use it to navigate to the file location of your backups, whether a UNC or a defined path on the machine that’s running the software. SQL Virtual Restore can work with SQL HyperBac or SQL Backup Pro files and, as I’m demonstrating here, native backup files. Once I’ve selected the backup I created earlier, it appears on the list of available backups, as in Figure 2. Clicking the “Next” button moves me to Step 3, as shown in Figure 3:

1257-Fig3.jpg

Figure 3: Specify destination

The options available here largely define themselves. The first section allows you either to create a new database, the default, or overwrite an existing database. There will be files created by this database, but they are extremely small, 128kb, demonstrating the best thing about using SQL Virtual Restore: saving tons and tons of disk space.

If the backup being restored has outstanding changes, such as transactions that were committed during the backup process, these will also be written to the virtual files. This may affect the size of those files. Think about the scenario at the beginning of the article, with multiple different environments, each of which had multiple versions of SQL Server, all with databases requiring storage space. Upwards of a terabyte was used just for this development environment. Imagine what could have been done if, instead of full database restores – 5.3mb each, let’s say, the original size of the data file for our test with AdventureWorksLt – it could have just been 128kb plus the size of the backup. When you consider you might have 3-5 databases, you then start to multiply the savings without having to maintain multiple copies of the backup. Yes, that’s a huge saving.

Anyway, back to the task at hand. In the third section, you can have the SQL Virtual Restore process perform DBCC CHECKDB on the database it’s creating. Finally, because you can use log backups and differentials with full backups to restore databases to a point in time, you have the option of not performing the recovery action at the end of the backup which will bring the database online. Keeping the defaults and clicking “Next” will open the Step 4 of the process, as shown in Figure 4:

1257-Fig4.jpg

Figure 4: View summary

This fourth step shows what you are about to do in a summary. You can click on the “View Scripts…” button to check out the script being used – more on that in a minute. The summary lists out all the pertinent information for what’s about to occur. At the bottom of the screen, the HyperBac Control Service configuration shows that I don’t have “.bak” files associated with HyperBac, but that the engine will happily register this backup for me. You can manually configure the HyperBac Service to recognize this type of backup file, but I find that simply allowing the process to configure for me works well enough in most cases. Clicking “Restore” at this point will create a new database through the restore process – a virtually restored database that is actually still just a backup file.

The question you might be asking yourself is, “how the heck is that happening?” Well, let’s take a quick peek at the script used to perform the restore. By the way, because it is done through scripts, it’s entirely possible for you to automate this process, so if you need to maintain 30 different environments, you can use this approach. Here’s the script:

Look through it carefully, because, except for a very minor difference, this is just a SQL Server restore operation. Spot the key point? Take a look at the file extensions, .vmdf, not .mdf. What’s happening is that the HyperBac service is intercepting this and performing all the work behind the scenes, sort of spoofing SQL Server that it’s getting a new database restored, when nothing of the kind is occurring. What you get in the end is a full-fledged SQL Server database, it’s just that Hyperbac is working at a lower level to manage these particular data files in conjunction with SQL Server. For a more detailed overview of how this works, check out Brad’s Sure Guide to SQL Virtual Restore.

Clicking on the “Restore” button will run the process and result in a final screen (Figure 5) that shows the completed process, the sizes of the data, and the space saved:

1257-Fig5.jpg

Figure 5: Performing virtual restore

As shown, only a small amount of space was saved, because we’re only talking about a single instance that is being served by this backup file. But, when two or three or five instances are being served by this one backup, then the savings multiply.

Running the exact same process on the virtual machine results in a copy of the database being available there as well. Here’s the magic moment. I’ll run this simple SELECT statement on both machines:

You can see the output of this query on my laptop instance and the virtual instance in Figure 6. The results are identical:

1257-Fig6.jpg

Figure 6: Select statements from two machines

In the foreground is the query and the results in my virtual machine, and behind it you can see the same query run from the other machine. Now, you can get excited, but let me provide some more excitement. I can modify data within this new “database” that I’ve created. But, and again, this is a magic moment, the updates on one machine are not affecting the other machine. In Figure 7, I’ve run an update to the data on my laptop. You can see the results of the update on the laptop, but the old data still in place on the virtual machine:

1257-Fig7.jpg

Figure 7: Updates that are different

While this does look like magic, TANSTAFL (There Ain’t No Such Thing A Free Lunch) still applies. If you recall, a very small set of files is created for these virtual databases. When you begin modifying data within the virtual database, it has to be stored somewhere. Committed transactions are stored to the virtual log and then written to the virtual data file that was used to define your database. The backup is read-only the whole time, so you won’t see changes there. As you modify data, these will grow. So the more you modify your data, the less having a virtual restore will help you. In fact, the more it hurts, since the changes have to be reconciled on the fly with what’s in the backup. Like I said, there is no free lunch here, but some very well defined processes with results you can anticipate. To date, Red Gate has tested this approach with 10 databases created from one backup and more testing is being planned.

Just remember, these types of SQL Virtual Restore uses really shouldn’t be considered for permanent databases. In volatile development environments, such as the one I described earlier, virtual databases can work. In a more traditional shared development environment, the disk space saved will be lost quickly as data and structures are modified. If databases are going to be maintained for more than straightforward integration testing, you’re much better off looking to save space using SQL Storage Compress. Then you also get all the other benefits that SQL Storage Compress offers.

Back to the multiple environments and multiple database copies for a moment. You can see how, if SQL Virtual Restore had been available, huge amounts of space could have been saved. Instead of having to maintain terabytes for a development environment, gigabytes might have sufficed, more than paying the licensing cost for this software.

It’s very easy to automate this process for development environments in general, because you can take advantage of standard T-SQL scripts to activate it. While there is likely to be some contention as multiple servers access data from a single source, that shouldn’t be an issue, because this is a solution for a development environment. It could also be used to work with QA or Production Support environments, again giving people access to data without having to maintain large amounts of storage space.