4 July 2018
4 July 2018

Deploying Multiple Databases from Source Control using SQL Change Automation

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Quite often, in a database development project, you need to create several copies of the database under development. They need to be to the current version of the build, or a previous specific version. You need to fill them with a version of the development data that is anonymized. For regression or integration testing, you often need a standard unchanging version of the data against which you can test the results of a process. We’ll use SQL Change Automation PowerShell cmdlets to do this.

I’ve described how to set up SCA in a previous article, and demonstrated a simple process building a version of a database, publish it as a NuGet Package, and use it to synchronize a target database. It is best to refer to that article for instructions on setting things up.

Here, we’ll get slightly more ambitious, and show how to maintain a group, or cell, of databases. As this is quite a long article, we’ll deal with topics such as how you anonymize the data, separately, in a later article. We will also duck any questions about specific source control by using a code directory.

Parallel database testing

Let’s take a simple, imaginary example. The lead developer of AdventureWorks wants to be able to implement a test cell that will allow the team to run not only the current version of AdventureWorks through the many batches of integration and regression tests, but also one of the previous versions. Tests are being added, in the light of bugs that have appeared, and the developers would like to know when a bug was introduced, and what changed at that point.

The team need to develop using a standard, anonymized version of the actual database data, which is kept in a database on a server. This database has the same table structure as the current build. The databases must end up stocked with data, and ready for work. If the build fails, the process must stop at that point, and everything must be logged, so that the process can run overnight.

The team decide to implement a PowerShell-scripted process that will take the build from source control, and use it to update a list of databases, on various servers, to be at the required version, and then stock each with the standard test data set. The solution will, ultimately, allow them to keep an archive of the build components for each version, as a zipped NuGet package, in a configuration management archive, so that they can build any previous version from its NuGet package. Obviously, all this must be a relatively quick and easy task.

How the script works

SCA PowerShell components coexist easily with the standard PowerShell components. I use SQL Server Management Objects (SMO), as exposed within the sqlserver PowerShell module, to do all the routine stuff that it is so good for, and use the SCA Cmdlets for the specialized things.

Together, these two modules provide a comprehensive toolkit. Much of this script is similar to the version in my previous article, and so I will not repeat here the descriptions of the various processes, such as checking for connections and valid paths, validating the build, and so on.

Validation

To check or validate a build, SCA takes the source code, uses it to builds a temporary database, on a server, runs the checks, and then deletes it. SCA is relaxed about the form of the source code. You can build from a variety of sources such as a single build script file or a source code directory

For simple databases, you can even use LocalDB for the temporary database, but don’t count on it. The Server needs to be on the same version, or a higher version, of SQL Server and have all the extra components, such as Full-Text Search (FreeText), that are used in the database. This temporary database needs care but once it is set up for the project, it looks after itself.

Since we’re building a database that is tested, and then disposed of, we don’t need the complication of building within a simple transaction that can be rolled-back on error. If your database uses Full-Text Search, it can’t be done within a single transaction anyway.

Kill ‘n’ Fill

The only good reason for doing a migration of a database is to preserve the existing data. Test databases are unlikely to be able to preserve the correct data for long, by the very nature of the tests, and the same is usually true of development databases. Unit tests, and any other types of test of a complete process, are likely to change the data. For this exercise, we’ll use a different approach, which is to delete the existing database along with its data, and create it all anew. You can script a backup first in PowerShell, but I haven’t shown that here (see instead How to Automatically Create and Refresh Development and Test Databases using SQL Clone and SQL Toolbelt).

So, the PowerShell routine will kill the destination databases before building the new databases and then filling them with data. The best and fastest way of copying the data is to use native-mode BCP. You use BCP both for the copying of data into a file directory, and for inserting it into a newly-built database. Each table resides in its own file. For this demonstration, we will keep the data in a file directory rather than attempt to store the data in the NuGet package, because of the distraction and overhead of zipping and unzipping.

If you have anonymized the original data in-place, then the kill-And-fill technique will make it far less likely that any remains of the original data will left after it has been copied across to the new databases. Conversely, if you anonymize the data and then copy the databases across to each destination, either using backup-and-restore, or by detaching, copying the MDF/LDF files and attaching, then both the database transaction log and the data pages are likely to retain vestiges of the data you were trying to mask.

A complication of using BCP native-mode imports is that they leave database constraints disabled. They will all need to be re-enabled, once the data imports are finished. This can introduce a subtle bug, if you had disabled one or more of those constraints, deliberately. This requires a SQL script because the SMO (sqlserver) way of doing it is clunky. On the plus side, this script should serve as an illustration of how to do SQL calls within PowerShell, when using SMO.

Each of the destination servers must have the correct version of SQL Server, as well as all the additional features required for the database to work. A build is rather more sensitive to this than a restore or an attach. AdventureWorks can easily be restored onto a server, even if full-text search isn’t installed, but if you try to build it from the source code, it will cause an error.

The PowerShell script

Once again, I’ve provided the configuration data in the body of this script. This isn’t done because I think it is a good idea, but to keep things simple. It is best kept separately, so the same process can be used in several different contexts. Otherwise, any change in a path, database name, or the like, would require a change in version in your source control system. Also, I don’t even like keeping usernames in PowerShell scripts. In my previous article, I showed how to read the information into the $config PowerShell variable, as a hashtable, and how to ensure that the data was all there.

The script starts by reading the configuration data into a hashtable. Once it has initialized everything, and made sure that all the file paths exist, it creates the serverConnection objects for each server in the list, and adds them to the hashtable. This provides all the login information for the various cmdlets, SCA, SMO and BCP. With this login information, we can then kill any existing versions of the target databases and create new empty databases. We go through all the databases, check that we can connect to them via SCA, and store the SCA connection information.

We now read the source control directory. We use SCA to build the temporary database on a server and use it to validate the build. We create a NuGet file and store the validated project object for the build stage.

Before we build and stock all the databases, we must export all the data from the data source. In reality, this would contain your masked and anonymized data, but here we just use the data from the live AdventureWorks database, the source of which we already have in source control. We use BCP to export the data from each table in native format into files in a file directory, one for each table.

We use the validated project object to synchronize the database schema with each empty database that we have created. This is, in effect, a clean build. We then fill each database table with the data for it, and enable constraints.

Conclusions

The kill ‘n’ fill database build process, which under the covers is a synchronization with an empty database, is an approach that is fine for a relatively small-scale database, as an alternative to a migration approach, or synchronization with a full database of a different version. I use it after an anonymization process, as it leaves no trace of the original data.

For bigger databases, you will need to adopt the method of restoring a backup, or creating a VM, using Docker or using SQL Clone. There is no single ‘best’ way of copying databases. It is always good, though, to have plenty of alternative techniques to meet your requirements.

In our simple imaginary example, we have started down the road of achieving what the fictitious developers of the IT Department of AdventureWorks wanted. The also had the ideal of being able to deploy database changes contained in a NuGet package, basically to provide a test database at a previous version as well as the current one. We’ll tackle this in the next article.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

Related posts

Also in Hub

The ‘Right to be Forgotten’ and Data Masker for SQL Server

The right to be forgotten is one of the main features of new data protection legislation across the globe. Under Article 17 of one such piece of legislation in Europe, the GDPR, individuals have the r...

Also in SQL Change Automation

SQL Change Automation with PowerShell Scripts: getting up-and-running

In this article, I'll demonstrate how to use SQL Change Automation, a PowerShell script, to take the contents of a source control directory for a database, check that it is possible to build from it, ...

Also in Product learning

Scaling SQL Monitor to Large SQL Server Estates

Most organizations are finding that the size and number of databases that need to be monitored is increasing, unlike the number of people available to do the work. Now that it is possible for the orga...

Also about automated database deployment

Building Better Test Data with SQL Provision

Development teams make software available for release once they are confident that it behaves consistently, as it was designed to behave, under as many different user workflows as they can test. Unfor...