14 March 2017
14 March 2017

Automatically build-and-fill multiple development databases using PowerShell and SQL Data Generator

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

When you are working with a database, you always need data. This is why you need to stock the database with data after you build it. Sometimes, you just want a large number of made-up customer details, sales figures or the like. You also will need columnar data, sometimes known as ‘static data’ or ‘enumerations’, saved in a file. Occasionally, you need data that is generated statistically using a suitable language such as Python.

My previous article about automatically filling your SQL Server test databases with data, explained how you can use PowerShell and SQL Data Generator (SDG) to automate data provisioning for test databases. The assumption there was that you want to build the database, fill it with data, run the tests, and tear it down again. You’ll never want to modify the test database.

Here, I’m going to show how to extend this build-and-fill method to development databases, where each developer will subsequently want to alter the data or metadata in his or her copy of the database. To do that though, we’ll need a .sqlgen file for every development copy of the database.

Working on a single, shared database

Let’s say the development team uses a single, shared database for development work. You can use the SDG GUI to specify where you want the data taken from, or how you want it generated, for the shared database. Having devised a data generation plan for each table, you can save the associated .sqlgen project file into source control, with the same version number as the database. In this way, you can build the latest version of the database, from a build script in the VCS, and then fill it with data, all in one step.

Each time you make a change to a table, or add a new one, you will also need to use SDG in interactive mode to update the method of generating the data for the changed table, or create a new data generation strategy for the new table. It is a very good idea to get into the habit of treating as a single operation the table changes and the strategy for re-filling the altered tables. In other words, don’t do the first without working out how you going to do the second, and then commit the table changes and the changes to the SDG file to source control at the same time.

This allows the team to work on any previous version of the database from source control, merely by building a previous version from the VCS, and using the corresponding version of the .sqlgen project file to fill it.

Generating a SDG file for each database copy

Things get a little more complicated if, for example, each developer has a dedicated development database that they need to be able to subsequently modify. The problem with SQL Data Generator is that it assumes you are using a single database on a specific server, both of which are specified in the project file.

Let’s say you have the ‘master’ copy of the SDG file, for a specific version of a database, saved in GitHub. You’ve used the script in my previous article to build-and-fill a couple of new copies of this database, using this master file, but you subsequently need to alter the schema of one of the database copies. When you try to open the SDG project to update the data generation plan accordingly, you’ll receive an error.

What you need is a separate SDG project file for each copy of each version of the database you are working on, whether you are using it for test, sandboxing or development. Each of these copies of the SDG file would also be saved to the VCS.

The script in Listing 1 will generate a separate SDG project file for each of the specified database copies. It doesn’t matter what particular server or database (or password) details are stored in the master SDG file, in GitHub, because we can override them from the command line.

Listing 1

You must include in this script each location of the database, when creating a new version of the database. Having done so, you can then use any of these SDG project files to access the database within the SQL Data Generator GUI. The one you choose depends on which copy of the database you need to access to tweak the data generation process. Your tweaks will then need to be merged into the master copy of the SDG project file, for that database version, otherwise they won’t be saved.

Adding the build-and-fill processes to the routine

You will probably have spotted that it is easy to modify this code so that you can build each copy of the database, and fill it with data, at the same time as generating the its associated project file.

So, here is the script. Note that the database build script deletes any existing versions of the objects as part of the build.

Listing 2

Of course, because you are using SQLPS to get at the invoke-SQLCMD command, you can use SMO for all sorts of operations that are specific to your build. If you do so, it is worth creating a server object for each server you install on, and pass that object to the invoke-SQLCMD command.

Summary

The technique described in this article is simple but quite powerful. It allows you to automate the process of building, and then filling, multiple copies of the same version of the same database.

By creating a separate SDG project file, per database copy, developers can make changes to their own copy of the database and then, in the same step, update the data generation plan to reflect that change. They will then commit the database change and the changes to the data generation plan, in a single step. As long you subsequently merge changes to individual SDG project files back into the master SDG file, for that database version, you’ll always be able to build and fill any version of the database.

If you’d like to explore more of what you can do with PowerShell and SQL Data Generator, download a fully-functional free trial.

Tools in this post

SQL Data Generator

Create large volumes of realistic test data with a couple of clicks in SQL Server Management Studio.

Find out more

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

You may also like

  • Article

    Pseudonymizing a Database with Realistic Data for Development Work

    For this demonstration, we will take AdventureWorks and produce a pseudonymized copy for development work. The aim is to surgically alter just the data that can identify individuals but leave everything else intact. I’ve shown how to do something similar before, for a subset of the data, in Pseudonymizing your data with SQL Data Generator.

  • Article

    How to generate realistic text data using SQL Data Generator

    SQL Data Generator (SDG) is very handy for making a database come alive with what looks something like real data, and, once you specify the empty database, it will do its level best to oblige. To get the best results though, you need to provide SDG with some hints on how the data ought to

  • Article

    Realistic, simulated data for testing, development and prototypes

    Generating realistic test data, which reflects accurately the nature and distribution of the data it is emulating, is a challenging task. The task is made more complex if you need to generate that data in different formats, for the different database technologies in use within your organization. This article proposes a scheme for using Redgate

  • Article

    Protecting production data in non-production environments

    A traditional IT problem in many organizations is that the development and operations teams each work in their own silo, and each tends to regard the other as having different, and often conflicting, goals. Developers are focused on the needs of just one application. Operations must keep the whole maze of production systems running throughout

  • Article

    Generating realistic dates using SQL Data Generator and Python

    When you’re generating test data, you have to fill in quite a few date fields. By default, SQL Data Generator (SDG) will generate random values for these date columns using a datetime generator, and allow you to specify the date range within upper and lower limits. This is fine, generally, but occasionally you need something

  • Forums

    SQL Data Generator Forum

    Test data generator for SQL Server databases