9 March 2017
9 March 2017

Automatically filling your SQL Server test databases with data

Guest post

This is a guest post from Phil Factor.  
Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

This article explains how you can use SQL Data Generator (SDG) to automate data provisioning for test databases, during the database development cycle.

Having first created an SDG project file for a database, to define the data generation strategy, we can write a command line batch script, or PowerShell script, which uses a build script in source control, to create a copy of the database in the test environment, and then the SDG project file to fill it with data.

There is an underlying assumption that you will never need to subsequently alter the test database, because that would mean altering the SDG project file to reflect the schema changes, and for that you need one project file per copy of the database; my next article will show how to achieve this.

However, if you just need to build a new test database, from the latest version in source control, fill it with data, run the tests, and tear it down again, then the technique shown in the article will do the job. It follows on from some preceding articles, which described how to generate realistic text, dates and addresses for individual tables in a customer database.

Using SQL Data Generator from the command line

Your first task is to use the SDG’s GUI to design a strategy for filling each of the tables in your database with data.

If you already have test data, SDG can load that from a text or CSV file. If you have an existing database, you can use BCP to export the table data, and then use SDG to import that data, after whatever obfuscation or cleansing you do to it. SDG takes care of inserting it in the right order, and the other complexities of interdependent data.

Alternatively, you can just let SDG generate the data for you. This strategy can range from accepting the ‘default’ data generated by the tool to using advanced regexes to generate fake data that is almost indistinguishable from the real data (see the previously-referenced articles for some examples).

Whichever way you do it, once you’re done, simply save into source control your SDG project, to capture in the .sqlgen project file all your data generation settings and options for that database.

We can then call SDG from a batch script or from PowerShell, and SDG will use this project file to fill a new copy of the database with data, as long as it can reach the server, and your Windows login has the necessary access rights.

The .sqlgen project file is an XML file that can be read and updated in PowerShell but this is probably never necessary. Usually the only items that need to be specified are the database and the server instance.

A command line ‘build and fill’ example

You have a database that you are developing. It could be that it hasn’t yet gone live, or it might be that either you can’t use, or won’t accept the risk of using, live production data to run your tests. You have to create data or load pre-prepared data.

Every working day during development, you make changes to the database, unit test them and save them to your Version Control System. The overnight build creates the database, and fills it with data before running your integration tests. I’ll show you how to build and fill your database using SDG.

Listing 1 shows a simple PowerShell script that builds the database from a script stored in source control, then fills it with data, according to the .sglgen project file. To keep to the topic of creating the data, I’ll leave to one side how we created the T-SQL Database build script (in this example, I wrote it as one script).

Listing 1

You will, if all works well, have a database that is already stocked with data, ready for integration tests and performance tests.

Troubleshooting data generation problems

Things can go wrong, of course. If SDG finds that it cannot, for some reason, insert the data, the whole table insertion is rolled back.

The error is not returned as a DOS error by the application. Instead, you have to check the report produced by SQL Data Generator (an extract from one is shown in Figure 1 below):

SQL Data Generator 1

Figure 1

You can export the report to a file and search it with a regex. The most obvious way is to search for the word ‘exception’.

Conclusions

If you regularly need to stock a database with faked data, or a standard data set, in order to conduct any sort of testing, then SDG is designed to be the tool to do this. It will work happily with any Build system such as Jenkins or TFS, and is just as amenable to use with PowerShell.

I like to store a version of the SDG project with the build scripts in source control so that it is possible to build, at any time, any version of the database, along with its data. To do this, you need to add any additional files used by SQL Data Generator, along with the script, into source control.

The technique I describe in this article is fine for test databases that you don’t need to subsequently modify, but what if you want to build the latest version on your development workstation, and stock it with data?

You’ll want to be able to make subsequent alterations to the database schema, and then you’ll need to be able to adapt the data generation plan for the affected tables. To do that though, you’ll need a .sqlgen file specifically for every development copy of the database.

That’s the topic of my next article, coming soon, Automatically filling your development databases with data.

To find out how you can generate realistic test data fast, download a free, fully-functional 14-day trial of SQL Generator.

Tools in this post

SQL Data Generator

Create large volumes of realistic test data with a couple of clicks in SQL Server Management Studio.

Find out more

Guest post

This is a guest post from Phil Factor.  
Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

Related posts

Also in Hub

Misuse of the scalar user-defined function as a constant (PE017)

Do not use a scalar user-defined function (UDF) in a JOIN condition, WHERE search condition, or in a SELECT list, unless the function is schema-bound. Scalar UDFs are often used without a parameter to...

Also in Product learning

Customizing the SQL Prompt built-in snippets: a better 'ata' snippet

Snippets are a great feature of SQL Prompt. They save coding time, and introduce standards and consistency to the way you build code modules. They have multiple replacement points (placeholders) for p...

Also in SQL Data Generator

Realistic, simulated data for testing, development and prototypes

Generating realistic test data, which reflects accurately the nature and distribution of the data it is emulating, is a challenging task. The task is made more complex if you need to generate that dat...

Also about SQL Data Generator

Protecting production data in non-production environments

A traditional IT problem in many organizations is that the development and operations teams each work in their own silo, and each tends to regard the other as having different, and often conflicting, ...