How to Automatically Fill your SQL Server Test Databases with Dummy Data
A PowerShell script to automatically populate a SQL Server database with dummy data, for database development and testing work.
Let’s say that you need to build a new test database, from the latest version in source control, populate the tables with fake but realistic test data, run the tests, and then tear down the database. I’ll show you how to do this using PowerShell, a database build script, and a data generation tool for SQL Server, in this case SQL Data Generator.
You can’t use, or won’t accept the risk of using, live production data to run your tests, so you need to create some fake data, or load pre-prepared data. First, then, you use the data generator to fill the database with randomly generated dummy data and save the project file and then you run it from a PowerShell script (or alternatively, you can write a command line batch script). The script uses a build script in source control, to create a copy of the database on the test server, and then the data generator (.sqlgen) project file to fill it with data.
There is an underlying assumption that you will never need to subsequently alter the test database, because that would mean altering the project file to reflect the schema changes, and for that you need one project file per copy of the database (see Automatically build-and-fill multiple development databases using PowerShell and SQL Data Generator for how to do this).
Generating fake but realistic data
Your first task is to use the SQL Data Generator’s GUI to design a strategy for filling each of the tables in your database with data.
If you already have some dummy test data, SQL Data Generator can load that from a text or CSV file. If you have an existing database, then you can use BCP to export the table data, and then use SDG to import that data, after whatever obfuscation or cleansing you do to it. SDG takes care of inserting it in the right order, and the other complexities of interdependent data.
Alternatively, you can just let SQL Data Generator generate the data for you. This strategy can range from just accepting the ‘default’ data generated by the tool to using advanced regexes to generate fake data that is almost indistinguishable from the real data. I’m going to assume you’ve read some of the previous articles on this topic that described how to generate realistic text, dates and addresses for individual tables in a Customers database, such as:
- How to start producing realistic test data with SQL Data Generator
- How to generate realistic text data using SQL Data Generator
- Generating realistic dates using SQL Data Generator and Python
- Generating test data with localized addresses using SQL Data Generator
Whichever way you do it, once you’re done simply save into source control your SDG project, to capture in the .sqlgen project file all your data generation settings and options for that database.
Automatically fill a database with fake data using PowerShell
Every working day during development, you make changes to the database, unit test them and save them to your Version Control System. The overnight build creates the database and fills it with fake data before running your integration tests. I’ll show you how to build and fill your database using SQL Data Generator.
Listing 1 shows a simple PowerShell script that builds the database, from a script stored in source control. It then calls SQL Data Generator, specifying the saved project file, and use it to fill the new copy of the database with test data. To keep to the topic of creating the data, I’ll leave to one side how we created the T-SQL Database build script (see Maintaining a Database Build Script with SQL Compare).
This should all work fine as long as SQL Data Generator can reach the server, and your Windows login has the necessary access rights. The .sqlgen project file is an XML file that can be read and updated in PowerShell but this is probably never necessary. Usually, the only items that need to be specified are the database and the server instance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
pushd; Import-Module sqlps -DisableNameChecking #load the SQLPS functionality popd; # Firstly, we create the aliases to make the execution of command-line programs easier. Set-Alias SQLDataGenerator 'C:\Program Files (x86)\Red Gate\SQL Data Generator 3\SQLDataGenerator.exe' -Scope Script # Now we set the script variables to do the job we want $MyDatabase='Customers' $TargetServer='MyServer' #The SQL Server instance $MyReportDirectory="$($env:HOMEDRIVE)$($env:HOMEPATH)\Documents\GitHub\$MyDatabase\" $MyBuildScript="$($env:HOMEDRIVE)$($env:HOMEPATH)\Documents\GitHub\$MyDatabase\$MyDatabase-build.sql" $MySQLDataGeneratorFile="$($env:HOMEDRIVE)$($env:HOMEPATH)\Documents\GitHub\$MyDatabase\$MyDatabase.sqlgen" # Firstly we create the script directory (normally you get these from source control Invoke-Sqlcmd -serverinstance $TargetServer -InputFile $MyBuildScript | Out-File -filePath "$MyReportDirectory\Build.rpt" sqldatagenerator /project:$MySQLDataGeneratorFile /server:$TargetServer /database:$MyDatabase | Out-File -filePath "$MyReportDirectory\Build.rpt" |
Listing 1
You will, if all works well, have a database that is already stocked with data, ready for integration tests and performance tests.
Troubleshooting data generation problems
Things can go wrong, of course. If SQL Data Generator finds that it cannot, for some reason, insert the data, then that whole table insertion is rolled back.
The error is not returned as a DOS error by the application. Instead, you have to check the report produced by SQL Data Generator (an extract from one is shown below).
[Customer].[Phone] The value '' cannot be inserted into column End_date RedGate.SQLDataGenerator.Engine.DataGeneration.InvalidColumnDataException: The value '' cannot be inserted into column End_date at _?__.??__._?__() at _?__.??__.Read() at RedGate.SQLDataGenerator.Engine.DataGeneration.TypeTranslationDataReader.Read() at System.Data.SqlClient.SqlBulkCopy.ReadFromRowSource() at System.Data.SqlClient.SqlBulkCopy.WriteToServerInternal() at System.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServer(Int32 columnCount) at System.Data.SqlClient.SqlBulkCopy.WriteToServer(IDataReader reader) at _?__.?__.??__(CancellableController ?___, Int32 __?_, Int32 __?_, ConnectionProperties _?__, GenerateAction ?_?_, SqlConnection _??, SDGProject ?_?_, GenerationReport ?_?_) Inserted 0 rows Generation started at 21 February 2017 14:24:10, taken: 00:00:02 (hh:mm:ss)
You can export the report to a file and search it with a regex. The most obvious way is to search for the word ‘exception’.
Conclusions
If you need to regularly need to stock a database with faked data, or a standard data set, in order to conduct any sort of testing, then SDG is designed to be the tool to do this. It will work happily with any Build system such as Jenkins or TFS and is just as amenable to use with PowerShell.
I like to store a version of the SDG project with the build scripts in source control so that it is possible to build, at any time, any version of the database, along with its data. To do this, you need to add any additional files used by SQL Data Generator, along with the script, into source control.
The technique I describe in this article is fine for test databases that you don’t need to subsequently modify, but what if you want to build the latest version on your development workstation, and stock it with data? You’ll want to be able to make subsequent alterations to the database schema, and then you’ll need to be able to adapt the data generation plan for the affected tables. To do that though, you’ll need a .sqlgen file specifically for every development copy of the database (see my next article: Automatically build-and-fill multiple development databases using PowerShell and SQL Data Generator).
To find out how you can generate realistic test data fast, download a free, fully-functional 14-day trial of SQL Generator.