Generating test data in JSON files using SQL Data Generator

Phil Factor shows how to use SQL Data Generator to produce as much pseudonymized data as you are likely to need for your testing, and then convert it to JSON so that you can also use it to test your MongoDB and Azure Cosmos databases, or to test out a new web service.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

SQL Data Generator is adept at filling SQL Server databases with ‘spoof’ data, for use during development and testing activities. However, what if instead of a SQL Server database full of fake data, you need a JSON file? Perhaps you need to run some tests in MongoDB, or Azure Cosmos DB. Maybe you need a sample data file to test a new web service.

The solution in this article builds a SQL Server database, fills it with sample data using SQL Data Generator, extracts the data into a JSON data file, using FOR JSON, and then drops the database. In other words, we just add an extra phase on top of producing a version of your database filled with fake data, and because we are scripting the whole process using PowerShell, this extra processing is of no consequence.

Filling the Customers SQL Server database

We need to start by creating a database that provides the relational version of the metadata and data we need. We’ll use the same sample Customers database that we’ve used throughout our series of SQL Data Generator “how-to” articles, but with an added CreditCard table.

Figure 1

Of course, we wouldn’t normally want to hold the credit cards like this, and then pass the data around to JSON. Certainly, doing it this way might give a passing hacker some momentary but unnecessary excitement.

Other than that, though, this is a standard design for a NAD database; it records changes in address so that you track the address at any time. This design accounts for the fact that several people can share the same address, and one person can have several addresses (invoice address, work, home, and so on.). We’ve also allowed notes to be applied to several people, so that we can attach arbitrary information to them.

So, we create and test a build script for this database, and once we have it producing a viable database, fill it with data using a SQL Data Generator project file. Unfortunately, I can’t provide my copy of the SQL Generator project file as it has my credentials in it, but I’ve written articles that show you how to generate data, adapt the data format to suit your needs using regular expressions and python, as well as to automate data generation for development and test databases using PowerShell.

When you’ve fine-tuned the SQL Data Generator output to your heart’s content, you are ready to generate the JSON

Converting the data to JSON

First, you need a good idea of the format in which you want the JSON data. As an example, I’ll use this common format of names and addresses (styled using the JSON Formatter and Validator):

Figure 2

We’ll use the query in Listing 1 to convert our relational NAD data into a JSON string of this format, using the FOR JSON function. This technique represents the easiest way of representing JSON nested objects, though it isn’t the fastest way of doing it because it relies on correlated subqueries. However, it is fast enough for our purposes.

Listing 1

If you previously ran your SDG project to fill the database with sample data, then execute Listing 1, and you’ll see that it will produce JSON records in the following format.

The PowerShell Routine

Now we have everything in place, and just need to automate the process of creating the JSON file, using PowerShell. The script will drop the database then create a new version ‘on the fly’, using the name that you provide. It will then run the latest database object creation script, then call SQL Data Generator to stock it with data, using the SDG project file. After it has queried the database and saved the JSON file that resulted, it removes the database from the SQL Server instance. Of course, if you require data in both relational and JSON formats, this is easy to change.

This ExportToJSON script is designed to cope with both Windows Authentication and SQL Server authentication. To do this, I use a credentials object which I can use not only to authenticate my access to SQL Server SMO, but also to pass those credentials to another application, such as to SQL Compare or to BCP, or to access a resource on Azure.

Listing 2

So, there we have it; as much random anonymized JSON data as you are likely to need for your testing. I’ve included all the scripts with the article, so that you can try this out.