Generating test data in JSON files using SQL Data Generator
Phil Factor shows how to use SQL Data Generator to produce as much pseudonymized data as you are likely to need for your testing, and then convert it to JSON so that you can also use it to test your MongoDB and Azure Cosmos databases, or to test out a new web service.
SQL Data Generator is adept at filling SQL Server databases with ‘spoof’ data, for use during development and testing activities. However, what if instead of a SQL Server database full of fake data, you need a JSON file? Perhaps you need to run some tests in MongoDB, or Azure Cosmos DB. Maybe you need a sample data file to test a new web service.
The solution in this article builds a SQL Server database, fills it with sample data using SQL Data Generator, extracts the data into a JSON data file, using FOR JSON, and then drops the database. In other words, we just add an extra phase on top of producing a version of your database filled with fake data, and because we are scripting the whole process using PowerShell, this extra processing is of no consequence.
Filling the Customers SQL Server database
We need to start by creating a database that provides the relational version of the metadata and data we need. We’ll use the same sample Customers database that we’ve used throughout our series of SQL Data Generator “how-to” articles, but with an added CreditCard table.
Figure 1
Of course, we wouldn’t normally want to hold the credit cards like this, and then pass the data around to JSON. Certainly, doing it this way might give a passing hacker some momentary but unnecessary excitement.
Other than that, though, this is a standard design for a NAD database; it records changes in address so that you track the address at any time. This design accounts for the fact that several people can share the same address, and one person can have several addresses (invoice address, work, home, and so on.). We’ve also allowed notes to be applied to several people, so that we can attach arbitrary information to them.
So, we create and test a build script for this database, and once we have it producing a viable database, fill it with data using a SQL Data Generator project file. Unfortunately, I can’t provide my copy of the SQL Generator project file as it has my credentials in it, but I’ve written articles that show you how to generate data, adapt the data format to suit your needs using regular expressions and python, as well as to automate data generation for development and test databases using PowerShell.
When you’ve fine-tuned the SQL Data Generator output to your heart’s content, you are ready to generate the JSON
Converting the data to JSON
First, you need a good idea of the format in which you want the JSON data. As an example, I’ll use this common format of names and addresses (styled using the JSON Formatter and Validator):
Figure 2
We’ll use the query in Listing 1 to convert our relational NAD data into a JSON string of this format, using the FOR JSON function. This technique represents the easiest way of representing JSON nested objects, though it isn’t the fastest way of doing it because it relies on correlated subqueries. However, it is fast enough for our purposes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
USE Customers; DECLARE @MyJSON NVARCHAR(MAX); SELECT @MyJSON = ( SELECT Coalesce(Title + ' ', '') + FirstName + ' ' + Coalesce(MiddleName + ' ', '') + Coalesce(Suffix, '') AS [full name], Title AS [name.title],FirstName AS [name.First name],MiddleName AS [name.middle name],LastName AS [name.Last Name],Suffix AS [name.suffix], ( SELECT TypeOfAddress AS [type], Coalesce(AddressLine1 + ' ', '') + ' ' + Coalesce(AddressLine2 + ' ', '') + ', ' + City + ', ' + County + ' ' + PostCode AS [Full Address], Start_date AS [dates.moved in], End_date AS [dates.moved out] FROM Customer.Abode AS A INNER JOIN Customer.Address AS A2 ON A2.AddressID = A.Address_id WHERE A.Person_id = Person.person_ID FOR JSON PATH ) AS Addresses, (SELECT N.Note AS [text], N.InsertionDate AS [date] FROM Customer.Note AS N INNER JOIN Customer.NotePerson AS NP ON N.Note_id = NP.Note_id WHERE NP.Person_id = Person.person_ID FOR JSON PATH ) AS Notes, (SELECT TypeOfPhone, DiallingNumber, Start_date AS [dates.from], End_date AS [dates.to] FROM Customer.Phone AS P INNER JOIN Customer.Person AS pp ON pp.person_ID = P.Phone_ID WHERE pp.person_ID = Person.person_ID FOR JSON PATH ) AS Phones, (SELECT CardNumber, ValidFrom, ValidTo, CVC FROM Customer.CreditCard AS CC INNER JOIN Customer.Person AS ppc ON ppc.person_ID = CC.Person_id WHERE ppc.person_ID = Person.person_ID FOR JSON PATH ) AS Cards FROM Customer.Person FOR JSON path ); SELECT @MyJSON; |
Listing 1
If you previously ran your SDG project to fill the database with sample data, then execute Listing 1, and you’ll see that it will produce JSON records in the following format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
[ { "full name": "Mr Jeremey Jarrod D.Phil.", "name": { "title": "Mr", "First name": "Jeremey", "middle name": "Jarrod", "Last Name": "Nolan", "suffix": "D.Phil." }, "Addresses": [ { "type": "Work", "Full Address": "91 New Close Tarbert , Newcastle upon Tyne, Merseyside DL79 9SX", "dates": { "moved in": "2004-10-27T08:33:06.320" } } ], "Notes": [ { "text": "Sir, I just wanted to let you know that I am very satisfied with your service, showing good style and correct terminology.", "date": "2012-02-16T08:36:16.070" } ], "Phones": [ { "TypeOfPhone": "Office", "DiallingNumber": "44544", "dates": { "from": "2010-02-24T05:42:50.430" } } ], "Cards": [ { "CardNumber": "371555023929064", "ValidFrom": "2017-07-14", "ValidTo": "2016-11-18", "CVC": "243" }, { "CardNumber": "4774444451642157", "ValidFrom": "2017-03-15", "ValidTo": "2018-11-06", "CVC": "654" }, { "CardNumber": "372237411022765", "ValidFrom": "2014-01-21", "ValidTo": "2018-06-08", "CVC": "970" } ] }/* and so on */ ] |
The PowerShell Routine
Now we have everything in place, and just need to automate the process of creating the JSON file, using PowerShell. The script will drop the database then create a new version ‘on the fly’, using the name that you provide. It will then run the latest database object creation script, then call SQL Data Generator to stock it with data, using the SDG project file. After it has queried the database and saved the JSON file that resulted, it removes the database from the SQL Server instance. Of course, if you require data in both relational and JSON formats, this is easy to change.
This ExportToJSON script is designed to cope with both Windows Authentication and SQL Server authentication. To do this, I use a credentials object which I can use not only to authenticate my access to SQL Server SMO, but also to pass those credentials to another application, such as to SQL Compare or to BCP, or to access a resource on Azure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# Firstly, we create the aliases to make the execution of command-line programs easier. Set-Alias SQLDataGenerator 'C:\Program Files (x86)\Red Gate\SQL Data Generator 3\SQLDataGenerator.exe' -Scope Script <# Now we specify the names of the various files, where to get files, and where to put them. #> $directory='C:\MyScriptsDirectory' # Your working directory $SQLServer = 'MyServer' $BuildScript='CustomersDatabase.SQL' # name of the build script $TempDatabase = 'Customers' $Login='' $MySQLDataGeneratorFile="MyGithubDirectory/Customers.sqlgen" $MyJSONFile="$directory\$TempDatabase.json" $FileType=''; Trap { # Handle the error $err = $_.Exception write-host $err.Message while( $err.InnerException ) { $err = $err.InnerException write-host $err.Message }; # End the script. break } #Load SMO assemblies Import-Module sqlps -DisableNameChecking #load the SQLPS functionality set-psdebug -strict $ErrorActionPreference = "stop" # # and some handy constants $MS='Microsoft.SQLServer' $My="$MS.Management.Smo" # If necessary, create the working directory if (-not (Test-Path -PathType Container $directory)) { # we create the directory if it doesn't already exist New-Item -ItemType Directory -Force -Path $directory; } #create a connection object as we can reuse this for all the operations $mySrvConn = new-object "$MS.Management.Common.ServerConnection" $mySrvConn.ServerInstance = $SQLServer # is this Windows Authentication of UserID/Password credentials? if ($Login -ne '') { $Credentials = get-credential -Credential $Login $mySrvConn.LoginSecure = $false $mySrvConn.Login = $Credentials.UserName $mySrvConn.SecurePassword = $Credentials.Password } # now log into the server and get the server object $My="$MS.Management.Smo" # $s = new-object ("$My.Server") $mySrvConn if ($s.Version -eq $null ){Throw "Can't find the instance $SQLServer"} if ($s.Version.Major -lt 13) {Throw "$SQLServer at $($s.Version) doesn't support JSON"} #We read in the build script if (-not (Test-Path -PathType Leaf "$directory\$BuildScript")) { Throw "Can't find the SQL Build script $($directory)\$($BuildScript)" } $SQL=[IO.File]::ReadAllText("$directory\$BuildScript") #if the database exists kill it If (@($s.Databases | % { $_.Name }) -contains $TempDatabase) {$s.KillDatabase($TempDatabase) } $db = New-Object ("$My.Database") ($s, $TempDatabase) $db.Create() $handler = [System.Data.SqlClient.SqlInfoMessageEventHandler] {param($sender, $event) Write-Host $event.Message}; $s.ConnectionContext.add_InfoMessage($handler); $result=$db.ExecuteNonQuery("$SQL") #execute the SQL $s.ConnectionContext.remove_InfoMessage($handler); if ($Login -ne '') { sqldatagenerator /project:$MySQLDataGeneratorFile /Server:$SQLServer /database:$TempDatabase /username:$($mySrvConn.Login) /password:$($mySrvConn.password) } Else { sqldatagenerator /project:$MySQLDataGeneratorFile /Server:$SQLServer /database:$TempDatabase } $SQL=@' DECLARE @MyJSON NVARCHAR(MAX); SELECT @MyJSON = ( SELECT Coalesce(Title + ' ', '') + FirstName + ' ' + Coalesce(MiddleName + ' ', '') + Coalesce(Suffix, '') AS [full name], Title AS [name.title],FirstName AS [name.First name],MiddleName AS [name.middle name],LastName AS [name.Last Name],Suffix AS [name.suffix], ( SELECT TypeOfAddress AS [type], Coalesce(AddressLine1 + ' ', '') + ' ' + Coalesce(AddressLine2 + ' ', '') + ', ' + City + ', ' + County + ' ' + PostCode AS [Full Address], Start_date AS [dates.moved in], End_date AS [dates.moved out] FROM Customer.Abode AS A INNER JOIN Customer.Address AS A2 ON A2.AddressID = A.Address_id WHERE A.Person_id = Person.person_ID FOR JSON PATH ) AS Addresses, (SELECT N.Note AS [text], N.InsertionDate AS [date] FROM Customer.Note AS N INNER JOIN Customer.NotePerson AS NP ON N.Note_id = NP.Note_id WHERE NP.Person_id = Person.person_ID FOR JSON PATH ) AS Notes, (SELECT TypeOfPhone, DiallingNumber, Start_date AS [dates.from],End_date AS [dates.to] FROM Customer.Phone AS P INNER JOIN Customer.Person AS pp ON pp.person_ID = P.Phone_ID WHERE pp.person_ID = Person.person_ID FOR JSON PATH ) AS Phones, (SELECT CardNumber, ValidFrom, ValidTo, CVC FROM Customer.CreditCard AS CC INNER JOIN Customer.Person AS ppc ON ppc.person_ID = CC.Person_id WHERE ppc.person_ID = Person.person_ID FOR JSON PATH ) AS Cards FROM Customer.Person FOR JSON path ); SELECT @MyJSON; '@ #execute this script, using the existing connection $sqlConnection = new-object System.Data.SqlClient.SqlConnection($mySrvConn.connectionString) $conn = new-object Microsoft.SqlServer.Management.Common.ServerConnection($sqlConnection) #$cmd=New-Object System.Data.SqlClient.SqlCommand($sql,$Conn) $conn.ExecuteScalar("USE $tempDatabase") $conn.ExecuteScalar($sql) | Out-File $MyJSONFile #clear up by deleting the database If (@($s.Databases | % { $_.Name }) -contains $TempDatabase) {$s.KillDatabase($TempDatabase) } |
Listing 2
So, there we have it; as much random anonymized JSON data as you are likely to need for your testing. I’ve included all the scripts with the article, so that you can try this out.