8 August 2018
8 August 2018

Creating Multiple Masked Databases with SQL Provision

Sometimes developer teams need access to a copy of the database containing live data. However, if that database contains sensitive or personal data, then it cannot be used for testing and development work, unless all appropriate security measures are in place. The data protection regulations make no distinction between development and production databases, in the event of a data breach.

SQL Provision offers a lightweight, low-impact way to create ‘clones’ of SQL Server databases, for use in pre-production development and testing workflows. Each clone can be created in seconds, requires very little disk space, and can be consumed like any regular copy of the database. SQL Provision allows users to create custom-defined masking rulesets that will ensure that the data is protected, before it is delivered to the pre-production servers.

Previous articles have discussed how you can use SQL Provision to create and manage non-production database copies, and have demonstrated how to start automating the process of creating a database image, with the required masking rules applied during image creation, and then distributing ‘compliant clones‘. In this article, I’ll extend those concepts to demonstrate how you can use SQL Provision to create masked copies of multiple databases, delivered as a group, for example when you are working with a Data Warehouse that contains several cross-database relationships.

The Problem

As a simple example of the problem, let’s say we have two databases, DMDatabase2018 and DMDatabaseDW2018. They share the same structure, and store the same information, except that the former stores only the current year’s data, and the latter also stores all previous years.

Figure 1: Customer Table in DMDatabaseDW2018 and DMDatabase2018

We need to deliver these databases to pre-production, as a group, with the data appropriately and consistently masked in each. How do we achieve this, given that SQL Provision can only capture an image of one database, at a time, with separate masking scripts applied to each?

The Approach

To solve this problem, we’ll first have to restore temporary, full copies of each one to a secondary instance of SQL Server, so that we can connect to them in a physical location and mask each one consistently. This secondary instance will need to be secure. For example, we’d remove all non-essential permissions to this instance, before performing the restore, so that we limit any access to PII, in any state.

The subsequent data masking approach relies on data masker’s support for cross-database, table-to-table synchronization rules. These allow us to copy the contents of a table in one database, masked according to the rules defined by the Rule Controller, and update the corresponding rows in a table in another database, so that the data is the same in each. Finally, we use SQL Clone to create the images of each of the masked databases and deliver clones of each one to our pre-production server.

The approach uses PowerShell to chain together each of the required steps in an end-to-end automated process.

Step 1. Restore the Databases

The first task is to restore temporary copies of the DMDatabase2018 and DMDatabaseDW2018 databases, from their latest nightly backups. We can do this manually, but I prefer to automate the process using PowerShell:

Listing 1

At this point, we can run the SQL Scripts to carry out immediately any additional processing of the databases, to prepare them for moving into a Dev/Test workflow. For example, we might remove Production level user logins and replace or remap them to the necessary Development permissions and logins.

Step 2. Run the Data Masking process

Now, we need to define a single masking set that masks the data in both the DMDatabase2018 and DMDatabaseDW2018 database, in the same way. We’ll add a rule controller for DMDatabase2018, define the required masking rules, add an additional rule controller for each target database, DMDatabaseDW2018 in this case, and use a Table-To-Table rule to synchronize the masked data from the former to the latter database.

Figure 2: Using table-to-table synchronization rules in data masking

In Figure 2, we’re using the table-table rules in Rule Block 10 to take the masking rules and synchronize them over to DMDatabaseDW2018. Note the additional Rule Controller for the latter database, in Rule# 02-0008. More information on setting up your masking set to connect to multiple schemas can be found here.

We can now use PowerShell to call Data Masker via the command line, using the “&” operator, and apply the masking set automatically, as soon as the previous restore stage completes. In Listing 1, I piped the output to null, which means that Listing 2 will only kick off once the database restores have completed.

Listing 2

Step 3. Create the Images and Clones

Having masked the restored databases, we can create the necessary images of each. We can, of course, do this manually through the SQL Clone dashboard, but again, I prefer to automate the process using the SQL Clone PowerShell cmdlets. Note in Listing 3 below I show only the ‘bare bones’ of the code, but you can easily build in your own error handling and so on.

Listing 3

Figure 3 shows the SQL Clone dashboard, with the newly-created images of our obfuscated databases.

Figure 3: The SQL Clone dashboard displaying the consistently-masked images

Finally, we drop the temporary database copies, using the sqlps module, then create clones of each image on the required development and test workstations.

Listing 4

Conclusion

The approach I describe in the article can ensure consistent masking across a group of databases that we need to deliver into Development and Testing. When we need to query data from multiple sources, we’re able to do so in a way that is repeatable, automated and never compromises on the security of our Personally Identifiable Information.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

Related posts

Also in Hub

Documenting your Database with SQL Change Automation

It would be wrong to portray SQL Change Automation (SCA) as being suitable only for epic project deployments, of the sort described in my previous article. It can do smaller tasks as well. To demonstr...

Also in SQL Clone

Getting Started with Database Development Using SQL Provision

Developers, when working on databases rather than the application code, often find they have less freedom to experiment than they are used to. It's not necessarily easy to set up a database for testin...

Also in SQL Provision

Spoofing Realistic Credit Card Data for your Test Systems using Data Masker

Data protection and privacy regulations, ranging from GDPR to HIPPAA to PCI, among many others, put strict compliance requirements on the storage and use of personal and sensitive data, in any of your...

Also in Product learning

SQL Prompt Code Analysis: INSERT INTO a permanent table with ORDER BY (PE020)

The SQL query that is used to produce the result that is inserted into the permanent table has its order specified by an ORDER BY statement. Relational tables are not ordered, so the ORDER BY is meani...

Also in Data Masker

The ‘Right to be Forgotten’ and Data Masker for SQL Server

The right to be forgotten is one of the main features of new data protection legislation across the globe. Under Article 17 of one such piece of legislation in Europe, the GDPR, individuals have the r...

Also about PowerShell

SQL Change Automation with PowerShell Scripts: getting up-and-running

In this article, I'll demonstrate how to use SQL Change Automation, a PowerShell script, to take the contents of a source control directory for a database, check that it is possible to build from it, ...

Also about automated database deployment

Practical PowerShell Processes with SQL Change Automation

This article uses the PowerShell cmdlets of SQL Change Automation (SCA), formerly DLM-Automation, to take the source code of a database from a directory, validate it, document it and then create a NuG...