8 August 2018
8 August 2018

Creating Multiple Masked Databases with SQL Provision

Sometimes developer teams need access to a copy of the database containing live data. However, if that database contains sensitive or personal data, then it cannot be used for testing and development work, unless all appropriate security measures are in place. The data protection regulations make no distinction between development and production databases, in the event of a data breach.

SQL Provision offers a lightweight, low-impact way to create ‘clones’ of SQL Server databases, for use in pre-production development and testing workflows. Each clone can be created in seconds, requires very little disk space, and can be consumed like any regular copy of the database. SQL Provision allows users to create custom-defined masking rulesets that will ensure that the data is protected, before it is delivered to the pre-production servers.

Previous articles have discussed how you can use SQL Provision to create and manage non-production database copies, and have demonstrated how to start automating the process of creating a database image, with the required masking rules applied during image creation, and then distributing ‘compliant clones‘. In this article, I’ll extend those concepts to demonstrate how you can use SQL Provision to create masked copies of multiple databases, delivered as a group, for example when you are working with a Data Warehouse that contains several cross-database relationships.

The Problem

As a simple example of the problem, let’s say we have two databases, DMDatabase2018 and DMDatabaseDW2018. They share the same structure, and store the same information, except that the former stores only the current year’s data, and the latter also stores all previous years.

Figure 1: Customer Table in DMDatabaseDW2018 and DMDatabase2018

We need to deliver these databases to pre-production, as a group, with the data appropriately and consistently masked in each. How do we achieve this, given that SQL Provision can only capture an image of one database, at a time, with separate masking scripts applied to each?

The Approach

To solve this problem, we’ll first have to restore temporary, full copies of each one to a secondary instance of SQL Server, so that we can connect to them in a physical location and mask each one consistently. This secondary instance will need to be secure. For example, we’d remove all non-essential permissions to this instance, before performing the restore, so that we limit any access to PII, in any state.

The subsequent data masking approach relies on data masker’s support for cross-database, table-to-table synchronization rules. These allow us to copy the contents of a table in one database, masked according to the rules defined by the Rule Controller, and update the corresponding rows in a table in another database, so that the data is the same in each. Finally, we use SQL Clone to create the images of each of the masked databases and deliver clones of each one to our pre-production server.

The approach uses PowerShell to chain together each of the required steps in an end-to-end automated process.

Step 1. Restore the Databases

The first task is to restore temporary copies of the DMDatabase2018 and DMDatabaseDW2018 databases, from their latest nightly backups. We can do this manually, but I prefer to automate the process using PowerShell:

Listing 1

At this point, we can run the SQL Scripts to carry out immediately any additional processing of the databases, to prepare them for moving into a Dev/Test workflow. For example, we might remove Production level user logins and replace or remap them to the necessary Development permissions and logins.

Step 2. Run the Data Masking process

Now, we need to define a single masking set that masks the data in both the DMDatabase2018 and DMDatabaseDW2018 database, in the same way. We’ll add a rule controller for DMDatabase2018, define the required masking rules, add an additional rule controller for each target database, DMDatabaseDW2018 in this case, and use a Table-To-Table rule to synchronize the masked data from the former to the latter database.

Figure 2: Using table-to-table synchronization rules in data masking

In Figure 2, we’re using the table-table rules in Rule Block 10 to take the masking rules and synchronize them over to DMDatabaseDW2018. Note the additional Rule Controller for the latter database, in Rule# 02-0008. More information on setting up your masking set to connect to multiple schemas can be found here.

We can now use PowerShell to call Data Masker via the command line, using the “&” operator, and apply the masking set automatically, as soon as the previous restore stage completes. In Listing 1, I piped the output to null, which means that Listing 2 will only kick off once the database restores have completed.

Listing 2

Step 3. Create the Images and Clones

Having masked the restored databases, we can create the necessary images of each. We can, of course, do this manually through the SQL Clone dashboard, but again, I prefer to automate the process using the SQL Clone PowerShell cmdlets. Note in Listing 3 below I show only the ‘bare bones’ of the code, but you can easily build in your own error handling and so on.

Listing 3

Figure 3 shows the SQL Clone dashboard, with the newly-created images of our obfuscated databases.

Figure 3: The SQL Clone dashboard displaying the consistently-masked images

Finally, we drop the temporary database copies, using the sqlps module, then create clones of each image on the required development and test workstations.

Listing 4

Conclusion

The approach I describe in the article can ensure consistent masking across a group of databases that we need to deliver into Development and Testing. When we need to query data from multiple sources, we’re able to do so in a way that is repeatable, automated and never compromises on the security of our Personally Identifiable Information.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

You may also like

  • Webinar

    6 Principles of the GDPR and SQL Provision

    Teams often want up to date realistic data, on demand to test code and deliver value quickly. However, DBAs have to account for all database copies and ensure personal information is protected and sanitized, specifically for non-consented use. Join our webinar to learn from MVP Steve Jones how SQL Provision, Redgate’s newest offering, can help organizations comply with the 6 principles of the GDPR, without slowing down the team.

  • Article

    Masking Data in Practice

    Even small extracts of data need to be created with caution, if they are for public consumption. Sensitive data can 'hide' in unexpected places, and apparently innocuous data can be combined with other information to expose information about identifiable individuals. If we need to deliver an entire database in obfuscated form, the problems can get harder. Phil Factor examines some of the basic data masking techniques, and the challenges inherent in masking certain types of sensitive and personal data, while ensuring it still looks like the real data, and preserving its referential integrity, and distribution characteristics.

  • Event

    WinOps London 2018

    The world’s only dedicated conference to ‘DevOps in a Windows World’. The conference is about discovering and sharing experiences of using products and tools within the Microsoft DevOps world such as: PowerShell, TeamCity, Octopus Deploy, Azure, Vagrant, Chocolatey, AppDynamics, ScriptRock, Chef, Puppet, Ansible, Docker etc… Register for the event and meet the Redgate team.

  • Article

    How to reset your development database in seconds using SQL Clone

    Let’s say you’re making experimental changes to your development database and, to explore a hypothesis, you’ve just dropped a table. How long does it take you to restore the database to its previous state, so you’re ready to continue testing? If it’s long enough to go fetch a coffee, then it’s too long. When developing

  • Webinar

    How to keep your delivery processes secure with Database DevOps

    Extending DevOps practices to the database, brings additional advantages, ensuring you protect personal data across your SQL Server estate, while also improving the efficiency and quality of software delivery. We’ll explore the impact database DevOps has on regulatory and compliance requirements and how approaches such as automation, can improve accuracy, transparency, and faster recovery across the entire database development cycle.

  • University

    Take the SQL Provision course

    In this course, you’ll learn about the challenges of masking data and moving it around, and how to overcome them by using a combination of SQL Clone and SQL Data Masker.