Quickly Find and Mask all Sensitive Data with Data Masker for SQL Server
Khie Biggs, a software developer on the Data Masker team at Redgate explains how a recent set of Data Masker improvements should make it significantly easier and faster to determine what data needs to be masked, implement a masking plan, and then to apply the masking operation, to protect sensitive and personal data in all the tables and columns of your SQL Server databases.
The Data Masker development team have made significant improvements to Data Masker v7, on two major fronts:
- Masking all sensitive and personal data –
- Making it much easier for users to define what data is sensitive – ideally by creating a masking plan based directly on documented data classifications
- Much simpler tracking of the masking plan – significant improvements to the GUI make it much easier to rapidly identify sensitive data, set up appropriate rules, see which masking rules are applied to which column, and what sensitive data you’ve yet to mask
- Improving the performance of commonly-used masking rules – so that even complex masking operations run as quickly and efficiently as possible.
- At least 2.5 – 3 times faster, according to early customer reports
Masking all sensitive and personal data
Data Masker is a powerful tool for obfuscating personal and sensitive data in your SQL Server databases, so that it is safe for use in development work, training, testing, research, analysis and reporting. With regulations such as GDPR and CCPA, of course, removing or otherwise protecting this data is now a legal obligation; you cannot afford to ‘miss bits’. Data masker v7 should give users a lot more confidence both that their masking plans capture all the data that needs masking, and then that they have configured their masking operation correctly, so that it provides full masking coverage of every piece of personal or sensitive data in the database.
All the new functionality is exposed in the GUI through a revamped Tables page, and its right-click context menu.
Importing sensitivity classifications
Every effective data masking strategy starts with comprehensive knowledge of the data that is personal or otherwise sensitive, where that data resides, and how it is used. Many of our customers already have information about sensitive columns of their database tables, ideally documented in a data catalog, but sometimes stored in applications such as Microsoft Excel.
Using the new Export/Import Plan button on the bottom of the Tables tab, you can import into Data Masker, as a Masking Plan, either of the following:
- SQL Server classification extended properties – including classifications for SQL Server 2019 and Azure SQL Database
- Your own custom sensitivity categorizations for tables and columns, stored in a CSV file
If you use SQL Data Catalog, you can now export its data classifications to a CSV file and then import them into Data Masker, so that your masking plan is based directly on the agreed and documented classifications for your organization. Alternatively, you can simply import your own custom CSV files.
Assign basic sensitivity classifications within Data Masker
Importing a predefined masking plan is the recommended route, but if you don’t have a CSV containing classification information, we’ve also made it easier to apply sensitivity settings directly within the Tables tab. If you select multiple columns or tables, the right click context menu will bulk apply the chosen sensitivity classification to all of them.
Implementing your masking plan
All columns should be classified, so you know which need masking and which don’t. This is the basis of your masking plan. Next, you need to build your masking set to ensure that all sensitive and personal columns have appropriate masking rules applied to them to protect their data. We’ve made some improvements to make it much easier to track all this.
Which tables and columns are classified and masked?
The Plan Summary column of the Tables page is now a lot more descriptive and automatically updates as you add Sensitivity classifications and apply rules to columns, so it always indicates clearly how many columns have been masked appropriately.
If any columns marked sensitive haven’t had rules applied to them, or if you haven’t specified whether a column should be masked or not, the cell will be marked in bold.
Filtering by sensitivity
We’ve added a filter to the Sensitivity column of the Tables page to make it easier to locate sensitive columns and tables that require attention, or tables and columns that don’t have sensitivity settings configured, especially when working with large tables with many columns. Simply right click on the Sensitivity column and select Filter, and at the top of the grid, you’ll now see checkboxes that you can use to filter the data as required.
Creating new rules from the table tab
Previously, you had to switch to the Rules tab to add new rules that would target the sensitive columns in your tables. Now, you can simply right click on a highlighted column, select “Add Rule” and then chose the type of rule you wish to apply. This functionality also works with multiple columns/tables selected so you can, for example, select all sensitive columns in a table and ‘bulk apply’ substitution rules for each of them.
Which rules apply to which columns?
In masking sets containing a lot of rules, it can be difficult to understand which rules affect each column. We’ve made this a lot easier too. Simply right click on a selected column and select Show Rules Targeting Column from the right click context menu:
You’ll see a list of the rules that target that column, and you can simply click on each one to see its details.
Exploring your tables and their relationships
We’ve made other improvements to the design of the right-click context menu, reorganizing it so the options you need are grouped into distinct categories, with appropriate sub-menus. For example, you can right-click on a table to view its schema information, revealing which columns are nullable, which are defined as indexes or participate in PK-FK relationships, and so on.
Improved performance of masking rules
Data Masker supports a range of masking rules, from simple substitution and shuffling rules, to more complex command and synchronization rules that will help you mask your data fully and consistently, even in less well-designed databases that lack the proper keys and constraints. However, if you need to apply a complex sequence of masking rules to a large database, containing possibly millions of rows, then the masking operation will require significant SQL Server processing power and can take several hours.
With Data Masker v7, you should now find that running these rules on large tables takes significantly less time. Our Substitution, Shuffle, Row-Internal Synchronization and Search Replace rules all use a new algorithm that it is much faster at mapping information from datasets over to the final table. We’ve improved the efficiency of the bulk import process that uploads values from the masking dataset to a temporary table, as well as to the MERGE
queries that apply the relevant masking and replace the sensitive data.
In recent beta tests, the telemetry data from customers indicated that, for the same masking set, substitution rules ran for 2.5 to 3 times faster. Our internal testing produced even better results, up to 18 times faster for the shuffle rule:
Imagine that you need to mask a terabyte-sized database for development use, but the masking operation takes 10 hours, so you’re only able to refresh it with the latest changes once a month. With Data Masker v7, the same operation should run in about 4 hours, so you can run it during the nightly maintenance window.
Any new masking sets created in v7 will automatically use the improved versions of the rules. For upgrading masking sets created in earlier versions, you will need to head to the Options tab and turn off legacy performance mode.
Conclusion
All the improvements we’ve made to Data Masker v7 have been in direct response to customer feedback on the challenges of designing masking sets that cover all the required data, and then of running the resulting masking operation on a timescale and that fitted in with their DevOps automation.
If you’re currently using a version of Data Masker before v7, it’s well worth upgrading and, as always, we’d love to hear your feedback.
Tools in this post
Data Masker
Shield sensitive information in development and test environments, without compromising data quality