Continuous Data Protection with SQL Data Catalog 2.0 and Data Masking
SQL Data Catalog 2.0 provides a simple, policy-driven approach to data protection, through data masking. It can now automatically generate the static masking sets that Data Masker will use to protect your entire database, directly from the data classification metadata held within the catalog.
Organizations are now obliged, by law, to ensure that all sensitive or personal data is removed or protected before data is made available for use in development, testing, analysis, training, or other activities. First, we need to identify and classify which columns, of which tables, of which databases, hold data that needs protecting and how. This is the data protection plan. We then need to implement this plan in a way that guarantees all sensitive and personal data is always removed or obfuscated, in every database that is made available for use outside a secure production environment. Finally, we need to maintain this data protection plan as our databases and their data expand and change.
This is a big challenge for many organizations, reflected in the scale of the research, resources, planning and time it requires. It is a hard process to get right first time, and impossible, without automation, to get it right every time we need to create or refresh data in a non-production system.
SQL Data Catalog 2 marks a “step change” in this process. It automatically generates the data masking sets that Data Masker can use to sanitize the data directly from the classification metadata in the catalog. As databases are added, and existing databases modified, the data classifications are automatically maintained in the catalog, and the masking sets can be updated, accordingly, on demand. This approach significantly reduces the time it takes to go from identification and classification to protection and makes maintenance far simpler. It also ensures that you’ve accounted for the risk of inadvertently leaking PII, every single time you provision sanitized copies of a database.
Smoothing the path to data protection
SQL Data Catalog creates the data classification metadata for all the data stored in your SQL Server databases. You can define a taxonomy of classification tags to describe all the different types of data. The following figure (navigate to Classify | Taxonomy in the web app) shows all the built-in categories and classification tags:
When you connect SQL Data Catalog to a SQL Server instance, it automatically examines both the schema and data of each database to determine where personal or sensitive data is stored. An extensive set of built-in classification rules use string-matching on column names to suggest Information Type and Information classification tags on any columns containing data that needs protection. In addition, data scanning identifies sensitive information that might not be immediately revealed by column name.
We can apply these suggested tags automatically, and customize existing rules, or add new rules, to apply any of the other categories of tag shown above. For example, if we need to protect people’s names using static data masking, then we apply a tag of “Static Masking“, for Treatment Intent, to any columns containing names or email addresses:
You can automatically classify and protect every database, in this way. You identify all your sensitive data and then set up a data catalog rule (see Classify | Rules) to auto-tag with “Static masking” any data that you want to protect using data masking.
SQL Data Catalog’s “Protect” workflow
For any columns in your database that have one of the built-in Information Type tags, and a Static Masking tag, SQL Data Catalog will automatically generate a mapping file (.DMMap) that Data Masker can import directly and use for data masking.
The following figure shows the new Protect page in SQL Data Catalog, with download links for the mapping files for two databases, each of which Data Masker can use to create a masking set for that database:
These mapping files are generated ‘on demand’, by clicking download, ensuring that they always reflect the current state of your data. Each mapping file indicates to Data Masker which of the built-in mapping ‘templates’ should be used for each column in the database that needs masking. For example, every column with a Given name information type will map to the Given name template, which will mask that column with a 50/50 split of male and female first names (first masking 100% of rows with female first names then re-masking 50% with male).
These mapping files will specify the substitution rules that will “desensitize” PII records such as names, addresses, emails, bank account details and so on. They also ensure full database coverage by connecting to each schema in one go, so that all columns in all schemas are protected, from a single masking set.
Importing mapping files into Data Masker
Data masker uses each mapping file to generate the data masking set (the set of controllers, masking rules and datasets) that it needs to ensure that every column containing PII data is protected but that the data is masked in a way that it is still realistic and useful.
Simply download the mapping file, double click on it to open it in Data Maker for SQL Server (requires 7.1.22 or later), and connect to the database to which you wish to apply masking, which might be a backup or clone of your production system.
Data Masker will generate and open the masking set and you can adjust existing rules and add any additional rules, if required, or simply run the masking process at once. You’ll see that the masking set contains controllers and rules for each of the schemas that contains data with the “static masking” tag.
Maintaining your masking sets
SQL Data Catalog periodically scans for additions and changes to your database metadata and data and will automatically apply and update its classifications accordingly, before applying any rules that tag these columns for static masking.
You can then synchronize the latest data classifications with your relevant data masking sets simply by opening them within Data Masker. This ensures you masking sets are always kept up to date.
From data classification to masking to provisioning
By combining SQL Data Catalog 2.0 with Data Masker, we have a simple, policy-driven approach to protecting sensitive data. It is also one that is easily maintained and therefore will scale as more or larger databases are added to the estate.
We classify the data, using Data Catalog, implementing rules that automatically assign the correct tags to each type of sensitive or personal data, and indicate how that data should be protected. This, in effect, is the data protection policy. Any data that requires masking will automatically be included in the mapping file that Data Masker uses to implement this policy. Once you have a sanitized copy of the data, you can use SQL Clone to automatically distribute or refresh as many copies as you need, rapidly.
Smoothing the path to data classification
The success of the Protect workflow, described above, relies on all sensitive columns being classified, so you know which need masking and which don’t. If all the PII in your data is held in a single table, then identifying and classifying it will be relatively easy. Most databases aren’t like that. Many will have been denormalized, to varying degrees, so that the same sensitive data appears in numerous and sometimes unexpected places. You can’t afford to miss any of it.
SQL Data Catalog Version 2.0 introduces several other improvements to help ensure you locate every bit of PII data, and to make the data classification process easier to follow, and faster.
A step-by-step guide to data classification
Click on the “mortar board” icon in SQL Data Catalog, and it will open an “onboarding” panel to guide you through the steps for classifying your databases, right through to integration with Data Maker where you can mask your sensitive data.
Data classification improvements
We’ve made several improvements to the data classification process, to help make it easier to identify all data that may need masking, as well as to exclude all columns that contain non-sensitive data.
Improved classification and descoping rules
SQL Data Catalog now ships with even more rules to generate classification suggestions for you, so you automatically find and tag data as Country, Gender, Occupation, First name, Title, Address and so on. There is also a new “Descoping system columns” rule to help you descope non-sensitive data such as Primary Key and Foreign Keys columns, empty tables and ‘system generated’ data types.
Data preview query statements
To help you understand the underlying content of your data, you can inspect sample values of a column. On the Classification page, you can now generate a SQL query by ctrl-clicking. You can then paste that SQL into SQL Server Management Studio to edit or execute the query.
Version 2.0 supports exclusion filters, both on Classifications page and the Create/Edit Rule page. For every dropdown filter, holding down the Shift key when selecting a value adds it as an exclusion. For example, on the Classifications page, you can add a filter (
%id) for column names ending with “id”. However, if you hold down the shift key when hitting enter, this becomes a Not ending with “id” filter.
Data Catalog and Data Masker are now closely integrated, providing a smoother path to masking which will help you protect your SQL Server estate’s sensitive data and maintain a compliant DevOps process.
You can take this for a spin by downloading a free trial of Data Catalog and Data masker and, hopefully, see how easy and fast it is to protect your data with Catalog 2.0. We also have a series of webinars you can attend to see this is action:
- Available On Demand – Setting up masking sets in Data Masker
- Upcoming – Classify and expand masking sets to a database – Thursday March 31 11-11.30am CT (US Central time) / 5-5.30pm BST
Was this article helpful?
Tools in this post
Supporting regulatory compliance without compromising data quality
Create SQL Server database copies in an instant