Unmasking SQL Server Dynamic Data Masking – Part 1 – Introduction

This is part of a series on Dynamic Data Masking by Ben Johnston. For the rest of the series, click here

This is the beginning of a series on SQL Server Dynamic Data Masking. Dynamic Data Masking is a concept familiar with all developers and users of sensitive data. It is implemented in SQL Server with simplicity and elegance, requiring minimal changes to front end applications, including reporting, and almost no changes to queries. The series includes:

  1. An in-depth introduction and use cases
  2. Setup, examples, testing recommendations, coding alternatives to Dynamic Data Masking
  3. Side-channel attacks
  4. Attack mitigations and general architectural-considerations

The dynamic data masking feature was introduced in SQL Server 2016. Data masking addresses gaps in exposing data to end users when direct access is allowed via reporting tools, data analysis, machine learning, or any query tool. Masking is used to protect sensitive or proprietary data from users not authorized to view that data by obfuscating the actual data with a mask as it is returned to the user. Stated simply, actual data is not returned to the user but rather replaced with a traditional style mask hiding all or part of the configured column.

For instance, instead of seeing your coworker’s actual salary, something like 000.00 would be returned. Instead of seeing the proprietary name for a product, the mask of xxx would be returned. The mask can be all zeroes, in the case of a number, all x’s for characters, a pre-designated mask, or a completely custom mask.

When using front end applications with built in masking capabilities or custom code, this need has long been addressed. SQL Server dynamic masking instead addresses the masking need directly in the data engine. Implementing masking in the engine ensures data is protected regardless of the access method, reducing the work necessary to mask data in multiple user interfaces and reducing the chance of exposing unmasked data. The engine only presents data based on the security model, including masked or unmasked data.

In this introduction blog, I want to do two things. First, introduce masking as a general topic. Then as an appendix to this post, include the code to setup the rest of the series.

Competing Solutions

Other products exist that actually modify the data in the database, typically referred to as static data masking. This can be useful for moving production data to non-production environments. Since the data is actually modified, even users with escalated privileges in these environments can’t override security and view the proprietary data. Data masked dynamically makes no actual changes to your data

The prime use case for physically masking the data is providing a cleansed copy of production data to non-production environments. This contrasts with dynamic data masking, with a use case of hiding sensitive data in production environments. They address very different use cases and some architectures may require both types of products.

Developers and other users with elevated privileges can easily bypass dynamic data masking in non-production environments. Since non-production environments aren’t typically monitored as closely as production environments, physically masking the data in these environments is a good option.

Static masking can be especially important if government regulations are involved or any other situation where the chance of a data breach has to be reduced to the absolute minimum, which as a data professional, should be practically always. With static data masking, the original data is secure since it has been modified. But because the data is modified, some functionality is lost. Connections to external systems, some front-end application functionality, and some testing scenarios are difficult if not impossible.

Considering that the primary function of a database engine is maintaining data and ensuring the integrity of that data, there aren’t many scenarios when static data masking is appropriate for production. Dynamic data masking requires careful planning and a dedicated plan to be sure it works. Modifying the data requires more planning and time.

Alternate Solutions

Dynamic masking is presented as a possible solution when users are allowed to directly query data. There are caveats to this recommendation, including security concerns presented in later sections. Direct ad-hoc access to data isn’t typically something you should allow your users to do against production transactional systems, so that wouldn’t be a recommended use case. That leaves us with warehouses and reporting systems as the obvious candidates.

Using production data is often necessary to thoroughly test systems, especially system integrations. It can take an unfeasible amount of effort to setup data in all necessary interconnected systems and it is likely to still leave gaps in testing scenarios. It is possible to lock down the target data by means other than data masking. Separate schemas, alternate column names, encrypted columns, lookup tables with alternate keys, or simply excluding the target data are all possibilities depending on the business requirements. The primary strength of dynamic data masking is the low effort to implement. Testing and automatic upgrades done by Microsoft is another big strength. It is much harder to justify a custom solution when something is built into the product, patched, and enhanced by the vendor. Dynamic data masking can also be combined with restricting access via views and stored procedures to make it a more secure solution.

SQL Server’s Always Encrypted is another useful solution that meets a different need. The data is encrypted before it reaches the database engine. Client applications have the encryption key, usually in a key store, and send and receive encrypted data to the engine. It is up to the client application to mask data if that functionality is needed. One benefit of Always Encrypted is that data is encrypted before it hits the data engine. This means that even privileged users, such as administrators, can’t see an unencrypted version of the data. Only authorized application users see the data.

Dynamic Data Masking Security

When configured for a column, dynamic data masking is applied for all users unless they have specific authorization to view the unmasked data. Users need to be granted the UNMASK, ALTER ANY MASK, or CONTROL permission on the database to view unmasked data. Starting with SQL Server 2022, the UNMASK permission is more granular and allows unmasking down to a column level.

Supersets of these permission sets are included in the dbo database role and the sysadmin server role. If users only have SELECT permissions and they don’t have super user permissions, the data will be masked. When implementing a new security layer it is always important to verify that security is working as expected. In general, SQL Server security is additive, so if a user is accidentally put into a group or role with permission to unmask data, they will have those permissions.

Dynamic Data Masking Use Cases

In this section I want to cover some of the primary use cases for the Dynamic Data Masking feature. While it isn’t perfect (as I will cover in a later blog about how it can be defeated if you aren’t careful), it has some very exciting uses.

Simply Hiding Sensitive Data from Users

The primary purpose of dynamic data masking is to obfuscate proprietary data from unauthorized users. The data isn’t changed, rather it is simply presented to users in a modified format or as a static value, such as 0 for integers. You can choose your own format if desired, so you might choose to include the first characters of a name or number, part of an email address including the @ sign so the data in the column is clear, or almost any user defined format.

It can also be used to hide data for authorized users in non-secure or public environments. Dynamic masking does this at the database engine level, thus making raw data access more secure. So, if data scientists, report users, testers, business analysts, product owners or other users are able to access data directly, it can still be protected. Dynamic queries, reports, data extracts are all protected. Users with unmasking privileges to the table or column see unmodified data.

Note that as noted, there are some security concerns with using Dynamic Data Masking in ad-hoc scenarios.

 

No Syntax Restrictions

The dynamic portion of dynamic data masking is what makes it powerful. Columns configured with dynamic masking functions exactly like other columns, with the exception that the output differs depending on who accesses the data.

The masked columns can be used to JOIN and for GROUP clauses (the actual value is used, not the masked value you see). It can also be used for WHERE clauses without using the masked values. It can be used in aggregations, functions like substring, in comparison operators, and in joins.

If a column with sensitive data is used as the primary key for a table, and that column is masked, it can still be used as the lookup column in your INNER JOIN while keeping it secure. Histograms based on the salaries of employees while keeping the salaries confidential are possible. Data can be limited to a geographic region without allowing users to see the specific geographic data. All the data can be used for comparison operations, aggregations, joins, any operation normally used by data, while still keeping it reasonably confidential (Later in this series I will show examples of what I mean by “reasonably”.)

This is generally why the feature is mostly for building user interfaces, and not to secure data from nosy ad-hoc users,

Summary

SQL Server Dynamic Data Masking was created to make it easier to obfuscate sensitive data. The relative ease of implementation makes it a nice additional layer when presenting SQL Server data. Future sections include detailed setup examples, caveats for use and security considerations.

 

——————

Appendix, Preparing the test database:

The database I will be using in subsequent blogs is the sample database WideWorldImporters, available in github, using the latest version available. In this section I will provide instructions for setting up and initializing the database for the following blog entries.

Dynamic data masking was added to multiple tables and columns. The columns chosen were determined purely for demonstration purposes and were not evaluated for business utility.

  1. Restore a copy of WideWorldImporters
    1. For an on-Prem installation, use WideWorldImporters-STANDARD.bak
    2. For an Azure database installation, use WideWorldImporters-STANDARD.bacpac
      1. For verification in Azure, the lowest standard tier was used
      2. The FULL version can be used but requires a higher service tier due to the inclusion of memory optimized tables. If you try to import the FULL version to a low tier model in Azure, you will get the following error:

Warning SQL0: A project which specifies SQL Server 2016 as the target platform may experience compatibility issues with Microsoft Azure SQL Database v12.

Error SQL72014: Framework Microsoft SqlClient Data Provider: Msg 40536, Level 16, State 2, Line 1 ‘MEMORY_OPTIMIZED tables’ is not supported in this service tier of the database.

  1. Add users for testing scripts.
    1. MaskedReader for viewing data in a masked state
    2. UnmaskedReader for viewing data in an unmasked state

  1. Mask columns
    1. As installed, WideWorldImporters has 5 masked columns in the Purchasing.Suppliers table and 5 in the corresponding temporal table, Purchasing.Suppliers_Archive.
    2. The following script adds masking to
      1. Application.Countries
      2. Additional columns to Purchasing.Suppliers
      3. Purchasing.SupplierTransactions
      4. Sales.Customers
      5. Sales.Orders
    3. Note that corresponding columns in temporal tables are automatically masked when the parent table is masked.

 

Setup Notes

https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0

WideWorldImporters-STANDARD.bacpac

WideWorldImporters-STANDARD.bak

 

References

https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking?view=sql-server-ver16

http://wiki.gis.com/wiki/index.php/Decimal_degrees

https://learn.microsoft.com/en-us/sql/t-sql/statements/execute-as-transact-sql?view=sql-server-ver16

https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver16

https://learn.microsoft.com/en-us/sql/relational-databases/security/row-level-security?view=sql-server-ver16

https://learn.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-tutorial-getting-started?view=sql-server-ver16&tabs=ssms

https://learn.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-database-engine?view=sql-server-ver16