SQL Server 2016 introduced dynamic data masking (DDM) as a way to prevent unauthorized users from viewing certain types of sensitive information. The database engine masks the data when it is retrieved from the database, based on masking rules defined on the column schema. The data stored within the database remains unchanged.
When a user queries the database, the database engine determines whether that user account has the permissions necessary to access the data in its unmasked state. If the account does not, the engine applies the masking rules to the data when it is returned as part of the query. In this way, you can mask all or part of sensitive data such as national identification numbers, credit card numbers, birth dates, phone numbers, or other types of information.
To implement DDM, you define masking rules on the columns that contain the data you want to protect. For each column, you add the MASKED WITH clause to the column definition, using the following syntax:
1 |
MASKED WITH (FUNCTION = '<em><function></em>(<em><arguments></em>)') |
You start by replacing the <function> placeholder with one of four functions: default, email, random, or partial. For the random and partial functions, you must also provide parameter values, as specified by the <arguments> placeholder. We’ll get into the specifics of how all this works as we go through the examples.
You can add the MASKED WITH clause to your column definition when you create a table or afterwards by using an ALTER TABLE statement. It does not matter whether the column already contains data. There are some limitations, however. For example, you cannot apply DDM to computed columns or Always Encrypted columns.
While we’re on the topic of Always Encrypted, it’s worth noting that this is the fourth article in a series related to SQL Server encryption. The following links point to the first three articles.
- Encrypting SQL Server: Using an Encryption Hierarchy to Protect Column Data
- Encrypting SQL Server: Transparent Data Encryption (TDE)
- SQL Server Encryption: Always Encrypted
I’ve included DDM in this series because Microsoft documentation (SQL Server Encryption) implies that DDM is a type of SQL Server encryption. It is not. The feature simply masks data for non-privileged users upon querying a protected column. No data is being encrypted at rest or in motion. The database engine merely replaces the sensitive data with non-identifying characters. Even Microsoft admits that “unprivileged users with ad-hoc query permissions can apply techniques to gain access to the actual data.”
Although DDM is not really encryption, it still seems worth including it in this series, if for no other reason than to be complete. The DDM feature is easy to implement and requires no changes to the queries themselves. You need only update your column definitions and perhaps tweak the permissions on certain accounts. But keep in mind that DDM should be used only as part of a much larger strategy for protecting data, a strategy that will likely include real encryption.
Applying a default mask
The first DDM function we’ll tackle is default. When you use this function to mask data, and a user with read-only privileges queries that data, the database engine masks the entire value and returns a replacement value. The exact nature of the replacement value depends on the column’s data type.
It is important to note that the default function does not take any arguments. Microsoft’s documentation is a bit confusing in this regard because it suggests otherwise, but you need only specify the function name and an empty set of parentheses.
To demonstrate how this works, we’ll start by creating a table and populating it with data from the AdventureWorks2014 database. When creating the table, we’ll use the default function to mask four of the columns, as shown in the following T-SQL script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
USE master; GO CREATE DATABASE EmpData4; GO USE EmpData4; GO CREATE TABLE EmpInfo( EmpID INT PRIMARY KEY, FirstName NVARCHAR(50) NOT NULL, LastName NVARCHAR(50) MASKED WITH (FUNCTION = 'default()') NOT NULL, Birthdate DATE MASKED WITH (FUNCTION = 'default()') NOT NULL, CurrentFlag BIT MASKED WITH (FUNCTION = 'default()') NOT NULL, SalesLastYear MONEY MASKED WITH (FUNCTION = 'default()') NOT NULL, EmailAddress NVARCHAR(50), SickLeave INT, SalesYTD MONEY, NatID NVARCHAR(15), PhoneNumber NVARCHAR(25)); GO INSERT INTO EmpInfo SELECT e.BusinessEntityID, sp.FirstName, sp.LastName, e.BirthDate, e.CurrentFlag, sp.SalesLastYear, sp.EmailAddress, e.SickLeaveHours, sp.SalesYTD, e.NationalIDNumber, sp.PhoneNumber FROM AdventureWorks2014.HumanResources.Employee e INNER JOIN AdventureWorks2014.Sales.vSalesPerson sp ON e.BusinessEntityID = sp.BusinessEntityID WHERE sp.CountryRegionName = 'United States'; |
The table includes a number of extra columns, which we’ll be using in subsequent examples. For now, let’s focus on the four DDM columns.
The definition for each of these columns includes the MASKED WITH clause, which must come before the NULL or NOT NULL option. Within the clause, we specify the DDM function, in this case, default, enclosed in single quotes.
With our table in place and populated, let’s query the masked columns, using the same privileged account we used to create the table:
1 2 3 4 |
SELECT TOP 5 EmpID, FirstName, LastName, Birthdate, CurrentFlag, SalesLastYear FROM EmpInfo; |
Because we’re running the query under a privileged account, we have full access to the data, despite having implemented the masking rules. For this reason, the SELECT statement will return the same results (shown in the following table) that we would get if the masking rules had not been added to the column definitions.
EmpID | FirstName | LastName | Birthdate | CurrentFlag | SalesLastYear |
274 | Stephen | Jiang | 1951-10-17 | 1 | 0.00 |
275 | Michael | Blythe | 1968-12-25 | 1 | 1750406.4785 |
276 | Linda | Mitchell | 1980-02-27 | 1 | 1439156.0291 |
277 | Jillian | Carson | 1962-08-29 | 1 | 1997186.2037 |
279 | Tsvi | Reiter | 1974-01-18 | 1 | 1849640.9418 |
Notice that we did not have to change our query in any way and that the data is unchanged. When the database engine saw that we had access rights to that data, it returned the values just like they’re stored in the database.
Controlling access to masked data
The key to controlling access to masked data is in the privileges granted to the user accounts. When I created the table and then queried the data, I was logged in with an administrative account. However, an account that has been granted only SELECT permission on the table will see the data as masked.
To test this out, let’s start by creating a local user account on the EmpData4 database and granting that account the SELECT permission:
1 2 3 4 5 |
USE EmpData4; GO CREATE USER user1 WITHOUT LOGIN; GRANT SELECT ON OBJECT::dbo.EmpInfo TO user1; GO |
This should all be fairly straightforward. We’ve created an account in the database that is not linked to a login (to keep things simple for our testing), and then we granted the SELECT permission to that account to allow read access to the EmpInfo table.
Now let’s run our SELECT statement again, only this time, we’ll do it within the context of the new user:
1 2 3 4 5 6 |
EXECUTE AS USER = 'user1'; SELECT TOP 5 EmpID, FirstName, LastName, Birthdate, CurrentFlag, SalesLastYear FROM EmpInfo; REVERT; |
We’re simply using the EXECUTE AS and REVERT statements to get the data that the new user would see. As the following table shows, the values in the four columns are now very different from how they’re stored in the database.
EmpID | FirstName | LastName | Birthdate | CurrentFlag | SalesLastYear |
274 | Stephen | xxxx | 1900-01-01 | 0 | 0.00 |
275 | Michael | xxxx | 1900-01-01 | 0 | 0.00 |
276 | Linda | xxxx | 1900-01-01 | 0 | 0.00 |
277 | Jillian | xxxx | 1900-01-01 | 0 | 0.00 |
279 | Tsvi | xxxx | 1900-01-01 | 0 | 0.00 |
To begin with, the database engine returns each LastName value as xxxx. The LastName column is defined with the NVARCHAR(50) data type. The database engine returns the xxxx value for all masked character data, unless the column is defined at a smaller size. For example, a CHAR(2) column will return a masked value of xx.
The database engine handles date/time values differently. Consider the Birthdate column, which is configured with the DATE data type. Each value is returned as 1900-01-01 (or whatever format is consistent with your regional settings). If this had been some variation of the DATETIME data type, such as DATETIME2, the returned values would be 1900-01-01 00:00:00:0000000.
For numeric columns, the database engine returns 0 or 0.00, if decimals are involved. In this case, the CurrentFlag column is configured with the BIT data type, which qualifies as numeric, and the SalesLastYear is configured with the MONEY data type.
Regardless of the data type, the default function results in the entire value being somehow masked, which means an unauthorized user would not be able to infer the actual value other than to know the type of data, such as character or date.
In some cases, you might want to allow a user to view the data as unmasked. To do so, you would grant the UNMASK permission to the account:
1 2 |
GRANT UNMASK TO user1; GO |
Now when you run the SELECT statement as User1, the results will show the unmasked data. However, if you then decide that the account should not be able to access the data as unmasked, you can revoke the permission:
1 2 |
REVOKE UNMASK TO user1; GO |
After you run this statement, the user will again see the data as masked, as defined by the masking rules in the column definitions.
Verifying column masks
If you want to view which columns are masked in your database and how that masking is implemented, you can use the sys.masked_columns system view, as shown in the following SELECT statement:
1 2 3 4 5 |
SELECT OBJECT_NAME(object_id) TableName, name ColumnName, masking_function MaskFunction FROM sys.masked_columns ORDER BY TableName, ColumnName; |
If you run the statement against our example database, you should see the following results.
TableName | ColumnName | MaskFunction |
EmpInfo | Birthdate | default() |
EmpInfo | CurrentFlag | default() |
EmpInfo | LastName | default() |
EmpInfo | SalesLastYear | default() |
In this case, we’ve defined masking rules on only four columns in the EmpInfo table, with all rules using the default function. If we had used functions that require arguments, those arguments would be shown here as well.
Applying an email mask
The next function we’ll look at is email, which is used specifically to mask email addresses. The email function is as simple to implement as the default function, except that it returns somewhat different results. The default function masks the entire value, whereas the email function masks all but the first letter and, in some cases, the Internet domain (e.g., .com, .org, .edu).
The behavior of the email function might seem a bit odd because it returns the .com domain for all email addresses. For example, if the email address ends in .org or .edu, the masked value will read .com, as it would if the domain actually were .com. Another way to look at this is that the function masks the domain for all email addresses except those that end with .com.
Regardless of the domain issue, the process of defining the masking rule is quite straightforward:
1 2 3 |
ALTER TABLE EmpInfo ALTER COLUMN EmailAddress NVARCHAR(50) MASKED WITH (FUNCTION = 'email()') NULL; |
In this case, we’re simply modifying the EmailAddress column to include the masking rule, with the email function specified. Now let’s query the table as our privileged user:
1 2 |
SELECT TOP 5 EmpID, EmailAddress FROM EmpInfo; |
As expected, the SELECT statement returns the results shown in the following table.
EmpID | EmailAddress |
274 | stephen0@adventure-works.com |
275 | michael9@adventure-works.com |
276 | linda3@adventure-works.com |
277 | jillian0@adventure-works.com |
279 | tsvi0@adventure-works.com |
This time, we’ll run the SELECT statement as user1:
1 2 3 4 |
EXECUTE AS USER = 'user1'; SELECT TOP 5 EmpID, EmailAddress FROM EmpInfo; REVERT; |
Now our results now look quite different.
EmpID | EmailAddress |
274 | sXXX@XXXX.com |
275 | mXXX@XXXX.com |
276 | lXXX@XXXX.com |
277 | jXXX@XXXX.com |
279 | tXXX@XXXX.com |
That’s all there is to masking email addresses. You might decide it’s just as easy to go with the default function to mask the data. In most cases, the value xxxx will work as effectively as tXXX@XXXX.com. The only advantage to the latter is that it telegraphs to the user that this value is a masked email address, and perhaps that’s exactly what you want to do.
Applying a random mask
The next function available to the masking rules is random, which replaces numerical values with a random value within a specified range. For example, we can apply the function to the SickLeave column, which is configured with the INT data type:
1 2 3 |
ALTER TABLE EmpInfo ALTER COLUMN SickLeave INT MASKED WITH (FUNCTION = 'random(1, 5)') NOT NULL; |
In this case, we want the returned random value to fall between 1 and 5. We define the range by adding the range limits as arguments to the function. We can do something similar with the SalesYTD column, which is configured with the MONEY data type:
1 2 3 |
ALTER TABLE EmpInfo ALTER COLUMN SalesYTD MONEY MASKED WITH (FUNCTION = 'random(101, 999)') NOT NULL; |
This time we want the returned values to fall between 101 and 999. Now let’s run a query against these columns using our privileged account:
1 2 |
SELECT TOP 5 EmpID, SickLeave, SalesYTD FROM EmpInfo; |
Not surprisingly, we get the results shown in the following table.
EmpID | SickLeave | SalesYTD |
274 | 27 | 559697.5639 |
275 | 39 | 3763178.1787 |
276 | 33 | 4251368.5497 |
277 | 32 | 3189418.3662 |
279 | 34 | 2315185.611 |
Next, we’ll query the table as user1:
1 2 3 4 |
EXECUTE AS USER = 'user1'; SELECT TOP 5 EmpID, SickLeave, SalesYTD FROM EmpInfo; REVERT; |
Now the returned values fall within the specified ranges, as shown in the following table.
EmpID | SickLeave | SalesYTD |
274 | 2 | 335.2532 |
275 | 1 | 814.2846 |
276 | 3 | 142.2145 |
277 | 2 | 428.1147 |
279 | 5 | 290.1132 |
Notice that for the SalesYTD column, the database engine preserves the decimal, but still returns values within the specified range.
Be careful when using the random function so you don’t create a situation in which users that need accurate data assume that the information they’re seeing is correct. As with the email function, you might be better off using the default function. At least that way, users are seeing all zeroes and will likely realize that the data is inaccurate. On the other hand, if your intention is to deceive non-privileged users, then by all means, use the random function.
Applying a partial mask
The final function available to the masking rules is partial, which allows you to specify exactly which values are masked. This function works best for column data that is consistently formatted, such as credit card numbers or national IDs.
To demonstrate how the partial function works, we’ll start by applying it to the NatID column, which stores social security numbers, without the hyphens. For these values, we want to mask all but the last four digits, as shown in the following ALTER TABLE statement:
1 2 3 |
ALTER TABLE EmpInfo ALTER COLUMN NatID NVARCHAR(15) MASKED WITH (FUNCTION = 'partial(0, "xxxxx", 4)') NOT NULL; |
To implement a partial masking rule, we must specify three arguments: how many opening characters to reveal, how many middle characters to mask and how to mask them, and how many trailing characters to reveal. In this case, we are revealing no opening characters (0), masking the first five characters with x values (xxxxx), and revealing the last four characters (4).
Now we’ll do something similar for the PhoneNumber column, only this time, we’ll reveal the first four characters, apply the value xxx-xxxx to the next seven characters, and reveal no trailing characters:
1 2 3 |
ALTER TABLE EmpInfo ALTER COLUMN PhoneNumber NVARCHAR(25) MASKED WITH (FUNCTION = 'partial(4, "xxx-xxxx", 0)') NULL; |
To see what the data looks like, we’ll query the table with our privileged account:
1 2 |
SELECT TOP 5 EmpID, NatID, PhoneNumber FROM EmpInfo; |
As the following table shows, the results include the same data that’s stored in the database.
EmpID | NatID | PhoneNumber |
274 | 502097814 | 238-555-0197 |
275 | 841560125 | 257-555-0154 |
276 | 191644724 | 883-555-0116 |
277 | 615389812 | 517-555-0117 |
279 | 716374314 | 664-555-0112 |
Next, let’s query the table as user1:
1 2 3 4 |
EXECUTE AS USER = 'user1'; SELECT TOP 5 EmpID, NatID, PhoneNumber FROM EmpInfo; REVERT; |
This time, the NatID values and PhoneNumber values are masked according to our masking rules.
EmpID | NatID | PhoneNumber |
274 | xxxxx7814 | 238-xxx-xxxx |
275 | xxxxx0125 | 257-xxx-xxxx |
276 | xxxxx4724 | 883-xxx-xxxx |
277 | xxxxx9812 | 517-xxx-xxxx |
279 | xxxxx4314 | 664-xxx-xxxx |
The partial function can be a particularly useful tool, as long as the column values are in a consistent format. If they’re not, you might need to separate those values into different columns or come up with another strategy.
Verifying column masks, again
Now that we have all our masking rules in place, let’s query the sys.masked_columns system view once more:
1 2 3 4 5 |
SELECT OBJECT_NAME(object_id) TableName, name ColumnName, masking_function MaskFunction FROM sys.masked_columns ORDER BY TableName, ColumnName; |
Our results should now look similar to those shown in the following table.
TableName | ColumnName | MaskFunction |
EmpInfo | Birthdate | default() |
EmpInfo | CurrentFlag | default() |
EmpInfo | EmailAddress | email() |
EmpInfo | LastName | default() |
EmpInfo | NatID | partial(0, “xxxxx”, 4) |
EmpInfo | PhoneNumber | partial(4, “xxx-xxxx”, 0) |
EmpInfo | SalesLastYear | default() |
EmpInfo | SalesYTD | random(101, 999) |
EmpInfo | SickLeave | random(1, 5) |
As pointed out earlier, the results include the function arguments, where applicable, making it easier to pinpoint where you might be having issues should your masked data not be exactly what you expect.
Masking columns data
The DDM feature in SQL Server 2016 is a handy way to mask data without needing to modify your queries or applications. The four functions supported by the masking rules—default, email, random, and partial—are simple to implement within your column definitions. And because masking is controlled through SQL Server permissions, you can determine exactly who gets to see the unmasked data and who does not. Again, just be sure that masking is part of a larger security strategy and not your sole solution for protecting sensitive data.
Load comments