Dynamic Data Masking – Part 1 – The Mechanism

Comments 0

Share to social media

This is part 1 of my Dynamic Data Masking blogs for my upcoming book project. Part 2 is here.

An interesting feature that is being added to SQL Server 2016 is Dynamic Data Masking. What it does is, allow you to show a user a column, but instead of showing them the actual data, it masks it from their view. Like if you have a table that has email addresses, you might want to mask the data so most users can’t see the actual data when they are querying the data. It falls under the head of security features in Books Online (https://msdn.microsoft.com/en-us/library/mt130841.aspx), but as we will see, it doesn’t behave like classic security features, as you will be adding some code to the DDL of the table, and (as of this writing in CTP3.2, the ability to fine tune who can and cannot see unmasked data isn’t really there.)

In a 2016 database, I will be executing my code. I am using a database named SimpleDemos, but any database will do.  The version I am using is 3.2, running on a Hyper-V VM on my laptop.:

SELECT @@version;

If you haven’t seen it yet, note that they have added a lot of stuff to @@version these days. Great if you are only using it for purposes like this.

Microsoft SQL Server 2016 (CTP3.2) – 13.0.900.73 (X64)   Dec 10 2015 18:49:31   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)

Now, say we have the following simple table structure (note that I would probably use a domain table for StatusCode values in a real system, but I wanted this to be a single table.

CREATE SCHEMA Demo;
GO
CREATE TABLE Demo.Person –warning, I am using very small column datatypes in this example to make formatting of the output easier
(
    PersonId    int NOT NULL CONSTRAINT PKPerson PRIMARY KEY,
    FirstName    nvarchar(10) NULL,
    LastName    nvarchar(10) NULL,
    PersonNumber varchar(10) NOT NULL,
    StatusCode    varchar(10) CONSTRAINT DFLTPersonStatus DEFAULT (‘New’)
                            CONSTRAINT CHKPersonStatus CHECK (StatusCode in (‘Active’,’Inactive’,’New’)),
    EmailAddress nvarchar(40) NULL,
    InceptionTime date NOT NULL, –Time we first saw this person. Usually the row create time, but not always
    –a number that I didn’t feel could insult anyone of any origin, ability, etc that I could put in this table
    YachtCount   tinyint NOT NULL CONSTRAINT DFLTPersonYachtCount DEFAULT (0)
                            CONSTRAINT CHKPersonYachtCount CHECK (YachtCount >= 0),
);

Into which I will be loading in some simple (silly) data:

INSERT INTO Demo.Person (PersonId,FirstName,LastName,PersonNumber, StatusCode, EmailAddress, InceptionTime,YachtCount)
VALUES(1,’Fred’,’Flintstone’,’0000000014′,’Active’,’fred@flintstone@slatequarry.net’,’1/1/1959′,0),
      (2,’Barney’,’Rubble’,’0000000032′,’Active’,’barneyrubble@aol.com’,’8/1/1960′,1),
      (3,’Wilma’,’Flintstone’,’0000000102′,’Active’,NULL, ‘1/1/1959’, 1);

Next, we want to let anyone in the company see this data, and we have decided that row level security is too cumbersome, and
(for argument’s sake,) that a view wasn’t the right method to use. (Note: I am not currently sure whether this method is better
than a view, but I can see a few advantages, certainly for a reporting database).

So we have the following rules:

We have three users (which in real life, would be roles, but keeping it simple for the non-relevant details:
    1. Those who can see all data (Test User: "BigHat")
    2. Those who can only see the first and last name of the person. Test User: ("SmallHat")
   
The goal is that we set up this system such that this scenario is handled. There are (currently) four different types of masks we can apply to data:

  • Default – Takes the default mask of the datatype (NOT the default of the column)
  • Email – Masks the email so you only see a few meaningful characters
  • Random – Puts a random number in place of an actual number (which is kind of weird, as we will see)
  • Partial – where you control what characters to keep and what to replace them with

So let’s take a look at each:

Default

Datatypes have different default masks they will apply, for example: string types are X characters, integers are 0 (which as we will see is confusing.)  I will start by giving every column (other than the names) default masks:

ALTER TABLE Demo.Person ALTER COLUMN PersonNumber
    ADD MASKED WITH (Function = ‘default()’);
ALTER TABLE Demo.Person ALTER COLUMN StatusCode
    ADD MASKED WITH (Function = ‘default()’);
ALTER TABLE Demo.Person ALTER COLUMN EmailAddress
    ADD MASKED WITH (Function = ‘default()’);
ALTER TABLE Demo.Person ALTER COLUMN InceptionTime
    ADD MASKED WITH (Function = ‘default()’);
ALTER TABLE Demo.Person ALTER COLUMN YachtCount
    ADD MASKED WITH (Function = ‘default()’);

Note that you can do this in the CREATE statement if you wish as well. Now, to check to see what this looks like, execute:

SELECT *
FROM    Demo.Person;

Which returns:

PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1           Fred       Flintstone 0000000014   Active     fred@flintstone@slatequarry.net          1959-01-01    0
2           Barney     Rubble     0000000032   Active     barneyrubble@aol.com                     1960-08-01    1
3           Wilma      Flintstone 0000000102   Active     NULL                                     1959-01-01    1

You are probably thinking, what a ripoff (and eventually you will be thinking, why isn’t this changing and it will drive you a bit crazy…) but as the dbo, you are not going to see the masking. So we need to create the users we initially set, and grant them rights:

CREATE USER BigHat WITHOUT LOGIN;
CREATE USER SmallHat WITHOUT LOGIN;

We will get a bit deeper into security in the next entry in this series, but to start with we need to give our users the ability to select from the table.

GRANT SELECT ON Demo.Person TO BigHat;
GRANT SELECT ON Demo.Person TO SmallHat;

Now we can run the following statements and see the effect:

EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM   Demo.Person;
GO
REVERT;

Which now returns:

PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1           Fred       Flintstone xxxx         xxxx       xxxx                                     1900-01-01    0
2           Barney     Rubble     xxxx         xxxx       xxxx                                     1900-01-01    0
3           Wilma      Flintstone xxxx         xxxx       NULL                                     1900-01-01    0

Note that the NULL value still shows up as NULL (meaning ‘UNKNOWN’) and the other values show up as ‘xxxx’, meaning ‘UNKNOWN, other than there IS a value’, which may be all that many users need to actually see. Everyone seems to have started in the database on 1900-01-01 (which may be an issue for some applications, but is at least an unreasonable value for most databases), and YachtCount is all 0s, which was actually the value for Fred, but not the others. Note that I included a default in the table DDL of ‘Unknown’ for StatusCode, but it was ignored for the ‘xxxx’ value, which is mildly disappointing. We will be able to set it to whatever value we want later, but it is nice when you can use the table’s structure to make the data easier.

Email

Next we have a simple mask for Email, so we remove the masking from the EmailAddress column, and add it back (You can currently change the masking without dropping the current masking, but I have seen it act weird… this is just CTP3.2).

ALTER TABLE Demo.Person ALTER COLUMN EmailAddress DROP MASKED;

ALTER TABLE Demo.Person ALTER COLUMN EmailAddress
    ADD MASKED WITH (Function = ’email()’);

Now query the data:

EXECUTE AS User=’SmallHat’;
go 
SELECT * 
FROM   Demo.Person;
go 
REVERT;

And you can see the email is masked by showing the first character of the email address, then XXX@XXXX.com:

PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1           Fred       Flintstone xxxx         xxxx       fXXX@XXXX.com                            1900-01-01    0
2           Barney     Rubble     xxxx         xxxx       bXXX@XXXX.com                            1900-01-01    0
3           Wilma      Flintstone xxxx         xxxx       NULL                                     1900-01-01    0

Looking at the base data, notice that Fred’s email address is ‘fred@flintstone@slatequarry.net’, so the only meaningful character you are giving the user is the first character.

Random

Random is a very interesting function, mostly because it would take a very specific use (that I can’t think of right now) to make it make sense. I added YachtCount to the model so I could demonstrate (and not offend anyone with a count of body parts, cars, etc. I figure the intersection of Yacht owners and people reading this blog to be low enough to risk it.)

So, let’s try the random(start, end) function, that will replace a numeric value with a value between two values:

ALTER TABLE Demo.Person ALTER COLUMN YachtCount
    ADD MASKED WITH (Function = ‘random(1,100)’); –make the value between 1 and 100. You could make it always the same value pretty easily by using the same value for start and end

Now, check out the data:

EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM   Demo.Person;
GO
REVERT;

Which will return something along these lines:

PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1           Fred       Flintstone xxxx         xxxx       fXXX@XXXX.com                            1900-01-01    65
2           Barney     Rubble     xxxx         xxxx       bXXX@XXXX.com                            1900-01-01    92
3           Wilma      Flintstone xxxx         xxxx       NULL                                     1900-01-01    64

Not the most useful function, but it is there if you need it. If you are like me (pity), then you are wondering about numeric types (I know I was). So let’s create a quick table, named for the Blackadder fans in the crowd:

CREATE TABLE demo.bob
(
    value decimal(15,5) MASKED WITH (Function = ‘random(-999999999,999999999)’)
);
GRANT SELECT ON demo.bob to SmallHat;
INSERT INTO demo.bob
VALUES (1),(2),(3);
GO

EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM   Demo.Bob;
GO
REVERT;

Returns:

value
—————————————
222199941.87482
-137196271.70996
-498355155.12580

Partial

The most configurable of the masking functions is partial. It lets you mask the characters you want. For example, our person number. If you want to keep 1 leading character, and 2 trailing, you make a mask such as:

ALTER TABLE Demo.Person ALTER COLUMN PersonNumber
    ADD MASKED WITH (Function = ‘partial(1,"——-",2)’); ––note the double quotes on the text

Looking at the data as user Smallhat,

EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM   Demo.Person;
GO
REVERT;

you will see:

PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1           Fred       Flintstone 0——-14   xxxx       fXXX@XXXX.com                            1900-01-01    99
2           Barney     Rubble     0——-32   xxxx       bXXX@XXXX.com                            1900-01-01    6
3           Wilma      Flintstone 0——-02   xxxx       NULL                                     1900-01-01    89

The size of the dashes is controlled by you. If you said (1,”-“2), the first row would be 0-14. If you said (1,”——————-“,14), the first row would be 0———, as the returned value always fits in the original datatype.

Finally, we want StatusCode to default to ‘Unknown’ when the value is masked. You can’t pass parameters to default(), so you need to use partial to replace the entire value (slight dig on the name, pattern might have
been better, but it is what it is.)

ALTER TABLE Demo.Person ALTER COLUMN StatusCode
    ADD MASKED WITH (Function = ‘partial(0,"Unknown",0)’);

Checking out the data:

EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM   Demo.Person;
GO
REVERT;
GO

You see that the data now says: ‘Unknown’:

PersonId    FirstName  LastName   PersonNumber StatusCode EmailAddress                             InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1           Fred       Flintstone 0——-14   Unknown    fXXX@XXXX.com                            1900-01-01    39
2           Barney     Rubble     0——-32   Unknown    bXXX@XXXX.com                            1900-01-01    18
3           Wilma      Flintstone 0——-02   Unknown    NULL                                     1900-01-01    12

Summary, Blog 1

Of course, whether that, or any of the methods we have seen here make sense is really a matter of design. If a value doesn’t look masked, that may signal a user that it isn’t masked. This particular blog is all about how the data masking feature works in doing the masking. In the next blog we will leave the data masked as it is, and we will look at what we can do via security. How do we let non-dbo users access the data, and what happens with WHERE clauses and UPDATE statements? What happens when the masked table is accessed through a view or stored procedure? See you real soon with answers to these and other questions.

Continue reading to Part 2

Load comments

About the author

Louis Davidson

Simple Talk Editor

See Profile

Louis is the editor of this Simple-Talk website. Prior to that, has was a corporate database developer and data architect for a non-profit organization for 25 years! Louis has been a Microsoft MVP since 2004, and is the author of a series of SQL Server Database Design books, most recently Pro SQL Server Relational Database Design and Implementation.