This is part 1 of my Dynamic Data Masking blogs for my upcoming book project. Part 2 is here.
An interesting feature that is being added to SQL Server 2016 is Dynamic Data Masking. What it does is, allow you to show a user a column, but instead of showing them the actual data, it masks it from their view. Like if you have a table that has email addresses, you might want to mask the data so most users can’t see the actual data when they are querying the data. It falls under the head of security features in Books Online (https://msdn.microsoft.com/en-us/library/mt130841.aspx), but as we will see, it doesn’t behave like classic security features, as you will be adding some code to the DDL of the table, and (as of this writing in CTP3.2, the ability to fine tune who can and cannot see unmasked data isn’t really there.)
In a 2016 database, I will be executing my code. I am using a database named SimpleDemos, but any database will do. The version I am using is 3.2, running on a Hyper-V VM on my laptop.:
SELECT @@version;
If you haven’t seen it yet, note that they have added a lot of stuff to @@version these days. Great if you are only using it for purposes like this.
Microsoft SQL Server 2016 (CTP3.2) – 13.0.900.73 (X64) Dec 10 2015 18:49:31 Copyright (c) Microsoft Corporation Enterprise Evaluation Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)
Now, say we have the following simple table structure (note that I would probably use a domain table for StatusCode values in a real system, but I wanted this to be a single table.
CREATE SCHEMA Demo;
GO
CREATE TABLE Demo.Person –warning, I am using very small column datatypes in this example to make formatting of the output easier
(
PersonId int NOT NULL CONSTRAINT PKPerson PRIMARY KEY,
FirstName nvarchar(10) NULL,
LastName nvarchar(10) NULL,
PersonNumber varchar(10) NOT NULL,
StatusCode varchar(10) CONSTRAINT DFLTPersonStatus DEFAULT (‘New’)
CONSTRAINT CHKPersonStatus CHECK (StatusCode in (‘Active’,’Inactive’,’New’)),
EmailAddress nvarchar(40) NULL,
InceptionTime date NOT NULL, –Time we first saw this person. Usually the row create time, but not always
–a number that I didn’t feel could insult anyone of any origin, ability, etc that I could put in this table
YachtCount tinyint NOT NULL CONSTRAINT DFLTPersonYachtCount DEFAULT (0)
CONSTRAINT CHKPersonYachtCount CHECK (YachtCount >= 0),
);
Into which I will be loading in some simple (silly) data:
INSERT INTO Demo.Person (PersonId,FirstName,LastName,PersonNumber, StatusCode, EmailAddress, InceptionTime,YachtCount)
VALUES(1,’Fred’,’Flintstone’,’0000000014′,’Active’,’fred@flintstone@slatequarry.net’,’1/1/1959′,0),
(2,’Barney’,’Rubble’,’0000000032′,’Active’,’barneyrubble@aol.com’,’8/1/1960′,1),
(3,’Wilma’,’Flintstone’,’0000000102′,’Active’,NULL, ‘1/1/1959’, 1);
Next, we want to let anyone in the company see this data, and we have decided that row level security is too cumbersome, and
(for argument’s sake,) that a view wasn’t the right method to use. (Note: I am not currently sure whether this method is better
than a view, but I can see a few advantages, certainly for a reporting database).
So we have the following rules:
We have three users (which in real life, would be roles, but keeping it simple for the non-relevant details:
1. Those who can see all data (Test User: "BigHat")
2. Those who can only see the first and last name of the person. Test User: ("SmallHat")
The goal is that we set up this system such that this scenario is handled. There are (currently) four different types of masks we can apply to data:
- Default – Takes the default mask of the datatype (NOT the default of the column)
- Email – Masks the email so you only see a few meaningful characters
- Random – Puts a random number in place of an actual number (which is kind of weird, as we will see)
- Partial – where you control what characters to keep and what to replace them with
So let’s take a look at each:
Default
Datatypes have different default masks they will apply, for example: string types are X characters, integers are 0 (which as we will see is confusing.) I will start by giving every column (other than the names) default masks:
ALTER TABLE Demo.Person ALTER COLUMN PersonNumber
ADD MASKED WITH (Function = ‘default()’);
ALTER TABLE Demo.Person ALTER COLUMN StatusCode
ADD MASKED WITH (Function = ‘default()’);
ALTER TABLE Demo.Person ALTER COLUMN EmailAddress
ADD MASKED WITH (Function = ‘default()’);
ALTER TABLE Demo.Person ALTER COLUMN InceptionTime
ADD MASKED WITH (Function = ‘default()’);
ALTER TABLE Demo.Person ALTER COLUMN YachtCount
ADD MASKED WITH (Function = ‘default()’);
Note that you can do this in the CREATE statement if you wish as well. Now, to check to see what this looks like, execute:
SELECT *
FROM Demo.Person;
Which returns:
PersonId FirstName LastName PersonNumber StatusCode EmailAddress InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1 Fred Flintstone 0000000014 Active fred@flintstone@slatequarry.net 1959-01-01 0
2 Barney Rubble 0000000032 Active barneyrubble@aol.com 1960-08-01 1
3 Wilma Flintstone 0000000102 Active NULL 1959-01-01 1
You are probably thinking, what a ripoff (and eventually you will be thinking, why isn’t this changing and it will drive you a bit crazy…) but as the dbo, you are not going to see the masking. So we need to create the users we initially set, and grant them rights:
CREATE USER BigHat WITHOUT LOGIN;
CREATE USER SmallHat WITHOUT LOGIN;
We will get a bit deeper into security in the next entry in this series, but to start with we need to give our users the ability to select from the table.
GRANT SELECT ON Demo.Person TO BigHat;
GRANT SELECT ON Demo.Person TO SmallHat;
Now we can run the following statements and see the effect:
EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM Demo.Person;
GO
REVERT;
Which now returns:
PersonId FirstName LastName PersonNumber StatusCode EmailAddress InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1 Fred Flintstone xxxx xxxx xxxx 1900-01-01 0
2 Barney Rubble xxxx xxxx xxxx 1900-01-01 0
3 Wilma Flintstone xxxx xxxx NULL 1900-01-01 0
Note that the NULL value still shows up as NULL (meaning ‘UNKNOWN’) and the other values show up as ‘xxxx’, meaning ‘UNKNOWN, other than there IS a value’, which may be all that many users need to actually see. Everyone seems to have started in the database on 1900-01-01 (which may be an issue for some applications, but is at least an unreasonable value for most databases), and YachtCount is all 0s, which was actually the value for Fred, but not the others. Note that I included a default in the table DDL of ‘Unknown’ for StatusCode, but it was ignored for the ‘xxxx’ value, which is mildly disappointing. We will be able to set it to whatever value we want later, but it is nice when you can use the table’s structure to make the data easier.
Next we have a simple mask for Email, so we remove the masking from the EmailAddress column, and add it back (You can currently change the masking without dropping the current masking, but I have seen it act weird… this is just CTP3.2).
ALTER TABLE Demo.Person ALTER COLUMN EmailAddress DROP MASKED;
ALTER TABLE Demo.Person ALTER COLUMN EmailAddress
ADD MASKED WITH (Function = ’email()’);
Now query the data:
EXECUTE AS User=’SmallHat’;
go
SELECT *
FROM Demo.Person;
go
REVERT;
And you can see the email is masked by showing the first character of the email address, then XXX@XXXX.com:
PersonId FirstName LastName PersonNumber StatusCode EmailAddress InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1 Fred Flintstone xxxx xxxx fXXX@XXXX.com 1900-01-01 0
2 Barney Rubble xxxx xxxx bXXX@XXXX.com 1900-01-01 0
3 Wilma Flintstone xxxx xxxx NULL 1900-01-01 0
Looking at the base data, notice that Fred’s email address is ‘fred@flintstone@slatequarry.net’, so the only meaningful character you are giving the user is the first character.
Random
Random is a very interesting function, mostly because it would take a very specific use (that I can’t think of right now) to make it make sense. I added YachtCount to the model so I could demonstrate (and not offend anyone with a count of body parts, cars, etc. I figure the intersection of Yacht owners and people reading this blog to be low enough to risk it.)
So, let’s try the random(start, end) function, that will replace a numeric value with a value between two values:
ALTER TABLE Demo.Person ALTER COLUMN YachtCount
ADD MASKED WITH (Function = ‘random(1,100)’); –make the value between 1 and 100. You could make it always the same value pretty easily by using the same value for start and end
Now, check out the data:
EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM Demo.Person;
GO
REVERT;
Which will return something along these lines:
PersonId FirstName LastName PersonNumber StatusCode EmailAddress InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1 Fred Flintstone xxxx xxxx fXXX@XXXX.com 1900-01-01 65
2 Barney Rubble xxxx xxxx bXXX@XXXX.com 1900-01-01 92
3 Wilma Flintstone xxxx xxxx NULL 1900-01-01 64
Not the most useful function, but it is there if you need it. If you are like me (pity), then you are wondering about numeric types (I know I was). So let’s create a quick table, named for the Blackadder fans in the crowd:
CREATE TABLE demo.bob
(
value decimal(15,5) MASKED WITH (Function = ‘random(-999999999,999999999)’)
);
GRANT SELECT ON demo.bob to SmallHat;
INSERT INTO demo.bob
VALUES (1),(2),(3);
GO
EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM Demo.Bob;
GO
REVERT;
Returns:
value
—————————————
222199941.87482
-137196271.70996
-498355155.12580
Partial
The most configurable of the masking functions is partial. It lets you mask the characters you want. For example, our person number. If you want to keep 1 leading character, and 2 trailing, you make a mask such as:
ALTER TABLE Demo.Person ALTER COLUMN PersonNumber
ADD MASKED WITH (Function = ‘partial(1,"——-",2)’); ––note the double quotes on the text
Looking at the data as user Smallhat,
EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM Demo.Person;
GO
REVERT;
you will see:
PersonId FirstName LastName PersonNumber StatusCode EmailAddress InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1 Fred Flintstone 0——-14 xxxx fXXX@XXXX.com 1900-01-01 99
2 Barney Rubble 0——-32 xxxx bXXX@XXXX.com 1900-01-01 6
3 Wilma Flintstone 0——-02 xxxx NULL 1900-01-01 89
The size of the dashes is controlled by you. If you said (1,”-“2), the first row would be 0-14. If you said (1,”——————-“,14), the first row would be 0———, as the returned value always fits in the original datatype.
Finally, we want StatusCode to default to ‘Unknown’ when the value is masked. You can’t pass parameters to default(), so you need to use partial to replace the entire value (slight dig on the name, pattern might have
been better, but it is what it is.)
ALTER TABLE Demo.Person ALTER COLUMN StatusCode
ADD MASKED WITH (Function = ‘partial(0,"Unknown",0)’);
Checking out the data:
EXECUTE AS User=’SmallHat’;
GO
SELECT *
FROM Demo.Person;
GO
REVERT;
GO
You see that the data now says: ‘Unknown’:
PersonId FirstName LastName PersonNumber StatusCode EmailAddress InceptionTime YachtCount
———– ———- ———- ———— ———- —————————————- ————- ———-
1 Fred Flintstone 0——-14 Unknown fXXX@XXXX.com 1900-01-01 39
2 Barney Rubble 0——-32 Unknown bXXX@XXXX.com 1900-01-01 18
3 Wilma Flintstone 0——-02 Unknown NULL 1900-01-01 12
Summary, Blog 1
Of course, whether that, or any of the methods we have seen here make sense is really a matter of design. If a value doesn’t look masked, that may signal a user that it isn’t masked. This particular blog is all about how the data masking feature works in doing the masking. In the next blog we will leave the data masked as it is, and we will look at what we can do via security. How do we let non-dbo users access the data, and what happens with WHERE clauses and UPDATE statements? What happens when the masked table is accessed through a view or stored procedure? See you real soon with answers to these and other questions.
Continue reading to Part 2
Load comments