{"id":97181,"date":"2023-06-27T18:56:37","date_gmt":"2023-06-27T18:56:37","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=97181"},"modified":"2023-07-24T18:57:40","modified_gmt":"2023-07-24T18:57:40","slug":"unmasking-sql-server-dynamic-data-masking-part-4-unmasking-formatted-and-unformatted-text","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/unmasking-sql-server-dynamic-data-masking-part-4-unmasking-formatted-and-unformatted-text\/","title":{"rendered":"Unmasking SQL Server Dynamic Data Masking, Part 4, Unmasking Formatted and Unformatted Text"},"content":{"rendered":"<p style=\"margin: 0in; font-family: Calibri; font-size: 11.0pt;\"><p><strong>This is part of a series on Dynamic Data Masking by <a href=\"https:\/\/www.red-gate.com\/simple-talk\/author\/ben-johnston\/\">Ben Johnston<\/a>. For the rest of the series, <a href=\"https:\/\/www.red-gate.com\/simple-talk\/tag\/BenJohnston_DynamicDataMasking\/\">click here<\/a><\/strong><\/p>\n<\/p>\n<p>Continuing from the previous entry in this series on security concerns with using Dynamic Data Masking, in this blog I want to close out that discussion showing how you can unmask different sort of data that has been dynamically masked, even if you don&#8217;t have access to unmasked data.<\/p>\n<p>Clearly this is NOT a suggestion for how you might break the text but is far more of an exercise to show you how a bad actor may attempt to look at your data in ways that would generally not cause red flags.<\/p>\n<p>It is especially important to reinforce the sentiment that Dynamic Data Masking less of a security tool to prevent attacks, but more to hide data from general viewing, and as a tool for building applications where the data still is accessible in some scenarios and not others.<\/p>\n<p>In this blog I will focus first on known formatted values, then will show a technique for brute force unmasking data. The big thing to understand about dynamic masking is that unmasking by a bad actor is only necessary if they are accessing the data as a normal user. If they have acquired your database, they will have unfettered access to the data, unlike if you use any kind of static masking. Static masking is used to hide data values permanently, as the data is overwritten. That makes it great for development data, but not when the masked data has a meaning to other users.<\/p>\n<h2>Formatted Text<\/h2>\n<p>The process for unmasking IP addresses, social security numbers (SSN), or any specialized format, is to understand the valid values for the format, break the column down into manageable sections, join to a table containing unmasked versions of the entire target set, and show the results using a combination of the masked table and the unmasked table. When joining to the unmasked table, the join will typically happen multiple times, one join for each masked section. The multiple joins to these small sections make it very fast but they also complicate the query a little. Considering that unmasking data isn&#8217;t normal functionality, the complexity is to be expected, and is like any other join.<\/p>\n<h3>IP Addresses<\/h3>\n<p>For the IP address unmasking a simple numbers table is used. Each octet is separated using a substring and joined to the numbers table. This is where the data exploration becomes critical. IP addresses are represented using a standard format, but that doesn&#8217;t mean the database developers storing the IP address followed those standards. If someone was trying to save space or just used a different format for unknown reasons, that format will need to be determined.<\/p>\n<p>Changing the delimiter is not a substantive change to the unmasking process since the delimiter is skipped in this sample. In this first script will set up objects to contain some fake IP addresses to unmask. (Note, setup is the same as used in <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/unmasking-sql-server-dynamic-data-masking-part-3-security-concerns\/\">part 3<\/a>)<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\nSET NOCOUNT ON\r\nGO\r\n--NOTE: Run with an account with create objects rights, such as db_owner\r\n\/*\r\n* Script to setup a table with fake IP addresses\r\n*\r\n* Creates the dbo.Host table\r\n* Masks the IPAddress column\r\n*\/\r\nCREATE TABLE dbo.Host (\r\n    HostID  int NOT NULL  \r\n       CONSTRAINT PK_dboHost PRIMARY KEY CLUSTERED        IDENTITY\r\n    ,IPAddress                char(15)        NULL\r\n)\r\nGO\r\nDECLARE @IPList TABLE (\r\n    IPOctet char(3) PRIMARY KEY CLUSTERED\r\n)\r\nINSERT INTO @IPList (\r\n    IPOctet\r\n)\r\nSELECT TOP 255\r\n    REPLICATE('0',3-LEN(ROW_NUMBER() OVER(ORDER BY object_id))) + CONVERT(varchar(3),ROW_NUMBER() OVER(ORDER BY object_id)) OCTET\r\nFROM sys.objects\r\n;\r\nWITH OCTECT1_CTE AS (\r\n    SELECT\r\n        ROW_NUMBER() OVER(ORDER BY NewID()) SELECT_CRITERIA\r\n        ,IPOctet\r\n    FROM @IPList\r\n)\r\n,OCTECT2_CTE AS (\r\n    SELECT\r\n        ROW_NUMBER() OVER(ORDER BY NewID()) SELECT_CRITERIA\r\n        ,IPOctet\r\n    FROM @IPList\r\n)\r\n,OCTECT3_CTE AS (\r\n    SELECT\r\n        ROW_NUMBER() OVER(ORDER BY NewID()) SELECT_CRITERIA\r\n        ,IPOctet\r\n    FROM @IPList\r\n)\r\n,OCTECT4_CTE AS (\r\n    SELECT\r\n        ROW_NUMBER() OVER(ORDER BY NewID()) SELECT_CRITERIA\r\n        ,IPOctet\r\n    FROM @IPList\r\n)\r\n--SELECT * FROM OCTECT1_CTE\r\nINSERT INTO dbo.Host (\r\n    IPAddress\r\n)\r\nSELECT TOP 100\r\n    IP1.IPOctet + '.' + IP2.IPOctet + '.' + IP3.IPOctet + '.' + IP4.IPOctet\r\nFROM OCTECT1_CTE IP1\r\n    INNER JOIN OCTECT2_CTE IP2\r\n        ON IP1.SELECT_CRITERIA        = IP2.SELECT_CRITERIA\r\n    INNER JOIN OCTECT3_CTE IP3\r\n        ON IP1.SELECT_CRITERIA        = IP3.SELECT_CRITERIA\r\n    INNER JOIN OCTECT4_CTE IP4\r\n        ON IP1.SELECT_CRITERIA        = IP4.SELECT_CRITERIA\r\nGO\r\nALTER TABLE dbo.Host\r\nALTER COLUMN IPAddress\r\nADD MASKED WITH (FUNCTION = 'default()')\r\nGO<\/pre>\n<p>Next, I will use the following script to unmask the IP address values we just created and captured.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\nSET NOCOUNT ON\r\nGO\r\n\/*\r\n* Example showing how to unmask a known format.\r\n* Each section is unmasked separately which allows\r\n* for a very fast unmasking process.\r\n*\/\r\nEXECUTE AS USER = 'MaskedReader'\r\nGO\r\nDECLARE @IPList TABLE (\r\n    IPOctet char(3) PRIMARY KEY CLUSTERED\r\n)\r\nINSERT INTO @IPList (\r\n    IPOctet\r\n)\r\nSELECT TOP 255\r\n    REPLICATE('0',3-LEN(ROW_NUMBER() OVER(ORDER BY object_id))) + CONVERT(varchar(3),ROW_NUMBER() OVER(ORDER BY object_id)) OCTET\r\nFROM sys.objects\r\nSELECT\r\n    H.*\r\n    ,IP1.IPOctet\r\n    ,IP2.IPOctet\r\n    ,IP3.IPOctet\r\n    ,IP4.IPOctet\r\nFROM dbo.Host H\r\n    LEFT JOIN @IPList IP1\r\n        ON SUBSTRING(H.IPAddress,1,3) = IP1.IPOctet\r\n    LEFT JOIN @IPList IP2\r\n        ON SUBSTRING(H.IPAddress,5,3) = IP2.IPOctet\r\n    LEFT JOIN @IPList IP3\r\n        ON SUBSTRING(H.IPAddress,9,3) = IP3.IPOctet\r\n    LEFT JOIN @IPList IP4\r\n        ON SUBSTRING(H.IPAddress,13,3) = IP4.IPOctet\r\nORDER BY H.HostID\r\nGO\r\nREVERT\r\nGO<\/pre>\n<p>Output from the above script is shown below. The IP octets are broken apart and attacked individually, making the processing much faster.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"301\" height=\"134\" class=\"wp-image-97182\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-picture-containing-text-font-screenshot-numbe.png\" alt=\"A picture containing text, font, screenshot, number\n\nDescription automatically generated\" \/><\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\n--Cleanup the IP address table sample\r\nDROP TABLE IF EXISTS dbo.Host\r\nGO<\/pre>\n<h3>Social Security Numbers<\/h3>\n<p>Social security numbers are typically stored as a string and will either contain dashes with a format of 000-00-0000, or without the dashes as a string of 9 numeric characters. This can be broken down and managed in sections, like the IP address example, or it could be done as a single numeric string.<\/p>\n<p>The numeric string can be done via a very large rainbow table. A very rough estimate of the size of the table 999999999 (rows) * 9 bytes. Using my handy SQL Server calculator, that is over 9.3 GB. It will get noticed by administrators or developers watching the database. It will also take some time to create. This is a good example of the issue with rainbow tables. They also get more complicated as you add characters. The example above uses 0-9 only. To add basic Latin characters, you go from 10 to the power of 9 possible combinations to 32 to the power of 9. Clearly using brute force via a rainbow table is not a good option for anything larger than a <code>varchar(5)<\/code> or maybe <code>varchar(6)<\/code>, depending on your space.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">                         --KB \/ MB \/ GB\r\nSELECT 999999999 * 10.0 \/ 1024.0 \/ 1024.0 \/ 1024.0 <\/pre>\n<p>\u00a0As you can see below, it quickly gets out of hand.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<p><strong>Character Set<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Maximum Column Size<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>MB<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>GB<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>0-9 (10)<\/strong><\/p>\n<\/td>\n<td>\n<p>9<\/p>\n<\/td>\n<td>\n<p>&nbsp;<\/p>\n<\/td>\n<td>\n<p>9.3<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>0-9, a-z (36)<\/strong><\/p>\n<\/td>\n<td>\n<p>10<\/p>\n<\/td>\n<td>\n<p>&nbsp;<\/p>\n<\/td>\n<td>\n<p>33.5<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><strong>0-9, a-z (36)<\/strong><\/p>\n<\/td>\n<td>\n<p>10<\/p>\n<\/td>\n<td>\n<p>&nbsp;<\/p>\n<\/td>\n<td>\n<p>335.3<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Unmasking with a single table and join is simpler, but the setup takes longer and requires much more space. This falls in the category of suspicious activity and may get DBAs looking at your queries much faster. This is the kind of activity we will discuss in countermeasures. If you are using this to unmask data where you have legitimate access, it is much easier to just run the query as a different user. If you find a table like this on your database, especially a temp table, you should do some investigations and wonder about the purpose of this table.<\/p>\n<p>The example below shows unmasking via a single join against a very large table and breaking it down into sections. The version using sections is very specialized but executes faster and doesn&#8217;t require a huge support table.<\/p>\n<p>First, I will create some sample data to work with:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\nSET NOCOUNT ON\r\nGO\r\n--NOTE: Run with an account with create objects \r\n--rights, such as db_owner\r\n\/*\r\n* Script to setup a table with fake SSN numbers\r\n*\r\n* Creates the dbo.Person table\r\n* Populates the SSN column with randomish numbers\r\n* Masks the SSN column\r\n*\/\r\nCREATE TABLE dbo.Person (\r\n    PersonID  int   NOT NULL\r\n      CONSTRAINT PK_dboPerson \r\n           PRIMARY KEY CLUSTERED  identity\r\n    ,FirstName  varchar(50)  NULL\r\n    ,LastName  varchar(50) NULL\r\n    ,SSN  varchar(11)  NULL\r\n)\r\nGO\r\nINSERT INTO dbo.Person (\r\n    FirstName\r\n    ,LastName\r\n)\r\nSELECT TOP 300\r\n    P.PreferredName\r\n    ,P.FullName\r\nFROM Application.People P\r\n;\r\nWITH RAW_SSN_CTE AS (\r\n    SELECT TOP 20000\r\n        CONVERT(varchar(40),NEWID()) RAWGUID\r\n    FROM dbo.Person P\r\n        CROSS JOIN sys.objects\r\n)\r\n,SSN_CTE AS (\r\n    SELECT\r\n      ROW_NUMBER() OVER(ORDER BY RAND()) SELECT_CRITERIA\r\n     ,CONVERT(varchar(3),LEFT(CONVERT(varchar(40),RAWGUID),3))\r\n                                                     PART1\r\n    ,CONVERT(varchar(2),LEFT(CONVERT(varchar(40),RAWGUID),2))\r\n                                                     PART2\r\n    ,CONVERT(varchar(4),LEFT(CONVERT(varchar(40),RAWGUID),4))\r\n                                                     PART3\r\n    FROM RAW_SSN_CTE\r\n    WHERE TRY_CONVERT(int,(LEFT(RAWGUID,4))) IS NOT NULL\r\n)\r\nUPDATE dbo.Person\r\nSET\r\n    SSN = SSN1.PART1 + '-' + SSN2.PART2 + '-' + SSN3.PART3\r\nFROM dbo.Person P\r\n    INNER JOIN SSN_CTE SSN1\r\n        ON P.PersonID            = SSN1.SELECT_CRITERIA\r\n    INNER JOIN SSN_CTE SSN2\r\n        ON P.PersonID + 300        = SSN2.SELECT_CRITERIA\r\n    INNER JOIN SSN_CTE SSN3\r\n        ON P.PersonID + 600        = SSN3.SELECT_CRITERIA\r\nGO\r\nALTER TABLE dbo.Person\r\nALTER COLUMN SSN\r\nADD MASKED WITH (FUNCTION = 'default()')\r\nGO<\/pre>\n<p>Using the table we just created, will try to brute force match the data: (Note, read on before executing this script as it will take a very long time and use a lot of resources.)<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\nSET NOCOUNT ON\r\nGO\r\n\/*\r\n* Example showing how to unmask the SSN column\r\n* by comparing as a number. \r\n* Takes much longer than the next version which\r\n* breaks the SSN into sections.\r\n*\/\r\nEXECUTE AS USER = 'MaskedReader';\r\nGO\r\nDECLARE @Numbers TABLE (\r\n    Number char(9) PRIMARY KEY CLUSTERED\r\n)\r\n;\r\nWITH NUMBERS_CTE AS (\r\n    SELECT 1 AS N\r\n        ,CONVERT(char(9),REPLICATE('0',9 - LEN(1)) \r\n                           + CONVERT(varchar(9),1)) FullN\r\n        UNION ALL\r\n    SELECT N + 1\r\n        ,CONVERT(char(9),REPLICATE('0',9 - LEN(N + 1)) \r\n                           + CONVERT(varchar(9),N + 1))\r\n    FROM NUMBERS_CTE\r\n    WHERE N &lt; 1000000000\r\n)\r\nINSERT INTO @Numbers (\r\n    Number\r\n)\r\nSELECT FullN\r\nFROM NUMBERS_CTE\r\nOPTION(MAXRECURSION 0)\r\nSELECT\r\n    P.PersonID\r\n    ,P.FirstName\r\n    ,P.LastName\r\n    ,P.SSN\r\n    ,N.Number\r\nFROM dbo.Person P\r\n    LEFT JOIN @Numbers N\r\n        ON REPLACE(REPLACE(P.SSN,'-',''),'-','')\r\n                                      = N.Number\r\nORDER BY P.PersonID\r\nGO\r\nREVERT\r\nGO<\/pre>\n<p>As noted, the script is very resource intensive. The basic tier in Azure will run out of room in the transaction log after about 3 hours of attempting the script.<\/p>\n<p><code>Msg 9002, Level 17, State 4, Line 19<\/code><\/p>\n<p><code>The transaction log for database 'tempdb' is full due to 'ACTIVE_TRANSACTION' and the holdup lsn is (250:976:12).<\/code><\/p>\n<p>It could be modified to use a cursor or a standard table and work in Azure and not fail for resource utilization and be quite efficient.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\nSET NOCOUNT ON\r\nGO\r\n\/*\r\n* Example showing how to unmask a known format.\r\n* Each section is unmasked separately which allows\r\n* for a very fast unmasking process.\r\n*\/\r\nEXECUTE AS USER = 'MaskedReader';\r\nGO\r\nDECLARE @SSN1 TABLE (\r\n    SSN1 char(3) PRIMARY KEY CLUSTERED\r\n);\r\nDECLARE \r\n    @SSN1Loop1        int = 0\r\n    ,@SSN1Loop2        int = 0\r\n    ,@SSN1Loop3        int = 0\r\nWHILE @SSN1Loop1 &lt; 10\r\nBEGIN\r\n    SELECT @SSN1Loop2 = 0\r\n    WHILE @SSN1Loop2 &lt; 10\r\n    BEGIN\r\n        SELECT @SSN1Loop3 = 0\r\n        WHILE @SSN1Loop3 &lt; 10\r\n        BEGIN\r\n            INSERT INTO @SSN1 (SSN1)\r\n            SELECT CONVERT(char(1),@SSN1Loop1) \r\n                    + CONVERT(char(1),@SSN1Loop2) \r\n                    + CONVERT(char(1),@SSN1Loop3)\r\n            SELECT @SSN1Loop3 += 1\r\n        END\r\n        SELECT @SSN1Loop2 += 1\r\n    END\r\n    SELECT @SSN1Loop1 += 1\r\nEND\r\n--SELECT * FROM @SSN1\r\nDECLARE @SSN2 TABLE (\r\n    SSN2 char(2) PRIMARY KEY CLUSTERED\r\n)\r\nDECLARE \r\n    @SSN2Loop1        int = 0\r\n    ,@SSN2Loop2        int = 0\r\nWHILE @SSN2Loop1 &lt; 10\r\nBEGIN\r\n    SELECT @SSN2Loop2 = 0\r\n    WHILE @SSN2Loop2 &lt; 10\r\n    BEGIN\r\n        INSERT INTO @SSN2 (SSN2)\r\n        SELECT CONVERT(char(1),@SSN2Loop1) \r\n                  + CONVERT(char(1),@SSN2Loop2)\r\n        SELECT @SSN2Loop2 += 1\r\n    END\r\n    SELECT @SSN2Loop1 += 1\r\nEND\r\n--SELECT * FROM @SSN2\r\nDECLARE @SSN3 TABLE (\r\n    SSN3 char(4) PRIMARY KEY CLUSTERED\r\n)\r\nDECLARE \r\n    @SSN3Loop1        int = 0\r\n    ,@SSN3Loop2        int = 0\r\n    ,@SSN3Loop3        int = 0\r\n    ,@SSN3Loop4        int = 0\r\nWHILE @SSN3Loop1 &lt; 10\r\nBEGIN\r\n    SELECT @SSN3Loop2 = 0\r\n    WHILE @SSN3Loop2 &lt; 10\r\n    BEGIN\r\n        SELECT @SSN3Loop3 = 0\r\n        WHILE @SSN3Loop3 &lt; 10\r\n        BEGIN\r\n            SELECT @SSN3Loop4 = 0\r\n            WHILE @SSN3Loop4 &lt; 10\r\n            BEGIN\r\n                INSERT INTO @SSN3 (SSN3)\r\n                SELECT CONVERT(char(1),@SSN3Loop1)  \r\n                     + CONVERT(char(1),@SSN3Loop2) \r\n                     + CONVERT(char(1),@SSN3Loop3) \r\n                     + CONVERT(char(1),@SSN3Loop4)\r\n                SELECT @SSN3Loop4 += 1\r\n            END\r\n            SELECT @SSN3Loop3 += 1\r\n        END\r\n        SELECT @SSN3Loop2 += 1\r\n    END\r\n    SELECT @SSN3Loop1 += 1\r\nEND\r\nSELECT\r\n    P.PersonID\r\n    ,P.FirstName\r\n    ,P.LastName\r\n    ,P.SSN\r\n    ,T1.SSN1\r\n    ,T2.SSN2\r\n    ,T3.SSN3\r\nFROM dbo.Person P\r\n    LEFT JOIN @SSN1 T1\r\n        ON SUBSTRING(P.SSN,1,3) = T1.SSN1\r\n    LEFT JOIN @SSN2 T2\r\n        ON SUBSTRING(P.SSN,5,2) = T2.SSN2\r\n    LEFT JOIN @SSN3 T3\r\n        ON SUBSTRING(P.SSN,8,4) = T3.SSN3\r\nORDER BY P.PersonID;\r\nGO\r\nREVERT\r\nGO<\/pre>\n<p>Unmasking the social security number by attacking the individual pieces happens very quickly, about 5 seconds on the lowest Azure tier (Basic, 5 DTU) for 300 rows. Note that the unmasked data results of this query will vary as the sample data is randomly generated, but the same 7 rows will be leading the result set.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"441\" height=\"134\" class=\"wp-image-97183\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-screenshot-of-a-computer-description-automatica-3.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with low confidence\" \/><\/p>\n<p>Next I will clean up the table I created (note that if you have cancelled one of the previous queries, you may not have executed the right number of <code>REVERT<\/code> statements and could get an error that the object does not exist (or you do not have proper rights to drop the object. Execute <code>REVERT<\/code> again a time or two until <code>SELECT SUSER_SNAME();<\/code> returns your advanced security principal you are operating as.:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\n--Cleanup the SSN example by dropping the dbo.Person table\r\nDROP TABLE IF EXISTS dbo.Person\r\nGO<\/pre>\n<h3>Dates<\/h3>\n<p>In this section I will show a few methods of showing the data in a masked date column.<\/p>\n<p>Dates are effortless to unmask, especially if a date dimension is available. A date dimension is a table with one row per date, usually for every row that could possibly show up in a reporting query. It allows you to include the day of the week, month, etc, that you could fetch from system objects, but also custom things like company holidays.<\/p>\n<p>If a date dimension doesn&#8217;t already exist in the target database, it can be created with a recursive CTE. Select the primary key column, the date column from the dimension table, then join the target table to the date dimension. The date is unmasked at this point. A similar process can be used for time data.<\/p>\n<p>You wouldn&#8217;t want to unmask the entire column as a datetime, but they could be tackled in the same query by unmasking them separately. As in previous examples, small sections are much more efficient. More than 70 thousand rows can be unmasked in a second and with a very small data footprint by targeting the date and time sections separately.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\nSET NOCOUNT ON\r\nGO\r\n\/*\r\n* Script showing how to unmask date fields.\r\n* A date dimension is first created. \r\n* The unused columns are left for demonstration\r\n* purposes and show how various date formats\r\n* or parts could be matched.\r\n*\r\n* The datetime column is converted to a date\r\n* and an exact match happens in the join.\r\n*\r\n* A time dimension could be created and matched\r\n* or converted to seconds and matched to a standard\r\n* numbers table if the time is also needed.\r\n*\/\r\nEXECUTE AS USER = 'MaskedReader'\r\nGO\r\nDECLARE @TBLDateDimension TABLE (\r\n    CalendarID            int    --4\r\n    ,CalendarDATE        date                --8\r\n    ,CalendarYEAR        int    --4\r\n    ,CalendarMONTH        tinyint                --2\r\n    ,CalendarDAY        tinyint                --2\r\n    ,CalendarQUARTER    tinyint                --2\r\n    ,CalendarWEEK        tinyint                --2\r\n    ,CalendarDAYOFYEAR    int    --4\r\n    ,CalendarDAYWEEK    tinyint                --2\r\n    ,CalendarDAYNAME    varchar(10)            --10\r\n    ,CalendarMONTHNAME    varchar(10)            --10\r\n)\r\nDECLARE @MAX datetime\r\nSELECT @MAX = '2030-12-31'\r\n;\r\nWITH TIME_CTE (\r\n    cDATE\r\n)\r\nAS (\r\n    SELECT CONVERT(datetime,'1970-01-01')\r\n    UNION ALL\r\n    SELECT cDate + 1\r\n    FROM TIME_CTE\r\n    WHERE cDATE &lt; @MAX\r\n)\r\nINSERT INTO @TBLDateDimension (\r\n    CalendarDATE\r\n    ,CalendarYEAR\r\n    ,CalendarMONTH\r\n    ,CalendarDAY\r\n    ,CalendarQUARTER\r\n    ,CalendarWEEK\r\n    ,CalendarDAYOFYEAR\r\n    ,CalendarDAYWEEK\r\n    ,CalendarDAYNAME\r\n    ,CalendarMONTHNAME\r\n)\r\nSELECT\r\n    cDate        CalendarDATE\r\n    ,DATEPART(yyyy,cDate)        CalendarYEAR\r\n    ,DATEPART(mm,cDate)            CalendarMONTH\r\n    ,DATEPART(dd,cDate)            CalendarDAY\r\n    ,DATEPART(qq,cDate)            CalendarQUARTER\r\n    ,DATEPART(wk,cDate)            CalendarWEEK\r\n    ,DATEPART(dy,cDate)            CalendarDAYOFYEAR\r\n    ,DATEPART(dw,cDate)            CalendarDAYWEEK\r\n    ,DATENAME(dw,cDate)            CalendarDAYNAME\r\n    ,DATENAME(mm,cDate)            CalendarMONTHNAME\r\nFROM TIME_CTE\r\nOPTION (MAXRECURSION 0)\r\nSELECT\r\n    SO.OrderID\r\n    ,SO.OrderDate\r\n    ,D.CalendarDATE OrderDateUnmasked\r\nFROM Sales.Orders SO\r\n    INNER JOIN @TBLDateDimension D\r\n        ON CONVERT(date,SO.OrderDate) = D.CalendarDATE\r\nGO\r\nREVERT\r\nGO<\/pre>\n<p>Dates are unmasked very quickly by comparing against a date dimension.by simply joining to the date column (casting any date values with a date part to a date datatype).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"232\" height=\"101\" class=\"wp-image-97184\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-picture-containing-text-font-screenshot-numbe-1.png\" alt=\"A picture containing text, font, screenshot, number\n\nDescription automatically generated\" \/><\/p>\n<p>The same could be done with the time portion if needed by removing the date part of the column and creating a table with the grain of the data you are interested in by creating a table with a row per second (there are just 86400, and if you needed milliseconds, it could be done with just 8.64 million rows). Finding the seconds since midnight for a date and time value is done using:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT DATEDIFF(SECOND,CAST(SYSDATETIME() AS date), \r\n                                       SYSDATETIME());<\/pre>\n<h2>Brute Force for Unknown Data Format<\/h2>\n<p>The method to unmask entire number columns quickly and efficiently was answered by exploring methods to unmask character columns. The examples above were easily expanded to separate each byte in a string and perform the comparison. To do this with a number, you simply convert the number to a string first. Doing this with numbers is even easier than strings because of a greatly reduced comparison set to be traversed (0-9 and maybe A-Z.).<\/p>\n<p>The next unmasking challenge is to make this a generic process. Masked column information is available in the standard system views as shown above, in the Exploratory Scripts section. If each column is converted and treated as a string, they can be brute forced using a single pattern. This includes a lot of code that you wouldn&#8217;t run in a production script but for a script of this nature it&#8217;s tolerable. This wouldn&#8217;t be deployed as part of a project but used as a demonstration of vulnerabilities or to tune and test audit scripts.<\/p>\n<p>This solution builds on the previous specialized solutions. If you&#8217;ve been following along closely, you probably saw this solution sooner than I found it. The trick to making the solution dynamic is splitting each column into a series of characters and attacking each character individually. This can be tested with a single column of known maximum length, preferably a short column. After proving that it can be done, the rest is just simple SQL coding.<\/p>\n<p>To perform a proof-of-concept, a <code>varchar(3)<\/code> column is used as the target column. A temp table is filled with ASCII values from 1-255 (the rainbow table), each of the 3 characters in the column are separated into individual columns using substring, each substring of the column is joined to the newly created rainbow table, and results are displayed using <code>CONCAT<\/code>. For a proof of concept, the unmasked values could also just be shown individually, but the <code>CONCAT<\/code> makes reading easier.<\/p>\n<p>There are a few things to note in this POC. The collation is specified as Latin, Case Sensitive. This eliminates the duplicates from the upper-and lower-case char values that would be generated using the full ASCII set via a Cartesian join. It also preserves the case of the masked data. To simplify the code, the <code>CHARACTERS_CTE<\/code> could be joined rather than using the <code>CHAR<\/code> function for each join. The code executes in less than 1 second, so optimizations aren&#8217;t needed. But for larger tables and wider columns, every optimization helps.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\nSET NOCOUNT ON\r\nGO\r\n\/*\r\n* Script showing how the first 3 characters of a string\r\n* can be unmasked completely by breaking the string into sections.\r\n* This is the key to unmasking an entire column\r\n* but shows the manual version.\r\n*\r\n* The generated numbers table and characters table is \r\n* the same standard CTE as used in previous samples.\r\n*\/\r\nEXECUTE AS USER = 'MaskedReader'\r\nGO\r\n;\r\nWITH NUMBERS_CTE AS (\r\n    SELECT TOP 255\r\n        ROW_NUMBER() OVER(ORDER BY object_id) Number\r\n    FROM sys.objects\r\n)\r\n,CHARACTERS_CTE AS (\r\n    SELECT\r\n        CHAR(Number) CharValue\r\n    FROM NUMBERS_CTE N\r\n)\r\nSELECT\r\n    C.CountryID\r\n    ,C.CountryName\r\n    ,C.IsoAlpha3Code\r\n    ,N1.Number\r\n    ,CHAR(N1.Number)        UnmaskedCharacter1\r\n    ,CHAR(N2.Number)        UnmaskedCharacter2\r\n    ,CHAR(N3.Number)        UnmaskedCharacter3\r\n    ,CONCAT(CHAR(N1.Number),CHAR(N2.Number),CHAR(N3.Number))\r\n                               Unmasked_ISOAlpha3Code\r\nFROM Application.Countries C\r\n    LEFT JOIN NUMBERS_CTE N1\r\n        ON SUBSTRING(C.IsoAlpha3Code,1,1) \r\n              COLLATE SQL_Latin1_General_CP1_CS_AS \r\n          = CHAR(N1.Number)  \r\n              COLLATE SQL_Latin1_General_CP1_CS_AS\r\n    LEFT JOIN NUMBERS_CTE N2\r\n        ON SUBSTRING(C.IsoAlpha3Code,2,1) \r\n              COLLATE SQL_Latin1_General_CP1_CS_AS \r\n          = CHAR(N2.Number)  \r\n              COLLATE SQL_Latin1_General_CP1_CS_AS\r\n    LEFT JOIN NUMBERS_CTE N3\r\n        ON SUBSTRING(C.IsoAlpha3Code,3,1) \r\n              COLLATE SQL_Latin1_General_CP1_CS_AS \r\n          = CHAR(N3.Number)  \r\n              COLLATE SQL_Latin1_General_CP1_CS_AS;\r\nGO\r\nREVERT;\r\nGO<\/pre>\n<p>The above script is a steppingstone to complete, automated unmasking. It is manually configured, including each character in the column.<\/p>\n<p>\u00a0 <img loading=\"lazy\" decoding=\"async\" width=\"789\" height=\"118\" class=\"wp-image-97185\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-screenshot-of-a-computer-description-automatica-4.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with low confidence\" \/><\/p>\n<p>The above example shows how to unmask almost any column, but it is a little cumbersome due to the manual nature of the script. The table and column were both selected by hand and the query was crafted specifically for that column. The next step to automation is unmasking a column without specifying the length of the column beforehand.<\/p>\n<p>The more automated version took 1 second to run. That is a very small difference but on large tables it would clearly get larger. The automation and extra functions come at a cost. Much of the base code is the same, but the size of the column is determined dynamically, and the columns are joined automatically. The <code>STUFF<\/code> function with <code>FOR XML<\/code> will be familiar to any SQL developers looking to automatically concatenate columns. It greatly simplifies the code needed to unmask columns. The primary key for the table, <code>CountryID<\/code>, is used in the <code>ORDER BY<\/code>, as is the Number column. The Number column is simply the order for each character. The Number column is also used to generate the CHAR values used for each comparison.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters\r\nGO\r\nSET NOCOUNT ON;\r\nGO\r\n\/*\r\n* Script showing how the first 3 characters of a string\r\n* can be unmasked completely by breaking the string \r\n* into sections.\r\n*\r\n* The generated numbers table and characters table is \r\n* the same standard CTE as used in previous samples.\r\n*\r\n* A small, 3 character, column was used for this test.\r\n* This uses a join to the numbers table to separate each character.\r\n* The next join then matches the column separated in the previous join\r\n* to the CHAR values to get an exact mask (confined \r\n* to the ASCII character set)\r\n*\r\n* STUFF with FOR XML PATH are used to dynamically \r\n* recombine the individual unmasked characters back \r\n* to a single column. The newer syntax, STRING_AGG \r\n* is discussed in the final brute force script.\r\n*\/\r\nEXECUTE AS USER = 'MaskedReader';\r\nGO\r\n;\r\nWITH NUMBERS_CTE AS (\r\n    SELECT TOP 255\r\n        ROW_NUMBER() OVER(ORDER BY object_id) Number\r\n    FROM sys.objects\r\n)\r\n,CHARACTERS_CTE AS (\r\n    SELECT\r\n        CHAR(Number) CharValue\r\n    FROM NUMBERS_CTE N\r\n)\r\n,MASK_ISO_CTE AS (\r\n    SELECT \r\n        AC.CountryID\r\n        ,AC.CountryName\r\n        ,N.Number\r\n        ,C.CharValue\r\n    FROM Application.Countries AC\r\n        INNER JOIN NUMBERS_CTE N\r\n            ON N.Number &lt;= \r\n                 LEN(CONVERT(varchar(11),AC.IsoAlpha3Code))\r\n        LEFT JOIN CHARACTERS_CTE C\r\n            ON SUBSTRING(AC.IsoAlpha3Code,N.Number,1) \r\n                    COLLATE SQL_Latin1_General_CP1_CS_AS \r\n                = C.CharValue \r\n                    COLLATE SQL_Latin1_General_CP1_CS_AS\r\n)\r\nSELECT\r\n    AC.CountryID\r\n    ,AC.CountryName\r\n    ,AC.IsoAlpha3Code\r\n    ,STUFF((SELECT ISNULL(MIC.CharValue,'')\r\n              FROM MASK_ISO_CTE MIC    \r\n                WHERE MIC.CountryID = AC.CountryID\r\n                ORDER BY \r\n    MIC.CountryID,MIC.Number\r\n              FOR XML PATH('')), 1, 0, '')  \r\n                             IsoAlpha3Code_FromMasked\r\nFROM Application.Countries AC\r\nORDER BY CountryID\r\nGO\r\nREVERT\r\nGO<\/pre>\n<p>The output for this script is the same as the previous script, but it is fully automated for this table. The table and key columns are specified manually, but the matching happens based on column position.\u00a0<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"398\" height=\"118\" class=\"wp-image-97186\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/a-screenshot-of-a-computer-description-automatica-5.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with low confidence\" \/><\/p>\n<h3>Automated Brute Force Attack<\/h3>\n<p>Creatin a more fully automated dynamic brute force attack builds on standard dynamic SQL and the techniques shown above. This script is more complicated than the previous scripts, so each section is broken down. Variables are set at the top of the script for each run. This includes the table schema, table name, if the script should run or just print to screen, if <code>NOLOCK<\/code> should be used, if only the masked column names should be shown, and any columns that should be excluded from the process. If the table that is specified does not exist, is in a different database, or doesn&#8217;t contain masked columns, the script will exit. Internal variables are also created in this section.<\/p>\n<p>Masked columns are identified next, including names and sizes. Text columns are differentiated from numeric columns to potentially speed the process and the columns specified for exclusion are left out of the result set. If any of the columns contain text characters, the full ASCII character set is used. Otherwise only 0-9, period and comma are included.<\/p>\n<p>Primary keys are identified in the next section. This is needed to join the unmasked columns back to the target table. It is also needed to order the unmasked characters in each column. The primary keys are then used to create the join statements and order statements. These statements are used later in the dynamic stuff commands.<\/p>\n<p>The next section is where everything is organized, and the SQL statement is created. Numbers and character CTEs are defined first. They are used multiple times in the script, for both the column CTEs and the stuff commands.<\/p>\n<p>A separate CTE is created for each masked column, named <code>ColumnName_CTE<\/code>. This would be a problem if the data team allowed crazy characters such as spaces in the column name. If that&#8217;s the case, a clean string function could be added or the replace function could be used to target specific issues. The numbers CTE is first used to parse each character in the column, then the characters CTE is used to match each masked character to the unmasked character list.<\/p>\n<p>At this point, the data is unmasked. The remainder of the script is focused on presentation. The server, database, schema, and table are listed. The base information is followed by the default columns, including the masked data. Finally, each masked column CTE is used in a stuff statement to present the unmasked column. The stuff section is identical to the proof of concept, it&#8217;s just dynamically repeated for each column.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE WideWorldImporters;\r\nGO\r\nSET NOCOUNT ON;\r\nGO\r\n\/*\r\n* Complete unmasking of all columns in a table.\r\n* \r\n*\/\r\nEXECUTE AS USER = 'MaskedReader';\r\nGO\r\n\/*********************************************************\r\n* Declare and set the variables for this run \r\n*********************************************************\/\r\nDECLARE\r\n    @ServerName  varchar(255)  = 'localhost' --@@SERVERNAME\r\n    ,@SchemaName  varchar(255)  = 'Purchasing'\r\n    ,@TableName  varchar(255) = 'SupplierTransactions' \r\n              --'SuppliersTest' --'SupplierTransactions'\r\n    ,@PrimaryKey  varchar(255) = NULL --'SupplierName'\r\n    ,@EXECUTE  tinyint  = 1 -- 1 = execute, 0 = print\r\n    ,@NOLOCK  tinyint    = 1 \r\n       --1 = WITH(NOLOCK), 0 = default lock (committed)\r\n    --,@strStandardFlag  varchar(20)\r\n    --,@strStandardCode  varchar(20)\r\n    ,@SHOW_COLUMNS_ONLY    bit   = 0  \r\n            --1 = columns only and exit, \r\n            --0 = don't show column list\r\n    ,@Exclude    varchar(500) --= \r\n          --'BankAccountBranch,BankAccountCode' \r\n          --CSV list of columns that will be ignored\r\n\/*******************************************************\r\n* Temporary table definitions\r\n********************************************************\/\r\nDECLARE @Columns TABLE (\r\n    ColumnID    int\r\n    ,ColumnName    varchar(255)\r\n    ,ColumnLength                smallint\r\n    ,HasText    tinyint\r\n)\r\n\/****************************************************\r\n* Variables set internally\r\n****************************************************\/\r\nDECLARE\r\n    @HasText    tinyint\r\n    ,@SQL        varchar(max)\r\n    ,@InternalWhere                varchar(max)\r\n    ,@OrderList    varchar(max)\r\n\/****************************************************\r\n* Set the columns to include\r\n*\r\n* Find masked columns in the table specified and add them to \r\n* the @Columns table variable\r\n*\r\n* Set the HasText column to 1 if it is a string based \r\n* column. The  comparison dataset can be reduced \r\n* considerably if only numeric masked columns are present.\r\n******************************************************\/\r\nINSERT INTO @Columns\r\nEXEC (\r\n    'SELECT \r\n        SC.column_id\r\n        ,SC.name\r\n        ,CASE \r\n            WHEN sc.max_length = -1 THEN 100\r\n            --WHEN sc.max_length = 0 THEN 100\r\n            WHEN ST.name IN (' + '''' + 'nchar' + '''' + ',' + \r\n'''' + 'nvarchar' + '''' + ') THEN SC.max_length    \/ 2\r\n            WHEN ST.name IN (' + '''' + 'char' + '''' + ',' + \r\n'''' + 'varchar' + '''' + ') THEN SC.max_length\r\n            ELSE 40 --hard code numbers to 40 wide. Verify this\r\n                    --with different scenarios\r\n        END    MaxLength\r\n        ,CASE\r\n            WHEN ST.name IN (' + '''' + 'varchar' + '''' \r\n                + ',' + '''' + 'nvarchar' + ''''\r\n                + ',' + '''' + 'char' + ''''\r\n                + ',' + '''' + 'nchar' + '''' + ') THEN 1\r\n            ELSE 0\r\n        END HasText\r\n    FROM sys.objects SO\r\n        INNER JOIN sys.columns SC\r\n            ON so.object_id = sc.object_id\r\n        INNER JOIN sys.schemas SS\r\n            ON so.schema_id = ss.schema_id\r\n        INNER JOIN sys.types ST\r\n            ON SC.system_type_id    = ST.system_type_id\r\n            AND SC.user_type_id        = ST.user_type_id\r\n        LEFT JOIN string_split(' + '''' + @Exclude + '''' +\r\n ',' + '''' + ',' + '''' + ') SSTR\r\n            ON SC.name COLLATE SQL_Latin1_General_CP1_CI_AS\r\n             = SSTR.value COLLATE SQL_Latin1_General_CP1_CI_AS \r\n    WHERE so.name COLLATE SQL_Latin1_General_CP1_CI_AS = ' + \r\n'''' + @TableName + '''' + ' COLLATE \r\nSQL_Latin1_General_CP1_CI_AS \r\n        AND ss.name COLLATE SQL_Latin1_General_CP1_CI_AS = ' + \r\n'''' + @SchemaName + '''' + ' COLLATE SQL_Latin1_General_CP1_CI_AS \r\n        AND ST.system_type_id NOT IN (240)\r\n        AND SC.is_masked                = 1\r\n        AND SSTR.value    IS NULL\r\n    ORDER BY sc.name'\r\n)\r\n--Determines character set to use\r\nSELECT @HasText = MAX(HasText)\r\nFROM @Columns\r\nIF @SHOW_COLUMNS_ONLY = 1\r\nBEGIN\r\n    SELECT * FROM @Columns\r\n    GOTO NO_EXECUTE\r\nEND\r\nIF (SELECT COUNT(*) FROM @Columns) = 0\r\nBEGIN\r\n    RAISERROR('No columns present: Check that you are running the script from the correct database, that the name of the table is correct and that you have permission to select from the table.',5,1)\r\n    GOTO NO_EXECUTE\r\nEND\r\n\/********************************************************\r\n* Primary Key Columns\r\n*\r\n* The sample script requires a primary key column for it \r\n* to work automatically. The primary key columns for the \r\n* table specified are dynamically inserted into @PrimaryKeys \r\n* using system tables.\r\n*\r\n* A candidate key can be specified via the @PrimaryKey variable\r\n* if it is not specified on the table but the key is known.\r\n*\r\n* The same purpose could be achieved with ROW_NUMBER() \r\n* for tables  without a primary key. \r\n**********************************************************\/\r\nDECLARE @PrimaryKeys TABLE (\r\n    PrimaryKeyName                sysname\r\n    ,ColumnName    sysname\r\n    ,RowNumber    int            identity\r\n)\r\nINSERT INTO @PrimaryKeys (\r\n    PrimaryKeyName\r\n    ,ColumnName\r\n)\r\nEXEC('\r\nSELECT\r\n    si.name    PrimaryKeyName\r\n    ,sc.name    ColumnName\r\nFROM sys.objects so\r\n    INNER JOIN sys.schemas ss\r\n        ON so.schema_id = ss.schema_id\r\n    INNER JOIN sys.indexes si\r\n        ON so.object_id = si.object_id\r\n    INNER JOIN sys.index_columns ic\r\n        ON so.object_id = ic.object_id\r\n        AND si.index_id    = ic.index_id\r\n    INNER JOIN sys.columns sc\r\n        ON ic.object_id = sc.object_id\r\n        AND ic.index_column_id = sc.column_id\r\nWHERE so.name COLLATE SQL_Latin1_General_CP1_CI_AS = ' + '''' + \r\n@TableName + '''' + ' COLLATE SQL_Latin1_General_CP1_CI_AS\r\n        AND ss.name COLLATE SQL_Latin1_General_CP1_CI_AS = ' + \r\n'''' + @SchemaName + '''' + ' COLLATE \r\nSQL_Latin1_General_CP1_CI_AS\r\n        AND SI.is_primary_key = 1\r\nORDER BY \r\n    sc.column_id\r\n')\r\nIF (SELECT COUNT(*) FROM @PrimaryKeys) = 0\r\nBEGIN\r\n    IF @PrimaryKey IS NULL GOTO NO_EXECUTE\r\n    ELSE\r\n    BEGIN\r\n        INSERT INTO @PrimaryKeys (PrimaryKeyName,ColumnName)\r\n        VALUES (@PrimaryKey,@PrimaryKey)\r\n    END\r\nEND\r\n\/*****************************************************\r\n* Internal WHERE Clause for JOINS\r\n*\r\n* Each column has a STUFF statement during the unmasking process.\r\n* This creates the WHERE clause for the STUFF statements\r\n*\r\n* The same @InternalWhere and @OrderList is used for each masked column\r\n*******************************************************\/\r\nSELECT @InternalWhere = ''\r\nSELECT\r\n    @InternalWhere += \r\n    CASE RowNumber\r\n        WHEN 1 THEN '            WHERE MCI.' \r\n               + ColumnName + ' = MC.' + ColumnName\r\n        ELSE '            AND MCI.' \r\n               + ColumnName + ' = MC.' + ColumnName\r\n    END\r\nFROM @PrimaryKeys;\r\n\/*******************************************************\r\n* Internal ORDER List\r\n*\r\n* The ORDER list for the STUFF statements for each column.\r\n********************************************************\/\r\nSELECT @OrderList = ''\r\nSELECT\r\n    @OrderList += \r\n    CASE RowNumber\r\n        WHEN 1 THEN 'MC.' + ColumnName\r\n        ELSE ',MCI.' + ColumnName\r\n    END\r\nFROM @PrimaryKeys\r\n\/********************************************************\r\n* SQL Statement\r\n*\r\n* The Numbers CTE is created first.\r\n* This example is limited to ASCII characters.\r\n* The same thing could be done with a wide character set. \r\n* It would take longer and exploration of the characters \r\n* used would be recommended.\r\n*\r\n* The @HasText variable is used to determine if only numeric \r\n* characters are included in the comparison or if all 255 \r\n* ASCII characters are checked.\r\n* This could be done in the creation below for each column, \r\n* but is just done once for simplicity in this example.\r\n*********************************************************\/\r\nSELECT @SQL = '\r\n' + CASE @NOLOCK WHEN 1 THEN \r\n      'SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED\r\n'\r\nELSE ''\r\nEND\r\nSELECT @SQL += '\r\n;\r\nWITH NUMBERS_CTE AS (\r\n    SELECT TOP 255\r\n      ROW_NUMBER() OVER(ORDER BY object_id, column_id) Number\r\n    FROM sys.all_columns\r\n)\r\n,CHARACTERS_CTE AS (\r\n    SELECT\r\n      CHAR(Number) CharValue\r\n    FROM NUMBERS_CTE N\r\n'\r\n    + CASE @HasText\r\n        WHEN 0 THEN \r\n         'WHERE Number IN (45,46) OR Number BETWEEN 48 AND 57 '\r\n        WHEN 1 THEN \r\n         'WHERE Number BETWEEN 1 AND 255 '\r\n        \r\n        END\r\n    + '\r\n)'\r\n\/*\r\n* A CTE is created for each column to unmask each character\r\n* individually. The join to NUMBERS_CTE is used to iterate \r\n* through each character. The join the ChARACTERS_CTE is used\r\n* to determine the unmasked value for each character.\r\n* COLLATE is used to simplify the comparisons by assuming the \r\n* same collation. It is also  used to be sure the case and \r\n* accents are preserved.\r\n*\r\n* Sample generated query section for a CTE follows:\r\n,TransactionDate_CTE AS (\r\n    SELECT \r\n        *\r\n    FROM [Purchasing].[SupplierTransactions] MT WITH (NOLOCK)\r\n        INNER JOIN NUMBERS_CTE N\r\n            ON N.Number &lt;= \r\n              LEN(CONVERT(nvarchar(40),MT.TransactionDate))\r\n        LEFT JOIN CHARACTERS_CTE C\r\n            ON SUBSTRING(CONVERT(nvarchar(40), \r\n                      MT.TransactionDate),N.Number,1) \r\n                         COLLATE SQL_Latin1_General_CP1_CS_AS \r\n           = C.CharValue COLLATE SQL_Latin1_General_CP1_CS_AS\r\n)\r\n*\/\r\nSELECT @SQL += '\r\n,' + ColumnName + '_CTE AS (\r\n    SELECT \r\n        *\r\n    FROM [' + @SchemaName + '].[' + @TableName + '] MT WITH \r\n(NOLOCK)\r\n        INNER JOIN NUMBERS_CTE N\r\n            ON N.Number &lt;= LEN(CONVERT(nvarchar(' + \r\n                  CONVERT(varchar(10),ColumnLength) + '),\r\n                          MT.' + ColumnName + '))\r\n        LEFT JOIN CHARACTERS_CTE C\r\n            ON SUBSTRING(CONVERT(nvarchar(' + \r\nCONVERT(varchar(10),ColumnLength) + '),MT.' + ColumnName + \r\n'),N.Number,1) COLLATE SQL_Latin1_General_CP1_CS_AS = \r\nC.CharValue COLLATE SQL_Latin1_General_CP1_CS_AS\r\n)'\r\nFROM @Columns\r\n\/* \r\n* Static columns are specified, then the primary key columns\r\n*\/\r\nSELECT @SQL += '\r\nSELECT\r\n    ' + '''' + @ServerName + '''' + ' SERVER_NAME\r\n    ,' + '''' + DB_NAME() + '''' + ' DATABASE_NAME\r\n    ,' + '''' + @SchemaName + '''' + ' SCHEMA_NAME\r\n    ,' + '''' + @TableName + '''' + ' TABLE_NAME'\r\nSELECT @SQL += '\r\n    ,MC.' + ColumnName\r\nFROM @PrimaryKeys\r\nSELECT @SQL += '\r\n    ,' + ColumnName\r\nFROM @Columns\r\n\/*\r\n* A STUFF function call is created for each masked column. \r\n* Note the use of the @InternalWhere and the @OrderList created \r\n* earlier in the script. The SELECT statement uses a correlated \r\n* subquery to join the internal query back to the \r\n* parent query. \r\n*\r\n* STRING_AGG could be used in place of STUFF, but the order of \r\n* columns isn't guaranteed. It was tested and worked, but given \r\n* that the order is guaranteed it was not used. Likely\r\n* instances when the order would differ is when the clustered \r\n* index does not align with the primary key.\r\n*\r\n* The query section generated for STUFF format is as follows:\r\n    ,STUFF((SELECT ISNULL(MCI.CharValue,'')\r\n            FROM TransactionDate_CTE MCI\r\n            WHERE MCI.SupplierTransactionID = \r\n                               MC.SupplierTransactionID\r\n            ORDER BY \r\n                MC.SupplierTransactionID,MCI.Number\r\n            FOR XML PATH, TYPE).value(N'.[1]', \r\n                  N'nvarchar(max)'), 1, 0,'') \r\n                            TransactionDate_FromMasked\r\n* The same query section using STRING_AGG follows:\r\n    ,(SELECT STRING_AGG(ISNULL(CharValue,''),'')\r\n            FROM TransactionDate_CTE MCI\r\n            WHERE MCI.SupplierTransactionID    = \r\n                       MC.SupplierTransactionID) \r\n                                TransactionDate_FromMasked\r\n*\/\r\nSELECT @SQL +=\r\n    '\r\n    ,STUFF((SELECT ISNULL(MCI.CharValue,' + '''' + '''' + ')\r\n            FROM ' + ColumnName + '_CTE MCI\r\n'\r\n            + @InternalWhere\r\n            + '\r\n            ORDER BY \r\n                ' + @OrderList + ',MCI.Number\r\n            FOR XML PATH, TYPE).value(N' + '''' + '.[1]' + '''' \r\n+', N' + '''' + 'nvarchar(max)' + '''' + '), 1, 0,' + '''' + \r\n'''' + ') ' + ColumnName + '_FromMasked'\r\nFROM @Columns\r\nSELECT @SQL += '\r\nFROM [' + @SchemaName + '].[' + @TableName + '] MC WITH \r\n(NOLOCK)'\r\nSELECT @SQL += '\r\nORDER BY'\r\nSELECT @SQL += \r\n    CASE RowNumber\r\n        WHEN 1 THEN '\r\n    MC.' + ColumnName \r\n        ELSE '\r\n    ,MC.' + ColumnName\r\n    END\r\nFROM @PrimaryKeys\r\nBEGIN TRY\r\n    IF @EXECUTE = 1\r\n    BEGIN\r\n        EXEC(@SQL)\r\n    END\r\n    ELSE SELECT @SQL\r\nEND TRY\r\nBEGIN CATCH\r\n    SELECT\r\n        ERROR_NUMBER() AS ErrorNumber\r\n        ,ERROR_SEVERITY() AS ErrorSeverity\r\n        ,ERROR_STATE() AS ErrorState\r\n        ,ERROR_PROCEDURE() AS ErrorProcedure\r\n        ,ERROR_LINE() AS ErrorLine\r\n        ,ERROR_MESSAGE() AS ErrorMessage;\r\nEND CATCH\r\nNO_EXECUTE:\r\nGO\r\nREVERT\r\nGO<\/pre>\n<p>The results of this query is un maskeddata.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2047\" height=\"134\" class=\"wp-image-97187\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/06\/word-image-97181-6.png\" \/> \u00a0<\/p>\n<p>The script demonstrates the ease with which data can be unmasked. Unmasking data requires direct access in addition to technical knowledge, but it can be unmasked. The generic approach to unmasking isn&#8217;t as efficient as specialized strategies but it requires virtually no exploratory work and is fast enough for a one-time procedure. If a bad actor is going to maliciously unmask a table it likely isn&#8217;t something they would perform multiple times. The data would be unmasked and saved off to a file or table. This makes efficiency less important and it also makes it more difficult to detect an attack.<\/p>\n<h2>Summary<\/h2>\n<p>SQL Server Dynamic Data Masking is potentially susceptible to side channel attacks. This will likely limit the scenarios where you will want to allow direct access to data. Additional implications of this and mitigations are explored in the final section of this series.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Continuing from the previous entry in this series on security concerns with using Dynamic Data Masking, in this blog I want to close out that discussion showing how you can unmask different sort of data that has been dynamically masked, even if you don&#8217;t have access to unmasked data. Clearly this is NOT a suggestion&#8230;&hellip;<\/p>\n","protected":false},"author":19670,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[159004],"coauthors":[98702],"class_list":["post-97181","post","type-post","status-publish","format-standard","hentry","category-blogs","tag-benjohnston_dynamicdatamasking"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97181","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/19670"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=97181"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97181\/revisions"}],"predecessor-version":[{"id":103156,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97181\/revisions\/103156"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=97181"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=97181"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=97181"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=97181"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}