18 July 2018

2 Comments

18 July 2018

2 Comments

The ‘Right to be Forgotten’ and Data Masker for SQL Server

The right to be forgotten is one of the main features of new data protection legislation across the globe. Under Article 17 of one such piece of legislation in Europe, the GDPR, individuals have the right to have personal data erased from all systems and data the company may be storing about them. Similarly, in the USA the California Consumer Privacy Act will require companies to respond to requests for data access, deletion and portability within 45 days.

While there may be many pieces of unstructured data littering our systems that hold customer data, perhaps one of the hardest problems to tackle is the erasure of personal information from our structured data, and how we also handle database backups. whenever we need to restore this information either back into Production or other environments.

In some cases, we may be able to simply remove a user from the database manually or by other means on an ad-hoc basis. But where we have important data referentially depending on these contacts e.g. order history, which we may be using for analytics or other important workflows, it is not possible to do so.

So how do we “redact” user information, while maintaining database integrity, especially where we may restore sensitive data over the top of this redaction?

For the purposes of this article I’m going to be using an example of my Forums-Redgate-Com database, specifically the Contacts table, as the target to tackle with the right to be forgotten:

The right to be forgotten 1

Figure 1: The top 10 rows of the Contacts table

Approach

There are several approaches we can take to tackling this problem. The first such solution is never restoring from a backup that was taken prior to receiving the request to be forgotten. This in itself is the least ideal workflow because it renders all prior backups useless and effectively means we can just delete them.

The second approach is to write a script and maintain this moving forward that we run against environments once the backup has been restored:

But this approach also has several drawbacks. We are only running a SQL script and in case of audit it is generally preferred to have some kind of log or report that demonstrates what steps were taken to comply with the request. It would also be beneficial to have something that we can programmatically keep updated instead of having to maintain a full “Right to be Forgotten” script in the future.

The first step we should take is to have a central list of which users have requested the right to be forgotten. By keeping a list of not-personally-identifiable identifiers, like Customer_IDs, we can easily update this as and when new requests come in. This could be in the form of a CSV file, a table in a database or however else we can securely store it. For this example, I have created a simple table on a separate database:

The right to be forgotten 2

Figure 2: The table of forgotten Customer_IDs

From here we can use Redgate’s Data Masker for SQL Server to define a masking set which will redact all users that appear in this table:

The right to be forgotten 3

Figure 3: We have defined a new substitution rule for the Contacts table to carry out a similar operation to the code we wrote above

The right to be forgotten 4

Figure 4: Next we apply a SQL WHERE clause to fetch the redacted IDs from the relevant table. This could be from another database as in this example or could have been loaded into the database programmatically as part of the restore job.

Now we can update the RedactedUsers table and if we go to run the Data Masker tool it will carry out the redaction for us. In this case, let’s try ContactID number 6:

The right to be forgotten 5

Figure 5: Customer 6 has requested the right to be forgotten

The right to be forgotten 6

Figure 6: After the run of the masking rules we can see only the relevant row has been redacted

Data Masker outputs reports as part of every masking operation and we can see below that as part of this run only one row was processed., As we know that there is only one row in the “Right to be Forgotten” table with 12 columns to be redacted, we know that the masking has completed successfully.

The right to be forgotten 7

Figure 7: An extract of the Data Masker Masking Report

Now that we have the foundation of our redaction process, all that is left to do is automate it as part of a restore. In this case, we can script out the calling of the Data Masker command line and call it as part of the SQL Agent job:

start/wait "" "C:\Program Files\Red Gate\Data Masker for SQL Server 6\DataMasker.exe" "C:\Users\chris.unwin\Documents\Data Masker(SqlServer)\Masking Sets\RedactForgottenUsers.DMSMaskSet" -R -X

This means that once the database is restored to a state before we initially redacted the information, we can then immediately redact it again (but we can also just maintain the list of forgotten users, in this case we’ve added rows 2 and 5 over time):

The right to be forgotten 8

Figure 8: Result of the redaction after running the command line for Data Masker after restoring

Conclusion

In summary, there are many ways we can handle actioning the “Right to be Forgotten” when it is requested, whether this is handling the update manually, scripting the change or using Redgate’s Data Masker for SQL Server to achieve the end result with the level of logging and reporting we would like to retain for audit purposes.

The goal is always to ensure that we are compliant with the necessary legislation, yet we can respond to requests we receive with the lowest possible overhead both to the data subject and our own teams. The solution outlined in this article is one such approach to achieve this.

Tools in this post

Data Masker

Replace sensitive data with realistic, anonymized, test data

Find out more

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

You may also like

  • Jonlee Lockwood

    Hi Chris,

    Nice article. However could you see a benefit in following the same steps, but leveraging the true power of Data Maskers for the Address Lines, email etc. Then tag the records so that you know it has been redacted.

    Doing this would allow you to keep some form of insights in the DB, but without the PII.

    P.S. I’m not a “Right to be forgotten” expert, so I may look stupid in your response 🙂

    • Chris Unwin

      Hi Jonlee,

      Thank you for your comment! 🙂 I don’t think anyone in the world can call themselves a “Right to be forgotten” expert, and from my perspective it’s definitely not a stupid reply – we can all throw something into the melting pot and see what comes out!

      I think you’re absolutely right – with this article it was more that I was focussing on the process and the benefits that having a third party tool offers over manual scripting, but yes absolutely, given that I was using a substitution rule to carry out the redaction, I think you could easily leverage Data Masker more heavily to retain an adequate spread of more useful business information, especially where a copy of this Database was later used for something like Business Intelligence / Reporting – the change itself wouldn’t be a difficult one but as i say, it’s getting the process right in the first place!

      Thank you very much for reading 🙂