So what is data mapping and why is it the key to GDPR compliance?
I’ve read a lot about the General Data Protection Regulation (GDPR) because I’m interested in it both from a professional and personal perspective. I’ve been working with data, as an IT manager in financial services and as a product manager in a data software company, for over 20 years and in many ways the GDPR was inevitable.
Think about it. The amount of information flowing through organizations is rising at an astonishing rate. The way that data is processed and used is also changing – sometimes in bad ways, as the recent debacle with Facebook with Cambridge Analytica demonstrates. We’ve moved from gathering data about individuals to wanting data that is also linkable to them. Something had to give.
The GDPR is a direct response and its introduction, while unwelcome in some respects, will actually help organizations cope with the growing size and complexity of data in a smarter, more efficient way.
Right now, protecting data can be difficult. It’s often spread across and copied to a number of different environments, and it’s hard to know what to restrict it to and where it’s located. This data sprawl inevitably leaves organizations open to data breaches, and not just from hackers either. In its 2017 end of year review of data breaches, the Identity Theft Resource Center revealed that ~10% of breaches were caused by employee error or negligence, ~7% were a result of accidental exposure, and ~5% were down to insider theft.
Having a way to identify and manage data will help to protect it, and the GDPR puts in place the framework for doing so. Article 30 of the regulation places a legal requirement on organizations to maintain a record of processing activities under their responsibility, and make it available to the relevant supervisory authority on request. The following information now needs to be documented:
- The purposes of processing data (customer management, marketing, etc)
- The categories of the individuals involved (customers, patients, etc)
- The categories of personal data being processed (financial information, health data, etc)
- The categories of any recipients of the data (suppliers, credit reference agencies, etc)
- Details of any transfers to other countries
- How long the data will be kept for
- The technical and organizational security measures in place (encryption, access controls, etc)
One proviso here: organizations which employ less than 250 people need only document processing activities that are regularly undertaken, or are likely to result in a risk to the rights and freedoms of individuals, or involve special category data, or data related to criminal convictions and offences.
For everyone else, Article 30 is the key to being compliant – and demonstrating compliance – and will also help meet other aspects of the GDPR. It will aid in drafting the privacy notice, for example, that is now required whenever personal data is collected. It will enable organizations to respond to requests from individuals for access to their data, or its rectification or erasure, faster and easier.
Perhaps most importantly, it will give organizations an accurate picture of what data they hold, where it is, and whether it is data which needs to be protected. That knowledge, in turn, will immediately flag up any access controls that are required, and where measures like pseudonymization, encryption, anonymization and aggregation should be adopted. If copies of databases are used in development and testing, for example, personal data should be masked.
This is where data mapping comes in – the process of discovering and classifying data so that it can then be protected and managed in a consistent, reliable way.
Discover your data
This may sound like an easy step but data within organizations tends to leak into lots of different places. Alongside the production database, there will probably be other environments like development, UAT and staging. Developers could be using copies of the database as development sandboxes. There will be backups, and other data may be held on legacy systems that have yet to be updated, or cloud services like Azure or AWS. Different business units may also have their own databases tailored to their specific requirements.
By identifying every database, and every instance of it, you’ll know the full extent of all of the data that is being used and accessed. It’s also useful at this stage to literally map out every location, with arrows showing how data flows between them.
Classify your data
The next step is crucial in fulfilling the requirements of the GDPR. A taxonomy needs to be applied to identify which data is personal, like names and addresses, and which is special or sensitive personal data, like ethnic origin or data concerning a person’s health.
If you use SQL Server, it’s a good idea to follow the same taxonomy used in Microsoft SQL Server Management Studio 17.5. Columns can then be tagged to identify what kind of data they contain.
Reduce the surface attack area of the data
At this point, you should have a good understanding of what data needs to be protected under the GDPR. If you also look at how it flows through your organization, the gaps where it is open to compromise will be clear.
If developers are using copies of the production database for testing, for example, steps will need to be taken to safeguard it by masking the personal data it contains. Similarly, individuals will need permission to view, modify or delete only the personal data that is relevant to their job role, and for which appropriate consent has been obtained. Encryption is also necessary when data is at rest, in transit, and in use.
Manage your data
Finally, you need to manage your data, because GDPR compliance is an ongoing requirement, not a one-off exercise. That’s easier than it sounds because organizations often process data in the same way, repeatedly. So the knowledge gained in the mapping exercise will enable you to record it – and then maintain the same record for future processing.
That way, you can follow Article 30 and, at the same time, have a true and living record of your data processing activity that can be used in future decision-making. When someone wants to implement a backup and availability plan, for example, the appropriate level of the privacy of the data will already be known. The Information Commissioner’s Office, the supervisory authority for the GDPR in the UK, has a very good downloadable template for just such a record. A similar template is available for companies which act as third party processors and also need to keep a record.
Automate data mapping where you can
Discovering, classifying and protecting data under the GDPR is a big responsibility for every organization employing more than 250 people. While it can appear to be daunting, the burden can be lightened by automating parts of the process.
At the discovery phase, for example, there are software solutions out there that can quickly build an inventory of your server estate and remove the guesswork in understanding which tables or columns contain sensitive data for compliance. The data discovery and classification feature in the newly released SQL Server Management Studio 17.5 can also help.
Protecting your data and demonstrating compliance can similarly be made easier with solutions that let you control permissions across your server estate from one UI, and generate documentation to show who has access to what data.
Finally, if your organization gives developers copies of databases for use in development, there are a number of data masking tools that can automate the provisioning and masking of the copies. Offering a consistent, reliable and repeatable way of masking data, it makes compliance a natural part of the process, rather than an unwelcome addition to it.
The GDPR brings new responsibilities for organizations which need to store and process personal data. Plan for it in the right way, however, and perhaps make use of the tools that are now available, and the journey to compliance will be a lot easier.
You can find out more about keeping sensitive data secure on Redgate’s Data Privacy and Protection pages.
If you’d like to gain a deeper understanding of the GDPR, you can also read Richard’s other posts on the topic:
So what is GDPR, and why should Database Administrators care?
So what is GDPR, and why should your customers care?
So what is a Data Protection Impact Assessment and why should organizations care?
This data mapping blog post was first published on Dataversity on 26 April 2018
Was this article helpful?