Data classification: understanding and protecting your data

A data discovery and classification research project from Foundry

In Foundry, we’re responsible for developing new products and technology to support the changing needs of our customers. We’ve seen a huge shift in our customers’ needs: driven by new and constantly evolving regulations, there’s an ever-growing demand on their time for data governance tasks.

As a result, we’re developing prototype software in SQL Data Mask (to de-sensitize databases) and SQL Census (to help you explain your user access permissions to auditors).

Both of these projects got us thinking; how do you know what’s sensitive enough to need masking or restricted user access?

What is data classification?

Enter ‘data classification’, the process of fully understanding your data so you can protect it accordingly. This process is often considered foundational, given its impact on higher level projects (eg. obscuring sensitive data and reporting on user access permissions).

1. Discovery and classification

The generally accepted process starts with the discovery and classification of data. Classification is dependent on understanding the content, context and the users of the data. The classification categories will largely depend on the data you hold and the risk to your organization that it carries.

2. Definition of responsibilities

Once data has been classified, the process continues with the definition of responsibilities. At this stage, you’re adding metadata to the information that’s held within the database. It will be important later on to understand who created the data, who owns it, who’s using it and who’s responsible for it at audit time.

3. Addressing the data

The final step is to actually address the data, making decisions about what needs to be done with it.


Why go to the effort of classifying data?

On the face of it, a data classification project might seem comparable to spending the weekend ordering the books in your bookcase by their color (time I don’t regret spending), but it is a foundational project, and one which is likely to have a positive impact on the work that follows; becoming compliant with regulation, delivering reports during an audit, moving data around or just understanding your organization’s exposure to risk through the data it holds.      

These are the next steps we’ve already begun to cover with SQL Data Mask and SQL Census, but the implications of having gained a better understanding of the data your organization holds may be even further reaching for you.

So what’s the problem?

Running a data classification project may sound simple in theory, but we think that it’s easier said than done for SQL Server. We’re preparing to run our own data classification project and so far we’ve only managed to find tools and support for documents, file systems and emails. The support for data classification with SQL Server appears scant.

Finally, it’s likely that the classification project run will only cover a snapshot in time. New development projects will come and go and, over time, there’s no doubt that you’ll need to solve the problem again for any new data that appears. It seems sensible that a proactive approach would be better – classifying new data as changes occur. Building a proactive solution is difficult given the complexity of existing processes.

Tell us about your experience

Like every Foundry research project, we rely on your expertise and experience to help us understand the problem and begin to build a solution. Are you about to begin your own classification project? Have you recently completed one? We’d like to hear about it.  

We’ll always endeavour to compensate you for your time; be it with an Amazon Gift Card, paying particular attention to your requirements or even a free license of the software that’s developed as a result, so please sign up to participate in this data classification research project.

Sign up to the data classification research project