Classifying the data within an organization is not just something nice to do. It’s critical for complying with regulations such as The General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), The Health Insurance Portability and Accountability Act (HIPAA), and a host of other privacy laws enacted throughout the world. By classifying data, you can understand, from a high level, what needs to be protected, where are the risks, where the data comes from, and where it flows throughout the organization. If you don’t know what data your organization stores, you can’t protect it.
Many companies make the mistake of thinking that a spreadsheet is a good enough tool for classifying data. Collecting the data – or metadata if you will – in this way means that the process ends up being completely manual, and it’s also tedious and difficult to keep up to date. Organizations often have tens or hundreds of database servers with databases containing thousands of potentially sensitive columns that are affected by one or more of the regulations. Classifying data this way can be overwhelming.
When you need to classify SQL Server data, the smart thing to do is to find a cataloguing tool designed specifically for the task and that can do most of the work for you. Built in suggestion features and bulk actions can find and classify up to 70% of columns. In an ERP system, for example, there could be thousands of unused tables. A bulk action could automatically mark those tables as non-sensitive. Any large string data type columns that are in use could be automatically marked as sensitive. Once the suggestions and bulk actions are completed, the remaining tables will take some work, but it’s a much easier problem to solve. Automation can shorten a classification project from months to weeks!
It’s also vital to know where the data comes from and where it ends up. Is there an ETL process that copies personally identifiable information from a production database to a data warehouse? Do exports of data end up in Excel Spreadsheets or Access Databases? While this type of data movement may be expected, one aspect that might be overlooked is copies of production databases for created for development, quality, or testing purposes. Sensitive data must be masked in these downstream databases to comply with regulations. A classification tool that also integrates with a masking tool can make this seamless.
Once the data is classified in the initial effort, it’s critical to keep the catalog up to date. What happens when new tables or columns are added? A classification tool that can be automated through PowerShell can discover and flag those new columns during the acceptance phase of the pipeline or through email notifications. That way, the catalog stays up to date and evergreen.
Of course, every industry and organization have different requirements. A classification tool should be flexible enabling you to control the taxonomy and rules and change as your company changes or new regulations are introduced. The tool should also have auditing capabilities so that you can show how the data was classified yesterday, six months ago, or last year.
Redgate’s SQL Data Catalog is fully automated with a REST API and PowerShell cmdlets. You can customize it to fit your organization and the regulations that apply. It also integrates with SQL Data Masker to automatically create masking sets for masking sensitive data for development. In fact, SQL Data Masker has a new interface for integrating with your data catalog so that you can automate masking without needing PowerShell skills. Together, these tools make a complete DPP process.
SQL Data Catalog makes the overwhelming job of classifying data and keeping it up to date manageable.