Protecting Data During Development

In the years before joining Redgate, I was a SQL Server database consultant. I spent a lot of time doing SQL Server health checks, query tuning, and ETL (extract, transform, and load) development. Every customer was different and getting to the resources I needed to do the job at each shop varied as well. I usually had to remote into a server or virtual workstation and work with a development database.

Providing databases for developers and database professionals is not easy, and there are several decisions to make. You must decide if all members of the team will share a dev database or if each member will have their own copy. Will these databases reside on a development server or will they live locally on developer workstations? You may also have to provide databases for QA and test. Even having enough disk storage for all these databases, often including multiple versions and backups, can be a challenge.

The data in the non-production databases should resemble production data. Imagine a query that runs blindingly fast on 100 rows of test data but is miserably slow against production tables where millions of rows are queried. Often the simplest thing to do is to give developers backups of the production databases for development. This becomes impossible as databases approach terabyte sizes, but there is another issue. Copies of production databases could contain sensitive information that should be protected like names and addresses, health insurance claims, or credit cards. The developers should not see this data, and nightmare scenarios like lost laptops holding private data happen all too frequently.

Private data given to developers must be obfuscated in some way, and it must be obscured enough so that identities can’t be figured out by looking at clues in the data and putting the pieces back together. And protecting this data is not just a nice thing to do. Regulations like GDPR and HIPAA as well as new or proposed legislation in the individual US states have steep penalties for failing to protect private data. Database administrators must have processes in place to protect sensitive data and to make sure that it does not make its way onto non-production servers.

I remember spending so much time pulling down backup files and restoring databases after every sprint a few years ago at some of my customer engagements. I never seemed to have enough storage space on my virtual desktops, either. It was quite the time-consuming juggling act. Luckily, today there are some tools that can help solve these problems like Redgate’s SQL Provision. The DBAs can create masking sets to obfuscate sensitive data, create an image, and then allow developers to provision their own clones. These clones take up a few megabytes of space on the developer’s workstation, and it just takes seconds to create the clone once the image is in place. Working with the clones is just like working with a regular database. Everything can be automated with PowerShell, saving time and protecting the data after every sprint.

Providing data to developers is not easy unless you have the proper tools. Technologies like database cloning and masking can ensure that developers have databases that resemble production but without the sensitive data and without taking up a lot of precious space or time. Protecting sensitive data is serious business, but having the right tools can make it less painful and more efficient.

Commentary Competition

Enjoyed the topic? Have a relevant anecdote? Disagree with the author? Leave your two cents on this post in the comments below, and our favourite response will win a $50 Amazon gift card. The competition closes two weeks from the date of publication, and the winner will be announced in the next Simple Talk newsletter.