The challenges – and rewards – of cataloging and masking data at Republic Bank
At Republic Bank, we’ve spent the last five years and more on a mission to be the most technically advanced community bank in the US. With 1,200 employees and around $5+ billion in assets, we might not be the biggest bank, but we like to see ourselves as one of the most innovative.
It’s strategically important to us to stay ahead of the curve and deliver value to the business and our customers quickly with pioneering apps and services that make banking easier. To meet that demand, 10% of our people are in IT and we have over 1,900 databases and more than 100 production servers.
That’s the sales pitch, if you like, but I’m the first to admit it’s not all plain sailing. Like every financial services business we have to balance our desire to adopt the latest technology or practice with the need to comply with regulations and keep our customer’s data absolutely safe.
Those regulations aren’t static either. They’re updated or new ones are introduced and sometimes they bring with them unexpected challenges that affect the way we gather, process or store data. Two really good examples are the General Data Protection Regulation (GDPR) that came into effect in the EU in 2018, and the recent California Consumer Privacy Act (CCPA).
Both require businesses to identify and categorize the Personally Identifiable Information (PII) they hold and put in place measures to protect sensitive data. With so many databases and servers, we’d been interested in some type of data classification and cataloging process for a while but the GDPR and the CCPA were the driving force that made finding an enterprise-wide solution even more important. We also had to do it quickly. Fortunately, the solution was already in plain sight.
Cataloging data
We were already very familiar with Redgate’s database DevOps tools at the bank. We’d invested in the SQL Toolbelt a couple of years before to resolve problems we were having with database deployments, and we were in the process of introducing SQL Provision, which automates the provisioning of masked database copies for use in development.
That was fortuitous in a way because the close relationship with Redgate prompted some of our people to take part in the beta testing of SQL Data Catalog when it was being developed. We knew it was what we needed, but we still had to get stakeholders across the business on board. It took a lot of conversations about how it would help us comply with data privacy laws but one of the deciding factors was when a third party auditor mentioned Redgate tools.
That was the kind of endorsement we needed and when we introduced SQL Data Catalog and started the classification process, three big advantages emerged.
Firstly, it provides automatic classification suggestions out of the gate so it didn’t take hours and hours for us to go in and make modifications to lots of columns. The tool identifies and tags those that contain data like names, addresses, zip codes, credit card numbers, and we only had to look at the ones that contain data specific to our particular requirements. With so many databases and servers, that was a big win for us and it allowed us to move across the business units quickly.
Secondly, as we rolled out the classification process, it helped us to identify who the data stewards are, the ones who are responsible for and care about the data for their part of the business. That meant we could explain why the data was important, why it needed to be classified, and which columns needed to be masked. We got buy-in from everyone we talked to and we’ve built strong relationships with them.
Thirdly, and going back to the third party auditor who first mentioned Redgate tools, everyone now knows right across the business that when our internal processes are audited in future, we’ll have everything that’s needed to demonstrate how we protect data before the auditors even walk through the door.
Masking data
Like many businesses in the Financial Services sector, we’ve always had problems with providing data sets for use in development and testing. It’s an important part of database development because you want to test changes against data that is realistic, with the same referential integrity and distribution characteristics rather than anonymous data.
To resolve that, we’d already decided to invest in Redgate’s SQL Provision. It provides truly representative copies of the production database which are a fraction of the size of the original, with the sensitive data masked. It also integrates with SQL Data Catalog, which can provide the masking set necessary to protect sensitive data.
This has brought huge advantages to our development process. When there’s an issue or a problem with our production database, we can now spin up masked copies that look and work just like the original. Rather than spending hours troubleshooting issues to find the cause, developers can usually find it in minutes.
Perhaps more importantly, that appeals to our Chief Information Officer as well as our developers. He knows that when we use data in development environments, all of the PII is masked, no sensitive data is revealed, and there’s nothing to worry about.
That’s not the end of the story
We’ve already achieved significant gains from cataloging and masking our data, but I’m looking forward to the next stage in our journey. A new feature has just been introduced that integrates the workflow between SQL Data Catalog and the data masking capability of SQL Provision. This will be another game changer for us because it automatically masks new data that is sensitive at the point when it’s classified and catalogued. That will make the challenge of complying with regulations like the GDPR and the CCPA even easier.
Chris Yates is a Vice President and Director of Data and Architecture at Republic Bank in the US, with over 19 years of experience in the SQL industry. A Microsoft MVP and Friend of Redgate, his experience includes the design and implementation of OLTP and OLAP solutions as well as the assessment and implementation of SQL Server environments for best practice, performance, and high availability.
To find out more about cataloging and masking data, visit our solution pages online.
Tools in this post
Data Masker
Shield sensitive information in development and test environments, without compromising data quality
SQL Provision
Provision virtualized clones of databases in seconds, with sensitive data shielded