Why test data management is becoming increasingly important to the C-suite

Test data management

We recently sat down with James Phillips, Senior IT Leader, to talk about test data management (TDM) and the growing attention it’s getting from the C-suite. It’s been prompted by the recognition that provisioning test and development environments with realistic production-like data improves the quality of code being developed, reduces errors, and deliver new features to customers faster.

But what do development environments look like without test data management in place, what challenges does that present, what are the real quantifiable benefits and, most importantly, how do you introduce it? Let’s find out from James.

Can you walk us through what a typical release process looks like for your teams when taking changes from test to deployment?

At a very high level, our developers write code, commit it to a source repository, and it’s deployed into a development environment where basic unit tests take place. The test coverage varies for each of the applications we use but for the most part there’s some very high-level testing around things like stored procedures.

There is some data management involved in this process, but its minimal. It’s very challenging, especially because we deal with five different platforms and the number of customers and use cases vary massively between them.

Once the code passes those initial tests, it moves into the QA environment. Even though at this point more data is introduced, it’s more from the application standpoint – entering data in the fields and checking its functionally flows the right way. The only way to test against data that is fully representative is to go into production, take a backup and restore a copy of the whole database. We’d then need to do some data cleansing and ensure certain operations like running payments don’t happen. It’s not really feasible for us because we have customers with multiple terabyte databases. We also run everything on Azure SQL Managed Instances which is a very expensive solution, so we can’t repeat the whole thing in our QA environment.

What are the main challenges you’re seeing with this process?

I would say it’s a couple of different things. Currently, especially in the lower development environment where they’re writing code, it’s a shared environment that everybody is working against. So obviously the data scenarios in there and what’s being captured are very small. The challenge involved in solving this is cost – how do you give developers their own environments without blowing your costs up?

In QA, we don’t have this challenge – we have multiple environments which means less sharing. We have about seven different client profiles that the QA team are performing regression tests on and running the same tests on every single one. However, the data in these QA environments doesn’t encompass all the different scenarios, and our biggest hurdle to this is the cost, space, and time that it takes up because restoring multiple terabyte databases isn’t the fastest thing in the world.

We have certainly had issues in the past which reached production because data scenarios weren’t handled or highlighted in those lower environments. I’d say on average, we get one to two of those issues on every single release. We’ve done a lot of work around improving release quality, so it’s not that the code doesn’t work, or the code is failing, it’s a data scenario that was unable to be tested properly in our QA environment.

Is there a way to quantify the impact of those challenges from a team, leadership and even a business perspective?

One of the things that we measure in each of our releases, whether that’s our monthly scheduled release or a non-regular release like a hotfix, is issues that make it to production. We log them and give them a severity impact with the severity scale going from one to five. It goes from a small piece of system functionality being unavailable up to the whole system being unavailable and variations in between. We then take all of those and factor in the severity and the revenue that it put at risk, to understand the financial impact.

That is a calculation that we do retrospectively after every release, and we make it available so the whole company is aware of the impact. The number of release issues is something we’ve improved dramatically over the past 18 months but last month, for example, we had two issues that made their way out and put about $50,000 of monthly revenue at risk.

What benefits would a test data management process or solution provide?

I think there are a few major, quantifiable benefits. Frequently when we have release issues, it’s because we didn’t have the data to produce the scenario in our testing process. There isn’t a proper test data management process with cleansed data inside our lower environment which is a really important aspect of this.

One of the things that we haven’t really talked about yet is also compliance. GDPR, PCI, SOC, ISO, you name it, we are subject to all of it. We’ve spent a lot of energy on compliance and are in the process of becoming ISO and SOC compliant. Right now, we’re getting near the audit process and this has a huge impact from a reputation and financial standpoint. Failing those audits or coming up with issues is a really big deal and has a direct correlation to our sales. We cannot sell to large enterprises without compliance in place. Test data management is a major aspect of every one of these regulatory frameworks. Being able to prove that you have sanitized data in your lower environments is crucial, and the solution today is writing a lot of custom scripts and hoping you catch everything.

I mentioned the reputational risk, and this is becoming increasingly important for organizations because if you have a data breach you may never recover from it. I think you are more heavily criticized for having a data breach in your lower environment than you are in your production environment. In the production environment, clients expect that data to be there, but they’re not expecting that data to be in lower environments.

When customers find out their data is part of the profile data in our QA instances, they obviously start asking questions. Why are you doing that? How are you protecting our data? How are you cleansing the data? We have to provide documentation on our processes and how we’re handling all those things. About a year ago, we had an incident where our data cleansing scripts didn’t catch something and an auto debit processed. We had to tell the customer it was because we copied data to a lower environment and we missed the cleansing. This was not a fun conversation and it had a direct financial impact as we had to give them a credit off their monthly bill to repair the reputational damage.

As a CIO, how does something like test data management become a priority for you?

Compliance is probably the biggest reason. I also oversee the infrastructure teams, so all my teams are stakeholders in implementing a test data management solution. The DBA teams are thinking what is this going to do to us and how much overhead is this going to create? Meanwhile, the infrastructure teams are asking how will this affect the network? From my standpoint, I have to factor in the impact a solution will have on all of those stakeholders.

Ultimately though, when I’m scoring a solution and figuring out what to go forward with, I’ve got to check off a lot of boxes first, and many of these are around compliance. How do I securely move that data? How do I create a reliable and repeatable process to find data that needs to be sanitized?

We try to create safeguards and front-end applications have certainly advanced over the years, putting controls into place, but it doesn’t catch it all. Sensitive data still makes its way to the database layer of development and, once it’s there, it becomes my team’s problem. There are things that writing a generic SQL script can fix such as masking the email field or deleting emails, but they can’t handle every scenario. Trying to write traditional scripts to do that, honestly, is a huge performance and time problem. It takes forever to get that database presentable for testing.

What advice do you have for companies that are struggling in this area and want to improve their test data management processes, and DevOps practices as a whole?

I think the biggest thing, if you’re not particularly senior in your role, is getting top-level leadership buy-in. I’m one of the executives and have responsibility for our infrastructure, DevOps, and database management team. So in our scenario you’d already have my support, as improving our test data management and DevOps processes will make the life of my teams – and my life – easier. I appreciate it’s not like this everywhere though and it can be a constant uphill battle, with criticism about how much a solution costs and how long it’s going to take to implement.

Once you have that top level buy-in, you need to expand this to every stakeholder that’s going to be involved. If one person or one part of the organization doesn’t buy into it, they will always look for a way around it. That’s been my experience when the product, engineering, or developers were not all bought into a solution. You’ve got to explain the importance of it, you need to find those key people in those groups to champion the changes, and they need to continue to champion it, so that it just becomes a regular part of what they need to do every single day.

I say this about DevOps and test data management but it’s applicable to any large changes you’re trying to implement. It’s a proven methodology that, if you follow, will help the success of implementing and getting the buy-in for a solution. It will also help limit how much you need to justify the cost of the tool. When I get that question around costs, I always come back with, ‘Here’s what it would cost if we didn’t do it and we had a breach. Which bill do you want to pay?’.

As you start, you’re probably going to need to work at selling it within your organization. Some will be more receptive, while others will worry what the impact will be on the engineers and if it’s going to slow them down. Initially the answer to this is yes. It probably will slow them down as they’re learning the process but once they’ve learnt it, they’re actually going to go a lot faster. We certainly follow the crawl, walk, run scenario and our end goal always is to increase speed and efficiency with automation.

My advice for anyone considering working on their test data management process, is you’ve just got to do it. Stop talking about it, and dive in. Once you do, I’m highly confident you will not regret it. The value test data management brings to your organization, whether we’re talking from a compliance or customer confidence standpoint, is huge. It’s hard to put a dollar value on, but when you look at the dollar value of not doing it, and being faced with the consequences of what can happen, it’s much worse.

If you’d like to follow James’s advice, find out how you can introduce, streamline and automate test data management workflows while keeping data safe across PostgreSQL, SQL Server, MySQL and Oracle databases. Discover more about Redgate Test Data Manager.