Where’s the money? The ROI of test data management
You may have heard of test data management (TDM). It’s part of the software delivery process – some would say a crucial part, involving the creation, management, and maintenance of environments for software development and testing. By provisioning fresh, production-like data, it allows developers to test their proposed changes early, thoroughly, and repeatedly with the right test data, when they need it and where they need it. All of which results in robust, reliable database deployments, fewer bugs, and shorter lead times for changes.
It’s not new either. As far back as 2016, the definitive guide to DevOps, The DevOps Handbook, outlined the four constraints that typically cause blockages in software development. And the first? It’s environment creation, about which it says:
We cannot achieve deployments on-demand if we always have to wait weeks or months for production or test environments. The countermeasure is to create environments that are on demand and completely self-serviced, so that they are always available when we need them.
Once those test environments are in place, the blockages are removed and software development teams see immediate advantages in four areas.
Firstly, the speed and efficiency of testing goes up. The ready availability of up-to-date, realistic copies of production databases that are truly representative of the original enables developers to accurately validate the performance impact of their changes. Quite simply, the testing works, and works faster, so changes can be released sooner.
Introducing a dedicated TDM solution typically saves a minimum of 15% of developer time by providing them with dedicated development environments, and by streamlining testing processes with improved test data. DBAs aren’t left out either. They can expect to save 10% and often a lot more of their time by automating the provisioning of development and test environments with the personal and sensitive data already sanitized and masked.
The result? Developers get six hours and more of their time back every week to spend on the task they like the best: writing quality code. And DBAs? They’ve got at least four more hours a week to optimize database performance, ensure availability, and keep on top of data security.
Secondly, and as a direct result, the quality of coding improves, with errors being caught earlier in the development process so that they can be corrected long before they cause problems. This is often called shift-left testing and is a major contributor to reducing change failure rates in deployments and the expensive reworking of code.
Thirdly, there are further savings to be gained beyond testing sooner, releasing changes faster, and stopping errors reaching production environments. With automation in place, the time required to provision database copies for use in development and testing falls from hours to literally seconds. When virtualization technology is used to create those copies and make them a fraction of the size of the original, there are also big infrastructure savings, whether on-premises or in the cloud.
Finally, and the icing on the cake for many organizations, data is more secure. While data can be shared more widely, with more developers, in more ways, security can be built into the TDM process. Personally Identifiable Information (PII) can be protected by using identification, classification, data masking and anonymization so that compliance with any relevant data protection legislation is supported. And with sensitive data automatically classified and de-identified as part of the provisioning process, a demonstrable audit trail can be provided.
By default, it also mitigates the reputational risk and financial penalty of a data breach, the average cost of which is now $4.45m according to IBM’s 2023 Cost of a Data Breach Report.
The signs you need TDM
There are lots of gains to be made from introducing TDM but another measure of the ROI it will bring is the pains it will resolve. Typically, these are around the Achilles’ heel of software development, change failure rates, along with pressures on DBA and developer time, issues around providing storage space, and the safeguarding of data.
Problems with deployments
Software development is complicated, errors can creep in, and deployments can fail. The benchmark is the change failure rate, frequently used as one of the measures of software delivery performance. In the 2023 Accelerate State of DevOps Report, for example, the highest performing teams have a change failure rate of 5%. For just under a third, the change failure rate is 10%, while another third sees this rise to 15%. For the lowest performing 17%, meanwhile, it jumps to a surprising 64%.
Where organizations are on this scale depends largely on the quality of the code being written and the veracity of the testing it undergoes. As we’ve seen, TDM improves code quality by providing developers with testing environments that look and behave like production, highlighting problems when they are writing the code, not when it hits production
Inefficient development practices
Make no mistake – developers do want to write good code, and DBAs do want to provide them with the resources to do that like accurate, up-to-date copies of the production database. Internal processes and practices can hold them back, however.
Provisioning a single copy of a large production database for use in testing can be long and laborious, and developers can often wait for hours, sometimes days or even weeks for their own copy, delaying development, slowing releases. To avoid this, a shared development environment is a common workaround but this causes knock-on issues when a proposed change from one developer breaks the environment and it needs to be refreshed for every developer.
TDM changes this by streamlining and simplifying the practice of provisioning development and testing environments, automating processes to simplify them and save hours of manual effort. It also means those fragile shared environments are replaced with dedicated environments for each developer so that they can develop and test faster without stepping on each other’s toes.
Surgical Information Systems used Redgate technology to create virtualized database copies that are a fraction the size of the original, and can be created, updated and refreshed in minutes. This saved a minimum of 12 hours a day across all of its team, equating to savings of $268,320 per year1.
Infrastructure constraints
One of the more practical issues when provisioning testing environments is the physical disk space they require. Production databases can reach tens of gigabytes in size, sometimes a terabyte or more, and providing copies to multiple developers adds up to a lot of extra disk space very quickly.
This constraint, incidentally, is the same whether databases are on-premises or in the cloud. On-premises, it comes down to the size of the installed physical infrastructure. In the cloud, most organizations don’t have the footprint to replicate large production databases in development and testing environments and to do so would be prohibitive in terms of the costs involved.
Here, the virtualization technology used in solutions like Redgate Test Data Manager reduce the size of a database for use in development and testing environments by up to 99% and remove the constraint.
Using Redgate technology, South Africa’s leading general insurer Santam delivers sanitized production data to its development teams in seconds. The teams can self-serve reliable, up-to-date data environments 720 times faster while also saving 95% of storage space.
Concerns about compliance
Lastly, there are the thorny issues around protecting personal and sensitive data. Redgate’s State of the Database Landscape survey showed that 43% of organizations use a full-size backup of the production database in development and test environments, and 28% use a subset of production data. It also highlighted that data masking and anonymization is used in only 35% of development and testing environments.
The best approach is to classify and mask personal data during the provisioning of database copies. Redgate Test Data Manager, for example, was developed from the ground up to resolve the issue by automating classification and masking, making compliance a natural, always-on part of the development process.
Conclusion
TDM is one of the key ways organizations can deploy changes on-demand while also doing so reliably and safely, and reducing their change failure rate. And the ROI to be gained does not just include the dollar amount to be saved. We’ve seen, for example, that Surgical Information Systems saved over $250,000 a year. As Ryan Burg, DevOps Manager, commented:
Faster feature release is certainly one benefit. Others have been standardized environments, reduction in management and maintenance overheads, and improved efficiencies.
There is also the reputational risk to be considered because, by default, a good TDM solution will classify, mask and anonymize sensitive data used in development and testing. I like how James Phillips, Senior IT Leader, puts it in his blog post, Why test data management is becoming increasingly important to the C-suite:
The value TDM brings to your organization, whether we’re talking from a compliance or customer confidence standpoint, is huge. It’s hard to put a dollar value on, but when you look at the dollar value of not doing it, and being faced with the consequences of what can happen, it’s much worse.
The ROI will also change for different organizations. For some, the physical infrastructure savings alone that come from virtualized environments are enough. Others will see improvements across the whole of the software development and database management process. And for those where compliance is a concern, TDM makes it an integral and demonstrable part of that process.
To find out how you can introduce, streamline and automate TDM workflows while keeping data safe across PostgreSQL, SQL Server, MySQL and Oracle databases, discover more about Redgate Test Data Manager.
1 Using data from the US Bureau of Labor statistics, the average annual salary of a software developer in the US is $132,930, with benefits accounting for an additional cost of just under 30%. This results in an average annual cost per developer of around $172,000. Assuming the number of working hours in a year is 2,000, the hourly cost of software development is $86. With a 5 day working week and 260 working days a year, Surgical Information Systems saved $268,320 per year.