8 March 2024

Redgate Test Data Manager: A Guide for the DevOps Manager

Redgate Test Data Manager will prepare, manage and deliver the test data required to support the full range of development and test activities for any database project. It allows for extensive and automated testing that leads to reliable, online deployment of database changes.

Most database projects struggle with managing their test data. Why does it matter? Inadequate access to test data reduces the productivity of the team, often obliging them to use a single, shared development database, where they work on one feature at a time, to avoid conflicts. This results in slower software delivery. Insufficient and poor-quality test data also results in less reliable testing, with reduced coverage, meaning lower quality software, with more bugs reaching the production system, and database deployment failures becoming commonplace.

Redgate Test Data Manager is designed to tackle these problems. It will allow an organization to implement a standardized and automated test data management strategy that works across a range of different database systems and scales smoothly to support even complex, terabyte-scale database systems.

What does Redgate Test Data Manager do?

It provides a standardized mechanism, using virtualized data containers, or clones, to deliver test data into any environment, quickly. When the test data originates in the production system, the delivery process will automatically incorporate data ‘de-sensitization’ or masking to ensure sensitive and personal data is removed. Equally, the users can choose to use purely synthetic test data, prepared using data generation techniques.

In either case, the team establishes a database, or a set of databases on an instance, at the right version and containing the required test data. Redgate Test Data Manager saves the database files as a ‘data image’. It can then deliver multiple copies, or clones, of the image, quickly and with minimal overhead. Each individual database clone is a fully functioning database, complete with the test data, but one that can be instantly created, reset or removed. The container automatically has the correct database software, operating system, and configuration.

All data images and clones are saved to a central test data repository and a team can share and manage all test data from a central dashboard. With appropriate permissions, and automation using the Redgate Test Data Manager CLIs and scripting, test data copies can be accessed in an automated, self-service ‘canteen’ system. They can also be delivered securely into automated DevOps pipelines.

What are the benefits?

By introducing effective test data management, everyone subsequently benefits. The Ops team benefit from a standard, secure and maintainable way to provide test data in all environments, safely. The developers and testers can ‘self-serve’ the test data they require, no longer facing the disruption and delay of preparing, refreshing or ‘resetting’ it. It allows them to work on databases, independently of each other, and to use test data in far more productive ways, increasing test coverage and effectiveness. It delivers realistic test data that allows the team to perform accurate and automated testing and validation of application functionality at every stage of a DevOps pipeline, without compromising data privacy or security.

When testing becomes embedded in the development and release processes, the integration of new features for deployment causes far fewer issues. As a result, managers appreciate the faster and more frequent delivery of higher quality software.

Preparation of test data

Having identified and documented the test data requirements, Redgate Test Data Manager aims to provide test data to support the full development cycle and deployment pipeline. It supports comprehensive test coverage, to ensure a database application always functions as expected. It also scales to support realistic integration, acceptance and performance testing, performed in CI/CD environments, even for large and complex database systems.

It provides several ways of preparing the test data.

Sanitized production database copies and subsets

Redgate Test Data Manager makes it possible to make representative copies of production data available to development teams, safely. It incorporates a data catalog to discover any columns containing PII or other sensitive information. It auto-classifies this data and then runs a data masking process that replaces it with synthetic values, while maintaining all the correct characteristics, relationships and distribution of the data. It delivers sanitized data copies as clones, with centralized control over where they are deployed, ensuring that only the relevant business processes, and users, have access to them. They can also set up regular, automated ‘refresh’ of the images while ensuring the data remains protected.

Redgate Test Data Manager also has a subsetting feature, enabling teams to work with a smaller but representative portion of the production dataset, maintaining necessary entities and relationships. This minimizes the exposure of production data, enhances efficiency in test runs, and accelerates development cycles. For example, developers can avoid delays when testing a series of changes to a business report that takes a long time to run with the full production data.

When the process of extracting masked production data is secure and automated, it cannot be circumvented and so is auditable. This makes it much easier for the DevOps Manager to balance the strict need to safeguard data against the legitimate desire of the organization to make the most effective use of it, for development and testing work.

Fully synthetic, generated data

Testing database functionality during development often needs synthetic data and standardized datasets. Test-driven database development requires constant unit testing and integration tests, and this relies on test datasets that provide full test coverage and ensure the application functions as expected across diverse inputs, including unexpected or unusual ones.

Redgate Test Data Manager allows developers to independently manage synthetic datasets for such tests. It features a data generator tool that imports the database schema and creates fake data for each column. Customization allows the generation of entirely synthetic data while preserving the essential characteristics of real data.

Test data delivery and automation

When developers and testers need to submit a ticket to get a fresh copy of the data, get approvals, and then wait days or even weeks to get it, productivity will suffer. With Redgate Test Data Manager this process is secure and automated, and it takes minutes, rather than hours or weeks. It supports automated delivery of realistic data, regardless of the size and complexity of the source database system, the flavor of database you need to test, or the volume of data you need.

Once the Data Ops team have established an automated process for delivering and refreshing data images, the development team can adopt a ‘self-service’ approach to setting up a realistic development and test environments. They can also establish a process to auto-provision realistic data for a continuous integration and delivery (CI/CD) pipeline, such as a Flyway DevOps pipeline, even for complex, terabyte databases.

The Redgate Test Data Manager Process

Self-service databases

For developers and testers, this means realistic dev and test environments can be created, and instantly reset, on demand and at a speed that fits easily into automated test-driven development. The database is no longer a shared resource that must be ‘protected’ once it is created. It moves even enterprise-scales databases towards being just another ‘standard resource’ that developers can create, import, use and reset, on demand. This enables a cultural shift in the team’s attitude and approach to database development and testing because it enables isolated and parallel work practices for the database and far more effective database testing.

Realistic environments, not just databases

Gene Kim (Phoenix project) identified “setting up environments” as one of the important issues that an effective test data management strategy must tackle. Redgate Test Data Manager makes it much simpler for the Data Ops team to provide realistic database environments for development work. It supports:

- Whole-server provisioning –create a data image for an entire instance of interdependent databases, rather than just individual databases.
- Accurate environment configuration – auto-configure the ‘virtualized’ data container in which the clones are delivered so that it reflects the production environment configuration as closely as possible.

With these features the team now has a test data delivery mechanism that will instantly create a test environment containing the right version of the source databases with the right data. These will be delivered in a data container running the right OS version, database software and having the correct instance and server-level configuration settings.

Putting realistic test data to work

Redgate Test Data Manager standardizes, simplifies and automates database environment setup for development work. It allows developers to self-serve dedicated database environments, promoting independent work and efficient testing.

Development teams can now split large developments into small, independently-releasable tasks. For example, when starting work on a new feature or hotfix branch, a developer simply selects the required ‘data image’ from the centrally managed data repository and the tool instantly delivers the right version of the database with the correct data, which they can work on independently, without the risk of compromising changes made by others. Using Flyway, developers execute migrations and create new features, and share updated test datasets in the Redgate Test Data Manager repository.

This quick test environment setup, with instant resets, means the team can conduct extensive testing early in the development cycle. Database testing becomes more efficient, with standardized environments for repeatable tests, rapid test sequence execution, and the ability to execute tests runs in parallel, such as to test two different implementations of the same feature. Use of production-like database environments in Continuous integration and delivery pipelines means early detection of data-related issues, improved overall software quality and increased confidence that production behavior will match that observed in testing.

Conclusions

Too many database developers are forced to adopt development practices that they realize aren’t best practice, often because the problems of creating and managing the large quantities of test data seem insurmountable. Setting up and tearing down test environments can be tricky even in application development but when the import of sufficient data can take hours, good test practices seem out of reach.

Redgate Test Data Manager uses established data virtualization technology and containers to drastically reduce hardware requirements and provide a system that simplifies setup and tear-down processes for test databases. It allows the team to use a simple ‘library’ or ‘canteen’ system for providing the test data sets, within images. From these images, they create a database environment for testing and development in the time it takes to drink a cup of coffee.

By having a central archive of data images, the task of preparing each set of test data, for each development task, or type of test, needs to be done just once, and the datasets can be subsequently shared. Whether the task is generating test data, carving out a subset of a database’s data, masking sensitive data or automatically finding personal data, Redgate Test Data Manager provides the tools for automating this process as much as possible.

If you’d like to know more, check out our Redgate Test Data Manager page here.

Tools in this post