Your Test Data Environment: Build vs Buy – a conversation we need to have
After three decades of working with databases, one thing I’ve seen over and over is this: we don’t treat our development and test environments with the same respect we do our production systems.
Not because people don’t care. Far from it. It’s usually because teams are under pressure, everyone’s juggling multiple priorities, and the quickest path forward often wins the day.
Those lower environments usually contain copies of real customer data, though, and that makes them real targets for malicious actors.
Developers need realistic data to build reliable software. Testers need it to validate accurate logic and acceptable performance. And the business needs to stay compliant and avoid risk. That combination creates tension, and most organizations resolve it with what feels like the most straightforward fix.
Typical ways around this are anonymizing or generating data for test and development environments. Many organizations jump to tasking someone in the team to create this process. This typically isn’t their primary job, and a quick DIY script is chosen as the quickest, easiest option, without considering the potential issues of risk or governance.
Why DIY scripts seem easy… until they aren’t
When someone gets asked to “sort out the masking,” they’re usually trying to make progress quickly so they can get back to their real job. That leads to decisions like:
- Shuffling data around, masking parts of strings, setting all rows to the same value
- Reusing old logic in their masking scripts
- Copying and pasting scripts and forgetting to update all of them
The challenge is that databases keep evolving – new columns appear, new data types get introduced, the business adds new use cases – and unless masking scripts evolve at the same pace, they drift out of date quickly and quietly.
And then maybe the person who wrote them moves to another team. Or leaves the organization. Suddenly no one knows why something was masked a certain way or what the original intention was, and that’s when things get risky.
The outcome?
You end up with scripts that don’t account for new schema changes, or worse, scripts that break, aren’t a priority for fixing, and are not run in the future. And at that point, you’re effectively restoring production data into development with no protection at all – which dramatically increases the risk of regulatory penalties, or worse, reputational degradation from a data breach.
Synthetic data has its own limits
Some teams avoid masking altogether and swing to the other extreme: synthetic data.
There’s nothing wrong with synthetic data – when it’s done well. But most of the time, teams generate something small, simple, and not very realistic.
Developers then test with data that doesn’t reflect real-world patterns. Edge cases disappear. Performance bottlenecks stay hidden.
The problem isn’t the intent. It’s that generating high quality synthetic data is a craft. And most DBAs and developers don’t have the time to master that on top of everything else they’re responsible for. Both DIY approaches solve the problem for today – but rarely for tomorrow.
What a good, sustainable approach to TDM actually looks like
If we want to protect data and help teams move faster, the solution must be easy to use, flexible to handle a variety of situations, and simple enough for our staff to understand, and more importantly, maintain across time and personnel changes.
The right approach to test data should be:
1. Simple to pick up and simple to maintain
Teams change. Documentation gets lost. That is critical as comments in code are not enough. Documentation or automated processes that enable others to extend or apply the solution to new situations.
2. Smart enough to find sensitive data for you
We store sensitive data in more places than ever. No one can manually keep track anymore. Automatic detection – with the flexibility to override – is essential.
3. Capable of producing realistic, familiar data
Developers do their best work with data that “feels” real. Random strings and nonsense values slow everyone down.
4. Able to subset production safely
You don’t always need the entire production database. Often, a well-formed slice is enough – and it’s much faster to work with.
5. Built for modern databases
A solution also needs to work at scale, handling the complexities of modern databases. Varied data types, large volumes of data, referential integrity (declared or not), and a wide variety of datasets in different languages.
These are the main considerations for any solution that protects test data.
The Build vs. Buy decision
I’ll be the first to admit: you can build your own masking solution. Plenty of teams do. But much like a monitoring system, this can result in substantial ongoing investment of time and labor. Maintaining them? That’s the real challenge.
Our databases are always changing, and we often add new database platforms. Ensuring a masking solution works well across time and the entire database estate takes a great deal of effort.
That’s why a lot of organizations are moving toward paid for, vendor supported solutions. Not because they couldn’t build their own, but because:
It reduces overhead
It preserves institutional knowledge
It scales with the environment
It gives new joiners a fighting chance
It ensures compliant test data and reduces risk
We often build software because we can’t buy something that works in the way our business works. However, as we’ve just walked through, for some tasks – like masking test data in a compliant way – a paid solution is better for several reasons, and Redgate has just the thing.
Where Redgate Test Data Manager fits in
Redgate Test Data Manager was built for exactly these challenges. It helps teams:
- Discover sensitive data automatically
- Mask it with realistic values
- Retain referential integrity
- Subset intelligently
- Reduce PII exposure
- Scale with changes in the database estate
- Ensures that compliance is built in and not a bottleneck
If you want to see how it fits into your workflow, download the free trial and see it in action. It’s quick to set up, easy to learn, and takes the burden of DIY off your team’s plate.
FREE TRIAL
Try Redgate Test Data Manager
Get compliant test data in minutes not months.