Introducing AI-Enhanced Data Generation to Redgate Test Data Manager

We’re excited to reveal our latest effort towards simplifying and accelerating the test data management process: AI Synthetic Data Generation, part of Redgate Test Data Manager. 

Officially introduced in a session at the recent PASS Data Community Summit, the capability uses machine learning to rapidly generate realistic yet entirely synthetic data – all while maintaining data integrity and with data privacy built-in as priority.  

It’s a culmination of what you – the customer – want to see, and the efforts of our dedicated test data management engineering team here at Redgate. We’re now looking for participants in the beta program to help shape the future of the tech, and you can sign up now. 

In this article, I’m going to explain more about the new capability, from concept to execution, and how it will benefit you and your organization. 

How it all began 

Here at Redgate we keep our ears to the ground. Through listening to our customers and tracking the results of our annual State of Database Landscape Survey, we discovered some pain points were beginning to emerge in the test data management process. Just by looking at an early sample of our latest survey results, we discovered that: 

  • Data isn’t being refreshed regularly; 50% of respondents take a month, or longer, to refresh their data 
  • Valuable time is being lost through data distribution; 58% of data teams have to manually supply test data to others in their organization 
  • 54% of survey respondents are using a full-size backup of production, with big, unwieldy databases being used to test

These findings have been corroborated through discussion with our customers. They’ve shared with us the struggles of hand-rolling their test data and trying to balance that with privacy, with comments such as: 

“The process for generating and using data is ad-hoc” 

“It must not be possible to recognize a client from the test data” 

With all this in mind, we asked ourselves the simple question: “how can we help our customers move faster while overcoming these manual process challenges?”

And, at the same time, “how can we ensure privacy is maintained every step of the way?”

In other words, we needed to find a way of bridging the gap between process and privacy. 

From our research, we picked out three key priorities to focus on – the things you all wanted to see:

  • Simplicity (“we need something that works without complexity”)  
  • Remove manual work (“too much time is spent on repetitive tasks”)  
  • Privacy first (“we need to ensure data security at every step”)

 These prerequisites formed the basis of our early planning…

The initial concept 

At first, we were drawn to going with exclusively rules-based data generation. After all, it’s a tried and tested approach and offers advantages such as control and predictability of data, built-in datasets to help test new applications, and excellent privacy preservation (with original data being replaced with brand new data as it’s generated).  

‘Great!’, we thought…but we don’t just rest on our laurels. We knew we could go one step further. 

‘How can we make it even better?’ 

We always have one eye on the future, so the potential of using AI/machine learning to help us bridge that gap between process and privacy was not just a pipe dream. We knew that, by asking the right questions and bolstering our already strong foundations by hiring a group of machine learning experts, this technology could open new doors for us and our customers. 

So, that’s exactly what we did, and we got to work on answering the burning question: can we use machine learning to understand and generate realistic test data, while maintaining its integrity? After all, we want the data to still look like production data, factoring in the likes of patterns and relationships.  

“As we enter this new era of AI-assisted database management, Redgate is committed to using AI’s potential while staying true to our core mission: making database work easier, safer, and more efficient.”
Jeff Foster – Director of Technology and Innovation at Redgate​ 

Finding the right balance 

As we set to work on realizing our ambitions, we also looked to address some of the concerns we’d seen around AI and machine learning.  

Early data from our latest State of the Database Landscape Survey and discussions with customers told us that 61% are concerned about data security and privacy. However, 42% are still keen to use it and 15% are already doing so. 

Furthermore, 84% of those already utilizing AI or machine learning in our sector have already seen improvements in productivity.  

Privacy is always top of mind at Redgate, which is why we made it one of our three priority points in developing this new capability: privacy-first, and the need to ensure data security at every step.  

To alleviate any concerns, we came up with a rock-solid solution which means your data stays yours, at all times. That’s just one of the many leading features of our final concept, which we’re immensely proud of. 

The final concept: the best of both worlds 

With AI Synthetic Data Generation in Redgate Test Data Manager, we’ve combined both rules-based and AI-based approaches to form this new capability. Whether you want to use just one approach or both, is up to you – we allow you the flexibility.  

Image showing a basic overview of rules-based and AI-based

Rules-based or AI-based, the choice is yours

This was the best solution to help us achieve the three key objectives we mentioned earlier: simplicity, remove manual work, and privacy first.  

The latter was an absolute priority for us so, with AI Synthetic Data Generation, the data you input is only ever used by your local version of the capability and never leaves your site.   

Additionally, your data is never seen or used by us at Redgate or any other organizations using the capability. There are no third parties, your data stays yours (everything lives in your environment), and no internet connection is required. 

This push for enhanced privacy also helps businesses that can’t use production data due to sensitive customer information. Our technology can generate entirely realistic, but completely fake, data based on patterns, relationships, and distribution in your own datasets.   

But what if you don’t have any data available? Not to worry – you can write yourself a small set of data, then use AI Synthetic Data Generation to create the rest.  

So, in summary: 

Won’t this be expensive to run? 

No! We’ve designed AI Synthetic Data Generation to be run by CPU, not GPU, so there are no extreme costs involved in set-up. You can run the functionality on local machines, at a budget to suit you and your organization.  

Join the beta program today 

We’re looking for participants in our beta program. Here’s what’s involved: 

  • Shape the future for Redgate tech: Be part of this select group to use this capability before it’s released to the public. Your feedback is a critical part of this process and you’ll influence the product to better suit your needs. 
  • What we need from you: Permission from your organization to use products in beta. You’ll also need to be using SQL Server, MySQL, or Postgres on-premises. 

Redgate’s Disclaimer: Registering your interest in the AI Synthetic Data Generation beta program does not create an obligation on you to join. Please note, registering also does not guarantee a place in said program. Redgate excludes any liability, warranty, condition and/or representation arising out of the registration of your interest. 

By registering you also grant permission and consent to be contacted by Redgate in order to share further details and updates about the program. 

Join the beta program today

Want to try out the latest Redgate technology? Join our AI Synthetic Data Generation Beta Program!

Learn more and join the program

Tools in this post

Redgate Test Data Manager

Reliable and secure test data provisioning

Find out more