20 June 2018

Database subsetting. What it is, what it’s for, and why we’d love to hear from you.

We’re looking into database subsetting, and if you need it or are investigating it, we want to hear from you. Please share your thoughts in the comment section below.

What is database subsetting?

Database subsetting is creating a copy of a database that contains only a portion of the data, while still being referentially intact.

For example, suppose you have a table in a database containing millions of customers, but you only want to work with a database that contains thousands. Simple enough, just copy over a fraction of the rows, right?

But then suppose you also have an orders table that contains references to the customers. You’ll need to take only those orders that are related to customers in your sample. As your schema grows more complicated, so does the process of taking a valid subset of the data, and the need for automating the process becomes greater.

Why would I want a subset of my data?

We think there are many reasons organizations need subsets of data for development work, and right now we’re looking at three use-cases:

• Testing, development and training environments – developers often want to work with real data, but providing a copy of your entire production database would be too costly in time and disk space, especially if you need to provide a copy for each developer. Subsetting allows you to work with data that contains all the necessary links between tables for your programs to function, but for a fraction of the cost.

• Multiple organizations or departments – you may have a database that contains data for a number of different organizations or departments, and providing the whole database would expose data to individuals without the correct permissions. Instead, you can provide a subset containing just the data that is relevant to a particular organization or department.

• Old data – you may want to discard a portion of your data based on some criteria. For example, you might want to clear out data older than a certain date, or data which you can no longer keep in order to comply with regulations like the GDPR.

What do you think?

Do you agree with us? Can you think of any other scenarios where you might need database subsetting? Are there other aspects of the use-cases we’ve outlined that should be considered?

We want to gain a deeper and broader understanding of the demand for data subsetting, so if your business has a need for any of the above or you would be interested in data subsetting for any other reason, please share your thoughts in the comment section below.

10 Essential Things to Know Before Diving into Database DevOps

In today’s rapidly evolving world of development, Database DevOps is becoming an essential practice. It combines the agility of DevOps with the intricacies of databases, all with the goal of enhancing speed, stability and collaboration when it comes to database changes. However, before diving into Database DevOps there are some key concepts your team... Read more

18 August 2025

18 August 2025

Blog

Built for simplicity and scalability: why organizations choose Redgate Flyway

Redgate Flyway is renowned for its low-friction, SQL-first approach. Our customer stories share the wins seen in organizations worldwide for Redgate Flyway. And when it comes to community endorsement for Flyway OSS, the GitHub and Docker statistics prove it, with 9k+ GitHub stars and 50M Docker downloads advocating for its appreciation. Built with simplicity... Read more

Roseanna Mumby

9 June 2025

Roseanna Mumby

9 June 2025

Blog

What’s new in SQL Prompt Version 11.0

We’re excited to announce the release of SQL Prompt Version 11.0. SQL Prompt Version 11.0 adds support for SQL Server Management Studio 21 (SSMS 21) and includes compatibility with Azure Synapse Dedicated SQL Pools—building on our existing support for Azure Synapse Serverless SQL Pools. Read on to find out more about this new support... Read more

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

Redgate Blog

Database subsetting. What it is, what it’s for, and why we’d love to hear from you.

What is database subsetting?

Why would I want a subset of my data?

What do you think?

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

What is database subsetting?

Why would I want a subset of my data?

What do you think?

You may also like