Test Data Management and SOC 2 Compliance
Why SOC 2?
“SOC 2 evaluates whether an organization's systems and controls effectively protect and manage customer information” — Based on the AICPA Trust Services Criteria
The System and Organization Controls (SOC) framework was developed by the American Institute of Certified Public Accountants (AICPA) and SOC 2 is an independent attestation report based on AICPA's Trust Services Criteria (TSC). It is designed to help answer a simple question that any customer must ask of a service provider: can we trust you with our data?
SOC 2 compliance isn't a legal requirement in the way GDPR or HIPAA can be, but as more organizations move applications into the cloud and rely on SaaS providers, it has become a standard way for them to check that their chosen providers handle sensitive customer information safely. Many larger enterprises, especially in the U.S. market, won't engage SaaS, cloud, or B2B technology vendors unless they can provide a SOC 2 Type II report, which assesses not only that appropriate controls exist, but whether they operate effectively over time.
What SOC 2 Requires (Exec Summary)
The TSC cover Security, Availability, Processing Integrity, Confidentiality, and Privacy. The Security category is mandatory for every SOC 2 report, and it assesses an organization's system and data security controls against a set of Common Criteria (CC1–CC9) covering:
- CC1–CC5: The foundations of control
How an organization's controls are owned, managed and monitored. - CC6: Access control and data handling
Controlling access to systems and data; identifying and protecting sensitive data. - CC7: System operations
Monitoring for, detecting, and responding to security incidents. - CC8: Change Management
Ensuring all system updates are authorized, documented and properly tested. - CC9: Risk mitigation
Identifying threats and planning for operational resilience.
The other four categories are included only if they match your service commitments. If you handle proprietary business data or PII, then Confidentiality (C) or Privacy (P) apply to ensure secure handling and disposal. Similarly, Availability (A) may be included for specific 'uptime' or 24/7 availability promises, while Processing Integrity (PI) applies where you make commitments about the accuracy, completeness, and timeliness of data processing outputs (for example for financial transactions or data analytics).
Across all categories, SOC auditors don't just want to see the right technologies (encryption, access controls). They expect documented, repeatable processes for data handling and protection, and evidence those processes are followed consistently and kept under review. Evidence might include logs, reports, and change records showing controls are operating as intended and are updated as systems and requirements change.
The SOC 2 challenges that TDM helps solve for database teams
Production systems are often tightly controlled, but SOC 2 also expects customer data to be protected before it is copied, accessed, and reused outside production for development, testing, analytics, or AI workflows. Auditors will expect to see not only data protection and access control tools, but repeatable processes, and evidence that those processes operate consistently wherever this data is reused.
The test data must always remain useful for its intended purpose, without exposing sensitive information, so that it can be used safely to test changes properly and catch regressions before release, in support of SOC 2 change management requirements, or to run analytics or AI workloads that reflect real behavior.
Teams who rely on manual processes to identify, protect, and provision test data will struggle with these challenges. Manual approaches don't scale and often result in inconsistent protection: columns get missed and results vary between environments. They can also produce incomplete or unrealistic datasets, which makes it harder to test changes with confidence or trust analytical results.
By contrast, a TDM approach supports SOC 2 expectations by embedding controls directly into the test data lifecycle, while keeping data realistic and fit for purpose. It standardizes and automates the identification of sensitive and personal data, ensures that data is always protected before it moves outside production, and then automates data provisioning and cleanup as one controlled workflow, with traceable records of what was done, where data went, and when it was removed.
The rest of this article focuses on where TDM most directly supports SOC 2:
- Protecting data outside production – CC6 plus Confidentiality and Privacy criteria
- Supporting safe, realistic testing for change management – CC8
How TDM protects customer data outside production
"The entity identifies and maintains the confidentiality of information designated as confidential from its receipt or creation through its retention and disposal." (C1.1)
Woven throughout SOC 2's Trust Services Criteria is the expectation that confidential and personal data will be protected across its lifecycle. This means from creation, through retention and use, including reuse outside production, to secure removal when it's no longer needed for its intended purpose. In practice, that means an organization must first define and identify confidential/personal information (C1.1), and then apply controls that:
- Limit use of personal information to the identified purposes (P4.1)
- Retain it only as long as necessary and dispose of it securely (P4.2)
- Record and report unauthorized disclosures (P6.3)
- Classify sensitive information by its relevant characteristics (CC2.1 points of focus)
- Use logical access controls to restrict use of protected information, based on an inventory of information assets (CC6.1)
- Restrict transmission, movement, and removal of sensitive data, and protect it during handling (CC6.7)
- Test system changes while ensuring sensitive data remains protected (CC8.1)
This is a challenging part of SOC 2 compliance for any enterprise. Once customer data leaves production, different teams access it for different reasons, and safe handling controls become harder to enforce without a consistent, automated way to apply them across every environment. A TDM approach supports SOC2 expectations by embedding controls into a repeatable workflow for identifying sensitive data, protecting it before reuse, and controlling how sanitized copies are provisioned and cleaned up.
Automated and controlled test data preparation
Redgate Test Data Manager makes it practical to treat test data provisioning as code. Teams define what must be protected and how in version-controlled configuration, then apply masking and subsetting consistently through automated runs. The result is safe, realistic test data with a clear audit trail showing what rules were applied, when, and where the data was provisioned.
Identifying and classifying personal and sensitive data (C1.1, CC2.1, CC6.1)
In most database estates, sensitive data rarely comes neatly labelled. It is spread across many tables, often duplicated into reporting structures, and can hide behind obscure column names. It can also appear unexpectedly in free-text fields and attachments.
Manual classification is slow, labor-intensive, and error-prone. It also becomes unrealistic at scale, as data volumes grow and schemas evolve. A modern TDM approach automates discovery and classification as far as possible, using AI-assisted identification and a predefined taxonomy that teams can extend and customize to fit their requirements.
This gives teams a single source of truth for what data needs to be protected before any copies are distributed to non-production environments, supporting the expectation to classify information by relevant characteristics (CC2.1) and to apply access controls based on an inventory of information assets (CC6.1).
Consistent data protection before reuse (CC6, CC8)
Once sensitive data has been identified, SOC 2 expectations shift from visibility to control. Organizations need to restrict access to sensitive data (CC6.1), restrict its movement wherever possible (CC6.7), and protect it during development, testing, and change processes (CC8.1).
The 'old ways' of providing test data do not generally meet these compliance challenges. Copying live data to test environments creates PII exposure, even on a secure shared server, and relies on proper data handling practices from each team. Manual sanitization techniques that require writing and maintaining often-complex masking scripts are unreliable and don't scale well. These scripts can miss PII, produce different results across environments, and leave little evidence of what was protected and when. They are also brittle and hard to keep in step with schema changes.
A TDM approach replaces this with a controlled, repeatable workflow that produces a sanitized base copy of the data that can then be distributed for approved non-production uses. The data protection process will use one or both of the following techniques:
- Data minimization using Subsetting – minimal but representative datasets, excluding data that is not required for the intended use. This reduces both exposure and operational overhead.
- Static data masking – irreversible replacement of sensitive values while preserving referential integrity and realistic data distributions.
Simplicity and automation are what make this process maintainable and scalable. If a masking run needs a custom dataset for a specialized data type, or a column-specific generator to satisfy complex constraints, these should be added as configuration options rather than as manually-applied custom scripts that teams then must maintain.
When classifications and protection definitions are stored as simple configuration, tracked in version control, the process becomes easier to automate, easier to maintain, easier to adapt as systems change, and far less reliant on ad-hoc scripts.
Critically, it ensures rules are applied based on the same classifications each time data is provisioned, so protection is consistent across environments.
Controlled, auditable test data provisioning and cleanup (P4.1, P4.2)
SOC 2's Confidentiality and Privacy criteria reinforce the simple expectation that customer data must stay protected wherever it is used. Of course, this isn't only about anonymization; no compliant data protection strategy can rely on one technique in isolation. It also covers purpose limitation, access restriction, retention, and disposal, and SOC auditors will assess how well these controls hold up across every environment where customer data is used.
This is another area where manual processes struggle, because it is difficult to produce a clear record of what data was provisioned, where it went, who had access to it, and when it was removed. Auditors generally look for systemic controls that operate consistently over time, rather than relying on manual steps.
A TDM approach tackles this by treating provisioning and cleanup as part of the same automated and controlled workflow:
- Automated provisioning and cleanup rules. Define where sanitized datasets can be provisioned, how long they can be retained, and when they must be removed — reducing sprawl and supporting retention and disposal expectations (P4.1, P4.2).
- Strictly controlled access to data provisioning – restrict who can access provisioning/refresh workflows using centralized authentication (for example, OIDC).
- Traceable definitions and repeatable execution. Store classifications and masking definitions as configuration, and keep an audit trail of what ran, when it ran, and what rules were applied.
- End-to-end auditability. Record how data was classified, protected, provisioned, accessed, and removed, supporting the "ongoing operational effectiveness" nature of SOC 2 type II reporting.
How TDM supports SOC 2 testing expectations
"The entity authorizes, designs, develops or acquires, configures, documents, tests, approves, and implements changes to infrastructure, data, software, and procedures to meet its objectives" – CC8.1
Most database environments are changing at an increasingly rapid pace, driven by schema updates, application releases, patching, and configuration changes. Some changes are planned upgrades; others are urgent fixes in response to incidents.
SOC 2 expects all these changes to be tested properly, and it expects sensitive data to stay protected while that testing happens. That only works if teams can get test data that is safe to reuse and still behaves like the real thing.
This is where many test data strategies fail. If data protection produces datasets that are inconsistent, incomplete, or unrealistic, then testing becomes unreliable. The result is lower test coverage, more manual checks, and greater risk during releases and incident fixes.
A TDM approach will provide realistic datasets without exposing customer data. Relationships are maintained, data distributions stay realistic, and the same test dataset can be recreated consistently as changes are tested and retested.
Testing system changes thoroughly but safely (CC8.1)
A TDM solution such as Redgate Test Data Manager can provide test datasets required to support all the types of tests referenced by SOC 2, in CC8.1:
| Type of test | Example | Test data requirements |
|---|---|---|
| Unit testing | Validate a small change to a stored procedure or function | Small, purpose-built datasets that typically only need data that the object can reference directly |
| Integration and Regression testing | Verify an application workflow still works after a schema/API change | Realistic, immutable datasets where the result can be cross-checked by the business for validity. |
| User acceptance / QA testing | Validate behavior end-to-end before release | Production-like datasets that reflect real workflows and edge cases, without PII exposure |
| Patch and update testing | Test database engine or application patches before rollout | Representative data volumes and distributions so performance and query plans reflect production |
| Bug fix/validation | Reproduce a production issue, test the fix, and confirm no new issues | A dataset that mirrors the failure conditions, delivered safely and repeatably |
Supporting incident recovery and resilience testing safely (CC7.5, CC8.1)
CC7.5 expects organizations to have a documented incident recovery plan and to test it on a periodic basis. These recovery tests are only meaningful when environments reflect real data volumes, relationships, and workload patterns.
A TDM approach supports this by providing production-like datasets that are safe to reuse, so teams can rehearse recovery procedures and validate outcomes without distributing raw customer data into non-production. It also supports the broader CC8.1 expectation that changes and recovery procedures can be tested safely as part of development and change processes.
Conclusions
SOC 2 is not just about whether an organization has documented controls for protecting customer data. Auditors look for evidence that those controls operate consistently over time, meaning that they are actively maintained, and built into everyday database work, rather than relying on manual intervention.
They will also expect to see those controls extending to any copies or derivatives of that data reused outside production, for development and testing, incident fixes, and analytics. This is where a TDM approach provides a consistent, automated way to meet these expectations. It will:
- Create safe, representative test datasets through data masking and subsetting
- Provision and refresh test data through an automated workflow, coordinated through version-controlled configuration so that data access, movement, retention, and cleanup can be enforced consistently across environments.
- Maintain traceable evidence that proves how sensitive information is protected throughout its lifecycle.
With a TDM approach in place, teams can test changes more thoroughly using realistic data, without risking exposure of customer information. That improves the quality and repeatability of testing while supporting SOC 2's expectation that sensitive data stays protected throughout its lifecycle.
FREE TRIAL
Try Redgate Test Data Manager
Get secure, compliant test data in minutes not months.











