How to find the PII hiding in your database
You’ve masked your database. You’re confident you’ve covered all your PII. But not all of it lives where you’d expect it to.
AI Classification, Redgate's data scanning capability, automatically catches sensitive data before it reaches your developers, your test environments, or your auditors.
The column name problem
Classification tools work when columns are named for what’s in them, for example when they’re called FirstName and DateOfBirth. They don’t work with names like ‘col_7’, which actually contains dates of birth, or the ‘Notes’ column where someone's been pasting customer home addresses for years.
This isn't carelessness, it's the normal state of any database that has been developed over time. Schemas are built by different people, often without documentation. Columns get repurposed. Fields get renamed. Systems get inherited. Over time, the gap between what the columns are called and what they have in them grows.
That gap is what GDPR, SOX, and HIPAA auditors uncover. It’s the difference between what you thought you masked and what you actually masked.
What AI classification does differently
Redgate Test Data Manager already classifies columns using metadata like names and data types. AI classification, adds an extra layer of data discovery and classification on top. It samples your actual data and uses machine learning to identify what each column contains. A column full of values that look like home addresses gets flagged as addresses, whether it's called ‘Address’ or ‘misc_field_3’.
Column name rules cover the tidy parts of your schema. Data scanning with AI classification covers the rest.
AI classification vs column-name classification
| Column-name classification | AI classification (data scanning) | |
| Identifies FirstName, DateOfBirth | Yes | Yes |
| Identifies col_7 containing dates of birth | No | Yes |
| Identifies Notes containing pasted addresses | No | Yes |
| Requires correct column naming | Yes | No |
| Runs locally, no cloud dependency | Yes | Yes |
AI that runs in your environment – Your data stays yours
AI classification runs the ML models inside your environment. Your data stays on your network throughout, with no external API calls and nothing in transit. For teams in financial services, healthcare, or government, local execution is usually a procurement requirement before any tool that touches sensitive data under GDPR, PCI-DSS, or HIPAA can be approved.
From discovery to protection
When Redgate Test Data Manager identifies sensitive data, it recommends a masking rule that preserves referential integrity across your schema. You review, adjust where needed, and start masking. No spreadsheet exports or manual column mapping.
Find and protect the sensitive data your column names miss. Try Redgate Test Data Manager.
Frequently asked questions
How do I automatically detect sensitive data in my database? AI classification in Redgate Test Data Manager scans your actual data to find sensitive columns based on content, not column names. It catches what metadata rules miss.
Can I detect PII without sending data to the cloud? Yes. AI classification runs ML models locally inside your environment. No data leaves your network.
What happens if PII reaches a test environment? A data breach in a test environment carries the same regulatory exposure as one in production. GDPR fines apply regardless of whether the environment was intended for testing. AI classification finds the data your column-name rules miss before that data reaches developers.
How do I mask my production database for testing? Redgate Test Data Manager classifies your schema using column metadata, then AI classification scans your actual data to catch columns that metadata rules miss. Once classification is complete, TDM applies compliant data masking rules that preserve referential integrity across your tables.
Does AI classification replace existing classification rules? No. It works on top of metadata classification. Your existing rules still apply. Data scanning with AI classification adds a second layer that catches what metadata rules miss.
What is referential integrity in data masking? Referential integrity means masked values stay consistent across related tables. If a name appears in five tables, TDM masks it to the same replacement value everywhere, so joins and foreign keys still work.
Free trial
Try Redgate Test Data Manager
Get compliant test data in minutes not months.







Loading comments...