Red Gate forums :: View topic - Anonymizing data and preserving foreign key relationships
Return to www.red-gate.com RSS Feed Available

Search  | Usergroups |  Profile |  Messages |  Log in  Register 
Go to product documentation
SQL Data Generator 1
SQL Data Generator 1 forum

Anonymizing data and preserving foreign key relationships

Search in SQL Data Generator 1 forum
Post new topic   Reply to topic
Jump to:  
Author Message
BenDippenaar



Joined: 15 Oct 2011
Posts: 1

PostPosted: Sat Oct 15, 2011 12:27 am    Post subject: Anonymizing data and preserving foreign key relationships Reply with quote

We are using data generator to anonymize some existing data in a database.

Right now, when i use the SQL foreign key data generator, it looks like the data that is generated has to be unique, or I have to specify the amount of repeats from within the interface.

It would be really nice if the data generator could generate data that matches the referential ‘footprint’ of the data it is generating from.

for example, it doesn't seem like i can preserve existing many-to-one relationships in the anonymized database.
Back to top
View user's profile Send private message
james.billings



Joined: 16 Jun 2010
Posts: 1119
Location: My desk.

PostPosted: Mon Oct 17, 2011 12:26 pm    Post subject: Reply with quote

Hi, and thanks for your post.

Usually for anonymizing data, you'd work on the basis of "copying" your current database and then generating random data for just the columns you want to remove sensitive information on. We have a document describing the process here

I'd suggest having a look through that and seeing if it helps, if not, please let us know!
Back to top
View user's profile Send private message
Scriber



Joined: 17 Oct 2011
Posts: 2

PostPosted: Mon Oct 17, 2011 6:24 pm    Post subject: Reply with quote

Hey James,

What if your DB had 3 tables in it and in all three tables there was a column lets call it creditcardnumber the columns all need to be anonymized the same way so that the one to many or the one to one relationship can be maintained and you can query the new anonymizing data and get the same row count and reult set as the live data. Can the tool do that?

Thanks,
Scriber
Back to top
View user's profile Send private message
james.billings



Joined: 16 Jun 2010
Posts: 1119
Location: My desk.

PostPosted: Mon Oct 17, 2011 6:27 pm    Post subject: Reply with quote

For most of the random generators, you get the option of a "seed" value. AFAIK, the same seed should produce the same ordered data.

If the 3 tables are actually a master, and then there's a FK on the other two back to that, it should just bring the values in automatically using the FK generator.
Back to top
View user's profile Send private message
Scriber



Joined: 17 Oct 2011
Posts: 2

PostPosted: Mon Oct 17, 2011 7:10 pm    Post subject: Reply with quote

Thanks James I'll give it a try, and it should work in a one to many relationship? Were table A has one entry for creditcardnumber 123 and table B has say ten entries for the same creditcardnumber 123 when it changes table A to say ABC it will chnage the other ten entries in Table B to ABC so when I query for creditcard number ABC I'll get the proper row count?
Back to top
View user's profile Send private message
james.billings



Joined: 16 Jun 2010
Posts: 1119
Location: My desk.

PostPosted: Mon Oct 17, 2011 7:13 pm    Post subject: Reply with quote

Hmm, I'm not sure if it's that clever when just generating random records. If your FK relationships are set up on that column then you have the choice in the FK generator for whether to generate all rows at random, make them unique, or to repeat a certain number of times, but I'm not sure if that's going to help.

SQL Data Generator doesn't have any kind of cross-column rules, so you are limited in that sense.

If yuo get stuck, then a simple repro may be useful... you can email it to support with F0053299 in the subjectline
Back to top
View user's profile Send private message
mark__a



Joined: 10 Nov 2011
Posts: 2

PostPosted: Thu Nov 10, 2011 9:39 pm    Post subject: Re: Anonymizing data and preserving foreign key relationships Reply with quote

BenDippenaar wrote:
It would be really nice if the data generator could generate data that matches the referential ‘footprint’ of the data it is generating from.

for example, it doesn't seem like i can preserve existing many-to-one relationships in the anonymized database.


We are trying to do exactly the same thing. We have three tables:
Address, Customer, CustomerAddresses (contains only 2 FKs: AddressId and CustomerId). A Customer can have 0 to many Addresses.

The goal is to generate new address info and customer names but keep the same FKs in the CustomerAddresses table. I am using the "Use existing data source" option and am referencing a copy of the database to generate/import the data from Address and Customer. What I want to do is to import the CustomerAddresses table EXACTLY as it is. Same Id values in each row.

In the "real" database, the data looks like this:

Address/AddressId
1 (and then street1, city, state, zip, etc.)
2
3
...

Customer/CustomerId
1 (and then name, credit limit, etc.)
2
3

CustomerAddresses
CustomerId / AddressId
1 1
1 2
2 3
2 4
3 5
3 6

But when we generate the CustomerAddresses table with "Use existing data source" it forces us to use the Foreign Key Generator for these two columns. We can't change it. How can we just use the existing data and not generate anything?

We cannot de-select the CustomerAddresses table for generation because it chokes when trying to delete all the rows from Customer.
Back to top
View user's profile Send private message
james.billings



Joined: 16 Jun 2010
Posts: 1119
Location: My desk.

PostPosted: Thu Nov 10, 2011 9:48 pm    Post subject: Reply with quote

Unfortunately if the foreign keys are there, then it'll always use the FK generator. If you're copying records across for the other two tables does this matter though? Presumably, the primaries in the other two tables are what you want anyway?

The other option is to drop the FK's before starting up Data Generator; then you can use whatever generation option you like on those columns. You can recreate them afterwards then. This is a little more inconvenient of course, but the only way around it I can think of right now.
Back to top
View user's profile Send private message
mark__a



Joined: 10 Nov 2011
Posts: 2

PostPosted: Thu Nov 10, 2011 11:47 pm    Post subject: Re: Reply with quote

james.billings wrote:
Unfortunately if the foreign keys are there, then it'll always use the FK generator. If you're copying records across for the other two tables does this matter though? Presumably, the primaries in the other two tables are what you want anyway?


This is a serious limitation in your product. All I really want to do is to "UPDATE" a few columns to wipe out some of the production data columns so that we can have a realistic test database.

As a concrete example for why this is a serious limitation, I put forth the following:

Customer can have 0 or more related Address records (via CustomerAddresses). If the CustomerAddresses CustomerId and/or AddressId FKs are randomly assigned, then the Addresses won't make much sense at all. You'll have Address records from different states or countries. Which is fine if you're trying to test for all available inputs but terrible if your goal is a realistic-yet-cleansed set of data.

There are a number of other examples in our database where many-to-many relationships only make sense if you keep the same keys. As far as I can tell, the only way to do this is to run a post-SQL Generator script that deletes all the FKs in these m2n tables, sets IDENTITY INSERT and then re-imports the existing data.

Why can't we just import all of the CustomerAddresses FKs as-is? Can't we just set the generation order so that Customer and Address are generated before CustomerAddresses?
Back to top
View user's profile Send private message
james.billings



Joined: 16 Jun 2010
Posts: 1119
Location: My desk.

PostPosted: Fri Nov 11, 2011 3:57 pm    Post subject: Reply with quote

We do have a feature request to allow simple updates which would obviously help in your situation, but as yet it's not been scheduled for inclusion. The ref. is SDG-850 and I've added a link back to this thread on the job for when it's next evaluated.
Back to top
View user's profile Send private message
anna.p



Joined: 03 Jul 2012
Posts: 23

PostPosted: Wed Oct 17, 2012 12:21 am    Post subject: I second this!! Reply with quote

A feature to allow simple updates would be fantastic.

I'm getting around this limitation by creating views against the source tables that select only the columns I don't want to "update". I use the Table/View generator to populate those columns in the target and randomizing generators for the columns I want to "update". The only columns for which this doesn't work are those with a Foreign Key constraint and IDENTITY columns.

2 possible enhancements to address this, which would help my situation but possibly not the other posters', are:

    1. For columns with FK constraints, default to the "Foreign Key" generator, but allow us to change it to a different one.

    2. For IDENTITY columns, default to the "Server Assigned" generator, but allow us to change it to another integer-type generator with an option to SET IDENTITY_INSERT ON.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic All times are GMT + 1 Hour
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group