Spoofing Data Convincingly: Credit Cards

I haven’t seen a SQL Server table with real unencrypted credit card numbers for several years, and I don’t know of any good reasons to have them stored that way. However, I’ve needed them in the past for testing a web application that had to take credit card details. Generating credit cards in a way that conforms to a particular distribution is reasonably easy in SQL Server, though. The only difficult bit is the fact that there is a validation checksum. Otherwise it is a good example of how to solve the problem of spoofing the more specialist types of data.

Imagine that we need to create fake credit cards from a fairly large table. We don’t want, and can’t get, the real information, but our fake credit cards have to conform to the distribution of the real data.

All we need is an aggregation of the real data that tells us the distribution. Fortunately, the first six digits of a credit Card has only the details of the bank, currency and so on. We aren’t going to associate the fake data with an identifiable person. We are pretty safe. We are just generating a list of credit cards. We will draw from the list at random and not associate the randomly-generated suffix with the original person.

We start with the real data. OK, it isn’t real data, but we’ll pretend that it is. The data I used is provided with this blog so you can try it out. It conforms roughly with the proportion of providers that you’ll see in the States.

… A lot of rows omitted  here …

We just need to know the distribution. You must imagine that these cards are from a production system that you will never see. In reality, you would just receive from the Ops people the Generation Table #GenerationTable so you wouldn’t need access to the actual data

Now we can use the Generation Table to provide valid credit cards with the correct distribution. Note that, by this method, you will only get credit card numbers with the prefixes that come from the generation table.

With this table we can quickly throw together a routine that provides our friends in AdventureWorks (or your spoof customer names) with credit cards. We use the ‘zip’technique.

To do this, you would need to start with a view that gives you a twenty digit number when combined with the six-digit prefix

Now, we need the function that produces the spoof credit card that is correct according to Luhn’s algorithm

And finally, you have a view that contains just the number of random numbers that you need. I’ve bulked this up to 4000 just to test out and make sure the routine performed well. The source is with the blog.

… and so on down to the end … (it avoids iteration and is fast)

We can now compare the distribution of the old and new data, using the routine published here in ‘Visual Checks on How Data is Distributed in SQL Server’

 

Can we improve on this in order to get a smoother distribution of provider strings? At the moment we can’t create a six-digit provider string that isn’t in the original data. To do so isn’t easy, because each provider uses the string in a different way, and not every combination of digit is used even for the individual provider. It would require maintenance as well because provider strings can come and so, or change.

This following routine is designed for generating lists. You can check the routine that creates the card by using an existing routine, dbo.fnIsValidCard, that I wrote a while back in The Luhn Algorithm in SQL .

Downloads

BigRandomNumberView

SampleCreditCards