Combinations, permutations, and derangements

Joe Celko explains how several mathematical concepts, combinations, permutations, and derangements, relate to databases.

Before I get to the database stuff, here are a few mathematical preliminaries.

Factorials

The factorial function is usually written as (n)!, and it is defined as the product of the first (n) natural numbers. Thus, 5! evaluates to (5 · 4 · 3 · 2 · 1) = 120. As usual, zero becomes a special case, 0! = 1 which can be proven with a slightly different derivation of a factorial. Instead of defining it as a product, define it recursively, like this: n! = CASE WHEN n = 0 THEN 1 ELSE n · (n-1)! END.

Showing the process one step at a time, the recursion unrolls like this:

5! = 5 · 4!
4! = 4 · 3!
3! = 3 · 2!
2! = 2 · 1!
1! = 1 · 0!

Look at the last step of the recursion. Now divide both sides by one to get (1! / 1) = 0! or 1 = 0! Notice that everything done so far is procedural, not set-oriented. RDBMS folks prefer to get away from procedural code. For the rest of this article, you can think of n! as the number of ways to arrange (n) elements of a set into a sequence. Clearly, if you have one element, then you have only one arrangement. But likewise, if you have zero elements, you are also done. Just as you have only one empty set, so you also have only one empty sequence.

Yes, it is possible to define factorials for negative numbers, imaginary numbers, the gamma function for real numbers, and other things. Unless you’re a math major, you will have absolutely no use for any of these fancy tricks. Since SQL is a database language and not a computational one, you might want to use table lookup and not this recursive definition. You can find a table of the numbers one to hundred to populate the table. The factorial function gets big fast!

0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
7! = 5,040
8! = 40,320
9! = 362,880
10! =3,628,800
11! = 39,916,800
12! = 479,001,600

Combinations

Combinations assume there is a set of (n) distinct elements, and you want to get the number of subsets you can pull from this set. There is no concern with exactly what elements go in a particular subset, just the count. The order is not important.

One common notation for this, though not the only notation, is nCr read as Set of (n) things, Choose (r) of them. Notice that (0 ≤ r ≤ n), and that you can choose an empty set. Given a set, the number of subsets in it is 2n ; For example, {a, b, c} has subsets {}, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c} for total count of 8.

I’ve never understood the term ‘combination locks’ when the order of the numbers that you put into it is very important. Perhaps traditional mechanical security features were not designed by mathematicians.

Permutations

Permutations also assume there is a set of (n) distinct elements, and the goal is to get the number of sequences from this set in any order. However, in this case, the order does matter; (a, b, c) is not the same permutation as (c, b, a).

Derangements

To be honest, I just think that the term ‘derangement’ is really cool. Given two sequences of the same length and with the same data elements, one is a derangement of the other when the data elements do not appear in the same sequence position in the two sequences. For example, (a, b, c) is not a derangement of (c, b, a) because the data element ‘b’ is in the second position in both sequences. Instead, there are two derangements, (c, a, b) and (b, c, a).

A derangement can also be called a permutation with no fixed points. The two notations for derangement of (n) elements are either !(n) or D(n). I find the first one with the leading exclamation point to be a bad choice. It looks too much like a factorial.

The number of derangements of a set with (n) distinct elements is given by the recursive formula : D(n) = (n-1)[D(n-1) + D(n-2)]. If you know D(1) = 0 and D(2) = 1, you can generate subsequent values for D(n). Or you would prefer : D(n) = n D(n-1) + (-1)n?

Obviously, the derangements will be fewer in number than the original permutation.

Derangements of n Elements

D(n)
D(0) = 1
D(1) = 0
D(2) = 1
D(3) = 2
D(4) = 9
D(5) = 44
D(6) = 265
D(7) = 1,854
D(8) = 14,833
D(9) = 133,496
D(10) = 1,334,961
D(11) = 14,684,570
D(12) = 76,214,841

Derangements provide a way to make sure that everyone gets a fair shake when making job assignments. Nobody is stuck in the same job from one assignment cycle to the next. The idea is that eventually, all your personnel would be assigned to every job, but you don’t necessarily know in which assignment cycle any employee will get a particular job.

My favorite use for derangement is when an office does a “Secret Santa” Christmas gift program. If you have never had one of these, the idea is that everyone brings a wrapped gift to work. The gift-giver then picks the name of another employee randomly, and the second employee becomes the recipient. The first rule is that you don’t give a gift to yourself. A second rule is that you’re not supposed to be able to figure out who is the gift giver.

Gift givers are the set, and the recipients are supposed to be an unpredictable derangement. You try doing this by drawing one name at a time, and very quickly a wind up with a situation where only a few possible derangements are left. So much for maintaining secrecy. To get an idea how this works, look at the prior example, with only three data elements in the original permutation. You really need to pick your derangement all at once.

A cute trick for doing this in the real world is to put everybody’s name on a card twice, once on the top and once on the bottom. Shuffle the card deck. Cut the cards in half, leaving the top and bottom halves paired. Take the top half-card from the top halves deck and put it on the bottom of the same top halves deck. Now draw the pairs of cards from the decks, assembling the tops and bottoms to make a new card. The result will be a new card deck with an unpredictable derangement.

Permutations, combinations, and derangements in databases

While all these are interesting, a database person is probably going to be more interested in actually generating the permutations, combinations, and derangements. The problem is that SQL is not really built for this kind of computation. The sets are in databases are in the form of tables, which all have a fixed number of columns. Databases don’t have arrays, link lists, or other data structures, so it’s faked with tables.

If all the columns are the same data element, then this is an example of de-normalizing a table with a repeated group. Here’s a simple example for (n = 3)

Remember that all tables must have a key. In these cases, the rows that represent a particular set all have to be unique, and so do all of the columns in that row. This means each row is a key.

The CHECK() constraint ensures that all columns are unique with a row. A lot of the ranges of the dimensions, constraints about membership, etc., in other data structures must be explicitly defined in SQL check constraints. This is one of the many reasons that I tell newbies that 80 to 90% of the work in SQL is done in the DDL, not the DML. Think about trying to write constraints into procedural code for 500 application programs; instead of putting it in one CHECK() constraint, you must duplicate the same code and magically hope that you get it right in every procedural chunk in your system. And when the rules change, you must go back through those 500 pieces of procedural code and update them. Lots of luck with that.

Modeling combinations is easy. Since order doesn’t matter, any simple list of the values will do. That’s a perfect description of a table with one column.

There are several algorithms for generating permutations. Two of the best known are the Heap algorithm (1963) and the Fike algorithm. Both are recursive and are based on the facts that

1) There are n! permutations of the set {1, 2, 3, …,}. This let you know how many times you have to cycle through a loop or how deep your recursion has to go.

2). The next permutation can be generated from the current permutation without fear of duplication.

Robert Sedgwick (you might know from his textbooks that are widely used) wrote a paper on the various methods for generating permutations. He classified them as

1 METHODS BASED ON EXCHANGES
Recursive methods
Adjacent exchanges
Factorial counting
“Loopless” algorithms

2 OTHER TYPES OF ALGORITHMS
Nested cycling
Lexicographic algorithms
Random permutation

References:

https://en.wikipedia.org/wiki/Heap%27s_algorithm

https://academic.oup.com/comjnl/article/19/2/156/408726. The Computer Journal, vol 19 Issue 2. pages 156-159

https://homepage.math.uiowa.edu/~goodman/22m150.dir/2007/Permutation%20Generation%20Methods.pdf