Celko’s SQL Stumper: Eggs in one Basket

Joe Celko reveals the winner of his Easter Stumper: the puzzle of designing an apparently simple database to deal with the process of packing eggs into cartons. It wasn't quite as easy as it looked.

Don’t Put All Your Eggs in One Basket – Use Cartons!

The Puzzle

You are the DBA for Farmer Brown, who runs a chicken farm. Or is it a chicken ranch? Here is Texas, that can be a legal issue. One side argues that it has animals and not plants, therefore it is ranch. The other side says it is a farm because a ranch has livestock, and livestock is defined by an animal that is herded to market. There are cattle drives, there are sheep drives but there are no chicken drives. There is a story about that disaster, but I digress.

Farmer Brown gathers eggs in baskets and the eggs are homogeneous. Each basket has an identifier which tells us where and when it was collected. Or are eggs harvested if it is a farm? But I digress again.

We do not put our eggs in one basket; we put them in dozen egg cartons. The dozen is an incredibly strong tradition. During the decimalization of the UK a dairy offered ‘decimal eggs’- a ten-pack of eggs. People would not buy it; it was just too weird, even when the cost per egg was less. I have a theory about the chickens being taught to lay eggs that way. However, Brits were happier to consider adjusting to metric beer when it was explained that a liter is more beer than a quart, before dismissing the idea as a Napoleonic trick.

We need table(s) for the baskets and the cartons. A basket comes in, and it is split into cartons with 12 or fewer eggs. This means that we will have one or zero cartons with fewer than 12 eggs in it. Example: the basket with 144 eggs becomes 12 cartons. The Basket with 255 eggs becomes 21 full cartons and one with 3 eggs.

  1. Your first job is to design the table(s)! Yes, this is a puzzle where you have to write DDL. Just doing SELECT/UPDATE/INSERT/DELETE is only part of SQL; most of the real work is in DDL. A bad schema design will force bad DML. A carton, for example,  must have CHECK (COUNT(egg_cnt BETWEEN 1 AND 12) constraint.
  2. Split the baskets of eggs into cartons, each carton having its own identifier. This can be a statement or a stored procedure. Call this procedure or function SplitBaskets().
  3. Once the baskets are split into cartons, write a procedure to split a carton. You will need parameters for the source carton and the number of eggs moved. You need to create a new carton. Call this procedure or function SplitCarton(). You need to watch the size rules.
  4. Once the baskets are split into cartons, write a procedure to consolidate two cartons. You will need parameters for the source carton and the destination carton. Assume that the destination carton gets filled and the source carton decremented by the appropriate amount. I.e. (c1 = 10, c2 = 4) => (c1 = 12, c2 = 2). Call this procedure or function ConsolidateCartons(). Watch out for the empty carton problem.

The answer needed include the DDL as well as the DML, as  the code is easier if you get the schema design  right.

The competitors had to…

  1. Solve the problem — Duh!
  2. Avoid proprietary features in SQL Server that will not port or be good across all releases, present and future.
  3. Use Standard features in SQL Server that will be good across all releases, present and future. Extra points for porting code.
  4. Be clever but not obscure.
  5. Explain what they were doing

A discussion about the Stumper

I really liked my Egg Puzzle because it was a chance to show off a lot of newer programming tricks. I usually write ANSI Standard SQL and then translate it into local dialect, so bear with me.

Begin by modeling Baskets of eggs. We can assume that there is a basket identifier of some kind (basket_id) and that it has a certain number of eggs in it. The collection date and other information is not needed for the puzzle, but put him in there to remind us about that data.

or in SQL Server

The number of possible full cartons is a computed value, as its the size of the partial, possibly empty, carton. But where do you compute these values? I think that Dkorzennik was correct to do this job in the table where the parameter lives. They can done with the new computed column feature; Standard SQL uses “GENERATED ALWAYS AS” instead of the “PERSISTED” syntax of SQL Server.

Integer math is preferred over using FLOOR() and CEILING() functions on decimal or floating point expressions.

Next, we need to model the cartons. I am making the assumption that a carton is identified by (basket_id, carton_id) because that is the pattern required by law for chickens, swine, shrimp, etc:  RFID tags. Basically, animals are sold in ‘lots’ and not individually identified.

Notice the DRI action on the key and a CHECK() on the egg count in each carton. The CHECK() might look redundant at first because the computed columns assure that dozen_cnt and partial_cnt are always between 0 and 12 when we get to the procedure to pack the cartons. Wrong. Firstly, this constraint assures that nobody can subvert the data. Secondly, the optimizer can use it, but cannot use the computations.

Now, let’s pack those cartons. Carton #0 will always be the partial carton from its basket that we cannot ship. That is a handy thing for a lot of queries, where we need to look at partial and full cartons. The parameter for the procedure is just the basket identifier; we have the other data we need already computed in the Baskets table. SQL Server people will need to turn the IN keyword into a @ prefix.

Or in SQL Server…

I assume we have the usual auxiliary table with a series of integers from 1 to (n). The first term of the UNION ALL created the partial, possibly empty, carton. I don’t want to have empty cartons, I can add a DELETE FROM statement after this or add a “.. AND partial_cnt > 0” predicate to the first SELECT.

The second SELECT returns the full cartons numbered 1 to dozen_cnt within the basket.

Here is a characteristic for good SQL programming: do as much work as you can in ONE statement, so the optimizer can have as much information as possible.

Now, the next procedure is supposed to move eggs from a source carton to the destination carton. Because SQL is a data language, I like data-driven solutions over computational ones. This is a state-change problem. There are “conservation of eggs” laws that have to be met:

  1. I cannot pack or unpack more than 12 eggs total to either carton.
  2. I cannot move more eggs than there are in the source carton.
  3. The total number of eggs is constant after they change locations.

So, we build a classic state-change table, with precondition and post-condition states.

Or, in SQL Server…

The first thought to use the Series table and load the table like this:

Nice, but it does not work! We need to allow for cartons that have or will have zero eggs during consolidation. We could use a CTE that UNIONs a zero the to Series table, but it is just as easy to do a little math and subtract one to adjust the range.

This will give us 169 rows to use. It might throw a new SQL programmer when he sees the two-part keys, but Standard SQL has row constructors; SQL Server programmers will have expand them. What we are going to do is find the egg count in the source table, the egg count in the destination table and use them to do a fancy look-up in the Consolidations. The MERGE statement is made for this.

In SQL Server, this would probably look like this…

The CASE expression puts the right post-condition numbers in the appropriate rows. If we cannot find our situation the Consolidations table, we do nothing because there are no matches.

You could make Consolidations into a derived table to get this into one statement. I am not sure that it is worth it; as the table is very small but since you can index a base table, there might be a slight performance boost.

Another point; this code was written in straight ANSI SQL without any proprietary features. Do you think that you would have any problems porting it to SQL Server? DB2? Oracle? Any SQL that has the SQL-99 standards or better?

And the winner is…

I was impressed by the ingenuity and energy of the contestants. As always, it is difficult to choose a winner because almost every entry had something of merit to offer. I  liked Peso’s ingenoius entry, which would probably take the speed award, if there was one. I am a fan of his — his stuff is usually clean and solid code.  When you earn a living fixing disasters, you really like to see that. However,  Phil and I both voted to give this one to Dkorzennik in the end. He actually does this in the Real World. His code was pretty clean. But he have IDENTITY when it was not needed, a WHILE which is evil and some minor dialect that a Petty Printer would clean up (INSERT instead of INSERT INTO, DELETE instead of DELETE FROM, etc).

What I liked was seeing computed columns for computed VALUES. Nobody else got that and this was a DDL problem more than anything else.

Putting it into ISO-11179 data element names and adding constraints, he had:

The trigger and its loop can be replaced by a one statement procedure. The partial carton is always #0.