Celko’s SQL Stumper: The Class Scheduling Problem

What can we use in SQL instead of E. F. Codd's T theta operators for best-fit? Joe Celko returns with another puzzle that isn't new, in fact it already features "Swedish", "Croatian" and "Colombian" solutions in chapter 17 of Joe's 'SQL for Smarties' book. These were all written before CTEs or the new WINDOW functions. Is there now a better solution? Was there one even then? We leave it to the readers to provide the answer!

In the book “The Relational Model, Version Two” (1990, ISBN13: 978-0201141924), Dr. E. F. Codd introduced a set of new theta operators, called T-operators, which were based on the idea of a best-fit or approximate equality. The name never caught on and nobody invented special syntax for them.

One problem is that they were defined by a procedural algorithm and not a set-oriented definition. It is easier to understand with an example modified from Dr. Codd.

The problem is to assign the classes to the available classrooms. We want (class_size < room_size) to be true after the assignments are made. This will allow us a few empty seats in each room for late students. We can do this in one of two ways. The first way is to sort the tables in ascending order by classroom size and the number of students in a class. We start with the following simplified tables:

These tables have the following rows in them:

The goal of the T-JOIN problem is to assign a class which is smaller than the classroom given it (class_size < room_size). Dr. Codd gives two approaches to the problem.

1) Ascending Order Algorithm

Sort both tables in ascending order. Reading from the top of the Rooms table, match each class with the first room that will fit.

This gives us …

2) Descending Order Algorithm

Sort both tables in descending order. Reading from the top of the Classes table, match each class with the first room that will fit.

and …

Notice that the answers are different! Dr. Codd has never given a definition in relational algebra of the T-Join, so I proposed that we need one. Informally, for each class, we want the smallest room that will hold it, while maintaining
the T-JOIN condition. Or for each room, we want the largest class that will almost fill it, while maintaining the T-JOIN condition. These can be two different things, because these are not “best fit” solutions “first fit” approach.

Other theta conditions can be used in place of the “less than” shown here. If “less than or equal” is used, all the classes are assigned to a room in this case, but not in all cases. Nor is the solution always unique. Here is a (class_size <= room_size) answer for this data. Room ‘r5’ is too small to match any class.

In the third edition of SQL FOR SMARTIES, I gave “Swedish”, “Croatian” and “Colombian” solutions, named for the countries of the submitters. They were all written without CTEs or the new WINDOW functions (that is the technical name for the OVER() clause on aggregate functions). Obviously, if two rooms or two classes are the same size, they can be swapped with each other. Let’s make life easier by adding a UNIQUE constraint to room_size and class_size.

The problem is write an SQL statement that gives a T-Join.

WARNING: some of you already know this, but finding an optimal solution is an NP-complete problem. The bigger the data set gets, the effort it takes to solve the problem increase beyond any reasonable limit. You can Google for “Knapsack Problem” or  “Bin Packing Problem” if you want the details.

The best answer to each stumper will be given a prize of a $100 Amazon voucher. The stumper will be run simultaneously on SQL Server Central and Simple-Talk. To see all the comments so far, you will need to visit both sites.

 We will take entries for a week after the first Monday of publication, posted as comments to the articles.

Two weeks after the challenge is sent out, the judge’s decision and comments will be sent out in the newsletter, and published on the site. Joe Celko and Phil Factor will judge the answers to this puzzle. Your answer should :Your answer should :

  1. Solve the problem — Duh!
  2. Avoid proprietary features in SQL Server that will not port or be good across all releases, present and future.
  3. Use Standard features in SQL Server that will be good across all releases, present and future. Extra points for porting code.
  4. Be clever but not obscure.
  5. Explain how you got your answer.

The challenge presented by ‘Celko’s SQL Stumper: The Class Scheduling Problem’  was to assign classes to the classrooms that were available. This meant that, for each assignment the class size had to be less than the room size after the assignments are made, so as to allow a few empty seats in each room for late students. During the competition, Joe decided to rule out problems such as duplicate class sizes and identical rooms, in order to avoid ‘combinatorial explosions’.

Joe had already provided some solutions to this problem in his ‘SQL for Smarties’ book but challenged contestants to come up with something better. However, if he was expecting mainly to see subtle variations on his own ideas, then he got a pleasant surprise; most of the contestants aimed for something new and different. Refreshingly, the competition turned into a real group effort.  The majority of the entries are to be found on the SQLServerCentral site, since their forum software makes it rather easier for the contestants. In many cases, interesting and imaginative solutions were submitted, and then subsequently refined and re-submitted, based on help and advice from fellow contestants. It made choosing a winner a very difficult task.

First of all, came Peso, who set a very high standard, as always, with his first and second Descending order algorithm. For a short while there was a stunned silence from other competitors, but then Absinth, Liam Caffrey, Cary Taylor and Ryan Randall came in with some fresh and original ideas. Pierre-702284 and Lattice, in particular, surprised Joe with the different ways of approaching the problem.

Joe was surprised that few contestants ran the check SUM(class_size) <= SUM(room_size) so as to see if it is impossible to stuff the classrooms; he was also expecting the ROW_NUMBER() OVER() technique to be used more. However, as more entries came in, it became apparent that the contestants wanted to experiment. Peso enlivened the competition by gently highlighting the flaws in the early submissions on SSC in order to guide contestants towards a better solution. He then created a much better data set (the initial one was far too small) and Ryan and Dennis Allen produced some very slick solutions that became much more linear with increasing data sizes.

In the end, Joe decided to award the prize to Pierre702284. He is the “new kid on the block”; he got there first and really worked on it. Granted there was a lot of feedback from Peso, but he kept at it and cleaned up the code to make sure it was readable. However, an honourable mention must go to Steriod1970 who tried something different and original (goodness of fit scoring) who might have won had he been able to fully carry it out. Last, but not least, is Peso, whose contribution was magnificent.