Bin Packing Problems: The SQL

The 'bin packing' problem isn't just a fascination for computer scientists, but comes up in a whole range of real-world applications. It isn't that easy to come up with a practical, set oriented solution in SQL that gives a near-optimal result.

The bin packing problem is an NP-complete problem. It is a great way to make computer science students do some work and it is also useful in the real world. The basic problem statement is that you are given a set of (n) items. The items all have different volumes. We then have a supply of bins or boxes of the same size. The goal is to put the items into the bins and to minimizes the number of bins used.

As a real world application for this algorithm, look at television advertisements of arbitrary durations. Given time slots of, say, five minutes how do we pack the the most commercials into each time slot and maximize our daily profits. Think of free public service announcements or promo spots as unfilled bins.

Another example is scheduling tasks with known execution times on a set of identical machines. Minimize the number of machines needed for the whole project. This is actually the simplest form of a more complex problem. The task usually have an order of execution. The wire has to be on the wheel before the lug nuts, etc. But is another article for later.

To make the math easier, we assume that the items are sized in integer values from 1 to (k) where (k <= bin size). And to keep thing simple, the bins are size (v). I happen to like (v =12) because 12 has a lot of divisors and because I can use egg cartons and colored plastic eggs when I teach this problem.

It is obvious that we can not have an item of size zero. In spite of what people believe about suitcases, you cannot cram an oversized load into a bin. It is also obvious that if an item is the size of the bin then you have filled that bin and need to get to the next one. We know that the number of bins is between ceiling(sum(item_size)/ bin size) and (k), the number of items.

If the items are all of size one, then we can simply fill bins and use that lower limit of ceiling(sum(item_size) / bin_size). There are sophisticated algorithms for particular item sets, too.

But in the general case, we have to fall back on heuristics. Frankly, the good ones are done with procedural code and not SQL. But there are some quick heuristics that work very well in the real world.

The First Fit Algorithm provides ie fast, but often gives a non-optimal solution. It is just what you chink and the way that people pack a suitcase. This assumes that the items are in a queue and you have enough bins. Grab an item and put it into the first bin in which it will fit.

The First Fit Algorithm requires θ(n log n) execution time, where (n) is the number of items to be packed. This is the same as a non-stable sorting algorithm. You can boost this algorithm by first sorting the list of items in decreasing order (First-Fit Decreasing Algorithm).

This does not guarantee an optimal solution, and for longer queues, it may increase the running time because of the extra sorting. It is known, however, that there always exists at least one ordering of the items queue that produces an optimal solution. You just cannot find it easily. In 1973, Jeffrey Ullman (a very important name in computer science) proved that this algorithm can differ from an optimal packing by as much at 70%.That same year, S. S. Johnson showed that this strategy is never suboptimal by more than 22%, and furthermore that no efficient bin-packing algorithm can be guaranteed to do better than 22%.

Another version is if this algorithm is the First-Fit Increasing Algorithm, which is the same queue sorted from smallest to largest item size. It has all the same problems and limits.

The Best Fit Algorithm needs some smarts. We you get the next item in the queue, you look over the partially filled bins for one which will be completely full if the item is added to it. This sounds like it ought to be optimal. Sorry, but it can fail, too. As an exercise, try writing a query to pick subsets of (n) items that total to the bin target bin size. If you cannot fill a bin, insert itnto the gin that will have the smallest space left; the best fit.

The (n = 1) case is trivial and it is one configuration that you have to have in the final solution. The cases for (n > 1) have to be done with self-joins.

The Worst Fit Algorithm has an unfortunate name. We look at the available partially filled bins and put the item into the bin with the most empty space.

There is a website with animations of of various algorithms from the University of Arizona’s computer science department. It is a demonstration of the ICON programming language. It is as high level language that started off as a string manipulation language to replace SNOBOL and then it evolved.

SQL and Bins

Getting a model for bin packing in SQL is not that hard, but the constraints are harder because they have to be at different level of aggregation. :et me jump into my DDL and then explain it.

The size and the item numbers are constants which we load the set of items. Bin zero is essentially loose items on the floor, waiting to get packed. A VIEW of the bins will be handy, so let’s do that:

The WITH CHECK OPTION is not well-known but it has been a part of SQL since the SQL-92 Standards. The updatable view is protected from any update that would change a row in violation of the WHERE clause. Try to do this insertion:

leads to this error message:

However, this is just fine:

This might be easier to see with actual data, Start with this queue {6, 6, 5, 5, 5, 4, 4, 4, 2, 2, 2, 2, 3, 3, 7, 7, 5, 5, 8, 8, 4, 4, 5}   which gives us 110 units to go into bins of size of 9 units. If we did a first fit algorithm, the table would look like this:

Doing the math, the theoretical lower bound is 13 boxes.  This gave us 17 boxes which looks like this:

Box nbr

Packing

1

6

2

6

3

5

4

5

5

9

6

8

7

8

8

7

9

3

10

7

11

7

12

5

13

5

14

8

15

8

16

8

17

5

Can we actually hit that lower bin count? Maybe, but the only way to find out is to try all possible packing arrangements in the general case.

One other estimation we can make is to try to try all combinations of (k) items and get a heretical upper bound.  The basic approach is a self join and hope that you do not a lot of small items.

I wonder if I can pack three items in a b, so we try that:

 Let’s try packing all combinations of four items. It still works. Push this to four items per bin with an extension of the pattern we have been using, we get this query.

Again, we can extend the query to five items with this:

but now we get a failures at five items in a bin. Obviously, it would be better to start with a high value of (k) and keep reducing the value until you get a result back.

As it works out, with this data the Decreasing First Fit algorithm gives an optimal result. That queue would look like this: {8, 8, 7, 7, 6, 6, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 3, 3, 2, 2, 2, 2}. The results are like this:

Bin nbr

Packing

1

8

2

8

3

9

4

9

5

9

6

9

7

9

8

9

9

9

10

9

11

9

12

9

13

4

Do not be fooled by this!  If we has a bin of size 10 and five items of size 6, the theoretical lower bound would be three bins using ((5 * 6)/10). But it requires five bins no matter which algorithm you pick.

Without generating all possible permutations, see if you can come up with a practical, set oriented solution that gives a near-optimal result.

References:

  • Albers, S. and Mitzenmacher, M. “Average-Case Analysis of First Fit and Random Fit Bin Packing.” Random Structures Alg. 16, 240-259, 2000.
  • Coffman, E. G. Jr.; Garey, M. R.; and Johnson, D. S. “Approximation Algorithms for Bin-Packing–An Updated Survey.” In Algorithm Design for Computer System Design. Vienna: Springer-Verlag, pp. 49-106, 1984.
  • Garey, M. R.; Graham, R. L.; and Ullman, J. D. “An Analysis of Some Packing Algorithms.” In Combinatorial Algorithms. New York: Algorithmics Press, pp. 39-47, 1973.
  • Graham, R. L. “Bounds on Performance of Scheduling Algorithms.” In Computer and Job-Shop Scheduling Theory (Ed. E. G. Coffman Jr.). New York: Wiley, pp. 165-227, 1976.
  • Johnson, D. S. “Approximation Algorithms for Combinatorial Problems.” In J. Comput. System Sci. 9, 256-278, 1974.
  • Johnson, D. S. “Approximation Algorithms for Combinatorial Problems.” In Fifth Annual ACM Symposium on Theory of Computing (Austin, Tex., 1973). New York: Assoc. Comput. Mach., pp. 38-49, 1973.
  • Hoffman, P. The Man Who Loved Only Numbers: The Story of Paul ErdÅs and the Search for Mathematical Truth. New York: Hyperion, 1998.
  • Lewis, R.; 2009. “A General-Purpose Hill-Climbing Method for Order Independent Minimum Grouping Problems: A Case Study in Graph Colouring and Bin Packing”; Computers