Two-Dimensional Interval Packing Challenge

Comments 0

Share to social media

Packing intervals is a classic SQL task that involves packing groups of intersecting intervals to their respective continuous intervals. In mathematics, an interval is the subset of all values of a given type, e.g., integer numbers, between some low value and some high value. In databases, intervals can manifest as date and time intervals representing things like sessions, prescription periods, hospitalization periods, schedules, or numeric intervals representing things like ranges of mile posts on a road, temperature ranges, and so on.

An example for an interval packing task is packing intervals representing sessions for billing purposes. If you conduct a web search of the term packing intervals SQL, you’ll find scores of articles on the subject, including my own.

I’ve dealt with many interval packing tasks in the past and used different techniques to handle those. But they all had something in common that kept the complexity level at bay—they were all what you can think of as one-dimensional interval packing tasks. That is, each input represented one interval. One-dimensional intervals can be depicted spatially as segments on a line, and the packing task can be illustrated as identifying the delimiters of the groups of intersecting segments.

During a T-SQL class, while discussing packing challenges, Paul Wilcox from Johns Hopkins inquired whether I dealt with two-dimensional packing tasks, and if I had any references to resources for addressing those. He pointed to a forum post he submitted a few years earlier with a problem that came about due to a scheduling issue he came across when working at a previous school district. I remember Paul trying to use hand gestures representing two dimensional polygons to illustrate the idea. Up to that point it didn’t even occur to me that two-dimensional packing problems even existed, so obviously I had nothing to give him at the time. I promised to look at it after class, though, and address it next morning.

I sometimes refer to solutions or techniques as being elegant. What was interesting about this problem is that the challenge itself was very elegant and compelling. Moreover, I could tell that Paul was very experienced and bright, so if after years since he first stumbled into this task, he was still looking for additional solutions, it had to be a very interesting one.

In this article I present Paul’s two-dimensional packing challenge and my solution. Some of the illustrations I use are like Paul’s own depictions of the logical steps that are involved in the solution.

As a practice tool, I recommend that you attempt to solve the challenge yourself before looking at Paul’s and my solutions. I recommend reading Paul’s post to understand the task, but will also provide full details of the challenge here.

Note: the code for this article can be downloaded here.

The Two-Dimensional Interval Packing Challenge

The task at hand involves student class schedules stored in a table called Schedule, which you create and populate using the following code:

Each row has the student in question, a begin date (fromdate) and end date (todate), and a period start time (fromperiod) and period end time (toperiod). Note that the sample data uses integers instead of date and time values for simplicity.

To help understand the current data, it can be convenient to depict it graphically. This can be achieved with the following spatial query, forming a rectangle from each row:

Figure 1 has the graphical output of this query as can be seen in the SSMS Spatial results tab.

Figure 1: Unnormalized Schedule Depicted Graphically

The X axis represents the date ranges, and the Y axis represents the period ranges.

Notice that a student may have overlapping schedules, as is the case with Amy’s schedules 1 and 2. You can think of the current data as representing an unnormalized form of the schedule.

Sometimes you need to query the data to identify a schedule that contains a certain date and period. For example, the following query looks for Amy’s schedule that contains date 4 and period 8:

This query generates the following output, showing multiple matches:

Multiple matches for a single (date, period) point are only possible due to the unnormalized nature of the schedule. The challenge is to create a normalized state of the schedule, with distinct, nonoverlapping date range and period range combinations. Assuming such a normalized state is stored in a table, you then have a guarantee for no more than one matching row per input (date, period) point.

The tricky part is that there may be multiple arrangements that can achieve this goal. For the sake of this challenge, you just need to pick one that produces a deterministic result. For example, form a result row for each consecutive range of dates with the same period range. Figure 2 has a graphical depiction of the desired normalized schedule.

Figure 2: Desired Normalized Schedule

Informally, you’re forming the smallest number of rectangles possible using only vertical cuts. Obviously, you could go for a strategy that uses horizontal cuts. Or even something more sophisticated that uses a combination of vertical and horizontal cuts. Anyway, assuming the vertical-cuts approach, following is the desired output of your solution:

Told you it’s a formidable challenge. Now, to work!

Unpack/Pack Solution for One Dimensional Packing

My approach to solving the two-dimensional packing problem is to rely on a classic technique for handling one-dimensional packing, which you can think of as the Unpack/Pack technique, in multiple steps of the solution. So first, let me describe the Unpack/Pack technique for one-dimensional packing.

Suppose that in our Schedule table you only had period ranges and needed to normalize them. This is of course a completely contrived example, but I’m using it for convenience since we already have sample data available to work with. Here’s a query showing just the periods:

This query generates the following output:

Suppose that the task was to pack intersecting periods per student. Here’s the desired output of the solution with the packed periods:

As mentioned earlier, there are many solutions out there for packing intervals. The Unpack/Pack solution involves the following steps:

  1. Unpack each interval to the individual values that constitute the set.
  2. Compute a group identifier using a classic islands technique.
  3. Group the data by the group identifier to form the packed intervals.

Let’s start with Step 1. You’re supposed to unpack each interval to the individual values that constitute the interval set. Recall that an interval is the set of all values between the low and high values representing the interval delimiters. For example, given that our sample data uses an integer type for the period delimiters, Amy’s interval with fromperiod 5 and toperiod 8 should be unpacked to the set {5, 6, 7, 8}.

If you’re using Azure SQL or SQL Server 2022 or above, you can achieve this with the GENERATE_SERIES function, like so:

This query generates the following output:

If you don’t have access to the GENERATE_SERIES function, you can either create your own, or use a numbers table.

Step 2 involves computing a unique group identifier per packed interval. Observe that each packed interval contains a consecutive range of unpacked p values, possibly with duplicates. As an example, see Amy’s range between 5 and 9: {5, 6, 7, 7, 8, 8, 9}. The group identifier can be computed as the p value, minus the dense rank value (because there may be duplicates) based on p value ordering, within the student partition. Here’s the query that achieves this, showing also the dense rank values separately for clarity:

This query generates the following output:

You get a group identifier (grp_p) that is the same per packed interval group and unique per group.

Step 3 then groups the unpacked data by the student and the group identifier, and computes the packed interval delimiters with the MIN(p) an MAX(p) aggregates, like so:

This query generates the following desired output:

This technique is obviously not very optimal when the sets representing the intervals can contain large numbers of members. But it works quite well for small sets and is essential for handling the two-dimensional packing challenge.

Solution for Two-Dimensional Packing

Armed with the Unpack/Pack technique, you are ready to tackle the two-dimensional packing task. The trick is to realize that you can use it multiple times in the solution—once per dimension.

In Step 1, you unpack the two-dimensional date range, period range values to the individual date (d), period (p) values. It’s probably easiest to think of this step in graphical terms, as pixelating (thanks Paul for the terminology!) the rectangles representing each student class schedule from Figure 1 shown earlier. Here’s the code to achieve this step:

This query generates the following output:

You can use the following helper query to depict Step 1’s result graphically:

Figure 3 has the graphical result from the Spatial results tab in SSMS.

Figure 3: Pixelated Schedule

Now you need to decide on the normalization strategy that you want to use. Assuming you decide to use what I referred to earlier as the vertical cuts approach, you can proceed to Step 2. To apply vertical cuts, you pack period ranges per student and date. This is achieved by assigning the group identifier grp_p to each distinct range of consecutive p values per student and d group. Here’s the code that achieves this:

Here’s the output of the inner query defining the CTE PixelsAndPeriodGroupIDs, showing the computed grp_p value per pixel, if you will:

And here’s the output of outer query, showing the packed period ranges after grouping:

What’s important to note here is that you created packed period ranges per student and date. This is convenient to depict graphically using the following spatial query:

The output from the Spatial results tab in SSMS shown in Figure 4.

Figure 4: Daily Period Ranges

What you achieved here is sort of daily binning (again, thanks Paul for the terminology) of the packed period ranges. What’s left then in Step 3, is to pack consecutive date ranges per student and period range. Here’s the code to achieve this:

The main trick here is that in the second CTE (PeriodRangesAndDateGroupIDs), where you apply the classic islands technique a second time, when computing the dense rank value, you partition it by the student and the period range (student, MIN(p), MAX(p)), and order it by d.

Here’s the output of the second CTE’s inner query:

And here’s the output of the outer query, showing the final desired result with the normalized schedule:

If you wish to illustrate the result graphically, you can use the following helper spatial query:

The output in the Spatial results tab in SSMS is the same as what I’ve shown earlier in Figure 2 as the graphical depiction of the desired result.

Conclusion

Thanks again to Paul Wilcox for introducing the two-dimensional packing challenge. It’s a beautiful puzzler and I enjoyed very much working on it. I hope that’s the case for you too.

You saw how important it is to develop a toolbox of fundamental techniques, such as the classic islands technique, and based on it, the Unpack/Pack technique. Those were essential for solving this more complex challenge. You also saw how useful it can be to think of some tasks in graphical terms, and even utilize T-SQL spatial tools for this purpose.

Happy querying!

 

Load comments

About the author

Itzik Ben-Gan

See Profile

Itzik Ben-Gan is an independent T-SQL Trainer. A Microsoft Data Platform MVP (Most Valuable Professional) since 1999, Itzik has delivered numerous training events around the world focused on T-SQL Querying, Query Tuning and Programming. Itzik is the author of several books including T-SQL Fundamentals, T-SQL Querying and T-SQL Window Functions. He has written articles for sqlperformance.com, ITPro Today and SQL Server Magazine. Itzik’s speaking activities include PASS Data Community Summit, SQLBits, and various events and user groups around the world. Itzik developed and is regularly delivering his Advanced T-SQL Querying, Programming and Tuning and T-SQL Fundamentals courses.