Creative Solutions by Using a Number Table

One irritating thing for the occasional SQL Server Developer is that the experts assume that you are familiar with the use of number tables when they write about techniques. We therefore asked Greg Larsen to spell it out in a way that would help the beginner and intrigue those already familar with the technique.

The SQL Server database engine does some things very well, but it is not optimized for iterative coding, such as stepping through a recordset one row at time: Therefore, it is always best to think of ways to perform a series of set-based operations to arrive at your resultset. In this article, I am going to explore the use of a number table, one that contains nothing more than a sequential set of numbers, to help resolve a couple of common business situations.

The Number Table

Every database should have a number table. A number table is just that; a table that contains a single column whose value is a unique number. The table contains a series of rows where the numbers start at 0, or 1, and increments by 1 to some define limit such as 1,000 or 10,000 depending how you plan to use the table. The range of numbers that you will need in your number table depends on what problem you are trying to solve.

There are a number of different methods to generate a table of numbers. Below is just one example that uses a CTE to populate a table named “Number” with 10,000 different numbers, where the first number starts at 1. I cannot take credit for developing this solution for generating a Number table. I originally got this code from a post by Itzik Ben-Gan.

This particular example can be use to populate a table that has number values from 1 to 4,924,967,296. All you need to do is change the “SELECT TOP” value in the INSERT statement to control the maximum count of numbers you want to generate. I think creating a 4 billion-row numbers table would be more than any one might need, but then again maybe not. Therefore, if you needed to create one with more the 5 billion rows just add a “L6” that does a CROSS JOIN to “L5”.

Now that we have a simple method to create a number table, let me go through a couple of examples that use this method to populate a number table, then use the table to quickly resolve a business problem using a set based solution.

Finding the Missing Dates

This example deals with how to find the missing dates in a set of records using a Number table. To provide some context to this example let us assume this is your business problem:

You are running a room rental business, known as “Rooms for Rent”. Each room is rented based on a specific date. Your database has a table named “RoomRent” that contains individual records for each day a room is rented. This table contains the name of the room and the date the room is rented. A new customer is trying to rent your “QuarterMaster” room and wants to know which dates are available, between 3/14/2011 and 3/18/2011. One last requirement is that you have is your can only make room rental reservations up to a year out (365 days).

To support this scenario let me first populate our RoomRent table and our Number table for this example. The RoomRent table will contain three different room rentals during the above-mentioned timeframe, two for the “QuarterMaster” room (3/15/2011, and 3/14/2011), and one for a room named “Commencement” (3/14/2011).

In order to solve this problem I will need a number table. The number table will make it easy to find days when the “Quartermaster” room is not rented. Your business requirement will dictate how many numbers you will need in your number table. Since rooms are only rented 365 days in advance, I will only need to populate my number table with 365 different number values. Here is the code to populate both the RoomRent table and the Number table:

Now that we have our tables to support our example, let me jump in and show you how to find the available rental dates for the “Quartermaster” room between 3/14/2010 and 3/18/2010. Here is the code to do that:

Now let me walk through this code.

To determine the available reservations dates, I needed to determine the number of days between the starting and ending period of the reservation request. In my scenario, the begindate is 3/14/2010 the enddate is 3/18/2010. To help to drive my query above I defined two local variables (@BeginDate and @EndDate) and set them to these dates. I then used these variables to select records from my Number table where the number of records returned was the number of dates between the @BeginDate and @EndDate, including the begin and ending dates in that calculation. In this example, that is 5 days, and the number values for the records returned start from 1 and go to 5. Here is the snippet of code from the query above that returned those Number table records:

Here you can see that I used the value of the N column (1, 2, 3, 4 or 5) in the Number table to calculate all the possible reservation dates, which I call CheckDate. I did this by calling the DATEADD function and using the N value from the Number table and the @BeginDate variable. To select only the first sequential five numbers from the Number table I used a WHERE clause. In that WHERE clause, I calculated the number of dates between @BeginDate and @EndDate and then selected a Number table records only if N was less than or equal to the number of dates between these dates. Which I already mentioned in this example is 5 days.

Now that I have all the possible reservation dates, it was a simple matter of performing a RIGHT OUTER JOIN with the RoomRent records. Anytime this join operation produced a NULL value for the “ReservationDate“, this was a day when the “QuarterMaster” room was not rented. Here are the results of the complete query above:

By using a number table to generate a few dates for the reservation date range in this scenario it became quite an easy task to identify the available dates when the “QuarterMaster” room would be available. A Number table can very useful in lots of different date range calculation. Let me move on and explore another situation where a number table helps to provide a simple solution.

Splitting a String based on Delimiter

If you have been writing T-SQL code for a while, you probably have come across a time when you need to take a string and split it apart into two different strings or a series of strings. In this example, I will be taking a string of people’s names and split them apart based on a delimiter character, a semicolon (“;”). Once again, let me put this example in context of the following business problem:

You have an event system that records events. For each event, your system records the individuals affected by the event. Each event might have one, two, or three individuals affected by an event. The individuals affected by an event are stored, unfortunately, in a single denormalised column within an event record, separated by a semicolon. Without being able to change the existing design of the database, you need to write a SELECT statement to display the event information where each individual is displayed on a separate row along with the event information.

This is extremely easy to do by using a Number table. The number values will be used to identify the positions of the semicolons. In order to set up this example the following script will be used to create three different event records. The first event record has 3 individual associated with it, the second only two individuals and the last event only has a single individual attached to it:

In addition to creating my event records, I also need to create a number table to support my business case. My Number table will be used to find semicolons. Since my code will need to use the Number table to look at each character in the “ImpactedIndividuals” column I need to have enough numbers to support the size of this column. Since this column is 150 characters in size, I will be generating a Numbers table that holds 200 different numbers, in reality I could have gotten by with a with only 151 numbers. Here is the code to generate my Number table:

To split apart the “ImpactedIndividuals” column, the Number table will be CROSS JOIN‘d with my Event records. I will only return Number records where the number value represents the column offset of the semicolon in the “ImpactedIndividuals” column of the Event record. Rather than show you the final code for this example, I will show you fragments of code that I will build upon to create the final example that will meet our requirement, as identified in the business case above.

This first code block will show you how to associate a number with the beginning position of each name in the “ImpactedIndividuals” column:

When I run this code, I get the following output:

The first character of each name was identify by performing a CROSS JOIN between my Event and Number table then using a WHERE constraint to only return those rows where “N” identified the positional offset of a semicolon. Note that I had to add a semi-column to the beginning of the “ImpactedIndividuals” column, so I could identify the character offset of the first name in this column, which will be 1 of course. This also allowed my code to identify the offset of each subsequent name as well. Keep in mind the value identified is the true column offset values for the semi-colon, without the concatenated semi-colon at the beginning.

Let me dig into this code a little closer so you can understand what is going on. First, the CROSS JOIN operation associated all 200 numbers, in my Number table, with each row in my Event table. Here is a sample of the first 15 rows returned from that CROSS JOIN operation:

If you looked at all the records, you would see that all 200 numbers would be associated with each one of my rows in my Event table. Thus, causing the result set of this CROSS JOIN operation to have 600 rows. Remember now, I only need to identify the starting offset of each name. Therefore in my query above I used a WHERE constraint to only return rows where the character offset identify by N (the value from the Number table) is a semicolon.  This constraint was performed using the following SUBSTRING function:

Having this WHERE constraint allowed me to create a record set that had one row for each name, and the value of N, which identified the starting offset of the name.

Now by modifying the above query slightly and adding a CHARINDEX function I can identify the ending offset of each name. Here is my slightly modified query:

In this query, I have identified the starting offset of each name by using the column alias of “BeginName” for the value “N”. To identify the ending offset of the name, I used the CHARINDEX function. In that CHARINDEX function I used N to identify where to start searching for the semi-colon. The position of “N” in the “ImpactedIndividuals” column just happens to be to first character of the next name immediately following the semi-colon. Since the last name in the “ImpactedIndividuals” column does not have a semicolon after it, I had to concatenate a semicolon to the end of this column value. This allowed me to find the ending offset of the last name. When this CHARINDEX function executes it will return the location of the semicolon after each name. Therefore, I subtract 1 from this value to identify the actual offset for the ending of the name, and then assign a column alias name of “EndName” to identify this value. Now we have both the beginning and ending offset of the impacted individuals.

I can now use the code above within my final query to meet our business requirement. This requirement was to return event information where each individual was identified on a separate row. Of course I had to modify the code above slightly so I could return the EventDate and EventComment columns. Here is my final query, which meets my business requirement:

Here is the output from this query:

By reviewing this query, you can see that I used the prior query as a table alias named “A“. I also used the “BeginName” and “EndName” columns within a SUBSTRING function to parse apart the “ImpactedIndividuals” column value to identify each impacted individual.

Using a number table made it quite simple to parse apart the “ImpactedIndividuals” column based on the presence of a semicolon. Without knowing this technique, it would be easy to fall into trying to use procedural logic to loop through the “ImpactedIndivdual” one character and records one at a time using a cursor.

Note: in SQL Server 2016, there is a built-in function to do this.

The Value of a Number Table

There are many creative uses for a Number table. I have demonstrated just two of those many uses. Using a number table can eliminate the complicated looping mechanisms needed to support your application logic with a set based solution. My simple examples show the value that a number table can bring to your arsenal of T-SQL tricks. Next time you are puzzled with how to identify, count and/or parse a set of records, see if using a number table provides you a more elegant solution.