03 August 2010

45,734 views

37 0

Creative Solutions by Using a Number Table

One irritating thing for the occasional SQL Server Developer is that the experts assume that you are familiar with the use of number tables when they write about techniques. We therefore asked Greg Larsen to spell it out in a way that would help the beginner and intrigue those already familar with the technique.

The SQL Server database engine does some things very well, but it is not optimized for iterative coding, such as stepping through a recordset one row at time: Therefore, it is always best to think of ways to perform a series of set-based operations to arrive at your resultset. In this article, I am going to explore the use of a number table, one that contains nothing more than a sequential set of numbers, to help resolve a couple of common business situations.

The Number Table

Every database should have a number table. A number table is just that; a table that contains a single column whose value is a unique number. The table contains a series of rows where the numbers start at 0, or 1, and increments by 1 to some define limit such as 1,000 or 10,000 depending how you plan to use the table. The range of numbers that you will need in your number table depends on what problem you are trying to solve.

There are a number of different methods to generate a table of numbers. Below is just one example that uses a CTE to populate a table named “Number” with 10,000 different numbers, where the first number starts at 1. I cannot take credit for developing this solution for generating a Number table. I originally got this code from a post by Itzik Ben-Gan.

IF OBJECT_ID('Number') IS NULL BEGIN

CREATE TABLE Number (

N INT CONSTRAINT Number_PK PRIMARY KEY CLUSTERED(N)

);

WITH

L0 AS(SELECT 1 AS C UNION ALL SELECT 1 AS O), -- 2 rows

L1 AS(SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), -- 4 rows

L2 AS(SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B), -- 16 rows

L3 AS(SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows

L4 AS(SELECT 1 AS C FROM L3 AS A CROSS JOIN L3 AS B), -- 65,536 rows

L5 AS(SELECT 1 AS C FROM L4 AS A CROSS JOIN L4 AS B), -- 4,294,967,296 rows

Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L5)

INSERT INTO Number SELECT TOP 10000 N FROM Nums ORDER BY N;

END

This particular example can be use to populate a table that has number values from 1 to 4,924,967,296. All you need to do is change the “SELECT TOP” value in the INSERT statement to control the maximum count of numbers you want to generate. I think creating a 4 billion-row numbers table would be more than any one might need, but then again maybe not. Therefore, if you needed to create one with more the 5 billion rows just add a “L6” that does a CROSS JOIN to “L5”.

Now that we have a simple method to create a number table, let me go through a couple of examples that use this method to populate a number table, then use the table to quickly resolve a business problem using a set based solution.

Finding the Missing Dates

This example deals with how to find the missing dates in a set of records using a Number table. To provide some context to this example let us assume this is your business problem:

You are running a room rental business, known as “Rooms for Rent”. Each room is rented based on a specific date. Your database has a table named “RoomRent” that contains individual records for each day a room is rented. This table contains the name of the room and the date the room is rented. A new customer is trying to rent your “QuarterMaster” room and wants to know which dates are available, between 3/14/2011 and 3/18/2011. One last requirement is that you have is your can only make room rental reservations up to a year out (365 days).

To support this scenario let me first populate our RoomRent table and our Number table for this example. The RoomRent table will contain three different room rentals during the above-mentioned timeframe, two for the “QuarterMaster” room (3/15/2011, and 3/14/2011), and one for a room named “Commencement” (3/14/2011).

In order to solve this problem I will need a number table. The number table will make it easy to find days when the “Quartermaster” room is not rented. Your business requirement will dictate how many numbers you will need in your number table. Since rooms are only rented 365 days in advance, I will only need to populate my number table with 365 different number values. Here is the code to populate both the RoomRent table and the Number table:

SET NOCOUNT ON; IF OBJECT_ID('RoomRent') IS NULL

BEGIN

CREATE TABLE RoomRent (

RoomName VARCHAR(50)

,ReserveDate DATETIME);

INSERT INTO RoomRent VALUES ('QuarterMaster','3/15/2011')

INSERT INTO RoomRent VALUES ('QuarterMaster','3/17/2011')

INSERT INTO RoomRent VALUES ('Commencement','3/14/2011')

END

IF OBJECT_ID('Number') IS NULL

BEGIN

CREATE TABLE Number (

N INT CONSTRAINT Number_PK PRIMARY KEY CLUSTERED(N)

);

WITH

L0 AS(SELECT 1 AS C UNION ALL SELECT 1 AS C), -- 2 rows

L1 AS(SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), -- 4 rows

L2 AS(SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B), -- 16 rows

L3 AS(SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows

L4 AS(SELECT 1 AS C FROM L3 AS A CROSS JOIN L3 AS B), -- 65,536 rows

Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L4)

INSERT INTO Number SELECT TOP 365 N FROM Nums ORDER BY N;

END

Now that we have our tables to support our example, let me jump in and show you how to find the available rental dates for the “Quartermaster” room between 3/14/2010 and 3/18/2010. Here is the code to do that:

DECLARE @BeginDate DATETIME DECLARE @EndDate DATETIME

SET @BeginDate = '3/14/2011'

SET @EndDate = '3/18/2011'

SELECT * FROM

(SELECT * FROM RoomRent

WHERE RoomName = 'QuarterMaster'

AND ReservationDate BETWEEN @BeginDate AND @EndDate ) R

RIGHT OUTER JOIN

(SELECT DATEADD(DD,N-1,@BeginDate) AS CheckDate FROM Number

WHERE N <= DATEDIFF(DD,@BeginDate,@EndDate)+1) A

ON ReservationDate = CheckDate

WHERE R.ReservationDate IS NULL;

Now let me walk through this code.

To determine the available reservations dates, I needed to determine the number of days between the starting and ending period of the reservation request. In my scenario, the begindate is 3/14/2010 the enddate is 3/18/2010. To help to drive my query above I defined two local variables (@BeginDate and @EndDate) and set them to these dates. I then used these variables to select records from my Number table where the number of records returned was the number of dates between the @BeginDate and @EndDate, including the begin and ending dates in that calculation. In this example, that is 5 days, and the number values for the records returned start from 1 and go to 5. Here is the snippet of code from the query above that returned those Number table records:

1 2	SELECT DATEADD(DD,N-1,@BeginDate) AS CheckDate FROM Number WHERE N <= DATEDIFF(DD,@BeginDate,@EndDate)+1

Here you can see that I used the value of the N column (1, 2, 3, 4 or 5) in the Number table to calculate all the possible reservation dates, which I call CheckDate. I did this by calling the DATEADD function and using the N value from the Number table and the @BeginDate variable. To select only the first sequential five numbers from the Number table I used a WHERE clause. In that WHERE clause, I calculated the number of dates between @BeginDate and @EndDate and then selected a Number table records only if N was less than or equal to the number of dates between these dates. Which I already mentioned in this example is 5 days.

Now that I have all the possible reservation dates, it was a simple matter of performing a RIGHT OUTER JOIN with the RoomRent records. Anytime this join operation produced a NULL value for the “ReservationDate“, this was a day when the “QuarterMaster” room was not rented. Here are the results of the complete query above:

RoomName ReservationDate CheckDate -------- --------------- -----------------------

NULL NULL 2011-03-14 00:00:00.000

NULL NULL 2011-03-16 00:00:00.000

NULL NULL 2011-03-18 00:00:00.000

By using a number table to generate a few dates for the reservation date range in this scenario it became quite an easy task to identify the available dates when the “QuarterMaster” room would be available. A Number table can very useful in lots of different date range calculation. Let me move on and explore another situation where a number table helps to provide a simple solution.

Splitting a String based on Delimiter

If you have been writing T-SQL code for a while, you probably have come across a time when you need to take a string and split it apart into two different strings or a series of strings. In this example, I will be taking a string of people’s names and split them apart based on a delimiter character, a semicolon (“;”). Once again, let me put this example in context of the following business problem:

You have an event system that records events. For each event, your system records the individuals affected by the event. Each event might have one, two, or three individuals affected by an event. The individuals affected by an event are stored, unfortunately, in a single denormalised column within an event record, separated by a semicolon. Without being able to change the existing design of the database, you need to write a SELECT statement to display the event information where each individual is displayed on a separate row along with the event information.

This is extremely easy to do by using a Number table. The number values will be used to identify the positions of the semicolons. In order to set up this example the following script will be used to create three different event records. The first event record has 3 individual associated with it, the second only two individuals and the last event only has a single individual attached to it:

SET NOCOUNT ON; IF OBJECT_ID('Event') IS NULL

BEGIN

CREATE TABLE Event (

ID INT IDENTITY

,ImpactedIndividuals VARCHAR(150)

,EventDate DATETIME

,EventComment VARCHAR(MAX));

INSERT INTO Event (ImpactedIndividuals, EventDate, EventComment)

VALUES ('Stan Smith;Doris Russell;Don Catalina','5/15/2010','Received CPR training');

INSERT INTO Event (ImpactedIndividuals, EventDate, EventComment)

VALUES ('Ryan Jone;Sam Hayden','5/18/2010','Attended SSIS presentation');

INSERT INTO Event (ImpactedIndividuals, EventDate, EventComment)

VALUES ('Dan Johnson','5/22/2010','Received 10 year Service Award');

END

In addition to creating my event records, I also need to create a number table to support my business case. My Number table will be used to find semicolons. Since my code will need to use the Number table to look at each character in the “ImpactedIndividuals” column I need to have enough numbers to support the size of this column. Since this column is 150 characters in size, I will be generating a Numbers table that holds 200 different numbers, in reality I could have gotten by with a with only 151 numbers. Here is the code to generate my Number table:

IF OBJECT_ID('Number') IS NULL

BEGIN

CREATE TABLE Number (

N INT CONSTRAINT Number_PK PRIMARY KEY CLUSTERED(N)

);

WITH

L0 AS(SELECT 1 AS C UNION ALL SELECT 1 AS C), -- 2 rows

L1 AS(SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), -- 4 rows

Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L1)

INSERT INTO Number SELECT TOP 200 N FROM Nums ORDER BY N;

END

To split apart the “ImpactedIndividuals” column, the Number table will be CROSS JOIN‘d with my Event records. I will only return Number records where the number value represents the column offset of the semicolon in the “ImpactedIndividuals” column of the Event record. Rather than show you the final code for this example, I will show you fragments of code that I will build upon to create the final example that will meet our requirement, as identified in the business case above.

This first code block will show you how to associate a number with the beginning position of each name in the “ImpactedIndividuals” column:

SELECT ID AS EventID ,ImpactedIndividuals

,N AS BeginName

FROM Event CROSS JOIN Number

WHERE SUBSTRING(';'+ImpactedIndividuals ,N,1) = ';'

When I run this code, I get the following output:

EventID ImpactedIndividuals BeginName ------- ------------------------------------------------ -----------

1 Stan Smith;Doris Russell;Don Catalina 1

1 Stan Smith;Doris Russell;Don Catalina 12

1 Stan Smith;Doris Russell;Don Catalina 26

2 Ryan Jone;Sam Hayden 1

2 Ryan Jone;Sam Hayden 11

3 Dan Johnson 1

The first character of each name was identify by performing a CROSS JOIN between my Event and Number table then using a WHERE constraint to only return those rows where “N” identified the positional offset of a semicolon. Note that I had to add a semi-column to the beginning of the “ImpactedIndividuals” column, so I could identify the character offset of the first name in this column, which will be 1 of course. This also allowed my code to identify the offset of each subsequent name as well. Keep in mind the value identified is the true column offset values for the semi-colon, without the concatenated semi-colon at the beginning.

Let me dig into this code a little closer so you can understand what is going on. First, the CROSS JOIN operation associated all 200 numbers, in my Number table, with each row in my Event table. Here is a sample of the first 15 rows returned from that CROSS JOIN operation:

EventID ImpactedIndividuals N ------- ---------------------------------------------- -----------

1 Stan Smith;Doris Russell;Don Catalina 1

1 Stan Smith;Doris Russell;Don Catalina 2

1 Stan Smith;Doris Russell;Don Catalina 3

1 Stan Smith;Doris Russell;Don Catalina 4

1 Stan Smith;Doris Russell;Don Catalina 5

1 Stan Smith;Doris Russell;Don Catalina 6

1 Stan Smith;Doris Russell;Don Catalina 7

1 Stan Smith;Doris Russell;Don Catalina 8

1 Stan Smith;Doris Russell;Don Catalina 9

1 Stan Smith;Doris Russell;Don Catalina 10

1 Stan Smith;Doris Russell;Don Catalina 11

1 Stan Smith;Doris Russell;Don Catalina 12

1 Stan Smith;Doris Russell;Don Catalina 13

1 Stan Smith;Doris Russell;Don Catalina 14

1 Stan Smith;Doris Russell;Don Catalina 15

If you looked at all the records, you would see that all 200 numbers would be associated with each one of my rows in my Event table. Thus, causing the result set of this CROSS JOIN operation to have 600 rows. Remember now, I only need to identify the starting offset of each name. Therefore in my query above I used a WHERE constraint to only return rows where the character offset identify by N (the value from the Number table) is a semicolon. This constraint was performed using the following SUBSTRING function:

1	SUBSTRING(';'+ImpactedIndividuals ,N,1) = ';'

Having this WHERE constraint allowed me to create a record set that had one row for each name, and the value of N, which identified the starting offset of the name.

Now by modifying the above query slightly and adding a CHARINDEX function I can identify the ending offset of each name. Here is my slightly modified query:

SELECT ID AS EventID ,ImpactedIndividuals, N AS BeginName

,CHARINDEX(';',ImpactedIndividuals + ';', N)-1 AS EndName

FROM Event CROSS JOIN Number

WHERE SUBSTRING(';'+ImpactedIndividuals ,N,1) = ';'

In this query, I have identified the starting offset of each name by using the column alias of “BeginName” for the value “N”. To identify the ending offset of the name, I used the CHARINDEX function. In that CHARINDEX function I used N to identify where to start searching for the semi-colon. The position of “N” in the “ImpactedIndividuals” column just happens to be to first character of the next name immediately following the semi-colon. Since the last name in the “ImpactedIndividuals” column does not have a semicolon after it, I had to concatenate a semicolon to the end of this column value. This allowed me to find the ending offset of the last name. When this CHARINDEX function executes it will return the location of the semicolon after each name. Therefore, I subtract 1 from this value to identify the actual offset for the ending of the name, and then assign a column alias name of “EndName” to identify this value. Now we have both the beginning and ending offset of the impacted individuals.

I can now use the code above within my final query to meet our business requirement. This requirement was to return event information where each individual was identified on a separate row. Of course I had to modify the code above slightly so I could return the EventDate and EventComment columns. Here is my final query, which meets my business requirement:

SELECT EventID ,SUBSTRING(ImpactedIndividuals,BeginName,EndName-BeginName+1) AS ImpactedIndividual

,EventDate

,EventComment

FROM (

SELECT ID AS EventID

,ImpactedIndividuals, N AS BeginName

,CHARINDEX(';',ImpactedIndividuals + ';', N)-1 AS EndName

,EventDate

,EventComment

FROM Event CROSS JOIN Number

WHERE SUBSTRING(';'+ImpactedIndividuals ,N,1) = ';') A;

Here is the output from this query:

EventID ImpactedIndividual EventDate EventComment -------- ----------------------------- ----------------------- ---------------------------------

1 Stan Smith 2010-05-15 00:00:00.000 Received CPR training

1 Doris Russell 2010-05-15 00:00:00.000 Received CPR training

1 Don Catalina 2010-05-15 00:00:00.000 Received CPR training

2 Ryan Jone 2010-05-18 00:00:00.000 Attended SSIS presentation

2 Sam Hayden 2010-05-18 00:00:00.000 Attended SSIS presentation

3 Dan Johnson 2010-05-22 00:00:00.000 Received 10 year Service Award

By reviewing this query, you can see that I used the prior query as a table alias named “A“. I also used the “BeginName” and “EndName” columns within a SUBSTRING function to parse apart the “ImpactedIndividuals” column value to identify each impacted individual.

Using a number table made it quite simple to parse apart the “ImpactedIndividuals” column based on the presence of a semicolon. Without knowing this technique, it would be easy to fall into trying to use procedural logic to loop through the “ImpactedIndivdual” one character and records one at a time using a cursor.

Note: in SQL Server 2016, there is a built-in function to do this.

1 2	DECLARE @tags VARCHAR(200) = 'Yan,Tyan,Tethera,Methera,Pimp,Sethera,Lethera,Hovera,Dovera,Dik,Yanadik,Tyanadik,Tethera dik,Methera dik,Bumfitt,Yanabumfit,Tyanabumfitt,Tetherabumfitt,Metherabumfitt,Giggot' SELECT value FROM STRING_SPLIT(@tags, ',')

The Value of a Number Table

There are many creative uses for a Number table. I have demonstrated just two of those many uses. Using a number table can eliminate the complicated looping mechanisms needed to support your application logic with a set based solution. My simple examples show the value that a number table can bring to your arsenal of T-SQL tricks. Next time you are puzzled with how to identify, count and/or parse a set of records, see if using a number table provides you a more elegant solution.

Creative Solutions by Using a Number Table

The Number Table

Finding the Missing Dates

Splitting a String based on Delimiter

The Value of a Number Table

Subscribe for more articles

Rate this article

Greg Larsen

Related articles

Database Subsetting and Data Extraction

What’s new in T-SQL in SQL Server 2022

Snake draft sorting in SQL Server, part 1

Tags