T-SQL Tuesday #104: Code You Would Hate To Live Without (Relative Positioning in Date Table)

T-SQL Tuesday

Today’s blog post is in response to Bert Wagner’s invitation here: https://bertwagner.com/2018/07/03/code-youd-hate-to-live-without-t-sql-tuesday-104-invitation/. For me, the answer was pretty simple. There is one piece of code that I have written that I find very useful, and even more… it was my (as far as I know, my own idea!)  In my life, I have obviously learned a tremendous amount about building and designing databases on a relational engine. I have written books about it, as you may know. While I am proud about this, all it generally means is that I took the work of so many other people and arranged it into 700 pages.

The solution is part of my calendar/date dimension code, and it is used to do relative positioning over date periods. For example, say you have the need to get data from the 10 days. You can definitely use a simple between to filter the rows, and a bunch of date functions to group by year, month, etc., generally all of the “normal” groupings. But using a calendar table allows you to prebuild a set of date calculations that make the standard values easier to get, and non-standard groupings possible. The technique I will cover makes moving around in the groupings more easily accessible. Like if you want data from the last 3 complete months. The query to do this isn’t rocket science, but it isn’t exactly straightforward either.

For the example, I will use the calendar table that I have on my website here: https://drsql.org/code in the download SimpleDateDimensionCreateAndLoad, and will load it with data up until 2020. Here is that structure:

You can join this table to your data, either on the CalendarId (which is a smart key like YYYYMMDD), or the actual date value (and you can change the key structure to meet your needs very simply. This is code I could not live without because I, and people I work with, do a lot of queries where we compare time periods, such as “compare this year’s sales period with 1, 2, and/or 3 years ago.” and sales period does not correspond to any calendar that any other organization has created. We also commonly need to do this via a selectable parameter, where the user can decide the number of periods to compare.

Now say you want to get the dates for the last 5 days. A typical (and definitely not wrong) thing you can do, is do date math on the current date and time using SYSDATETIME() or GETDATE(), though SYSDATETIME() is the newer alternative with extra precision that means nothing in this example:

This returns:

dateValue
----------
2018-07-01
2018-07-02
2018-07-03
2018-07-04
2018-07-05

Which works well enough (though it is a bit concerning in a quick look that DateValue is a date and we are comparing it do datetime2(7) values, as shown, it will work as expected and include the current day and 4 previous days.) Now you can join to your table, such as a sales table:

It’s a small table, so joining on the function, while not ideal, will be better than the alternative of trying to calculate the various factors in the Calendar table row by row. Now you can filter and group by the year, month, fiscal month, etc.

But now, what if you want something more complex. Like the previous last 3 full months? Using EOMONTH(), we could put together some criteria to make that happen for sure. Now consider any period of time where “real world” date math does not work at all. Like your corporate fiscal year, which doesn’t start on Jan 1. Or perhaps a fiscal month, which doesn’t start/end on the first and last day of a month. You might even have time periods that are not contiguous, like sales periods (which is what I will give as example code).

This is where my bit of code comes into play. Notice in the code it loaded:

Looking at the past 14 days, which cross a month and week boundaries because of when I am writing this (July 5):

This returns:

DateValue  RelativeDayCount RelativeWeekCount RelativeMonthCount
---------- ---------------- ----------------- ------------------
2018-06-22 96961            13851             3185
2018-06-23 96962            13851             3185
2018-06-24 96963            13852             3185
2018-06-25 96964            13852             3185
2018-06-26 96965            13852             3185
2018-06-27 96966            13852             3185
2018-06-28 96967            13852             3185
2018-06-29 96968            13852             3185
2018-06-30 96969            13852             3185
2018-07-01 96970            13853             3186
2018-07-02 96971            13853             3186
2018-07-03 96972            13853             3186
2018-07-04 96973            13853             3186
2018-07-05 96974            13853             3186

Notice the RelativeDayCount is a unbroken sequence. The week counts include 7 days for 13852 (this started counting from 1753-01-01 for some very arbitrary, long forgotten, reason), and the month boundary changes on the first. By using these groupings, which are sortable, we can use simple integer comparisons, typically using equality or between. Like if we want the days of the previous week:

The current date is July 5, and the first day of this month was July 1. So you can see the previous week ended on the 30th.

DayName    DateValue
---------- ----------
Sunday     2018-06-24
Monday     2018-06-25
Tuesday    2018-06-26
Wednesday  2018-06-27
Thursday   2018-06-28
Friday     2018-06-29
Saturday   2018-06-30

But as I said, the beauty of this technique is dealing with periods that cannot be done with simple date math. Lets set up a scenario of a company that needs to report on activity during given sales periods. I will add a couple of columns to set up this scenario. This company has a weekly sale that regularly lasts from Tues-Thurs, most weeks (on the week of June 17, the sale went from Thursday-Sunday for Father’s day instead.) You want to join to your data table to see sales on the day of the sale, as well as the days between each sale to see if a sale influenced following sales.

So I am going to add 3 columns, the first is the relative positioning, and the last two are the kind of columns you would regularly wish to have in a calendar table to indicate a time period.

Now, I will load the data, in a very manual manner, but if there was an algorithm that could easily be formulated, this technique probably would never have been born. 

So now, with this data you can query the data pretty easily and get the most recent sale by getting the value that is currently set, or get previous sale periods by moving to the previous boundary. For example, show me the dates for the previous two sales periods:

This returns the following:

From here, you should be able to pick out the data you want in your summaries. You just want to sum sales for days of the sale: ActualSaleDayFlag = 1. Non-sales days: ActualSaleDayFlag = 0. But getting a number of periods back is done simply by varying an integer factor.