Temporal Data Techniques in SQL

In the first part of this series on Temporal Data, Joe explained how it is that the Common Era calendar is irregular and mentioned that, although there are ANSI/ISO Standards for temporal operations in SQL, every vendor has something different. Now, he discusses other factors to take into account when using temporal data such as Holidays, and discusses a few techniques using Calendar, Report Usage and History tables

Holidays

Nobody agrees on holidays. There are the obvious international differences – the Fourth of July doesn’t have the same appeal to the UK that it does to the US, and most Americans have no idea what Bank holidays are, and why the banks deserve them. This can be a real problem in certain industries. The ocean freight and maritime industries publish a list of all the non-working holidays for every seaport on Earth. Air freight is not so bothered by this, but strong dock worker unions mean that if you show up in a port on the wrong day, you can pay extra docking fees. 

Holidays fall into two types. Some can be computed from a rule and some are simply declared by an authority. The declared holidays are usually civic ones and they might change.  The computed ones that involve lunar and/or solar cycles are mostly religious. You also have holidays that appear only once, such as a “National Day of Mourning” for the death of an important political figure. 

But even the familiar holidays are harder to compute than you think.  Every programmer is aware that Easter has a pretty elaborate formula. But nobody asks “Which Easter?” we are to compute.  Orthodox and Catholic (which is used by Protestants) dates do not agree.  For example, in 2008 the Catholic Easter was on March 10 and the Greek Orthodox Easter on April 14; but by 2010 they will both be on March 22.  You can get a side by side comparison at Side-by-side Easter calendar reference for the 21st century

Some holidays are longer than others. The obvious example is the Weekends.  That usually means Saturdays and Sundays in Western countries.  In the Islamic world, it means Fridays.  And probably in my lifetime, we will probably see the four day work week in some countries.  Then there is Hanukkah and Eid ul-Fitr, which last for several days. 

In short, holidays are just too complicated and arbitrary to put in a function, VIEW or a query.  I wish that there was even an ISO Standard for just classifying types of holidays.  What this means is that you cannot help but use a Calendar table for holidays and you have to do most of the work for yourself.  We will get to that in another section. 

Leap Seconds and Trivia  

Be grateful that most commercial work is done at the level of days and perhaps time of day, and not to the accuracy of decimal fractions of seconds. That leads to the choice of atomic versus the astronomical clocks, if you have to co-ordinate events at the nanosecond level.  Scientific work has to cope with leap-seconds.  These are positive or negative corrections defined by the International Earth Rotation and Reference System Service to keep the atomic clocks within 0.9 seconds of the planet’s actual rotation.  They announced on 2008 July 4 that the world’s atomic clocks will be adding another second at the end of this year.  I hope this does not spoil your New Years Eve plans. 

Calendar Tables

The simplest Calendar is just a single column with the dates for a particular range.  Since it is all key, we know that It is as normalized as it can get.  Since there are 365.2422 days per year, a 100 year calendar is pretty small and would fit into main storage in most modern machines.  It is also of pretty limited use without other data that we want to look up. 

Since we already talked about holidays, the first impulse is to add a second column with a flag for “work day” or “holiday” to the table, something like this skeleton. 

The most common calculations you need are the number of business days between two dates or the nearest business day to a given single date.  You can do this with the integer flag and earlier SQLs with a skeleton like this:

This interferes with the ability to get other data out of Foobar without adding GROUP BY clauses.  Today you can use the windowed aggregates functions – that is the new ones with an OVER() clause. 

This is still not as good as we can make it.  Let me take time out to discuss the Julian date, which is often misused.  Julian date is technically a count of days used by astronomers for observations (http://en.wikipedia.org/wiki/Julian_date).  It is a large integer number and has some rules for computing it.  Only astronomers care about it, and the rest of us get it confused with a Julian-IZED date,

The Julianized date is a count of days from a base day, used in business.  You see it as the “day within year” number on the bottom of the pages of a desk calendar.  If your time frame is a calendar year, it will be between 1 and 365 or 366 days.  It is called the ordinal day format (“yyyy-ddd”) in ISO-8601.  But you can Julianize any time frame you wish such as the entire range covered by your Calendar table.  Instead of a flag, you simply repeat the Julianized number for holidays and use integer math to get the counts.  Here is a skeleton. 

To compute the business days from ‘2007-04-05’, Thursday before Easter in 2007 to ‘2007-04-10’, the next Tuesday:

This saves having to scan all of the rows within the range to get a sum.  The -1 is to avoid counting the current day as one of the business days, but you might have a different rule in your company.  

Report Range Tables

A common application is have special reporting periods, perhaps based on the fiscal calendar or business defined events (“Annual Going out Of Business Sale!” was one of my favorites). 

These reports can overlap; a fiscal quarter will be contained in the range of its fiscal year.  There can be gaps between them; we have to wait a year between each “Annual Going out Of Business Sale!” and there might be long stretches of time without any special sales.  But we want to know their ranges so that the table is fairly constant once it is created. 

The basic skeleton for use with these tables is

The <<report name list>> would probably be events nested inside each other, like fiscal quarters inside a fiscal year.  While this is a useful and portable programming trick, you need to consider replacing it with the newer OLAP extensions to the GROUP BY clause such as ROLLUP and CUBE. 

History Tables

Another form of calendar table records the history of events.  You will see history tables used in Data Warehouses and queries dealing with things that have duration. If you go to the French Quarter in New Orleans, there is a lamp post with a historical plaque on it which gives the years of the Spanish domination, French domination, American domination, Confederate domination and finally, the second American domination.  The last entry has a blank space to be filled in later and empty room on the plaque beneath that line. 

Now look at the NULL-able end_date in the skeleton that follows.  It is useful because SQL does not have a token for “Eternity” to show that something is not yet finished like New Orleans.  Using a dummy date in the distant future for “Eternity” has portability problems.  And it can mess up temporal math if you don’t catch it in CASE expressions.  There should be, at most, one NULL for each item in this inventory history to model the current price.  It is also possible that we know a future ending data for an item – look at the expiration dates on food.  

To make this easier to see, let me use a history table for price changes.  So a basic history table looks like this skeleton:

The first three uniqueness constraints are fairly obvious.  The EAN and the start of a price have to be unique and not NULL-able so we have a natural primary key.  The ending dates, current and previous, might have NULLs, so we need to use a UNIQUE constraint. 

The next constraints give an ordering to each event, namely, price_prev_date is on or before price_start_date, which is on or before price_end_date.  The reason for not putting this into a single BETWEEN predicate is that each constraint will have a name in production code that will show up in error messages so we want to be exact. 

The self-referencing constraint is a trick from Alex Kuznetsov.  It says that all the previous price ending dates were really ending dates for some time period.  You will want to play with options to get them to fit your own business rules. 

It is also a good idea to have a VIEW with the current data:  

You use a BETWEEN predicate to get the appropriate price for a given order on a particular date. 

You can also enforce the “one NULL per item” with a trigger.  If your SQL has CREATE ASSERTION or a full implementation of the CHECK() constraints in SQL-92, you can do it with declarative code.  But there is a sneaky trick that uses the WITH CHECK OPTION on a VIEW to fake it; here is the skeleton:

This forces you to update through the VIEW so that the WITH CHECK OPTION is always used. However, if you go directly to the base table, it can be circumvented.  Now your problem is to write a stored procedure that will update the table and insert a new row.  You can do this with a single MERGE statement, or with a short block of procedural code:

This will make the price change go into effect tomorrow.

In his next article, Joe gets distracted from temporal data by reading the Manga Guide to Databases, and decides, instead, to write about relational division.