Converting a History Table into a System-Versioned Temporal Table

Temporal tables were introduced with SQL Server 2016. To take advantage of this feature, you might start fresh with a brand new table. It is possible, however, to convert an existing table with accumulated history to the new functionality. In this article, Mala Mahadevan explains how to create temporal tables for both scenarios.

SQL Server has had many different methods to track changes to data. There has been old-fashioned trigger-based logging, Change Data Capture, and Change Tracking. All of these features allow you to see how data has changed with varying degrees of how far you can go back historically to see the changes. However, there has never been the ability to see how the entire table looked at any given point in time. That is what Temporal Tables do – they log every change that happens to the table. When you want to query it for any specific period of time, SQL Server will do the hard work of getting the snapshot of how the data in the entire table looked at that time.

A great introduction on how to set up temporal tables in different ways with various limitations can be found here. In this article, you will learn how to set up versioning when creating a new table and how to convert a table with an existing history to a system-versioned temporal table.

Creating a System-Versioned Temporal Table

Table versioning can be created on entirely new tables, or on existing tables. The history table, or the table where changes are logged, can be:

  • An entirely new, ‘anonymous’ table with no name specified in which case SQL Server creates a table and assigns a name,
  • A ‘default’ table with a name as desired by you,
  • An existing table with data that you now want to use as a history log.

To get started, create an entirely new table and version it first.

It is easy to see that the column StartDateTime is populated with current date and time in UTC, and EndDateTime is the max value that can be specified for a datetime2 data type. These are not specified in the insert statements and do not have defaults defined, they get autopopulated. Notice the syntax in the CREATE TABLE statement, generated always as row start and generated always as row end.

Now take a look at what was logged in history table:

This returns nothing. This is because logging is limited to updates and deletes and does not log inserts.

Now, if you run an update and then look at the history table, you will see that the previous row has been logged. The StartDateTime and EndDateTime values specify exactly when this row was active.

If you look at the main table for the same row, you can see that it has a new start date that matches the date when previous version of the row was retired.

Deleting a row also works similarly. When the end date of the deleted row matches date when it was deleted and there is no matching row in the main table.

Adding Versioning an Existing Table

The next scenario is to transition an existing history table to versioning. Trigger-based change tracking is still a very common and easy-to-implement process used at many places. This example explores a simple way this was implemented and how to use the same table, without changing or deleting any data to implement versioning.

There are two simple triggers, one for updates and one for deletes, to track changes to the table.

Intentionally change the same records several times so that the history table has a decent volume of data. This script will take about 10 minutes to run as you are recreating a history table with several updates with different timestamps on them.

Also, delete a couple of records from the main table.

You’ll see 702 rows in the history table.

The goal is to transition these two tables to temporal tables by keeping this data intact and allowing for traditional querying as well as querying using temporal table methodology.

As a first step, add start and end dates to both tables:

The structures of the history table and main table must be identical for turning versioning on. Since there is one column, regionhistoryid, in the history table that is not in the main table, you can either get rid of it in the history table or add it to the main table. Getting rid of it will mean the history table has no key per the older method. This is not ideal if you want to query older data with that method. Instead, add it to the main table. You won’t be using it, just add it to ensure conformance for this purpose.

The next step is to add the period to connect two new fields in the main table and then attempt to enable versioning.

This returns an error as shown below:

SQL Server will not allow identity columns in a history table. The identity property must be removed, but the data in this column is needed. To solve this, create another column, move data there, drop this column and rename the new column to the old name.

Now that the identity column is removed from the history table, try to turn versioning on again. This time you’ll get another error.

The data consistency check runs DBCC CHECKCONSTRAINT under the hood and comes up with issues if constraints do not validate. The default value of the new EndDate column is the maximum date of the system which, of course, is in the future.

There are several ways to resolve this problem. One way is to enable versioning but to skip the checks. Don’t run this code, but here it is for your reference:

This essentially means the data that is currently in the history table cannot be queried on with methods used to query temporal tables. You may also run into issues with querying any new data because older data is bad and has time period overlaps. This method is not recommended since it carries a lot of risk.

Instead, it is better to fix the data for time periods to match what is expected when using temporal table methodologies. Each history record must have a start and end date in the past during which the row was valid. The start date of each history record must match the end date of the one before it and the end date should be the start date of the next one and so on. The start date of the main table record should equal the last end date of its history. Cleaning up the data in this way will ensure that there are no time gaps.

To perform the cleanup, follow these three steps:

Step 1: Find the first history record for each row in the main table and set start date to equal the create date and end date to equal update date.

Step 2: Find the records that are dated after the first one, and update them in sequence, the start date of each record should be equal to end date of the previous one.

Step 3: The last date of the very last history record should equal the start date in main table of the same record. Remember that you have old triggers still enabled, so any changes you make to the main table will be logged again. So first, you have to drop those triggers. You also have to temporarily remove the period.

Then, run an update to bridge the history on the history table and main table.

You may, if you choose to, drop the columns createuser, createdate, updatedate, updateuser, and regionhistoryid from both tables at this point. If you have older queries using these columns, though, this might not be desirable to do.

Now, when you add the period back in and set versioning on, it works like a charm. You have also covered all time gaps involved so that querying using both the older method and the new method in versioning will work the same.

Gives results as below:

(The results returned may depend on when you run this query – in comparison to when the data was created, so use the right date for the variable @Adayago).

Converting Your Data

Cleaning up the data to make it conform to the system-version temporal tables can be quite tricky, and your scenario may be even more complex. Here are a few things to keep in mind:

  • The start date should always be less than the end date in both tables.
  • If you have multiple history records for a single parent record in main table, the start and end dates should also be sequential in ascending order with no period overlaps.
  • The end date for the last row in the history table should match the start date for the active record in the parent table.

Deleting data that does not obey these conditions is also a possible solution. Since this destroys the purpose of even having an existing table converted to history, this is not recommended. Instead you could keep that table as is and use a brand-new table to store versioning history.

It is also noteworthy that table versioning does not capture who made the change. That is something you may have to do manually, if you have a need to get this information. This trigger based workaround suggested by MVP Aaron Bertrand is a good way to incorporate this.

Removing Versioning

Eventually you may have e a scenario where you need to drop the tables or remove versioning entirely. Reasons might be that the table is gathering too much history or the footprint is not affordable.

To do this, you need set system versioning off and drop period for system_time. You can remove the date columns too since they are not of much relevance if the table is not using them but this optional.

There are a few steps to go through for this process and the following script can come in handy.

Summary

Data Versioning is complex and there are no two ways of versioning that work exactly the same. There are many situations you may run into if you are transitioning from an older method to temporal tables. Knowing what SQL Server expects would help this transition to happen smoothly. Temporal tables are a great feature and very easy to use, once we cross the hurdle of setting them up.