So way back in June of last year, when I started this series on Temporal Tables: Part 1 – Simple Single Table Example, Part 2 – Changing history; and even in Part 3 – Synchronizing Multiple Modifications; I only referenced one table. In this entry, I want to get down to what will actually be a common concern. I want my objects to be consistent, not just at the current point in time, but throughout all points in time. I won’t even try to mix this concern with changing history, but I imagine that it could be a major undertaking depending on the data you already have from any change log tables you have created, if you were not concerned with viewing data at previous points in time.
In part 3, I looked at what happens when the same row, or two rows in the same table would behave. 2 tables will behave quite the same, and there is a compelling reason to be cognizant of the temporal timestamps when you are dealing with multiple tables and want to see all tables as they existed at a given point in time.
If you were writing you own versioning, you might use a trigger and write the change. Each change you would write would have a timestamp as of the time the change was written. I won’t cover the code (in this blog, but I will in the book) that you need for keeping history of changes here, but it is a widely used pattern. When a row changes, make a copy of the deleted rows in the trigger in a table that looks like the primary table, and set the time when the change was made.
A problem with this, if you want to see how the database looks at any point in time, you would see the progression of changes over time, which could lead to inconsistent views of the data at any given point in time. So say you have a SalesOrder and SalesOrderLineItem table, and you have a ControlTotal on the SalesOrder that needs to match the sum of the line item values for the SalesOrder. There really is no way to enforce a relationship between columns in different tables as this because SQL Server does not have delayed constraints or triggers. I will use this (what could be deemed a denormalization, depending on how it is viewed in the requirements) scenario, but not all issues will be quite so obvious.
While we must trust the client to get the math correct or reject it, if we are going to keep temporal versions of data, it is highly important that we make sure that the views across time are consistent (even down to the microsecond if we are using a very high granularity in our start and end time columns,) in order to make coding easier. By making sure that our modifications are packaged in transactions, we can do this.
First, let’s create our tables (and I will drop the SalesORder table we had previously and add the ControlTotal column. Note that you can’t just drop a table with versioning on, you need to turn off versioning and drop both tables.):
ALTER TABLE Sales.SalesOrder
SET (SYSTEM_VERSIONING = OFF);
go
DROP TABLE Sales.SalesOrder;
DROP TABLE Sales.SalesOrderHistory;
GO
Then I will recreate the table with a change from Data to ControlTotal, and then a LineItem table that has the line total, as well as foreign key constraint to the SalesOrder table:
SET NOCOUNT ON;
CREATE TABLE Sales.SalesOrder
(
SalesOrderId int NOT NULL CONSTRAINT PKSalesOrder PRIMARY KEY,
ControlTotal DECIMAL(10,2) NOT NULL,
ValidStartTime datetime2 (7) GENERATED ALWAYS AS ROW START,
ValidEndTime datetime2 (7) GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (ValidStartTime, ValidEndTime)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = Sales.SalesOrderHistory)); –Another future thought, put in a different schema? I think not, but, it intrigues me nonetheless.
GO
CREATE TABLE Sales.SalesOrderLineItem
(
SalesOrderLineItemId INT NOT NULL CONSTRAINT PKSalesOrderLineItem PRIMARY KEY,
SalesOrderId int NOT NULL CONSTRAINT FKSalesOrderLineItem$ContainsDetailsFor$SalesSalesOrder REFERENCES Sales.SalesOrder(SalesOrderId),
LineItemNumber INT NOT NULL,
LineItemTotal DECIMAL(10,2) NOT NULL,
CONSTRAINT AKSalesORderLineItem UNIQUE (SalesOrderId, LineItemNumber),
ValidStartTime datetime2 (7) GENERATED ALWAYS AS ROW START,
ValidEndTime datetime2 (7) GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (ValidStartTime, ValidEndTime)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = Sales.SalesOrderLineItemHistory)); GO
Now let’s do a progression of data, leaving a two second gap between operations, slowing down time a little bit like can happen in reality.
INSERT INTO Sales.SalesOrder (SalesOrderId, ControlTotal)
VALUES (1, 100);
WAITFOR DELAY ’00:00:02′;
INSERT INTO Sales.SalesOrderLineItem (SalesOrderLineItemId, SalesOrderId, LineItemNumber, LineItemTotal)
VALUES (1, 1, 1, 50);
WAITFOR DELAY ’00:00:02′;
INSERT INTO Sales.SalesOrderLineItem (SalesOrderLineItemId, SalesOrderId, LineItemNumber, LineItemTotal)
VALUES (2, 1, 2, 50);
Now let’s take a look at the data, just after the insert:
SELECT *
FROM Sales.SalesOrder
WHERE SalesOrderId = 1;
SELECT *
FROM Sales.SalesOrderHistory
WHERE SalesOrderId = 1;
SELECT *
FROM Sales.SalesOrderLineItem
WHERE SalesOrderId = 1;
SELECT *
FROM Sales.SalesOrderLineItemHistory
WHERE SalesOrderId = 1;
We have 3 rows in the base tables, starting at three different times, and no history yet:
SalesOrderId ControlTotal ValidStartTime ValidEndTime
———— ————————————— ————————— —————————
1 100.00 2016-05-01 20:36:51.1670200 9999-12-31 23:59:59.9999999
SalesOrderId ControlTotal ValidStartTime ValidEndTime
———— ————————————— ————————— —————————
SalesOrderLineItemId SalesOrderId LineItemNumber LineItemTotal ValidStartTime ValidEndTime
——————– ———— ————– ————————————— ————————— —————————
1 1 1 50.00 2016-05-01 20:36:53.2000286 9999-12-31 23:59:59.9999999
2 1 2 50.00 2016-05-01 20:36:55.2452606 9999-12-31 23:59:59.9999999
SalesOrderLineItemId SalesOrderId LineItemNumber LineItemTotal ValidStartTime ValidEndTime
——————– ———— ————– ————————————— ————————— —————————
Now, even before we have any version history, the timestamps start to matter. Looking at current data, no problem
SELECT ControlTotal, LineItemTotal
FROM Sales.SalesOrder
JOIN Sales.SalesOrderLineItem
ON SalesOrderLineItem.SalesOrderId = SalesOrder.SalesOrderId
The results look just like you expect. But what if we are doing temporal queries on the table?
ControlTotal LineItemTotal
————————————— —————————————
100.00 50.00
100.00 50.00
At the time the first operation:
DECLARE @asOfTime datetime2(7) = ‘2016-05-01 20:36:52’;
SELECT ControlTotal, LineItemTotal
FROM Sales.SalesOrder FOR SYSTEM_TIME AS OF @asOfTime
LEFT OUTER JOIN Sales.SalesOrderLineItem FOR SYSTEM_TIME AS OF @asOfTime
ON SalesOrderLineItem.SalesOrderId = SalesOrder.SalesOrderId;
The data looks wrong:
ControlTotal LineItemTotal
————————————— —————————————
100.00 NULL
And at 20:36:54:
DECLARE @asOfTime datetime2(7) = ‘2016-05-01 20:36:54’;
SELECT ControlTotal, LineItemTotal
FROM Sales.SalesOrder FOR SYSTEM_TIME AS OF @asOfTime
LEFT OUTER JOIN Sales.SalesOrderLineItem FOR SYSTEM_TIME AS OF @asOfTime
ON SalesOrderLineItem.SalesOrderId = SalesOrder.SalesOrderId;
Things are better, but not quite right:
ControlTotal LineItemTotal
————————————— —————————————
100.00 50.00
Naturally, the likelihood of coming up against such a condition in the case of one row over two seconds is pretty slim, But the more concurrent operations using temporal, the more likely that you get back some weird totals that you don’t want. And even more of a concern is that users often need to fix some data that is broken when you have these inter-table dependencies. If you are fixing data, you may need to fix history as well now (fixing history is an interesting topic that I may discuss some day. The central question will be based on requirements. It really depends if you want to see the data through time, or the information through time. Information should be correct. Data is what it was.
This time, for the second SalesOrder, I will make sure that all of the data is inserted and updated in the same transaction to ensure that the view of the data remains consistent:
BEGIN TRANSACTION;
INSERT INTO Sales.SalesOrder (SalesOrderId, ControlTotal)
VALUES (2, 100);
WAITFOR DELAY ’00:00:02′;
INSERT INTO Sales.SalesOrderLineItem (SalesOrderLineItemId, SalesOrderId, LineItemNumber, LineItemTotal)
VALUES (3, 2, 1, 50);
WAITFOR DELAY ’00:00:02′;
INSERT INTO Sales.SalesOrderLineItem (SalesOrderLineItemId, SalesOrderId, LineItemNumber, LineItemTotal)
VALUES (4, 2, 2, 50);
COMMIT TRANSACTION
Then checking the data:
SELECT *
FROM Sales.SalesOrder
WHERE SalesOrderId = 2;
SELECT *
FROM Sales.SalesOrderHistory
WHERE SalesOrderId = 2;
SELECT *
FROM Sales.SalesOrderLineItem
WHERE SalesOrderId = 2;
SELECT *
FROM Sales.SalesOrderLineItemHistory
WHERE SalesOrderId = 2;
It is clear to see that no matter what the AS OF time used, you will not have the issue with illogical results, since all of the start and end times are the same to 7 decimal places:
SalesOrderId ControlTotal ValidStartTime ValidEndTime
———— ————————————— ————————— —————————
2 100.00 2016-05-01 20:48:30.0154948 9999-12-31 23:59:59.9999999
SalesOrderId ControlTotal ValidStartTime ValidEndTime
———— ————————————— ————————— —————————
SalesOrderLineItemId SalesOrderId LineItemNumber LineItemTotal ValidStartTime ValidEndTime
——————– ———— ————– ————————————— ————————— —————————
3 2 1 50.00 2016-05-01 20:48:30.0154948 9999-12-31 23:59:59.9999999
4 2 2 50.00 2016-05-01 20:48:30.0154948 9999-12-31 23:59:59.9999999
SalesOrderLineItemId SalesOrderId LineItemNumber LineItemTotal ValidStartTime ValidEndTime
——————– ———— ————– ————————————— ————————— —————————
Finally, let’s update the line item 1 row to 25, and the total to be 75
BEGIN TRANSACTION;
UPDATE Sales.SalesOrder
SET ControlTotal = 75
WHERE SalesOrderId = 2;
UPDATE Sales.SalesOrderLineItem
SET LineItemTotal = 25
WHERE SalesOrderId = 2
AND LineItemNumber = 1;
COMMIT TRANSACTION;
Looking at the data and history:
SELECT *
FROM Sales.SalesOrder
WHERE SalesOrderId = 2;
SELECT *
FROM Sales.SalesOrderHistory
WHERE SalesOrderId = 2;
SELECT *
FROM Sales.SalesOrderLineItem
WHERE SalesOrderId = 2;
SELECT *
FROM Sales.SalesOrderLineItemHistory
WHERE SalesOrderId = 2;
We see that the SalesOrder and line time 1 start times match, but not the 4th one, as you would expect since we did not apply any modification statement to that row:
SalesOrderId ControlTotal ValidStartTime ValidEndTime
———— ————————————— ————————— —————————
2 75.00 2016-05-01 20:52:41.6210321 9999-12-31 23:59:59.9999999
SalesOrderId ControlTotal ValidStartTime ValidEndTime
———— ————————————— ————————— —————————
2 100.00 2016-05-01 20:48:30.0154948 2016-05-01 20:52:41.6210321
SalesOrderLineItemId SalesOrderId LineItemNumber LineItemTotal ValidStartTime ValidEndTime
——————– ———— ————– ————————————— ————————— —————————
3 2 1 25.00 2016-05-01 20:52:41.6210321 9999-12-31 23:59:59.9999999
4 2 2 50.00 2016-05-01 20:48:30.0154948 9999-12-31 23:59:59.9999999
SalesOrderLineItemId SalesOrderId LineItemNumber LineItemTotal ValidStartTime ValidEndTime
——————– ———— ————– ————————————— ————————— —————————
3 2 1 50.00 2016-05-01 20:48:30.0154948 2016-05-01 20:52:41.6210321
Now we can check the data as of a few times, and see that things are consistent:
At the original time of insert:
DECLARE @asOfTime datetime2(7) = ‘2016-05-01 20:48:30.0154948’;
SELECT ControlTotal, LineItemTotal
FROM Sales.SalesOrder FOR SYSTEM_TIME AS OF @asOfTime
LEFT OUTER JOIN Sales.SalesOrderLineItem FOR SYSTEM_TIME AS OF @asOfTime
ON SalesOrderLineItem.SalesOrderId = SalesOrder.SalesOrderId
WHERE SalesOrder.SalesOrderId = 2;
Two rows returned, total matches line item totals:
ControlTotal LineItemTotal
————————————— —————————————
100.00 50.00
100.00 50.00
At the time of the update:
DECLARE @asOfTime datetime2(7) = ‘2016-05-01 20:52:41.6210321’;
SELECT ControlTotal, LineItemTotal
FROM Sales.SalesOrder FOR SYSTEM_TIME AS OF @asOfTime
LEFT OUTER JOIN Sales.SalesOrderLineItem FOR SYSTEM_TIME AS OF @asOfTime
ON SalesOrderLineItem.SalesOrderId = SalesOrder.SalesOrderId
WHERE SalesOrder.SalesOrderId = 2;
This returns:
ControlTotal LineItemTotal
————————————— —————————————
75.00 25.00
75.00 50.00
I will leave it to you to try other times for yourself.
One quick note, if I had updated SalesOrderLineItemId = 4, even to the same value, I would get version rows. Be really careful not to just update rows repeatedly if there is
no change. You will want to do what you can to avoid it, or you could get this to occur:
UPDATE Sales.SalesOrder
SET SalesOrderId = SalesOrderId
WHERE SalesOrderId = 2
GO 10
Beginning execution loop
Batch execution completed 10 times.
So now the row has been updated 10 times with no change to the data. The version history is now considerably larger:
SELECT *
FROM Sales.SalesOrderHistory
WHERE SalesOrderId = 2;
SalesOrderId ControlTotal ValidStartTime ValidEndTime
———— ————————————— ————————— —————————
2 100.00 2016-05-01 20:48:30.0154948 2016-05-01 20:52:41.6210321
2 75.00 2016-05-01 20:52:41.6210321 2016-05-01 20:59:53.9903127
2 75.00 2016-05-01 20:59:53.9903127 2016-05-01 20:59:54.0059626
2 75.00 2016-05-01 20:59:54.0059626 2016-05-01 20:59:54.0215896
2 75.00 2016-05-01 20:59:54.0215896 2016-05-01 20:59:54.0372406
2 75.00 2016-05-01 20:59:54.0372406 2016-05-01 20:59:54.0372406
2 75.00 2016-05-01 20:59:54.0372406 2016-05-01 20:59:54.0528342
2 75.00 2016-05-01 20:59:54.0528342 2016-05-01 20:59:54.0528342
2 75.00 2016-05-01 20:59:54.0528342 2016-05-01 20:59:54.0528342
2 75.00 2016-05-01 20:59:54.0528342 2016-05-01 20:59:54.0528342
2 75.00 2016-05-01 20:59:54.0528342 2016-05-01 20:59:54.0684602
Very little change in the start times, but some never the less (you can see the start and end times do change a little bit over time.) This could be a horrible feature to turn on if you have such an interface
(as most of us probably do) where if nothing has changed and the user can press save over and over, causing update after update. So definitely watch your history tables after you turn this feature on to a new table to make sure of what is occurring.
Load comments