Checking on the Differences Between Two Versions of a Database Build

One of the common problems when you are developing databases is knowing what’s changed and when. You want to detect changes and to work out what has changed in a particular build. You’d have thought it would be an easy problem, when you consider that every database object in SQL Server’s system views has a date when the object was last modified. If you want to know what’s been changed since a particular date you would need to just list all those objects that have a ‘last-modified’ date greater than that date.

This actually works fine if you just want to be know whether objects have changed, but you are likely to want more than that. It is nice to know what’s been deleted too. While were about it, it is nice to see if something has been renamed or added.

This can all be achieved reasonably easy by querying the metadata, but unlike a database comparison tool, it can do little more than tell you what has changed rather than the way it has changed. It has a different objective. It gives you just the overview to track changes in a particular build. If you want to compare different builds, you can ‘t use it because the object_ids will be different. You’ll need to use a database comparison tool because the information you want to use is lost.

The problems with this technique come from the fact that some database components such as indexes, columns, parameters and distribution statistics aren’t considered to be objects, because they are parts of other objects, tables or routines in this case. When, for example, a column changes, the whole table changes and its last-modified date is changed to the time that the column was modified. However, some objects that are parts of other objects such as primary keys and constraints are considered to be objects in their own right. Sometimes they roll their updated modification date back to the parent object. Sometimes, rarely, they don’t. When a foreign key constraint is changed, both participating tables are flagged as being modified, which isn’t entirely intuitive.

Nowadays it is easier to use Extended Events to track database changes, but this requires a session to be running at the time to capture the event. If this isn’t possible, then the default trace may have the information. On the other hand, this may have been switched off, or the data may have got lost. This happens if the background noise of maintenance tasks drowns out the database modifications you want to track by pushing them out of the current default trace log. It is useless if you want to study the history of changes in a build.

In order to track database changes, you need to understand the thinking behind the way that the various database components are recorded in the system views. Objects that don’t have parents are typically views, table valued functions, stored procedures, service queues, inline table valued functions, tables and scalar functions. On the other hand, check constraints, default constraints, foreign key constraints, primary key constraints, sql triggers, and unique constraints have parents, and have no independent existence beyond their parent object. When the parent is deleted, so are the children. However Columns, parameters and indexes aren’t considered to be objects, just attributes of objects, so can only be tracked by their ids together with the object IDs of their associated objects.

Why bother with all this? If you maintain a single build, merely making changes to it for every internal release, then you can preserve just the values of the objects name, object_id, modify_date and parent_object_id in a database somewhere. If you do that for every integration, then you have the means to list out all the changes between any two releases. Hmm. I wonder what has changed since the beginning of May? (click click click.) Hmm. OK, it doesn’t tell you how it has changed, but it narrows it down! Another use: At the beginning of a day’s work, you save those four columns from the sys.objects table. At the end of the day, you run the query which identifies the changes. This allows you to script out all the changed objects into source control. With SMO (sqlserver), you can do it all automatically once you’ve devised a script to do it.

I started on this routine in order to monitor changes being made to clones, using SQL Clone. In this case, it is easy to save a json file with the required columns to disk when creating the image and use that to check on what has changed with each clone. One could even report on when the changes were made.

The table value you need in order to do comparisons is easily illustrated by using Adventureworks, but any database will do, obviously.

You can create a table to store these in, or maybe store them on disk. We’ll show the former method. Here is an example of such a table

… then you can take an ‘insta-record’ of the state of a database (Adventureworks2016 in this case)

… and reference them later. You could, if you want, just get a count of just the modifications

This will tell you the number of database objects that have changed. However, I guess you’d want more than that. This involves several comparisons between the two table values of objects so you either need a CTE or some table variables.

I’ve given a CTE version of the code in an article I’ve written elsewhere that describes how I went about the chore of testing the development of the code for this. However, that is only capable of comparing two databases on the same server. It is more useful if could compare with a JSON string, or file-based data. The best performance and versatility comes from an inline table-valued function. Here we first create the table type and table-valued function before preparing the data and passing it to the function. OK, it may look slightly cumbersome, but it is quick.

I’ve taken the original from a version stored in a DatabaseObjectReadings table. I made a copy of AdventureWorks 2016, and ran a few deletions and modifications on it to test it out. Naturally, you can compare two versions of the same build of a database when you have neither of them, just a record of relevant columns in sys.objects.

Now we can use this function do check on the difference between any two saved versions of the database or the current state of the database.

We can sort this another way, but this is the most obvious way because it groups the members of each schema together, and lists each object together with the child object that has changed, the parent first followed by the children.

You could, conceivably, do better by adding the details of the columns, indexes and so on that are related to these. However, that’s not the objective here because I just want to know what object is affected by the modification and I already know that. If you want all that stuff, you can buy a SQL Comparison tool!