Product articles
SQL Change Automation
Database Builds and Deployments
Database Deployments and Rocky Rollback…

Database Deployments and Rocky Rollback Horrors: Where Does the New Data Go?

Phil Factor describes the problem of preserving data that is left 'in limbo' when you need to revert the database to a previous version that has no way of accommodating it.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

A migration script is used to move a database from one version to another. By far the most common route is to migrate ‘forward’ from one release version to the next, but the cautious development team will also create a ‘rollback’ migration script that will, if necessary, revert a database to the previous version.

With these two types of script, it is theoretically possible to move from one release version to any other, from v1 to v3, say, simply by chaining together the v1-v2 and v2-v3 migration scripts. Of course, it usually simpler to create a single script to move directly between from v1 to v3 (or in the reverse direction). However, script chaining becomes necessary when each migration script needs to provide the means of preserving existing data from one version to the next, specifying precisely how data must be moved to accommodate it on the new version.

SQL Compare or SQL Change Automation will do the hard work of preparing these scripts for you. Whether you’re doing a migration, build, or rollback, the SQL Compare engine will produce a script that create or removes objects in in the correct dependency order, within a transaction, and with all the necessary error handling. It is a task that is best left to the resolute methodical nature of the computer. Human error generally makes us a distant second-best at this task. SQL Compare deals with many common issues in preserving data between versions, but if it spots a problem while doing this chore, it lets you know the tables, and you will then fine-tune the script it creates to move the data in line with the developer’s intentions. We humans are still occasionally required.

Mostly, this is simple to do, as I’ve described elsewhere. However, occasionally, such as when you release a new version, users make subsequent changes to the data, but then you spot a problem and need to rollback. It can get complicated.

When delayed rollback happens

There are three cases where a database change must be rolled back. I’ve described them in detail in a previous article, but will summarize them. Only the third type features the problem we’re describing here, which is when there is no obvious place to place the new data added since the release went online and live.

Transactional rollback of migration scripts

Migration scripts must account for what happens if they fail during execution. It is a bad idea to leave a database in an indeterminate state between versions. By far the easiest approach to prevent this is to run them in a transaction, with error handling that rolls back the entire transaction if an error happens. Build scripts have a related but simpler requirement: they start with an empty database or schema, unlike migration scripts, so they just have to leave a clean unoccupied database if they fail, but even these scripts are simpler done within a transaction.

Rollback scripts without subsequent data changes

If a migration script is executed outside a transaction, but when offline, in single-user mode, there must be a twin script that can be executed to undo any changes that were made before the script failed or hit an error. This must be idempotent so that it checks each change and undoes it only if it still exists. Because users have had no opportunity to access the system, there is no special problem with migrating data

Rollback scripts after data changes

If a migration to a new production version of a database fails after users have been able to make subsequent changes, we may be required to roll back to the previous version. However, we must not lose any data in doing so. With trivial changes in a release, it isn’t likely, but if you’ve made more radical changes to the database to allow more sophisticated data structures, there is nowhere logical in the previous version to preserve all the new data that the users will have been keenly feeding into the system and using.

The data problem with rollback scripts

Let’s imagine the far-fetched example. You are using the venerable PUBS database to run your wholesale book business. Suddenly, you become aware of the need to deal with audiobooks. The developers obligingly add the extra tables and columns. The new release of the Pubs application and database goes live, and the staff fall upon it and enter all the audiobooks on the market, along with wonderful images, videos, blurb and so on. After two week’s work the system increasingly shows a horrible intermittent bug. Who know, maybe the developers used TEXT fields and their associated special functions, and a lot of COM components. In any case, you must roll back to the previous release while you locate and eliminate the problem. Rollbacks must preserve existing data, of course. However, there is a lot of data about audiobooks that has no location in the old version of the PUBS database.

What you want to do is to roll back to the previous version, fix the problem quickly, and then release the healed database as a subsequent version, with the extra data in place. You can’t expect the business to use the increasingly fragile current release while you develop, test, and release the new version with all the fixes because the organization has to maintain its existing book wholesaling business.

The seasoned DBA, faced with this problem, will probably BCP out the new data from the tables, and squirrel the resulting BCP files away somewhere in the file system. This DBA will then have to devise a way of replacing the data within the new table design, when the new healed version is released, and before it goes live. The problem here will be keeping all the materials for the release together as part of the new deployment package and remembering the point in the sequence of the database development where this rollback happened in case it must be repeated in a subsequent migration. It also means that movement between versions can no longer be done using just the appropriate rollback and migration scripts.

There may be no possible place in the previous version to put the data that was added in the current ‘broken’ version. If you solve the problem by adding tables in the previous version to store the data in them, then it’s no longer the previous version.

The ‘Limbo’ solution

Limbo refers to the place that those who die in original sin are left if they have not been assigned to the Hell of the Damned. I refer to a database Limbo as being the place that data is stored if it cannot, in this version of the database, be assigned to one or more relational tables. This was difficult to do before XML and JSON.

With this strategy, every database is provided with a single ‘Limbo’ table that stores as many JSON collections as are necessary. Only a Sysadmin can access these collections. This allows you to store the data and associated metadata for as many tables or views as you need without any metadata changes at all. If a rollback script is faced with removing a table without having any place for the data in the source version of the database, it is stored in ‘Limbo’. This means that the data stays within the production database: it simply isn’t easily accessible from that version. In our case, the version that we regress to has no understanding of audiobooks yet can store the data in a format that is easily restored when necessary.

The data is all held within the access control of the database, so it presents no security issues. Also, the data is backed up with the rest of the database, so there isn’t a recovery problem. An upgrade to the next version just requires a post-deployment script to re-insert the existing data.

It is possible to provide a JSON schema for each reverted table, but I think this is probably overkill. There is little danger of forgetting the table design because it will be in source control. It will, however, need enough attached information to be certain of selecting the right table-source for each table, at the right version.

Storing data in a limbo table is easy. As an example, we’ll save all the data from the Prices table into just one row in the limbo table:

To restore it into a relational form, we just have a query like this:

Why not just create a table to store the data? This would cause pursed lips for any database developer because a table is a metadata change, so it would not be the same version. This is frowned on, and rightly so because of the complications involved with keeping track of anomalies of this sort, and because it would cause false positives with auditing software. A lot of IT shops simply don’t allow it, learning from bitter experience.

Refactoring PUBS for audiobooks and other book formats

So, what changes did we need to make to PUBS so that it accommodates audiobooks? We can map PUBS, as it exists, like this:

To resolve various design flaws, and deal with the lack of any way to deal with audiobooks or any other sort of publication, we make the following alterations to the base tables:

  • Enable Titles to have tags for two or more different subjects (e.g. romance and comedy)
  • Split out Editions from Titles
  • Split out Prices so that editions can have several date-related (temporal) prices.
  • Rename what was left of the Titles table as Publications

The new table structure looks like this:

Finally, we also add a view that replicates the old Titles table and with the same name but referencing the new table structure. This means existing stored procedures will still work:

Download the source code

If you want to play along, you can grab the scripts from my public GitHub repository, here: https://github.com/Phil-Factor/PubsRevived. The build script for my current version of the expanded database is in the source folder, and the build script for the original version is in scripts.

The migration script

The team decide to release this and so make a migration script using SQL Compare. It creates the deployment script but warns us that we need to also provide a custom deployment script that looks after the data migration for a list of tables. We need to provide a pre-and post-deployment script and add it to the script folder.

We need both a pre- and post- deployment script because we need to stock each table within the new table structure from the contents of the Titles table. The pre- script is attached to the start of the actual deployment script, when it is created, and the post-script to the end. Because the Titles table gets destroyed as part of the reorganization, we keep the data as a temporary table before the main work of changing the database starts, so we can stock the new table structure after the deployment of the new tables has finished.

More on SQL Compare pre- and post-deployment scripts

See Using Custom Deployment Scripts with SQL Compare or SQL Change Automation for more details on how to work with these scripts, their requirements and limitations.

Here is the pre-deployment script:

Then, after we’ve executed the SQL Compare deployment script, the attached post-deployment script gets executed.

We also, in this script, stock the tag tables that allow us to apply several tags to a publication:

The rollback script

We can revert to the current, audiobook-less version of the database quite easily because we have SQL Compare to do all the detail work for us. However, it warns us that because some tables will be deleted, and so without evasive action, we will lose the following data: EditionType, TagName, TagTitle, Editions, Prices and Publications. It is easy to preserve the data in some of these tables, but others are more of a problem. There will be data that would be lost because there is nowhere that we can store it within the previous version’s design. From the time of the deployment to the time we need to roll back the release, the organisation has been busy with the new version and they’ve added all sorts of new editions of the existing publications, including eBooks, audiobooks, maps, paperbacks and so on. Each edition has had different prices and the business has made some price changes. The old prices are kept in order allow the business to generate copies of each invoice with the price agreed at the time.

All this data would be lost unless we find places to save it. We can easily recreate the data in the Titles table because there is a place for it in the previous version, in the form of the Titles compatibility view, shown earlier. We can use this view in a pre-deployment script to create a temporary table that will contain all the books that are currently published at the time of the rollback. This data will then be restored to the old Titles table, in the post-deployment script.

However, the other tables have data that can’t be placed in the previous version, so we must store it in Limbo. For example, we have an EditionType table, just used to prevent you mistyping a type of edition (e.g. Book, Audiobook, Map, Hardback, Paperback)

It will be very easy to restore this type of data in the future whenever you need to do so. Just to demonstrate how we’d do it, this is the code to extract it as a table-source:

Where a table has foreign keys, we need to hesitate before saving it by itself, and work out if the key could be different when, in future, we come to restore the table.

If we can be certain that none of the primary keys (Edition_ID, in this case) have been reassigned between the saving and restoring of the data, then we can save each table as we did with EditionType, and, when we want to restore the data, import the tables in much the same way as we would with BCP, by using SET_IDENTITY <Tablename> ON and inserting the data. Otherwise, we would need to save the data from a group of related tables in a way that preserves the relationships.

For this example, we’ll assume that there is no guarantee that the IDENTITY fields we’re using will be the same when we come to restore the tables. By saving the data within the related tables in hierarchical form rather than the tabular form, using JSON, it means that we can also preserve the relationships, but without using keys. This way, we can do all the other tables that SQL Compare warned us about in one JSON array of documents.

This will store the data in a more natural JSON format. Unfortunately, the FOR JSON AUTO format gets confused by the SQL used to get the tag list, which is why we had to go for the more complicated correlated subquery. However, the output is straightforward. Here is an example of one publication. I’ve kept the original keys just to provide several alternative ways to restore the data.

The next migration script should be usable whether the previous release was reached ‘naturally’ via a build script, or a series of migration scripts, or ‘unnaturally’ via a rollback script. I like to do this by a simple batch that only restores the data from limbo if it is there, and deletes it from limbo, or flags it as redundant, once the data has been merged with the existing data in the database.

Restoring the data from Limbo

Now, in development, you create another migration script that deploys a new, ‘fixed’ audiobook-friendly version of PUBS, restoring the necessary data from limbo in the process.

We can import data into relational tables from a hierarchical JSON data document like the one we’ve just generated. The complication is that we need to add the foreign keys that are appropriate, rather than assume that the old ones will still work. In this example, we can assume that the publication_id of each publication (formerly ‘title’) is assigned by the business and so we can safely use that without fear of duplicating one that is already assigned. However, the edition_id and price_id are IDENTITY fields and we have to assume that they will have been reused by the time we do our data restoration (it’s not necessarily true but let’s keep things tricky just in case).

In this example, we need to iterate through the array of documents (publications) within the JSON document. For each publication, we first save the general facts about the titles into the new publications table, and then add its tags. Then we can add the editions that the publication has, using the publication_id. Then we need to add every price, but in order to add each price for the one or more editions of each title, we need to retrieve the IDENTITY value used for the primary key of each edition and add that as the foreign key to each of the associated prices.

Here’s the coded to do all this, which you’d run as a post-deployment script for the migration (and replaces the pre- and post- scripts used for the previous migration):

Conclusion

The use of a table that doesn’t change from version to version seems, from a distance, to be an unnatural and rather dubious way of storing data. However, we’re not referencing, just preserving it because the version that the database is migrating to has no other way of preserving the data.

If we are obliged to preserve data within the database, and there is no relational, table-level, means of storing it, and it is done in line with all the disciplines in place within the development, such as insisting on atomicity in the columns of all tables, then it is OK. A delayed rollback of a database to a previous version is one of the few cases where storing document data such as JSON or XML might be appropriate. It is, though, another thing to remember about, especially if you must rebuild a database and restore the data.

Roll-back scripts are always an irritation because they take some time to write, and even more to test. They are even more irritating if it becomes essential to preserve all the data, even if there is no logical place for it. After all, they are rarely required, mercifully, but when they are used, they must work. You could say the same about fire-extinguishers, seat belts and parachutes. I create Rollback scripts conscientiously nowadays because things can go wrong: I’ve seen the pain when a database deployment went bad and there was a loss of data because there was no rested rollback script.

 

Tools in this post

SQL Change Automation

Automate your database changes with CI and automated deployment

Find out more

SQL Compare

Compare and synchronize SQL Server database schemas

Find out more