Non-Breaking Online Database Deployments

For an experienced database developer, the idea of doing non-breaking database deployments while the database is still online holds respect but only a little fear. If your test procedures are good, you attend to the detail, and the application interfaces are properly managed, it can be achieved. Ed Elliott explains some of the issues to tackle.

2091-stage4.png

DATABASE LIFECYCLE MANAGEMENT PATTERNS & PRACTICES LIBRARY

Stage 4: Automated Deployment

Databases often have a set of challenging requirements when dealing with releases. Business requirements mean that we need to think about how the changes we make can be deployed without taking an application offline and how we can support the different applications that can simultaneously use databases.

Whether we write manual upgrade scripts or have a fully automated deployment pipeline we need to understand these deployment issues so we can be prepared enough to know when changes should and should not be done online and also when we need to ensure we have backwards compatibility with running applications.

Although they require rigorous testing and precautionary code, it is perfectly practical to apply changes to a live database system without “breaking” its dependent applications. There are many reasons for deploying updates when the application is being used and, by ensuring that you always use non-breaking deployments where you can, you gain the flexibility to deploy on a more regular basis.

This article offers guidance on when and how you should upgrade your database, using non breaking changes, whilst keeping the database, and therefore the application, online.

What are non-breaking database deployments?

Although the prime struggle is to deploy database changes without breaking the database, this article moves on to the next level: deploying database changes without breaking or impacting the dependent applications and services, such as ETL and downstream analysis.

A non-breaking database change is an upgrade to the schema of the database, in other words the tables, stored procedures, functions as well as the actual data, while keeping existing applications working with multiple versions of an application, or in some cases several applications.

The traditional approach is to deploy the database and application changes together. We stop the application, deploy the database changes, deploy the associated application changes and then bring the application online. This works well when the application has regular maintenance windows, where we can take it offline, where we perform scheduled deployments during these windows and where every application that uses a database can be upgraded at once

However, it does not work if we have an application that must be available 24 x 7, or a database that requires ad-hoc, irregular deployments. In such cases, we need a way to make only the database change, without breaking the applications, which of course implies that we should be able to make the database change without taking the application offline. In fact, a good description of the end goal of non-breaking database deployments is to get to the point where you can release changes to the database without having to take the applications offline. In other words, the goal is to be able to treat the database as a self-contained component or public API that we can upgrade without needing to upgrade the applications that use it, or at least not immediately.

Online versus offline changes

There is a certain class of changes which are low risk and can be done with the database online such as adding a new table or procedure. The act of adding schema items will cause a schema lock for the duration of the change but this is perfectly fine in the majority of cases.

However, for any database change that affects how an application queries the data, or affects the data it receives back, we either need to take the application offline, or find a way to guarantee that the database change on its own will be non-breaking.

Certain schema changes, such as altering a column name or data type, should be relatively easy to make online. For more complex changes, such as table splits, it may still be possible but it will be more challenging. Ultimately, if a database change is likely to take a long time, due to the size of the tables, or to be constantly blocked due to users holding locks on the table, then there may well be no choice but to do it offline.

Sometimes, it is tempting, in the spirit of optimism, to make an online change simply because you can’t see what could go wrong, or because it “normally works fine”. However, you should really consider whether you have truly tested the change in staging with the same sort of load you will see in production at the time of the deployment and whether you truly understand the effects of locking and how the modifications will affect the performance of the application while the change is going on. It is also important to understand what will need to happen if you roll back the change, will a five second deployment turn into a one-hour rollback with the database offline? If you do not know the answer or the rollback time is unacceptable then you should probably do the change offline.

In terms of how easy it is to make online changes, if applications have direct accesses to the underlying tables, rather than via stored procedures, then the more challenging you will find it to make online changes, especially for large databases.

The practicalities of non-breaking database changes

Let’s say you have a 100-million row Person table that looks as shown in Listing 1.

Listing 1

You need to normalize the design, splitting the address information into a separate table, as shown in Listing 2.

Listing 2

How do we perform this table split, and data migration, without either breaking dependent applications, or taking them offline? It depends on whether your applications are tightly or loosely coupled to the underlying database tables.

Case 1: Tight Coupling of Application and Database

If your applications have direct access to the underlying tables, then performing this table split online will be challenging. You need to use devices that involve views and INSTEAD OF triggers to be able to support the old and the new schema concurrently You could:

  1. Create the new table definition
  2. Rename the existing table
  3. Create a view doing a union between the old, now renamed table and the new table
  4. Create an INSTEAD OF trigger on the view for inserts and updates which insert into the new table structure
  5. Start inserting the rows in small batches into the new format
  6. When the data is in the new structure, change the view to remove the old table and then drop the old table
  7. You will need to leave the triggers in place until you have changed your application to deal with the new table format
  8. When the applications have all been upgraded then you can remove the view and the trigger

This would work most of the time but

you really need to bear in mind how much data you will be migrating and how that will affect the performance of the database. This will vary considerably on a case by case basis.

Case 2: Loose Coupling of Application and Database

The best way to exploit the possibility of making non-breaking database changes is to follow the best practice of defining an application interface, consisting of stored procedures and functions within the database schema. It’s a controversial topic, and one that attracts much heated debate, and teams have various reasons for not using them, including flaky support for them in their ORM tool (although this argument is now-outdated; most modern ORMs offer full support for calling stored procedures).

From the perspective of database deployments, I think the potential benefits of such a layer is actually pretty exciting. Stored procedures and functions provides a layer of abstraction from the underlying database schema, and this can allow us to make changes to the underlying database schema without affecting the application.

If we take the previous example of splitting a table, with a well-designed stored procedure interface, rather than needing to introduce views, triggers, and so on in order to support both versions of the schema, we can simply remove the need to use a trigger and instead of having to deploy a view we just change the stored procedure to return the data we want. Of course, sometimes a schema change will mean that you need to add or remove parameters from stored procedures, and these changes can easily break an application. For example, let’s say we have a stored procedure called GetEmployeeById and it takes a single parameter, PersonId, and returns the name and age of the employee (this is a contrived example, purely to illustrate the point, and not a best practice of any sort!). In the next version of the application, you want to add a column to identify the company and so pass in the CompanyId as well as the PersonId, so that the procedure returns the employee name and company name, but no longer the age.

If we simply updated the stored procedure with the extra parameter, this would be a breaking change. Firstly, the stored procedure would not execute as the application is not passing the CompanyId. If we can provide a sensible default value for this parameter, for example if we know the application is only interested in employees from a single company, then we can solve this problem quite easily. However, the second problem is that if a column had been removed, the application would be expecting a value in the resultset that is now not available. Even worse, the application might be relying the columns being returned in a precise order, and could end up working with incorrect data, which could lead to serious issues.

The easiest way to deal with this is to give the new version of the stored procedure a different name, or place it in a different schema, and maintain both procedures, until such a time when it’s possible to update the application to accept the new parameter. The process for this would be:

Create a new procedure GetEmployeeById_ReleaseIdentifier (or some other name)

  • Change the existing GetExployeesById to either call the new procedure or to return the correct data – ideally the logic would be in one place
  • Migrate all applications to the new procedure
  • Remove the old procedure

There are different variations on this and instead of renaming the procedures you could create a new security schema.

If you are just upgrading a handful of procedures then it might be worthwhile just renaming them new ones but the benefit of using different schemas is that you can easily see what sets of procedures belong to each release.

Tests for non-breaking changes

Now that you understand how to physically make non-breaking changes you should beware that manually testing applications and changes is really difficult, time consuming and error prone. To really make this work, what you need is a suite of automated tests that cover the entire database and each of the applications and each version of each of those applications.

Without a complete suite of tests, you will find that between each change you would need to do a complete regression test which quickly limits your ability to make lots of small changes quickly.

What’s needed is a set of unit tests (I would highly recommend tSQLt unit tests), covering the original version of the stored procedures. Having created new stored procedures which mimic the old interface, the same unit tests should still work with the new code. Next, you create a new set of tests for the new interfaces, and both sets should pass, at the same time. Finally, over time, you remove the old code and remove the old tests, and the new tests should still all pass.

Only as good as the process

Non-breaking changes are great, in that they give you an opportunity to keep development going and increasing the business value of an application rather than letting it become a burden with long release cycles.

However, you really need a process to track and remove the old code and the temporary fixes you put in place, so you should be using some sort of work tracker – be it post it notes, a backlog, Microsoft Project or my personal favorite, a Kanban board. When you develop some temporary code, put in a future piece of work to remove it again and make sure you do it as soon as you can. If you let the changes pile up without doing anything about it, you are heading for a real bad time!

This article is part of our Database Lifecycle Management patterns & practices.

Return to the library to find more articles on DLM, or visit red-gate.com/dlm for more information about SQL Server products for database lifecycle management.