Using Migration Scripts in Database Deployments

Comments 0

Share to social media

DevOps, Continuous Delivery & Database Lifecycle Management
Automated Deployment

What is a Migration Script?

Whereas a build script creates a database, a migration script, or ‘change’ script, alters a database. It is called a migration script because it changes all or part of a database from one version to another. It ‘migrates’ it between versions. This alteration can be as simple as adding or removing a column to a table, or a complex refactoring task such as splitting tables or changing column properties in a way that could affect the data it stores.

For every likely migration path between database versions, we need to store in version control the migration scripts that describe precisely those steps required to perform the change and, if necessary, moving data around and transforming it in the process

Migration scripts are of two types:

  • Automated Migration Script – this is a script generated by a synchronization tool such as SQL Compare and normally checked, and perhaps altered, by a developer or DBA.
  • Manual Migration Script – this is a script written by a database programmer. Sometimes it is impossible for a tool to create a migration script that will preserve the data in the correct place. Although it knows what the ‘before’ and ‘after’ versions of the database look like, there is sometimes ambiguity regarding the correct transition path. The classic example is that of renaming a table from A to B. We know it’s the same table with a different name, but a tool would likely perform the transition by dropping A, and any data it contains, and creating B. Of course, there are other, more subtle examples where there is not enough evidence in the source and target databases for the tool to be certain of where the data should go without human intervention.

Migration scripts can be forward or “up” migrations that go to a newer version of a database, or backward, or “down” migrations to fall back to a previous version of a database. Any manual migration script is generally used to move between consecutive versions of a database.

A migration script must be idempotent, in that it will have no additional effect if run more than once on the same database. It isn’t wise to change a script once it has been run successfully, and the same set of migration scripts should be run in every server environment where the deployment is taking place.

With this discipline in place, we have a much greater chance of error-free deployments. To bring a database up-to-date, for example, all that is necessary is to run in the correct order the “up” migration scripts that will transform the target database from its current version to the latest version in source control. A ‘build script’ is then merely a special case of a migration script that migrates the database from nowhere to the first version.

It is a good precaution to write integration tests to check that the migration scripts have worked, and it is also essential to be able to roll back to a known version if an error occurs during the build process.

Why some refactoring tasks can’t be automated

Although a synchronization tool can describe most migrations, there are some database changes that can’t be inferred by comparing the beginning and end states of the database, because the tool has insufficient information to know how to migrate the existing data.

Let’s consider a few examples.

Table renaming

A table named People in one database might be named Customers in another. Although they’re the same table and contain the same data, it is impossible to tell whether the two are the same table with any certainty. In assessing the differences between the databases, any automated process will interpret them as two different tables. The People table only exists in the target database, and the Customers table only exists in the source database, and an automatically-generated comparison script would drop the People table and create a new, empty table named Customers. When the script is run, the data in the People table is lost. To get around a great deal of work, it is better to replace this with a manual migration script that does the rename using sp_rename.

Splitting or merging columns

When we split a column, an automated migration script would likely create two or more new columns and drop the original column. Likewise, a column merge would create a new column and drop the original columns.

In either case, any data in the dropped column(s) will be lost.

Instead, we can create a manual migration script to create the new column(s), run custom SQL to move the data, and then drop the original column(s).

Changing the data type or size of a column

When we change a column’s data type (for example, from INT to SMALLINT), data may be truncated during deployment if the new data type doesn’t accommodate some of the rows. Similarly, changes to the size of some columns can result in truncation; for example, VARCHAR(50) to VARCHAR(20). We can create a manual migration script to modify rows appropriately that would otherwise be truncated.

Dealing with large data movements

Regardless of how we generate the migration script, we may not be able to run it “as is” on a large database, especially one that is in production use.

An ALTER statement on a heavily used OLTP table that is being actively updated can have serious repercussions. Any operation, for example, that acquires a schema-modification lock will prevent user access to the table or its metadata during the modification. Alterations to a table are fully logged and so I/O-intensive. If they require table locks, then all other active users of the table will be forced to wait.

Operations such as dropping a column, or adding a NOT NULL column with a default value, can take a long time to complete and will generate many log records.

Just as we might break into smaller batches INSERT, UPDATE, or DELETE statements against large active tables, that affect many rows, so large-scale data migrations in such circumstances are best done incrementally in discrete batches. However this doesn’t apply to metadata. An individual column insertion is a single operation and can’t be broken down and so, to avoid taking the database offline, would require a series of steps involving the creation of a new table, population with data,  and a subsequent rename, or ideally a synonym change. This process is usually done with indirection using changes of Synonyms because of their late-binding, or with partition-switching (Enterprise Edition only).

Writing Migration Scripts

Every migration script needs a way to check the preconditions in the environment in which it will run to make sure the script has the right and that it is idempotent, meaning that running it repeatedly will have no further effect. For this purpose, we include in every script a guard clause that will, for example, check to see whether the migration task has already been accomplished before continuing to run it.

We might also write a clause that checks whether the migration script is valid for the version of the target database. For example, if we know that a migration script is intended for version 1.5, we can have the script start by interrogating the current version number of the target database, and aborting if it isn’t “1.5”.

Unfortunately, there is no standard way of attaching a version to a SQL Server Database. The obvious place to store the current version is in the extended properties of the database, but these are not implemented in Windows Azure SQL Database (formerly SQL Azure, SQL Server Data Services, and later SQL Services). Therefore, it is probably safest to store them in a special-purpose table-valued function or view in the target database if your database has to be portable to Azure SQL.

The following sections review some typical IF EXISTS-style guard clauses.

Guard Clauses

Every migration script should include a guard clause that will stop the migration script making changes if it doesn’t need to be run. It is dangerous to assume that a script will only be run in the right preconditions. If you don’t include a guard clause, the migration script might fail or create unexpected changes in the database.

Guard clauses are standard practice for SQL Server scripting. When, for example, developing a stored procedure, one can use CREATE PROCEDURE if the procedure doesn’t exist, or the ALTER PROCEDURE if it does. The former will fail with an error if it already exists, and the latter will fail if it doesn’t. To prevent errors, a guard clause is generally inserted to delete the procedure first if it exists. After that, it is safe to use CREATE PROCEDURE. Since SQL Server 2008, it has been possible to do it a more elegant way that preserves existing permissions on the procedure, having the guard clause run SET NOEXEC ON if the procedure already exists, which then parses, but does not execute the CREATE statement that creates a blank procedure. This avoids deleting an object in a running system, and preserves all the object’s existing attributes when making a change to it. It also gets round the problem of doing a conditional ‘CREATE’ when it must be at the start of a batch. see Listing 1 in ‘Automating SQL Server Database Deployments: Scripting Details’.

In other circumstances, a guard clause can be used to abort a script if it determines that it has already been run, or to perform different logic depending on the database context in which the script is being run. In the following examples, print statements are used to show where the SQL Statements would be.

To check that a table exists

To check that a function exists

To check that a procedure exists

To check that a particular column exists in a table exists

To check whether a column has any sort of constraint

To check that a column has a check constraint

To check that a column has a foreign key constraint

To check that a column participates in a primary key

Examples of migration scripts

The following section provides typical examples of manual migration scripts for various common database refactoring tasks.

Updating a Stored Procedure

This technique preserves permissions on a routine (procedure or function) whilst doing the update. (SQL 2008 or above only)

Renaming a table

This migration renames the Person table to Customer. The script includes a guard clause to check that the table Person exists in the database. If the table doesn’t exist, no changes are made.

Updating a column definition

This migration updates the column definition in the table Widgets with a default value for existing rows before adding a NOT NULL constraint. It includes a guard clause to check that the Widgets table contains a Description column. If the column doesn’t exist, no changes are made.

Splitting a column

In this example, The Customer table has a column Address with the postcode embedded, which means that reports that determine regions based on postcode are difficult.

2002-1-bca8930d-567b-473f-89aa-e397b9802

Figure 1: An Address column with embedded postcodes

The manual migration script splits Address into the columns StreetAddress and PostCode, and updates existing rows. Fortunately, the extraction of the postcode is relatively easy since it is separated from the address by a semicolon character ‘;’

The script includes a guard clause to check that the Customer table contains an Address column. If the column doesn’t exist, then no changes are made. If the column does exist, the guard clause checks that the StreetAddress and PostCode columns also exist. If they don’t exist, the script creates the columns before updating the rows.

This would have the following affect when run:

2002-1-c60b496d-38ea-468a-8ef0-7387d5867

Figure 2: The refactored Customer table

It it was inadvertently run again, the output would be …

Moving Data

In this example, a table, PersonData, has two email columns in it, to try to accommodate the fact that some customers have two email addresses. It needs to be normalized, of course.

2002-1-2df7a4f0-17e5-4e64-99f1-9a41a1fe0

Figure 3: The PersonData table has two email columns

The migration script moves data from the table PersonData to the tables Person and Email. It includes a guard clause to check that the table PersonData exists in the database. If the table doesn’t exist, no changes are made. If the table exists, the guard clause checks that the Person and Email tables exist. If they don’t exist, the script creates the tables before updating the rows.

Here is the resulting table filled with the email addresses:

2002-1-c6f2457f-cfdc-4d18-b7c1-31cc722ee

Figure 4: The refactored PersonData table

Table refactoring

Imagine that we have inherited a database that hasn’t been properly normalized and needs refactoring. Data for customers is held in table named PersonData, which also contains all their contact details.

To normalize the table, we’ll break it into different tables in a new schema named Contacts:

We now want to migrate Email1 and Email2 from PersonData into Contacts.Email, creating two separate rows for a person having both Email1 and Email2. Similarly, we migrate the phone and address details.

After running the migration script, we can drop the original PersonData table as a separate commit.

Conclusions

Although much of the chore of synchronizing databases for a deployment can be done using a tool such as SQL Compare, there are times when a step has to be done by hand, either because it is a large table, or it isn’t unambiguously obvious how the existing data should be placed in the target revision of the database, in such a way as to be sure of preserving it.

For the automated deployment of databases, hand-crafted SQL migration scripts will occasionally have a place, and will be stored in version control to run before any automated process for deploying changes to the database. They should be written so that, if it is run at the wrong time, or run repeatedly, it will have no effect. Guard Clauses are a good way of ensuring that Murphy’s law will have no consequences.

DevOps, Continuous Delivery & Database Lifecycle Management
Go to the Simple Talk library to find more articles, or visit www.red-gate.com/solutions for more information on the benefits of extending DevOps practices to SQL Server databases.

Load comments

About the author

Jonathan Hickford

See Profile

Jonathan Hickford is a product manager at Red Gate, working on creating tools to support automation of the database delivery process. Before he joined Red Gate he worked running the web technology team of an international broadcaster, who provided new features and the operational management for the consumer facing TV network websites.

Jonathan Hickford's contributions