How to build multiple database versions from the same source: pre-deploy migration scripts
The final part of Alex Yates's three-part series tackles the complicated issue of automating deployments when the same table might have a different structure, in different production versions of the database.
This is the third post in a three-part series that explains how to maintain a single declarative source of truth in version control for a SQL Server database, which can be deployed to multiple environments, despite the fact that some environments have known customizations.
In part 1, I explained how to manage objects that exist in some environments but not all of them using filters. In part 2, I explained how post-deploy scripts can handle any objects, other than tables, which exist in multiple environments but are different in each. In this final part, I’ll discuss the complicated issue of maintaining a specific table, or set of tables, with different structures in different production versions of the database, using a combination of a migrations tool and a pre-deploy script. We’ll also need to take another look at the filters we set up in part 1.
I’ll conclude by discussing, briefly, how to handle more than two customized environments and then sum up the key limitations associated with the overall approach, plus some of the alternative solutions we considered.
Setup
To help demonstrate the concepts I’ll continue to use my sample databases, Twilight Sparkle
and Fluttershy
, which represent two customized production instances of the same database (ponies). We’ll also continue to use the ‘master development database’, Ponies_dev
, and we’ll need modified versions of some of the SQL Compare filters that we created in part 1.
You can create the Twilight Sparkle
and Fluttershy
databases by executing this script. You can create the Ponies_dev
database, source control repository and filter files by following the instructions in part 1, and you can add a necessary post-deploy script by following part 2.
You’ll need a SQL Server instance and a copy of the Redgate SQL Toolbelt.
Dealing with multiple production versions of the same table
Deployments from a single source, to a range of environments, each with differently-customized tables, are a nasty problem and there is no simple solution. I will repeat my disclaimer from Part 1: It is much better, wherever possible, to solve this problem by standardizing table structures in all environments, before creating a ‘canonical source’ database in source control.
If, however, it is not possible to standardize your table structures, the approach I describe will at least allow you to keep the tables for each customized environment in source control, and offer a viable solution for deployment.
The migrations approach
We’re faced with the dilemma of deploying tables that need to be maintained in source control, but which are expected to exist with a customized schema in different environments. For example, perhaps a table in one environment has different columns, or data types that cannot easily be made to match.
In such situations, the strategies we employed in parts 1 and 2 won’t work. Filters won’t work if the table exists in both databases but is different. Post-deploy drop and create scripts don’t work on tables because we’ll lose data.
The most effective way to manage our tables, where differences exist, is with a migrations tool, the idea being that we have a custom migrations script, per environment, which deploys changes to the custom tables only, before we deploy any changes to the other database objects.
Migrations tools are often a cheap and effective way to automate database deployments, and they are fairly easy to understand. Typically, they allow a user to add a list of scripts to source control and deployments are handled by running the scripts in order.
Usually these frameworks will add a DeploymentHistory (or similar) table to a target database to keep track of which scripts have already been run, so that a simple query can establish if there are any new scripts that need to be deployed.
While these frameworks are simple and effective, without very careful management they don’t scale well with large projects that have many contributors and frequent updates, because script ordering and conflicts become difficult to manage. They also are not an effective way for managing programmable objects such as stored procedures, views and functions.
This is why many people prefer tools, such as those provided by Redgate, which store the current state of each object in the VCS and only use migrations scripts as required for tricky table migrations. It’s also why I’ve built this solution primarily around Redgate SQL Source Control (see parts 1 and 2). Trying to build this solution using entirely migrations tools would have been very challenging.
I’ve discussed the relative pros and cons of migrations and declarative, or state-based, approaches in this blog post.
That said, for our example, a lightweight migrations tool will come in very handy. We will only be using it to manage a small subset of our tables, so it’s unlikely we’ll be making regular changes and suffer these issues.
Choosing a migrations tool
In many ways, the most obvious choice for a migrations tool would have been to use the migrations feature built into SQL Source Control, since this is a tool I already use in this solution. It allows users to provide a specific upgrade script to override the default-generated code, in cases where the software struggles to create reliable upgrade scripts (column splits, new NOT NULL columns, and so on).
Had we been able to solve this problem the ‘right way’, by standardizing tables across all environments first, then I would almost certainly have done so. However, this is a ‘non-standard’ solution. SQL Source Control expects the database to be in a specific state after running the migration. Given that we expect our different environments to be in a customized state after the migration, the migrations feature of SQL Source Control will not solve our problem.
A more advanced migrations-first hybrid tool from Redgate called ReadyRoll provides a lot of features that would be very useful, such as auto generation of upgrade scripts, and simpler integration with deployment tools like Octopus Deploy.
However, it offers other features (e.g. drift checks, deployment reports) that simply won’t work in our non-standard solution. It also introduces a dependency on Visual Studio as our IDE for particular objects and MSBuild to run builds/deployments. Finally, there is not an easy way to permanently filter out all the standard (non-customized) objects from the project.
For non-standard requirements, such as exist for this problem, I favor simplicity and flexibility above all else. For these reasons, in this example, I chose to use a much simpler (and more limited) open source tool called Flyway.
One of the other open source migrations tools, such as DbUp or DBDeploy, could easily have worked too, and if you use one of those you can probably adopt the same principles I describe here with that tool.
How does the pre-deployment migration solution work?
We’ll add a separate instance of the migrations tool to each environment that contains tables that have been customized, and so are different from the master image in source control. At deployment time we will use a pre-deploy script to run the migrations for the particular target environment to get those customized tables up to the latest version. We’ll also filter these tables out of our main deployment with SQL Compare.
The result will be a process that works as follows:
- Migrations tool updates customized tables to latest version
- SQL Compare syncs standard objects (the ones that are the same in all environments) only, using filters to exclude objects covered by steps 1 and 3 as well as any objects that should not exist in the target environment
- Post-deploy drops/creates all customized programmable objects to enforce desired state
Setting up the migrations framework for SQL Server
We need to add a separate instance of Flyway to the source code for each database that has tables with known differences from the ‘master’ source control version. Download the Flyway command line tool and extract the zip to: Documents\Ponies_dev\customTableMigrations\FluttershyMigrations
To see the Flyway application, look inside: Documents\Ponies_dev\customTableMigrations\FluttershyMigrations\flyway-<version_number>
From now on, I’ll refer to this directory as the <FlywayHome
> directory, for brevity. Full documentation for Flyway is available here, but the three important things to know are that you:
- Add your migration scripts to
<FlywayHome>/sql
directory - Configure the target database in the config file
<FlywayHome>/conf/flyway.conf
- Run the scripts by calling
<FlywayHome>/flyway.cmd
with various arguments
Flyway needs the Microsoft JDBC driver to connect to SQL Server. Download and extract it to the recommended directory (Program Files\Microsoft JDBC Driver 6.0 for SQL Server
) and then copy the file sqljdbc4.jar to <FlywayHome>/drivers
.
Open <FlywayHome>/conf/flyway.conf
in Notepad++ or similar. You’ll see a long list of parameters, but you only need to set one: flyway.url
. You’ll see examples for various database platforms including SQL Server. Since I’m using a non-default instance, I used flyway.url=jdbc:jtds:sqlserver://WIN2012R2:1433/Fluttershy;instance=DEV.
(It took me ages to work out my problem was that WIN2012R2\DEV:1433 doesn’t work.)
Finally, follow these instructions to enable TCP/IP for your SQL Server instance.
Creating the pre-deploy migration scripts for deploying customized tables
Create a script for the initial creation of all the tables you plan to manage with Flyway and save it to <FlywayHome>/sql/V1__Baseline.sql
. Note that the naming convention is important: <VersionNumber><DoubleUnderscore><Description>.sql
Typically, V1 is used for baselining. After that, timestamps are a handy convention, if in doubt. Since we just have one customized table in our target database, Fluttershy
, our migration script is straight forward; simply script the table as CREATE
from SSMS, dropping the USE statement:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SET ANSI_NULLS ON; GO SET QUOTED_IDENTIFIER ON; GO CREATE TABLE [dbo].[tblOnBothButDifferent] ( [Numbers] [INT] NULL, [Words] [NVARCHAR](50) NULL, [SpecialFluttershyColumn] [NVARCHAR](50) NULL ) ON [PRIMARY]; GO |
Listing 1: V1_Baseline.sql
Now we need to tell Flyway that the Fluttershy
database has already been baselined (we do not need to recreate the table). To do this, open a command prompt and run <FlywayHome>/flyway baseline
. You’ll be prompted for a user/password.
Figure 1
Now, whenever you need to make a change to the Flyway version of a table you can simply add a new script in <FlywayHome>/sql
and run <FlywayHome>/flyway migrate
.
For example, let’s say we need to add a new column to our customized table, as shown in Listing 2:
1 2 |
ALTER TABLE dbo.tblOnBothButDifferent ADD newCol VARCHAR(20) NULL |
Listing 2: A migration script for Fluttershy database
Just copy that script into a file called <FlywayHome>/sql/V201611241507__NewColumn
and then run the command FlywayHome>/flyway migrate
, and you’ll deploy a new column to Fluttershy
.
Use SQL Compare to sync non-customized objects
Open up Redgate SQL Compare and update your Fluttershy
filter again to ignore all the tables handled by Flyway. You’ll also need to exclude the schema_version
table that was added by Flyway.
Now, of course, we’re handling deployments of each object in different ways:
- Our ‘in both, but different’ tables are managed with pre-deploy migration scripts in Flyway
- Our ‘in both, but different’ code objects are managed with post-deploy scripts in SQL Compare
- Our ‘only in one database’ objects are managed by filters in SQL Compare
- All ‘identical’ objects are managed by the default SQL Compare behavior
We can automate our complete deployment of Fluttershy
as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# This script assumes the user has followed the instructions listed in part 1 and part 2 # of this blog post: # http://www.red-gate.com/blog/database-lifecycle-management/how-to-build-multiple-database-versions-from-the-same-source-using-sql-compare-filters $ErrorActionPreference = "stop" # Required variables $user = "YOUR_USERNAME" $password = "YOUR_PASSWORD" $serverinstance = "YOUR_SERVER\YOUR_INSTANCE" # Derived variables $myDocuments = [Environment]::GetFolderPath("MyDocuments") $flyway = "$myDocuments\Ponies_dev\customTableMigrations\fluttershyMigrations\flyway-*/flyway.cmd" $masterSourceCode = "$myDocuments/Ponies_dev/state" # Pre deploy table migration $arguments = @( "migrate" "-user=$user", "-password=$password" ) & $flyway $arguments # Regular SQL Compare deployment step (using filters) $fluttershyDb = New-DlmDatabaseConnection -ServerInstance $serverInstance -Database Fluttershy | Test-DlmDatabaseConnection $filterPath = "$myDocuments\Ponies_dev\filters\fluttershy.scpf" Sync-DlmDatabaseSchema -Source $masterSourceCode -Target $fluttershyDb -filterPath $filterPath # Post-deploy script invoke-sqlcmd -inputfile "$myDocuments\Ponies_dev\postDeploys\fluttershyPostDeploy.sql" -ServerInstance $serverInstance -Database Fluttershy -Username $user -Password $password |
Listing 3: DeployFluttershy.ps1
Note that to deploy our master database (Twilight Sparkle
) we do not need the pre-deploy Flyway step or the post-deploy script.
When you have more than two versions of a database
What if I also have a Princess Luna
database? And an Applejack
database? Well now things very quickly become much more complicated. Twilight Sparkle
remains our master version, and dev includes all the objects for all four databases. However, you now need four filters (one for each database) and three post-deploy scripts and Flyway projects (one each for Fluttershy
, Princess Luna
, and Applejack
).
Also, while it feels like you have three relationships to maintain (Twilight Sparkle
vs each of the other three) you actually have six:
Figure 2
Even though you are not maintaining an upgrade script from Fluttershy
to Applejack
, for example, consider the case where Fluttershy
and Applejack
have a similarity. There is something unique to those databases. Now you need to duplicate your change in two places (to both applejackPostDeploy.sql
and fluttershyPostDeploy.sql
for example) and if this object changes you need to update the source code in two places – so you do need to consider the relationship between them which makes your life more complicated.
As the number of versions grows, the complexity scales based on the pattern of triangular numbers. Once you get beyond three or four versions, it very quickly becomes practically impossible. With 15 production versions, for example, you have over 105 relationships.
The client I was working with recently initially had six production versions, although a few were very similar. They were able to bring some of the databases in line fairly quickly, meaning they only had to worry about three production versions moving forwards. This reduced their complexity from 15 relationships down to three – and they were able to standardize all their tables, meaning they could cut the additional complexity of a migrations tool from their source code.
Eventually they do plan to remove all the customizations on their production databases but that will not be an easy or fast exercise. In the meantime, by following the process described in part 1 and part 2 of this series (as discussed above, they did not have customized tables, hence part 3 was not relevant to them), they are able to apply some sort of change management while solving the complicated issues involved in standardizing their schemas.
My favorite consequence of the exercise, however, is that having experienced the pain of having to manage their customizations, they have accelerated their plans to standardize to a single production version. A decision I wholeheartedly approve of.
Limitations
There are several obvious drawbacks with the approach I’ve described in this article.
- You may have ‘It works on my machine’ issues in development, since the master database (
Twilight Sparkle
) contains a hybrid of objects that will exist/not exist/be different in the various customized environments. We mitigated this by setting up automated deployments to a set of environments that matched each of our customized states from our build server, so broken dependencies were spotted within minutes of checking in the code. - As described in this series, deployments will occur in three separate transactions, so it’s worth extending the solution to wrap all the commands/scripts into a single transaction.
- If you make a mistake with your pre-deploy migrations, it might not be easy to spot/fix.
- Maintaining objects handled pre- and post-deploy phases is unintuitive. On the other hand, having all customizations coded out like this makes everyone more aware of the problem and encourages people to avoid making it worse.
- This solution doesn’t scale very well beyond three or four production versions. But then again, it’s a fundamentally complicated problem so not many solutions will.
Other solutions
It’s worth mentioning some of the other solutions we considered:
- 100% of non-table customizations handled with post-deploy scripts (no filters): This is fundamentally more complicated. It’s much easier to deal declaratively with objects, especially programmable objects, wherever possible. This is especially true with large, complicated projects with many developers making many changes.
- 100% of non-table customizations handled with filters (no pre/post-deploy): This would not handle the exists ‘in both but different’ scenario.
- Branches in source control: It was suggested we could use a different branch for each production version. This would effectively create a huge amount of code duplication and involve very complicated merges. Inevitably, the different versions would grow more divergent over time.
- Maintain each version separately: This was effectively the white flag option, if we accepted that these versions were never going to be realigned and would forevermore be managed separately. It would save us the complications of the filters, pre- and post-deploy steps, but it’s very unlikely it would work long term.
- Just get rid of the customizations already: Setting up source control like this is complicated and in some scenarios it was easier just to fix the customization than to codify it in source control – and that’s great. Wherever possible, fix the underlying problem. Unfortunately, in our case there were some customizations that did not have a quick or simple solution, so in the meantime it was easier for us to at least get the database in source control.
Summary
What I’ve detailed is an effective way to work around and visualize a problem. It’s a bit dirty, it’s not something I’ll proudly tell the grandkids about, but it works.
As data professionals, we should aim to avoid getting into this scenario and the best way to solve this problem is to fix the root cause and get our databases in sync. However, when that is not possible or practical in the short term, employing strategies such as those discussed in this series can allow us to apply some sort of version control.
After all, if we are going to take on the complicated task of refactoring our databases back to a standard version, it’s probably a good idea to get the databases under source control before we start hacking away at production.
Read part 1 of this series if you’d like to know how to use SQL Compare object filters to restrict deployments to a certain set of database objects.
Read part 2 if you’d like to learn how to use post-deployment scripts to deal with deployments where code objects are different in source and target.