Product articles Flyway Database Versioning
Flyway How-tos: a User’s…

Flyway How-tos: a User’s Perspective

Flyway provides a database-independent way for a team to track, manage and apply database changes, while maintaining strict control of database versions. It updates a database by running a series of versioned migration scripts, in order, and keeps track of all the changes in a special "schema history" table. It sounds simple, but it is easy to derail this team discipline if you don't find the right answers to the following questions…

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

How can I tell which objects changed between versions? There must be an easier way than inspecting a long chain of migration files…?

If you’re using Flyway Community, then you can generate a build script for the database after every migration run. To make this automatic, you can run the code to generate the build script as an AfterVersioned callback. For an example of generating build scripts for PostgreSQL, using the pg_dump and psql utilities, see: Getting an Overview of Changes to a PostgreSQL Database using Flyway.

If you inspect these build scripts, you’ll get a good idea what is in the database, especially if you include the DDL-based documentation (comments in PostgreSQL or Oracle, extended properties in SQL Server). You can save these build scripts for each version, in source control. Unfortunately, it can be disappointing to try to compare versions with File DIFF tools, because you’ll get false positives whenever the order in which objects are scripted changes.

An easier approach, if you have access to a schema comparison tool for your RDBMS, is to save for every version a directory of individual build scripts for each database object. This allows for a quick comparison within scripting and can be used by a Redate Schema Compare tool as a database source or target for comparing versions of the database. For an example, see Creating Flyway Migration Files using Redgate Schema Comparison Tools.

How do I exclude the Flyway schema history table from scripts and comparisons?

Most relational databases provide ways of excluding tables from backups. For example, if you’re generating scripts for PostgreSQL then you can exclude tables from the pg_dump backup using the --exclude-table="tablename" switch. MySQL’s mysqldump uses --ignore-table=tablename.

If you are using a Redgate schema comparison tool, either you’ll need to use the /exclude:table:NameOfTable switch, or use the GUI to define a filter that exclude flyway_schema_history table and then save it as an XML file (.scpf) that you can specify in all schema comparisons. See Creating Flyway Migration Files using Redgate Schema Comparison Tools.

Unfortunately, this flyway schema history table can be renamed within a project or even as a parameter when running Flyway, so you’ll have to change the filter as well, if you want to avoid this table getting into the source control for the database. For more information see: Exploring the Flyway Schema History Table.

I need static data in the database. How do I do this?

I think the best way to handle static data in a SQL database is to use views. Static data includes all the data that the database needs to function, such as countries, currencies and so on. We simply use the view as if it were a read-only table.

If we then wish to seed an ordinary table, such as a currency table, with some static data that can be expanded, you just have a second view that UNIONs this view with a conventional table that is initially empty.

These ‘static data views’ are DDL and will be included in all build scripts, whereas data inserts aren’t. They are read-only because they don’t reference any underlying table, and so a hacker cannot add a spurious currency, for example. Here is a rather impractical example that allows you to lookup the words used in the Brythonic language for counting up to twenty.

As you see, the data is in the definition of the view and is part of the DDL, so will be part of the schema, not the data.

If your relational database doesn’t yet support this multi-row value syntax, just about every relational database will support this UNION ALL syntax. For a view with static read-only data in it:

Is it OK to make changes to the database’s metadata outside of Flyway?

In general, no. To ensure that metadata in the database always corresponds to a particular version, and thereby ensure that changes can be reliably deployed, all database changes must be made by Flyway.

That said, a developer working in an isolated branch might wish to make changes using tools other than Flyway. In this case, you need to create a ‘branch’, at an exact version of a database, and when you’re happy with the changes and have tested them, then merge your work into the development database via a migration. My Flyway Teamwork PowerShell framework makes this a lot easier and Creating Flyway Migration Files using Redgate Schema Comparison Tools shows to capture development changes on a branch into a migration, retrospectively.

How do I fix a fault with an existing migration?

The only way to change a development database, or to be precise, the database in the develop or main branch, is to add migrations. There isn’t really a safe alternative. You can’t dive in and run DDL statements outside of a migration to correct a bug.

If you’re using Flyway Teams, you might consider using an UNDO script but beware that UNDOs run a risk of version-confusion. For example, let’s say the latest development version is V1.3. You then undo back to V1.2 version (by running the U1.3 script) and make a change to the V1.3 migration. However, in the meantime another developer has already created a branch off the original V1.3 version. Now you have two V1.3 databases that are, in fact, different.

For more information see Flyway Gotchas.

How can I test a migration script to make sure it works before I run “migrate”?

It depends on your team and your facilities. If you have a database system that can run DDL scripts within transactions, such as PostgreSQL, DB2, Derby and SQL Server, then the easiest way to test a migration is simply to run the script in your query editor within a transaction but without a COMMIT. You can test it and make sure that there are no obvious errors and then roll back. Obviously, this sort of work is best isolated in a branch, because it is likely to block other subsequent access to the database while the transaction is maintained, but this is where you’d test migrations. I explain how to do this for SQL Server in Testing Flyway Migrations Using Transactions.

Some database systems can’t support this technique. This generally doesn’t cause a problem because Flyway’s party trick is to build databases rapidly, reliably and accurately so it is easy to get into a build-up/tear-down way of working that allows you to do a migration run, test, and then clear the database. In a branch, you have no need to run a complete set of historic migrations, because you can easily baseline the database within a branch. There are also Clone and container technologies that can be used to do this sort of test.

Flyway Teams supports the use of UNDO migrations that are useful for rapid development work, but they must be developed and tested alongside the versioned migration. They are very handy in a branch but not usable where there are any risks to integrity of a version in the main or develop branches. See Secrets of Flyway Teams Undo.

How do I incorporate database unit tests into Flyway migrations?

Unit tests, for databases, ensure that isolated, deterministic components of the database such as functions and procedures, treated independently, produce the correct output for every test input. Usually, they are automated.

Typically, unit tests are stored with the source of the component being tested. A component can be one or more functions, procedures, views or tables. Take a table, or group of tables, for example: the constraints and triggers of a table must be tested. The indexes and keys of a group of tables that together form a component need to be checked for performance, and the datatypes will need to be checked for consistency, and to ensure they cope with the predicted edge cases.

In Flyway, you are in luck, because one of the peculiarities of a migration is that it allows DML as well as DDL, so you can incorporate the tests directly into the migration that create or alters the table or code module. Also, it is easy to signal a failure by creating an error, in which case Flyway will roll back the script, assuming the database system allows DDL rollback (DB2, PostgreSQL, Derby, EnterpriseDB and SQL Server). This means we can insert test procedures that are written so that they clean up after themselves.

In my Pubs Sample database project for Flyway, I’ve added a SQL Server procedure in the migration V1.1.11__AddProcedureWithTest.sql that adds a series of tests. If any of test fails, then the migration fails. My article Testing Flyway Migrations Using Transactions shows an example for SQL Server, adding unit tests to your migrations that deliberately raise an SQL error message, with the Raiserror statement, if a test fails.

How do I test a feature I’ve developed using Flyway before I commit it to Development?

Each new version must be tested before the migration script that creates it is committed to the Develop branch, and therefore added to the migrations to be applied by a versioning (forward migration) run. If you test a new feature only after it is added to the development branch, you are likely to bulk up migration runs with lots of “small-change” migrations. This can eventually lead to performance problems.

The idea of running integration tests on a migration before you apply it shouldn’t seem foreign. After all, when you upgrade the production database, you don’t wait until after you’ve done the migration to test it. All migrations are tested in the branch where the database development work is done. This is why branches shouldn’t be routinely updated with alterations to the parent branch. If you find a bug, you need to know what change caused and this isn’t always obvious if you’re continually pulling in changes from the parent. Testing cannot be done on a database that is shifting its version.

With a branch that has no child branch of its own, you can take a lot of liberties, but even here it helps to retain version discipline so that the people doing the testing, usually just the dev doing the work, know exactly what they are testing. Small-change migrations don’t matter in a branch, because they will all be consolidated into the final tested migration that merges the work into the parent branch.

The other part of this question is really about the mechanics of database testing. Each database has its own exact answer, but in general, you must first get agreement on what exactly the feature should do, the calculations made, and the correct result of any data manipulation or aggregation. You then set up a test harness, perhaps just a few SQL Statements, maybe a small application or script, that can be run repeatedly, and then use it to make sure that the feature produces the correct response to every predictable input.

See Testing a Flyway Database Migration.

What should be saved in source control?

I don’t like to make product-specific advice about database source control, so I’ll look at it from the perspective of the requirements.

Every migration must be saved the first time it is applied. Although migrations are supposedly immutable, you occasionally find changes to the file, especially if you are using the UNDO feature, so these changes need to be recorded. In addition, every version must save at least a build script for the database and, ideally, a set of object-level build scripts, or at least a JSON model of the database. My Flyway Teamwork framework does this for you for every branch of development.

It is, I believe, important to be able to detect object-level changes, what objects are being changed, by whom and when, and so it is important to preserve the database in source control in such a way that it is easy for a textual comparison to display changes reliably, and to record who was responsible for each contribution to the migration and when it was first applied. As well as the scripts, I therefore add the information that Flyway provides in its info command. See Database Development Visibility using Flyway and PowerShell for a demo of how to save a JSON-based table of the recent history of how the database was built.

How do I know a database really is at the version Flyway says it is?

Flyway makes it very easy to provision copies of any version of the database. Assuming no developer subsequently applies ‘uncontrolled’ changes, you can be confident that these databases are at the versions that the Flyway Schema History table says they are.

However, developers have a strange, unnatural inclination toward making uncontrolled changes, often for the purest of reasons. The result is called ‘drift’ which will make deployments more likely to fail and will invalidate test runs. See Flyway Database Drift and How it Happens.

The simplest check is to create a JSON model of the database at every version, the first time that version was reached, and save it to source control. Then you compare this model with one generated from the suspect database to see if there are any changes. If you are fortunate enough to have a Redgate schema comparison tool you can do this simply via a GUI or via a script. See Detecting Database Drift during Flyway Database Development.

Our migration runs now have over 200 migration files and are getting slow. What should I do?

Flyway must check every migration file every time it starts to make sure that nothing has changed, so all actions are likely to slow down as the number of migration files increases. In the zeal to preserve every twist and turn of database development, it is possible to lose sight of the purpose of using Migrations. When development work is beginning to disappear into the mists of time, and you no longer refer to them, it is the signal to ‘baseline’ the database. This tells Flyway the historical version number beyond which you’ve lost interest in how the version was reached.

You still need the build script that results from the migrations you no longer require, and you still have the record of old migrations in source control, but you don’t need to recapitulate the whole migration process, because the build script will take you to the baseline. I’ve given more details in Flyway Baselines and Consolidations.

If you are regularly getting large migration runs, I would consider a better strategy for development work, so that migrations can be consolidated. Many interim migrations are better consolidated into few that each correspond to a release. There is little point in retaining all the migration files beyond the branch where they were created. A branch can usually be merged into its parent branch via a single migration file. See Flyway Branching Walkthrough for a demo.

We’re mainly making changed to code modules. Is it OK to use Repeatable migrations to limit the number of migration scripts?

In my opinion, no. The suggested use of Repeatable migrations is for bulk data inserts and for creating (and recreating) packages and procedures, but there are more effective ways of doing this. For example, see Dealing with Database Data and Metadata in Flyway Developments for how I handle bulk data inserts.

Repeatable migrations have no version; they allow you to make ad-hoc changes to a database that aren’t part of a Flyway versioned migration. As such, they are an opportunity for development teams to compromise the whole purpose of Flyway, which is to be able, with confidence, to ascribe a definite version to a database.

For the best answer to this question, I must quote the official Flyway FAQ: “No. One of the prerequisites for being able to rely on the metadata in the database and having reliable migrations is that ALL database changes are made by Flyway. No exceptions. The price for this reliability is discipline. Ad hoc changes have no room here as they will literally sabotage your confidence.” This is referring to allowing structural changes to the database outside of Flyway, but this is effectively what Repeatable migrations allow.

Our CI development environment uses PowerShell scripting, not Java. How would we incorporate Flyway?

Flyway Teams now supports PowerShell scripts as callbacks well as Java. You can even use batch scripts if you need to. Because Flyway is based on a command-line interface, it slots easily into the typical DevOps scripting that is designed for this. There isn’t a Flyway cmdlet because it isn’t necessary. The callback architecture provides all the extensibility and is far easier to customize. See: Scripting with Flyway Teams and PowerShell for an overview of using PowerShell callbacks. For examples, see: Bulk Loading Data via a PowerShell Script in Flyway, A Flyway Teams Callback Script for Auditing SQL Migrations.

What information should go in the Flyway.conf files and what should I pass in as command line parameters or Environment Variables?

My advice is, wherever possible, to use the Flyway.conf files to store information. There are three of these that are read by Flyway when it starts. There is one in the installation directory, in which you put installation configuration items, another in your home directory for personal information and a third in the current working directory for all the project-based information. See A Programmer’s Guide to Flyway Configuration for more details.

If you can provide all the information needed in a config file, then you need only type in “flyway” from a console followed by the operation you require. This really helps when you are busy with a repetitive task. See, for example, Flyway Without the Typing. The same applies if you are using an application, such as a text editor, that only allows a simple CLI interface with external apps such as Flyway.

There is a snag, though. You can’t store UserIDs and passwords in the Flyway.conf of the current working directory, especially if it is a shared drive, or likely to be archived or placed in source control.

You can pass a UserID and password as parameters and leave the rest of the information in the appropriate Flyway.conf file. Of course, the downside of this is the repetitive typing of UserID and passwords. Credentials spoil a nice simple system. Alternatively, if you only ever access one database server, then you can keep default credentials in the Flyway.conf file that is sited more securely in your user area. However, you’re need quite a few of these credentials to be stored securely if you work with more than one database server.

You can place credentials, or any Flyway parameter, in session-based environment variables that are then automatically used by Flyway, so this is a good alternative for volatile information such as credentials. These environment variables are intrinsic to DOS batch and Bash scripting and essential when you are using this type of script instead of PowerShell.

The Flyway Teamwork framework adds the password as an environment variable for you from an encrypted store in your user area, so you only need to type it once. These variables are reasonably secure on windows. These environment variables will persist when the batch file, session or script finishes, unless you explicitly clean them out at the end of the batch. Environment variables that are declared in a session or batch “belong” to the copy of CMD that runs the Command Prompt window and any batch files in it. This means that they are visible to subsequent batch files on the same machine. You can never assume that such environment variables are undefined before the batch file starts.

If you predominantly use scripting, and I’m a keen scripter, you have a bit more flexibility in your approach, but it really helps in avoiding mistakes if you have a rule that your current location in the filesystem determines the location in the database project you’re working on. In the framework, I gather up all the possible sources of information into a single hashtable, so it becomes easy to know what is going on.

To sum up: keep information about the project, such as the server, database, locations and placeholders in the directory that is the working directory for the current branch of the project. Place credentials as environment variables for Flyway Community (deleted at the end of the session) and investigate using ‘secrets management’ in Flyway Teams.

Tools in this post

Flyway

DevOps for the Database

Find out more