Flyway Gotchas

Explaining some of the 'gotchas' that can trip up the unwary Flyway user, and how to avoid them. One or two of these you'll encounter quickly, such as the case-sensitivity of parameters and arguments. Others, such as potential problems with undo scripts or running scripted callbacks, only when you are tackling more complex development processes.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Every application, especially such an open-ended one as Flyway, has the occasional quirk. There are very few that one could call a bug, but they can catch the unwary.

Avoiding Flyway Gotchas

I’ll cover the following common ‘gotchas’ that will cause immediate errors or eventual problems, when using Flyway:

Flyway migration files aren’t encoded as UTF-8

Windows-based applications will generally read a variety of text files, no matter how they are encoded. The complications that enable them to do so are buried deep in the NET framework and allow for different writing systems and cultures to be accommodated. It also allows Windows to comply with internationals standard such as ISO for text encoding as well as the five Unicode standards. There are around 130 basic standards.

Out of the box, however, Flyway reads only UTF-8, or Windows 1252, reliably and you can encounter strange ‘checksum’ errors for other encodings:

Flyway : ERROR: Unable to calculate checksum of V1.0.0__FirstRelease.sql
At line:3 char:1
+ Flyway $pword info
+ ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo: NotSpecified: (ERROR: Unable t...irstRelease.sql:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandErrorInput length = 1
Caused by: java.nio.charset.MalformedInputException: Input length = 1

Flyway can make an educated guess at character encoding and get it wrong. Sometimes Flyway will think it has read a .conf file, for example, and misinterpret it. It can then get so confused that it produces an inappropriate error such as needing Flyway Teams or not being able to reach a database. In the heat of the moment, it is very easy to forget that files need to be converted.

Checking the encoding

Here is a little bit of code I use to check for migration and config files. It lists all the files that are not ‘UTF8 with BOM’. In this case, I’m checking a Flyway Teams installation with branches. If it does not have a UTF8 BOM, then it is listed as a potential issue. It could, of course, still be UTF-8 without a BOM.

Setting or fixing the encoding…

When you create your own migration files, they are best encoded as UTF-8, as this is what Flyway expects.

…In Flyway

You can use the -configFileEncoding and -encoding parameters to specify US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16. You will find that SQL Files, script files and .conf files are best written as UTF-8 Unicode. If you hit problems, using the ISO-8859-1 option will fix most problems.

You can specify the encoding in the Flyway.conf file but you need to be aware that if you change the encoding after migrations have been run, the calculated checksums will be invalidated, and will then require a flyway repair.

…In your RDBMS’s query editor

SQL Server has particular problems with encoding. SSMS saves script files as UTF-16 little-endian. It will edit UTF-8 files but save them as utf16 – little endian. To change its behavior, you need to change the encoding for the sample SQLFile.sql stored in various places according to version. This is where my user and machine defaults are stored. Other than doing this, there seem to be no supported way of changing the default encoding for SSMS.

Similarly, most utilities that output SQL files can be configured to produce UTF8 output. PostgreSQL’s pg_dump, for example, allows you to create UTF8 files with the –encoding=UTF8 parameter

…In Redgate tools

Redgate utilities such as SQL Compare and SQL Change Automation also save these script files as UTF-16 little-endian. With SQL Compare and SQL Data Compare users can specify the output type for scripts in the Tools > Applications-General window:

Set UTF8 encoding in SQL Compare

…Using Notepad

Microsoft recommend using Notepad to convert a file to UTF-8. Open the migration file in Notepad, and select Notepad’s ‘Save as’ menu. This has an Encoding’ combo box. If it isn’t showing UTF8, then select it and save. It is very easy to check the codes of all your Flyway files using a byte-order marker (BOM) at the start of a file. This is a pretty good indication that your files are UTF8 though the standard doesn’t actually recommend a BOM.

Set UTF8 encoding in Notepad

…In PowerShell

In PowerShell, you will always get UTF-8 if you use the [IO.File]::WriteAllLines() string method. If you run …

…in your PowerShell script, then all output redirected to file will be UTF8.

Forgetting about Flyway case-sensitivity

Flyway commands, parameters, and configuration items are case-sensitive.

PS \\Pubs\branches\develop> Flyway $pword Info
Flyway : ERROR: Invalid argument: Info
At line:1 char:1
+ Flyway $pword Info
+ ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (ERROR: Invalid argument: Info:Str 
   ing) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

This includes schema names and the name of the flyway_schema_history table. An exception to this rule are flyway.placeholders: these custom placeholders are recognized whatever their case.

This case-sensitivity (and accent-sensitivity) is a surprising survival in an age that tries to be inclusive of all cultures. It is mainly due to the Unix culture of Java, which was originally designed at Sun Microsystems. Unix was developed at a time when the complex issues of representing text in all cultures, and sorting text by collation, was being kicked down the road.

Nowadays, the idea that capital letters are significant is distinctly odd. After all, you wouldn’t think that calling your children Child, cHild, child and chilD would help you call them to dinner. However, case-sensitivity is an insidious problem that we tend to avoid by trying to stick to lower-case letters.

It’s not just with Flyway that you can get caught out unless you stick to using lowercase. The SQL language is case-insensitive (not case-sensitive). It gets more complicated. macOS is Unix-based but it uses a default file system type (HFS+) that is not case-sensitive. However, macOS also supports UFS volumes, which are case-sensitive just as on any other Unix system. Some SQL implementations get very confusing. Oracle word queries text comparisons are, by default, case sensitive. In MySQL, table aliases are case-sensitive on Unix, but not so on Windows or macOS. In SQL Server, you can opt for a case-sensitive database.

Confusion with schemas

Different database systems attach a different meaning to the term ‘schema’ and ‘default schema’. In MySQL, for instance, the terms ‘database’ and ‘schema’ are both used to mean ‘database’ whereas SQL Server, PostgreSQL and Oracle allows you to have several schemas, each of which is a ‘namespace’ associated with, or ‘owned by’, a user or role. This is designed to make it easier to implement role-based access to database objects and data. Flyway needs, and expects to have, at least a comma-separated list of schemas supplied in the config, or as a parameter. The first schema in the list is assumed to be the schema that Flyway uses to store the Flyway schema history table. To avoid any problems, it is best to provide all your schemas in this list, with your default schema as first in the list.

The most obvious reason that Flyway needs a list of schemas is so that it knows which schemas are in your project. There may be other schemas that are maintained separately. It can then effectively use the CLEAN command. If you leave out a schema then CLEAN will leave that schema out of the CLEAN operation. This is fine, but then, if you run a CLEAN followed by a MIGRATE that makes a change to an ‘unlisted’ schema, such as creating a table, on the assumption that the CLEAN operation cleaned out what you consider to be the entire project (the whole database), you’ll get an error saying that the table has already been created. In fact, you can have several different errors because, basically, the cleaning of that schema is outside Flyway’s control.

This might seem an unnecessarily strict system, but it has advantages. It means that Flyway can support several projects on the one database, for any RDBMS that supports schemas. You’d want this if part of your database is maintained by a third-party vendor or remote team. You could really only coordinate this if the shared base tables in the default schema are already stable in their design, both in terms of columns and constraints. Otherwise, coordination and testing activities are likely to expand greatly in complexity

Bad UNDOs

SQL allows you to leave out the names of constraints such as primary keys and foreign keys. When you do this, the database system makes up a name based on the objects involved. You can only have one primary key, so it is quite safe for the database system to default to a name like PK_MyTableName. A column can only have one default so again it is safe to call it DF_MyTable_MyColumn. However, a column can have several check constraints, or be involved in several foreign key constraints, so their names will need an extra ‘uniqueifier’. Most RDBMSs get around this by adding a random string to these generated names. So far, so good.

However, if you let the RDBMS generate names for your constraints, rather than naming them explicitly, then you can’t easily create reliable undo (U) scripts in Flyway. If you are adding, deleting or altering constraints in existing tables, within a migration, and if you then want a companion UNDO script, then you have a problem; the undo script will need to specify the name of the constraints and this name will change every time you perform the UNDO and then subsequently rerun the migration. In other words, each undo script will apply only to the one instance of the DO (versioned-V) script; it will work once but never again.

I’ve seen cases where running certain undo scripts that use the original, generated name of a constraint, to rollback table changes, can cause a subsequent CLEAN operation to crash:

Flyway : ERROR: Unable to clean schema [dbo]
At line:31 char:1
+ Flyway $pword clean
+ ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (ERROR: Unable to clean schema [dbo]:String) [] 
   , RemoteException
    + FullyQualifiedErrorId : NativeCommandError
 
----------------------------
SQL State  : S0000
Error Code : 3725
Message    : The constraint 'PK__Word__9CBB9A4F1E216A15' is being referenced by table 
'WordOccurence', foreign key constraint 'FK__WordOccure__Item__5224328E'.
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException:

You can get around this crash by deleting and recreating the database, but that’s awkward.

Leaving out the constraint names will cause other problems too:

  • You can’t temporarily disable foreign key constraints by name when you insert data.
  • Your automatically generated build scripts will be different every time you rebuild the database.
  • If you compare databases built from the same script, but at different times, you will find them to be different. In both cases, it is because the uniqueifiers will be different every time that database object is built.

If you use a schema comparison tool, you can get around the last problem by setting the option to ‘Ignore system named constraint and index names‘ (See my earlier article, Handling System-named Constraints in SQL Compare). In effect, this SQL Compare treats system-named constraints as “unnamed”. If the constraint is there is both scripts and does the same thing, then it is the same constraint. If you are comparing scripts by using string-based diff tools, then you are out of luck.

There are solutions to each problem, but the easiest solution is to give explicit names to every UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, and DEFAULT constraint. Most RDBMSs provide easy access to the metadata, so it is easy to spot the rogue constraints when you are testing a migration.

Using undos in the development or main branch

If you need to rely on the integrity of the metadata in the database, and if you need reliable migrations, then all database changes in the development (or main) branch of a database must be made by versioned migrations only. Undos are fine in  short-lived feature branches, but nowhere else.

The problem is that if you undo a migration, you can then alter the code for the migration before repeating it. The version will have changed. What if a copy of this version has already been used elsewhere, maybe for a branch, or for testing, perhaps? A change to a migration that has, at some point, been successfully applied can introduce problems. Even using an Undo to go back and add an index can trip up a subsequent migration. See Flyway Database Drift and How it Happens for more on this topic.

Apparent lock-up of Flyway due to database blocking

Where possible, Flyway will do a migration within a transaction and because a migration will be executing DDL SQL statements, it is likely to be putting a lot of locks on database objects. This won’t be a problem because all the work is done within the connection, and you’ve ensured that the database isn’t dealing with requests from other connections. Once the connection commits the transaction, then all is well for other connections.

If, however, you create a scripted ‘afterEach …’ callback (Java, PowerShell, batch and so on), and if that script, when it is called, creates a new connection to the database, or runs a program that does so, then you could be in trouble. You can get Flyway to ‘lock up’.

SQL Callbacks are never a problem because they are executed in the same connection as the migration itself, within the same transaction. However, a new connection created within a script such as PowerShell is likely to want access a database object that is locked by the migration’s connection. No problem, one might think, because it can wait. Unfortunately, both connections are waiting on the other. Flyway cannot close the transaction before the callback finishes, and the callback is blocked. There is no easy way out of this. Flyway appears to lock up, but it has no way of knowing that the two processes are engaged in an endless ‘after you’, ‘No, after you’ dialog.

The immediate solution is to kill one of the processes, so that the transaction ends and rolls back the migration. Unfortunately, the ODBC connectionTimeout value refers only to the time spent in connecting to the database so doesn’t help. Anyway, if it is a program such as a scripting tool that you’re running, then you have no control over the nature of the database connection.

For this reason, I never execute scripted migrations that create a connection to the database as afterEach callbacks. However, avoiding their use can cause certain problems because sometimes, you really do need a process to take place after each migration. For example, you might need to keep a record of all changes to the database, or update source control so that source control reflects the state of the reference database.

In most cases such as these, you really don’t need to do this within a transaction. Instead, of using a callback, you can explicitly do all the migrations that haven’t yet been applied one after another and execute the process you need that requires a connection, as each migration completes.

You can simply provide a list ….

Or, you can see what migrations are available and you can merely provide the start and end migration.

Conclusions

My best advice for when you are using Flyway is:

  • Don’t use anonymous constraints.
  • Ensure that every file used by Flyway is in UTF-8 character encoding.
  • Be careful and consistent with capitalization of all parameters and values in Config files and the commandline. Lower-case generally keeps you out of trouble.
  • Only use UNDO migrations in feature branches
  • Don’t make SQL Calls in ‘afterEach’ callbacks, which use a separate connection to the database.
  • Always provide Flyway with a complete list of all the schemas within your project, with the schema hosting the Flyway schema history table listed first

There is far more culture shock when moving between different relational database than there are in moving to a Java-based database migration system, from a conventional database build system. However, it is worth knowing in advance what the possible problems are likely to be.

The only two problems that you are likely to see when you are beginning to adopt Flyway is the need to save all files as UTF-8 or being caught out by the case-sensitivity of parameters and arguments. You’ll only face other issues once you are heavily into scripting, but it is worth bearing them in mind when you are tackling more complex development processes.

Tools in this post

Flyway

Version control for your database. Robust schema evolution across all your environments and technologies.

Find out more