Product articles Flyway Database migrations
Reviewing SQL Migration Files Before a…

Reviewing SQL Migration Files Before a Flyway Migration

Your finger is hovering over the 'enter' key to set off the Flyway "migrate" command, but you hesitate. There is a large stack of migration files for this project: don't all these files need to be checked first? Yes, they do, but how? This article demonstrates how, once armed with the file path locations of all the scripts, you can use PowerShell to search them for various purposes such to review them for potentially disruptive changes, or run code quality checks, or to verify documentation standards.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

There could be a good reason for a little unease, when you’re about to embark on a Flyway migration run, especially if there are a lot of migration files, scattered across various directories listed in the Flyway locations configuration item. Does the migration run include any potentially disruptive or destructive changes that you need to know about such as dropping indexes or tables? Has the migration code been checked to ensure it conforms to corporate standards, uses the correct naming conventions, doesn’t include deprecated code, nor too many unreviewed code smells, and is it properly documented?

If you detect ‘danger terms’ like “drop” or “rebuild”, or code that falls well short of standards, you’ll want to stop the migration run to review it further.

So, yes, all these files need to be checked – but how do you do that if there are hundreds of files and you’re not sure exactly which ones will be applied? If you decide to create a script to run the required checks, you will need to find all the migration files for the database. These files may be in the Java classpath, a network location or in the cloud. The locations of these migration files are defined as a list of ambiguous paths in one of three flyway.conf files, or possibly an environment variable, or maybe even one or more extra configuration files specified in an environment variable…I think I’ve said enough to convince you that it can be more complicated than it sounds. Next, you’d have to work out which files have been successfully applied to the database and are listed in the Flyway schema history table. We need to eliminate these from the list because we can no longer modify these.

Fortunately, there is a simple alternative. Flyway’s info command can provide you with a list of the paths of all the pending and successfully applied files.

The Secrets of Flyway Info

Normally, when you use the flyway info command, it returns a list of pending and applied files, as an ASCII table, together with a whole lot of text. The information includes, for each file, the columns: Category, Version, Description, Type, Installed On, State and Undoable. If the file’s State is ‘pending’, then it hasn’t been applied.

Unfortunately, this output doesn’t tell you the name or path of the file. However, you can get these details by specifying JSON output (-outputType=json). It’s less pretty, but easier to use in an automated process and with more information. The list of files is in a ‘migrations’ array. Each entry includes the full path of the file. There are other nuggets of information, such as how long it took to execute and who did it, but it is the file path that we need.

Searching through your SQL migration Files using simple string searches

You can, armed with the JSON list of filepaths, search through all the migration files for strings, either simple or Regex.

The following PowerShell demonstrates how to do it, in this case performing a simple search for files that contain the text “raiserror” or “password“. This example uses a sample migration run within my Flyway Teamwork PowerShell framework. Here’s the sample project on GitHub.

Using this simple technique, we review the code in each of the ‘pending’ migration files for ‘danger terms’ (like “drop” or “rebuild”) and stop the migration run if detected, or we can investigate the history of changes to a particular object, or perhaps runs some basic code quality checks. Let’s look at an example.

Example: reviewing changes for code quality

Let’s say we want to run some basic code quality checks before allowing a migration run to proceed. For that we need a SQL code reviewer. SQL Code Guard is an obvious choice: it has a good command-line interface but is necessarily specific to SQL Server, and we’ve already demonstrated its use with Flyway (see Running SQL Code Analysis during Flyway Migrations).

Instead, we’ll use SQLFluff, which is a generic ‘lint’ for a range of dialects of SQL, more concerned with enforcing cosmetic rules on your code than ensuring correct syntax. However, it will also spot some types of “problematic” code. These are ‘code smells’ where code that isn’t invalid but might indicate an issue in the logic: such things as variables defined but never used, redundant imports and shadowed variables.

Installing and testing SQL Fluff

SQLFluff is a python app so must be installed within Python using pip. Make sure you have a recent Python. Before you install SQLFluff, you need to make sure that your PATH includes the python’s installation’s \Scripts path. Otherwise, you’ll get a warning when pip runs. As you can see, my PATH should include ‘c:\users\Phil\appdata\local\programs\python\python310-32\Scripts

Installing collected packages: iniconfig, appdirs, typing-extensions, tomli, toml, tblib, regex, pyyaml, pyparsing, Pygments, py, pluggy, pathspec, MarkupSafe, colorama, chardet, attrs, tqdm, packaging, Jinja2, click, pytest, diff-cover, sqlfluff
  WARNING: The script pygmentize.exe is installed in 'C:\Users\phil\AppData\Local\Programs\Python\Python310-32\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script chardetect.exe is installed in 'C:\Users\phil\AppData\Local\Programs\Python\Python310-32\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script tqdm.exe is installed in 'C:\Users\phil\AppData\Local\Programs\Python\Python310-32\Scripts' which is not on PATH.

Running code checks in PowerShell

Once this is done, you can then run SQLFluff directly in PowerShell, but you’ll first need to make sure that the PowerShell session is restarted after you added the path.

PS c:\Users\Phil> sqlfluff.exe
Usage: sqlfluff [OPTIONS] COMMAND [ARGS]...
  SQLFluff is a modular SQL linter for humans.
Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.
Commands:
  dialects  Show the current dialects available.
  fix       Fix SQL files.
  lint      Lint SQL files via passing a list of files or using stdin.
  parse     Parse SQL files and just spit out the result.
  rules     Show the current rules in use.
  version   Show the version of sqlfluff.
  __Examples:
  sqlfluff lint --dialect postgres .
  sqlfluff lint --dialect postgres --rules L042 .
  sqlfluff fix --dialect sqlite --rules L041,L042 src/queries
  sqlfluff parse --dialect sqlite --templater jinja src/queries/common.sql

OK we’re good to go! We’ll use SQLFluff to check a trivial “foo-bar” SQL query:

You should see this result…

== [C:\Users\phil\AppData\Local\Temp\test.sql] FAIL
L:   1 | P:   1 | L034 | Select wildcards then simple targets before calculations
                       | and aggregates.
L:   1 | P:   1 | L036 | Select targets should be on a new line unless there is
                       | only one select target.
L:   1 | P:   9 | L006 | Missing whitespace before +
L:   1 | P:   9 | L006 | Missing whitespace after +
L:   1 | P:  11 | L039 | Unnecessary whitespace found.
L:   2 | P:   1 | L003 | Expected 1 indentation, found 0 [compared to line 01]
L:   2 | P:  10 | L010 | Keywords must be consistently upper case.
All Finished!

All these niggles are trivial and can be ‘auto-fixed’. We can get SQLFluff to fix whatever problems it can by using the fix command, but in PowerShell we need to use the –force option to prevent it sending ‘are you sure?‘ messages to the console, which PowerShell ISE ignores, and which then locks up the process.

sqlfluff.exe fix --dialect tsql --force  "$env:Temp\test.sql"
type "$env:Temp\test.sql"
==== finding fixable violations ====
== [C:\Users\phil\AppData\Local\Temp\test.sql] FAIL
L:   1 | P:   1 | L034 | Select wildcards then simple targets before calculations
                       | and aggregates.
L:   1 | P:   1 | L036 | Select targets should be on a new line unless there is
                       | only one select target.
L:   1 | P:   9 | L006 | Missing whitespace before +
L:   1 | P:   9 | L006 | Missing whitespace after +
L:   1 | P:  11 | L039 | Unnecessary whitespace found.
L:   2 | P:   1 | L003 | Expected 1 indentation, found 0 [compared to line 01]
L:   2 | P:  10 | L010 | Keywords must be consistently upper case.
==== fixing violations ====
7 fixable linting violations found
FORCE MODE: Attempting fixes...
Persisting Changes...
== [C:\Users\phil\AppData\Local\Temp\test.sql] PASS
Done. Please check your files to confirm.
PS C:\users\phil> type "$env:Temp\test.sql"
SELECT
    c AS bar,
    a + b AS foo 
FROM my_table

Having now seen that it works, we’ll try it on our sample Flyway migration.

Checking all your Flyway migration code

Chocks away! The following example checks any ‘pending’ migrations for a branch of our Pubs database project, using SQL Fluff.

The result is too long to display, but a good attempt. We have a few migration files in this project that are for importing data and are way too long for SQLFluff’s file size limit. There may be other migration files that cause warnings too, such as files that contain placeholders that aren’t valid SQL.

Where the file is too long, it warns with…

 'Length of file FlywayTeamwork\\Pubs\\Branches\\Develop\\Variants\\Rollback\\migrations\\V1.1.
2__Pubs_Original_Data.sql' is 116949 bytes which is over the limit of 20000 bytes. Skipping to avoid parser lock. Users can
 increase this limit in their config by setting the 'large_file_skip_byte_limit' value, or disable by setting it to zero.

To get around this, I just cheated a bit and switched to a variant without migrations that insert data (…Variants\NoData) and now we get every file reviewed with an output something like this…

Code Quality check of flyway migration files using SQL Fluff

You will not really last long with screenfuls of this sort of information when you have an enterprise-scale database to review. Soon, you will want individual reports for each migration file. Flyway will allow this. Here we are using the Teamwork framework to tell us where the reports should go (into the Reports folder for each version), but it is simple to alter to your preferred destination. Here, again, we are filtering the list of files for just those that are SQL and pending.

Once you have this sort of PowerShell script working to your satisfaction, you can add it to a beforeMigrate PowerShell callback if you have Flyway Teams and want a regular check on the pending files.

This places a single report for each Flyway version in its version directory, along all the other reports for that version. In a working system, you’ll want a summary report that lists all errors, warnings and faults that cannot be auto corrected. This will require parsing the SQLFluff output for each version, so we’ll leave that for another article!

Summary

Flyway has built-in support for providing a JSON list of filepaths that can be used for inspecting, reviewing, searching or making bulk changes to the migration files of the project. I’ve illustrated the use of the list in PowerShell, but it is just as easy in Java or Python.

 

Tools in this post

Flyway

DevOps for the Database

Find out more