Reviewing SQL Migration Files Before a Flyway Migration
Your finger is hovering over the 'enter' key to set off the Flyway "migrate" command, but you hesitate. There is a large stack of migration files for this project: don't all these files need to be checked first? Yes, they do, but how? This article demonstrates how, once armed with the file path locations of all the scripts, you can use PowerShell to search them for various purposes such to review them for potentially disruptive changes, or run code quality checks, or to verify documentation standards.
code qThere could be a good reason for a little unease, when you’re about to embark on a Flyway migration run, especially if there are a lot of migration files, scattered across various directories listed in the Flyway locations
configuration item. Does the migration run include any potentially disruptive or destructive changes that you need to know about such as dropping indexes or tables? Has the migration code been checked to ensure it conforms to corporate standards, uses the correct naming conventions, doesn’t include deprecated code, nor too many unreviewed code smells, and is it properly documented?
If you detect ‘danger terms’ like “drop” or “rebuild”, or code that falls well short of standards, you’ll want to stop the migration run to review it further.
So, yes, all these files need to be checked – but how do you do that if there are hundreds of files and you’re not sure exactly which ones will be applied? If you decide to create a script to run the required checks, you will need to find all the migration files for the database. These files may be in the Java classpath, a network location or in the cloud. The locations of these migration files are defined as a list of ambiguous paths in one of three flyway.conf files, or possibly an environment variable, or maybe even one or more extra configuration files specified in an environment variable…I think I’ve said enough to convince you that it can be more complicated than it sounds. Next, you’d have to work out which files have been successfully applied to the database and are listed in the Flyway schema history table. We need to eliminate these from the list because we can no longer modify these.
Fortunately, there is a simple alternative. Flyway’s info command can provide you with a list of the paths of all the pending and successfully applied files.
The Secrets of Flyway Info
Normally, when you use the flyway
info
command, it returns a list of pending and applied files, as an ASCII table, together with a whole lot of text. The information includes, for each file, the columns: Category, Version, Description, Type, Installed On, State and Undoable. If the file’s State is ‘pending’, then it hasn’t been applied.
Unfortunately, this output doesn’t tell you the name or path of the file. However, you can get these details by specifying JSON output (-outputType=json
). It’s less pretty, but easier to use in an automated process and with more information. The list of files is in a ‘migrations’ array. Each entry includes the full path of the file. There are other nuggets of information, such as how long it took to execute and who did it, but it is the file path that we need.
Searching through your SQL migration Files using simple string searches
You can, armed with the JSON list of filepaths, search through all the migration files for strings, either simple or Regex.
The following PowerShell demonstrates how to do it, in this case performing a simple search for files that contain the text “raiserror” or “password“. This example uses a sample migration run within my Flyway Teamwork PowerShell framework. Here’s the sample project on GitHub.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<# go to the right working directory with the right flyway.conf file #> cd S:\work\Github\FlywayTeamwork\Pubs\Branches\Develop\Variants\NoData . .\Preliminary.ps1 <#if you use the framework. Otherwise use a different way to get your password #> <# now query flyway to get the migrations info #> $Migrations = Flyway info -outputType=json | convertfrom-json if ($Migrations.error -ne $null) { write-warning $Migrations.error.message } else { # get each file in turn. You might want to select just the pending migrations *> $migrations.migrations | where { ![string]::IsNullOrEmpty($_.filepath) } | select filepath | foreach{ # I've given it a list of simple strings but it could be regex Select-String -Path $_.filepath -Pattern @("Raiserror", "password") -Context 1 }; } |
Using this simple technique, we review the code in each of the ‘pending’ migration files for ‘danger terms’ (like “drop” or “rebuild”) and stop the migration run if detected, or we can investigate the history of changes to a particular object, or perhaps runs some basic code quality checks. Let’s look at an example.
Example: reviewing changes for code quality
Let’s say we want to run some basic code quality checks before allowing a migration run to proceed. For that we need a SQL code reviewer. SQL Code Guard is an obvious choice: it has a good command-line interface but is necessarily specific to SQL Server, and we’ve already demonstrated its use with Flyway (see Running SQL Code Analysis during Flyway Migrations).
Instead, we’ll use SQLFluff
, which is a generic ‘lint’ for a range of dialects of SQL, more concerned with enforcing cosmetic rules on your code than ensuring correct syntax. However, it will also spot some types of “problematic” code. These are ‘code smells’ where code that isn’t invalid but might indicate an issue in the logic: such things as variables defined but never used, redundant imports and shadowed variables.
Installing and testing SQL Fluff
SQLFluff
is a python app so must be installed within Python using pip. Make sure you have a recent Python. Before you install SQLFluff, you need to make sure that your PATH includes the python’s installation’s \Scripts path. Otherwise, you’ll get a warning when pip runs. As you can see, my PATH should include ‘c:\users\Phil\appdata\local\programs\python\python310-32\Scripts
Installing collected packages: iniconfig, appdirs, typing-extensions, tomli, toml, tblib, regex, pyyaml, pyparsing, Pygments, py, pluggy, pathspec, MarkupSafe, colorama, chardet, attrs, tqdm, packaging, Jinja2, click, pytest, diff-cover, sqlfluff
WARNING: The script pygmentize.exe is installed in 'C:\Users\phil\AppData\Local\Programs\Python\Python310-32\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script chardetect.exe is installed in 'C:\Users\phil\AppData\Local\Programs\Python\Python310-32\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script tqdm.exe is installed in 'C:\Users\phil\AppData\Local\Programs\Python\Python310-32\Scripts' which is not on PATH.
Running code checks in PowerShell
Once this is done, you can then run SQLFluff directly in PowerShell, but you’ll first need to make sure that the PowerShell session is restarted after you added the path.
PS c:\Users\Phil> sqlfluff.exe Usage: sqlfluff [OPTIONS] COMMAND [ARGS]... SQLFluff is a modular SQL linter for humans. Options: --version Show the version and exit. -h, --help Show this message and exit. Commands: dialects Show the current dialects available. fix Fix SQL files. lint Lint SQL files via passing a list of files or using stdin. parse Parse SQL files and just spit out the result. rules Show the current rules in use. version Show the version of sqlfluff. __Examples: sqlfluff lint --dialect postgres . sqlfluff lint --dialect postgres --rules L042 . sqlfluff fix --dialect sqlite --rules L041,L042 src/queries sqlfluff parse --dialect sqlite --templater jinja src/queries/common.sql
OK we’re good to go! We’ll use SQLFluff to check a trivial “foo-bar” SQL query:
1 2 3 4 5 6 7 8 9 |
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8' # make sure we output in UTF8 for the Linux folks #use the SQLfluff test sample @' SELECT a+b AS foo, c AS bar from my_table '@> "$env:Temp\test.sql" #and run SQL Fluff sqlfluff.exe lint --dialect tsql "$env:Temp\test.sql" |
You should see this result…
== [C:\Users\phil\AppData\Local\Temp\test.sql] FAIL L: 1 | P: 1 | L034 | Select wildcards then simple targets before calculations | and aggregates. L: 1 | P: 1 | L036 | Select targets should be on a new line unless there is | only one select target. L: 1 | P: 9 | L006 | Missing whitespace before + L: 1 | P: 9 | L006 | Missing whitespace after + L: 1 | P: 11 | L039 | Unnecessary whitespace found. L: 2 | P: 1 | L003 | Expected 1 indentation, found 0 [compared to line 01] L: 2 | P: 10 | L010 | Keywords must be consistently upper case. All Finished!
All these niggles are trivial and can be ‘auto-fixed’. We can get SQLFluff to fix whatever problems it can by using the fix
command, but in PowerShell we need to use the –force
option to prevent it sending ‘are you sure?‘ messages to the console, which PowerShell ISE ignores, and which then locks up the process.
sqlfluff.exe fix --dialect tsql --force "$env:Temp\test.sql" type "$env:Temp\test.sql"
==== finding fixable violations ==== == [C:\Users\phil\AppData\Local\Temp\test.sql] FAIL L: 1 | P: 1 | L034 | Select wildcards then simple targets before calculations | and aggregates. L: 1 | P: 1 | L036 | Select targets should be on a new line unless there is | only one select target. L: 1 | P: 9 | L006 | Missing whitespace before + L: 1 | P: 9 | L006 | Missing whitespace after + L: 1 | P: 11 | L039 | Unnecessary whitespace found. L: 2 | P: 1 | L003 | Expected 1 indentation, found 0 [compared to line 01] L: 2 | P: 10 | L010 | Keywords must be consistently upper case. ==== fixing violations ==== 7 fixable linting violations found FORCE MODE: Attempting fixes... Persisting Changes... == [C:\Users\phil\AppData\Local\Temp\test.sql] PASS Done. Please check your files to confirm. PS C:\users\phil> type "$env:Temp\test.sql" SELECT c AS bar, a + b AS foo FROM my_table
Having now seen that it works, we’ll try it on our sample Flyway migration.
Checking all your Flyway migration code
Chocks away! The following example checks any ‘pending’ migrations for a branch of our Pubs database project, using SQL Fluff.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
cd <path to>\FlywayTeamwork\Pubs\Branches\Develop\Variants\Rollback . .\Preliminary.ps1 $Migrations = Flyway info -outputType=json | convertfrom-json if ($Migrations.error -ne $null) { write-warning $Migrations.error.message } else { $migrations.migrations | where { ![string]::IsNullOrEmpty($_.filepath) -and ($_.type -ieq 'SQL') -and ($_.state -ieq 'Pending') } | foreach{ write-verbose "Checking $($_.version)-$($_.description) ($($_.state))" sqlfluff.exe lint --dialect tsql "$($_.filepath)" } } |
The result is too long to display, but a good attempt. We have a few migration files in this project that are for importing data and are way too long for SQLFluff’s file size limit. There may be other migration files that cause warnings too, such as files that contain placeholders that aren’t valid SQL.
Where the file is too long, it warns with…
'Length of file FlywayTeamwork\\Pubs\\Branches\\Develop\\Variants\\Rollback\\migrations\\V1.1. 2__Pubs_Original_Data.sql' is 116949 bytes which is over the limit of 20000 bytes. Skipping to avoid parser lock. Users can increase this limit in their config by setting the 'large_file_skip_byte_limit' value, or disable by setting it to zero.
To get around this, I just cheated a bit and switched to a variant without migrations that insert data (…Variants\NoData
) and now we get every file reviewed with an output something like this…
You will not really last long with screenfuls of this sort of information when you have an enterprise-scale database to review. Soon, you will want individual reports for each migration file. Flyway will allow this. Here we are using the Teamwork framework to tell us where the reports should go (into the Reports folder for each version), but it is simple to alter to your preferred destination. Here, again, we are filtering the list of files for just those that are SQL and pending.
Once you have this sort of PowerShell script working to your satisfaction, you can add it to a beforeMigrate
PowerShell callback if you have Flyway Teams and want a regular check on the pending files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
cd <My Path to>\FlywayTeamwork\Pubs\Branches\Develop\Variants\NoData . .\Preliminary.ps1 $Migrations = Flyway info -outputType=json | convertfrom-json if ($Migrations.error -ne $null) { write-warning $Migrations.error.message } else { $migrations.migrations | where { ![string]::IsNullOrEmpty($_.filepath) -and ($_.type -ieq 'SQL') -and ($_.state -ieq 'Pending') } | foreach{ write-verbose "Checking $($_.version)-$($_.description) ($($_.state))" $ReportLocation="$($dbDetails.reportLocation)\$($_.version)\reports" if (-not (Test-Path "$ReportLocation")) { New-Item -ItemType Directory -Path "$ReportLocation" -Force } sqlfluff.exe lint --dialect tsql "$($_.filepath)" > "$ReportLocation\SQLFluff.rpt" } } |
This places a single report for each Flyway version in its version directory, along all the other reports for that version. In a working system, you’ll want a summary report that lists all errors, warnings and faults that cannot be auto corrected. This will require parsing the SQLFluff output for each version, so we’ll leave that for another article!
Summary
Flyway has built-in support for providing a JSON list of filepaths that can be used for inspecting, reviewing, searching or making bulk changes to the migration files of the project. I’ve illustrated the use of the list in PowerShell, but it is just as easy in Java or Python.
Tools in this post
Flyway Teams
Ideal for organizations looking to improve collaboration and fine tune their processes during development and the deployment of database changes.