Cross-RDBMS Code Quality Reports in Flyway
Before you commit your Flyway migration files, you may want to run some automated checks for style or 'code smells'. This article demonstrates how to run basic cross-RDBMS code quality checks using SQL Fluff. We analyze the results in PowerShell to produce reports and analytics on the number or types of issues found.
When you’re initiating a database development of any consequence, one of the first matters that needs to be nailed down hard is that of coding style and standards. Without these, databases can prove very difficult to maintain or extend. Even you, the developer, can sometimes forget the full nature of that ‘very clever code’ you wrote a while back.
One should pause for thought, though, before implementing any sort of ‘rules’ on SQL coding style and usage. SQL was originally designed as a language to be as close as possible to spoken English. The idea was that it could be used by any lay person such as a business analyst, or an accountant. A SQL Statement is like a sentence. Programmers brought up on a language such as Python often feel instinctively uncomfortable with real declarative code.
You can, of course, use whitespace to make SQL easier to understand. However, your helpful whitespace additions, when turned into inflexible rules, may help understand some statements but will obscure others. There isn’t, and can never be, a single, ‘correct’ way of laying out SQL, just as there is no single correct way of laying out this text you’re reading.
Even more of a culture-shock to an application programmer is the fact that any SQL declarative code, such as a query, is merely a description of the result wanted, not a precise instruction of how it is carried out. Just as there is no ‘correct’ way of styling SQL, there is no single, correct way of using it, and no hard and fast set of rules that determine what is or isn’t a ‘code smell’
Life is short and is best spent on more creative work than lining up columns and rows in SQL code or scanning line by line through someone’s code looking for ‘smells’. If you find yourself often tapping the space bar, it is time for a career-change. Some SQL Coding tasks must be automated, even when you’re using a text editor or IDE.
If your team decide on a SQL Code standard for layout, and usage, you need to find a way to get the tools that will automate these tasks, and will, when the team changes its mind, make the changes effortlessly!
What sort of SQL code tasks can be automated?
There are a few automated tasks that need to be considered to help maintain a coding standard. There are SQL Minifiers out there, though it isn’t clear why this would ever be necessary. There are tools for inserting comments where necessary.
Then there are ‘prettifiers’. I once wrote one in SQL for rendering colorized code in HTML for documentation or publishing on a blog. There are also tools with the more radical ambition of formatting code. The most ambitious tools will also do SQL Code analysis, for picking up potential issues or ‘code smells’ though it is almost impossible to find a way of automating the correction of the code.
There are plenty of other tasks that could be automated. I would go further and look for code generators to do routine stuff, such as writing views. I also advocate using an ER Diagramming tool for the chore of creating or altering tables.
Code Formatting (aka ‘linting’)
Formatting or ‘prettifying’ code means making changes that affect only whitespace, such as line breaks, tabs and spaces. It doesn’t change the meaning of that code in any way.
It might seem a trivial task and in some cases it is, but not for SQL. The nature of SQL code is such that no two developers can ever agree on what is the correct format. To be useful, SQL Formatters must be easily configured, and operate with quite complicated rules, involving capitalization, line-breaks, handling of semicolons, and commas, indenting and so on. The rules must be varied according to where lines are folded and how long a SQL expression or clause may be.
There are several SQL Formatters, but SQL Prompt is the best-known one for SQL Server. I’ve written previously about how I use it as a layout tool: SQL Prompt as a Layout Tool: A Survival Guide. You can also check out it’s formatting capabilities using a free online version.
SQL Prompt is interesting for our use of Flyway with SQL Server because there is an accompanying CLI-based formatter that can take your formatting spec from the version embedded in SSMS. Unfortunately, it cannot format any other dialects of SQL
Code analysis
Whereas a SQL formatter is concerned with whitespace, a SQL Code Analysis tool will pick up issues that require a code change, such as using TOP
/FIRST
without an ORDER
BY
clause, using unnecessary aliases, or using deprecated code. A SQL Code Analysis tool is, in some respects, more difficult than a formatter, and there are still differences in opinion about what constitutes a ‘code smell’.
SQL Prompt also come with a built-in code analysis engine that I’ve written about extensively. You can establish all your rules in the UI and then run them automatically using the command line component. I’ve already shown how to use this in Flyway (but again, it’s SQL Server only).
When to check coding styles and standards?
I like to format code while I’m working on it. However, without a suitable SQL Formatter within the IDE, such as SQL Prompt, it makes sense to fine-tune your layout after you’ve tested the code.
It is possible, if you’re using Git, to use pre-commit githooks that are triggered just before a commit is made. With a CLI-based tool, it is possible to do bulk style changes to your entire source, or even abort the commit if a code analysis finds one of the more egregious code quality issues.
If you want to run these checks in Flyway, it must be before the migration is applied to the database. You can’t reformat any successfully applied migrations, or fix any smells, unless you clean the database and re-run the entire migration run because even whitespace changes will cause checksums to disagree.
Fluffing your SQL Code in Flyway
We’ll use SQL Fluff to run the code quality checks for formatting and ‘code smells’, merely because runs in a CLI, and it professes to understand several SQL dialects, each associated with a relational database system. It is positioned halfway between code analysis and formatting and offers to fix some of the formatting ‘problems’ as well. It has potential but is at an early stage of its development.
I’ve already shown how we can use it with Flyway in a previous article, Reviewing SQL Migration Files Before a Flyway Migration. In this example, we will go rather further than just running the formatter. With JSON output, we can import the data so that we can run reports and analytics on the number or types of issues found.
Again, we only want to check out the pending migrations because we can’t alter any code that has been successfully applied by Flyway because it will detect the change and complain. Fortunately, it is easy to get a list of pending migrations from Flyway using the flyway info
command.
Running the reports
We don’t need to run this check at any particular point, so I’ve also added it to the Flyway Teamwork utilities as a cmdlet ($CheckFluffInPendingFiles
) that can be run whenever convenient.
Details of how to install the Flyway Teamwork PowerShell framework are here: What is the Flyway Teamwork Framework? With that done, here is the simple code to run the SQL Fluff report:
1 2 3 |
cd <myPathTo>\develop\Variants\NoData . '.\preliminary.ps1' Process-FlywayTasks $dbDetails $CheckFluffInPendingFiles |
In the Flyway project I used, the database was at version 1.1.4, with all subsequent migrations ‘pending’. This produces the report files for each pending migration file, each in its own version directory.
Process-FlywayTasks $dbDetails $PostMigrationTasks Executed CheckFluffInPendingFiles For the CheckFluffInPendingFiles, we saved the report in <MyPathTo>\develop\Variants\NoData\Versions\FluffProblems.json in CheckFluffInPendingFiles, checked file for version 1.1.5 checked file for version 1.1.6 checked file for version 1.1.7 checked file for version 1.1.8 checked file for version 1.1.10 checked file for version 1.1.11
And here is an example of the JSON report for the migration file that takes us to version 1.1.11:
Analyzing report data in PowerShell
In the framework we just save the report files to disk, but we can also examine the results within PowerShell. I ran this for this demonstration on the same NoData variant as above. However, as long as the preliminary.ps1 file can find the Resources folder, it should all be OK in a simpler installation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
cd <pathToTheProject> . .\Preliminary.ps1 #these are the post-shredding tidy-up substitutions in Regex form $ValueAlterations = @(('(?m:^)\s{1,40}?\|', ''), ("`n", ''), ("`r", '')) <#The Regex for slicing up each SQLFluff record. I've commented it to make it easier to read #> $TheRegex = @' (?m:^)(?#Get the line number )L: {1,5}(?<Line>\d{1,4}) {0,10}(?# Get the position (column] )\| {1,4}P: {1,10}(?<Position>\d{1,10}) {1,4}(?# Get the problem number )\| {1,4}(?<Problem>\w\S{1,5}) {1,4}(?# Find the description )\| {1,4}(?<Description>(?s:.){1,200}?(?=\n\w|\z)) '@ $Warnings = @() #an array to collect up all the warnings $ParsingErrors=@() #an array to store all the parsing errors $CompleteProblemData=@() #an array to store all the issues # to get the dialects, use sqlfluff.exe dialects $Dialect = switch -Regex ($dbDetails.RDBMS) { 'sqlserver' { 'tsql' } 'postgresql' { 'Postgres' } 'sqlite' { 'sqlite' } 'mysql|mariadb' { 'mysql' } default { 'Error' } } # check to make sure you have installed SQLFluff if ($Dialect -eq 'Error') { Write-error "No SQL dialect specified by $($dbDetails.RDBMS)" } if ((Get-Command -WarningAction SilentlyContinue 'sqlfluff.exe').name -ne 'sqlfluff.exe') {Write-error "please install SQLFluff.exe using Python"} <# now we get from flyway a list of all the migration files from the info command and turn it into a PowerShell object #> $Migrations = Flyway info -outputType=json | convertfrom-json if ($Migrations.error -ne $null) { # something wrong within Flyway. Need to deal with it write-warning $Migrations.error.message } else { <# work through the list of files, using just the SQL Flies that are pending. We wont do the successfully-applied files because it would upset Flyway if we were to alter them #> $migrations.migrations | ` where { ![string]::IsNullOrEmpty($_.filepath) -and ($_.type -ieq 'SQL') -and ($_.state -ieq 'Pending') } | ` foreach{ # of the right type of file. $The_warning=''; $TheVersion=$_.version; # the versio attached to the file # we'll put each file into the version folder. You might want them in a different plce $ReportLocation = "$($dbDetails.reportLocation)\$TheVersion\reports" if (-not (Test-Path "$ReportLocation")) { New-Item -ItemType Directory -Path "$ReportLocation" -Force } <# you might need to provide other configuration information here fix Fix SQL files. lint Lint SQL files via passing a list of files or using stdin #> $report = sqlfluff.exe lint --dialect $dialect "$($_.filepath)" #collect any warnings you want listed $Warnings += $report|where {$_ -ilike 'warning*'} #write out the raw report $report > "$ReportLocation\SQLFluff.rpt" #Slice up the rather odd formatting and read it into powershell #ConvertFrom-Regex is in the resources. It is for text-based data $ThisFileAnalysis = ConvertFrom-Regex -source ($report -join "`r`n") ` -TheRegex $TheRegex ` -ValueAlterations $ValueAlterations #add the current version so we know which file it was in etc. $ThisFileAnalysis|foreach{ $_|Add-Member -MemberType NoteProperty -Name 'Version' -Value $TheVersion } #Write out this list of psCustomObjects as a JSON file for later use $ThisFileAnalysis | ConvertTo-Json >"$ReportLocation\SQLFluff.json" #and build up a complete list for reporting $CompleteProblemData +=$ThisFileAnalysis } } #Report the complete list of issues $CompleteProblemData|Out-GridView #extract the parsing errors $parsingErrors = $CompleteProblemData|where {$_.Problem -notlike 'L*'} #Display parsing Errors. $parsingErrors|Out-GridView |
You’ll see the summary reports, though these are also written to the versions\<VersionNumber>\reports directory:
Sadly, in its current state, SQL Fluff trips over some perfectly valid SQL Server DDL, so I had to filter out the parsing errors into a separate report:
Conclusions
I’ve demonstrated how one might do formatting and code analysis as part of a Flyway development. I’ve used SQL Fluff because it can, like Flyway, be used with many different RDBMSs. SQL Fluff’s parser isn’t ready for SQL Server. It is interesting, and it has potential, but like a child with a stethoscope, one doesn’t really trust it to wield the scalpel on you. However, it is ideal for demonstrating how one can extract data from even an unconventional output format. For this, we provide a cmdlet that is in the resources of the Flyway Teamwork framework and in my PowerShell library.
Tools in this post
Flyway Teams
Ideal for organizations looking to improve collaboration and fine tune their processes during development and the deployment of database changes.