Product articles Flyway Database migrations
Cross-RDBMS Code Quality Reports in…

Cross-RDBMS Code Quality Reports in Flyway

Before you commit your Flyway migration files, you may want to run some automated checks for style or 'code smells'. This article demonstrates how to run basic cross-RDBMS code quality checks using SQL Fluff. We analyze the results in PowerShell to produce reports and analytics on the number or types of issues found.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

When you’re initiating a database development of any consequence, one of the first matters that needs to be nailed down hard is that of coding style and standards. Without these, databases can prove very difficult to maintain or extend. Even you, the developer, can sometimes forget the full nature of that ‘very clever code’ you wrote a while back.

One should pause for thought, though, before implementing any sort of ‘rules’ on SQL coding style and usage. SQL was originally designed as a language to be as close as possible to spoken English. The idea was that it could be used by any lay person such as a business analyst, or an accountant. A SQL Statement is like a sentence. Programmers brought up on a language such as Python often feel instinctively uncomfortable with real declarative code.

You can, of course, use whitespace to make SQL easier to understand. However, your helpful whitespace additions, when turned into inflexible rules, may help understand some statements but will obscure others. There isn’t, and can never be, a single, ‘correct’ way of laying out SQL, just as there is no single correct way of laying out this text you’re reading.

Even more of a culture-shock to an application programmer is the fact that any SQL declarative code, such as a query, is merely a description of the result wanted, not a precise instruction of how it is carried out. Just as there is no ‘correct’ way of styling SQL, there is no single, correct way of using it, and no hard and fast set of rules that determine what is or isn’t a ‘code smell’

Life is short and is best spent on more creative work than lining up columns and rows in SQL code or scanning line by line through someone’s code looking for ‘smells’. If you find yourself often tapping the space bar, it is time for a career-change. Some SQL Coding tasks must be automated, even when you’re using a text editor or IDE.

If your team decide on a SQL Code standard for layout, and usage, you need to find a way to get the tools that will automate these tasks, and will, when the team changes its mind, make the changes effortlessly!

What sort of SQL code tasks can be automated?

There are a few automated tasks that need to be considered to help maintain a coding standard. There are SQL Minifiers out there, though it isn’t clear why this would ever be necessary. There are tools for inserting comments where necessary.

Then there are ‘prettifiers’. I once wrote one in SQL for rendering colorized code in HTML for documentation or publishing on a blog. There are also tools with the more radical ambition of formatting code. The most ambitious tools will also do SQL Code analysis, for picking up potential issues or ‘code smells’ though it is almost impossible to find a way of automating the correction of the code.

There are plenty of other tasks that could be automated. I would go further and look for code generators to do routine stuff, such as writing views. I also advocate using an ER Diagramming tool for the chore of creating or altering tables.

Code Formatting (aka ‘linting’)

Formatting or ‘prettifying’ code means making changes that affect only whitespace, such as line breaks, tabs and spaces. It doesn’t change the meaning of that code in any way.

It might seem a trivial task and in some cases it is, but not for SQL. The nature of SQL code is such that no two developers can ever agree on what is the correct format. To be useful, SQL Formatters must be easily configured, and operate with quite complicated rules, involving capitalization, line-breaks, handling of semicolons, and commas, indenting and so on. The rules must be varied according to where lines are folded and how long a SQL expression or clause may be.

There are several SQL Formatters, but SQL Prompt is the best-known one for SQL Server. I’ve written previously about how I use it as a layout tool: SQL Prompt as a Layout Tool: A Survival Guide. You can also check out it’s formatting capabilities using a free online version.

SQL Prompt is interesting for our use of Flyway with SQL Server because there is an accompanying CLI-based formatter that can take your formatting spec from the version embedded in SSMS. Unfortunately, it cannot format any other dialects of SQL

Code analysis

Whereas a SQL formatter is concerned with whitespace, a SQL Code Analysis tool will pick up issues that require a code change, such as using TOP/FIRST without an ORDER BY clause, using unnecessary aliases, or using deprecated code. A SQL Code Analysis tool is, in some respects, more difficult than a formatter, and there are still differences in opinion about what constitutes a ‘code smell’.

SQL Prompt also come with a built-in code analysis engine that I’ve written about extensively. You can establish all your rules in the UI and then run them automatically using the command line component. I’ve already shown how to use this in Flyway (but again, it’s SQL Server only).

When to check coding styles and standards?

I like to format code while I’m working on it. However, without a suitable SQL Formatter within the IDE, such as SQL Prompt, it makes sense to fine-tune your layout after you’ve tested the code.

It is possible, if you’re using Git, to use pre-commit githooks that are triggered just before a commit is made. With a CLI-based tool, it is possible to do bulk style changes to your entire source, or even abort the commit if a code analysis finds one of the more egregious code quality issues.

If you want to run these checks in Flyway, it must be before the migration is applied to the database. You can’t reformat any successfully applied migrations, or fix any smells, unless you clean the database and re-run the entire migration run because even whitespace changes will cause checksums to disagree.

Fluffing your SQL Code in Flyway

We’ll use SQL Fluff to run the code quality checks for formatting and ‘code smells’, merely because runs in a CLI, and it professes to understand several SQL dialects, each associated with a relational database system. It is positioned halfway between code analysis and formatting and offers to fix some of the formatting ‘problems’ as well. It has potential but is at an early stage of its development.

I’ve already shown how we can use it with Flyway in a previous article, Reviewing SQL Migration Files Before a Flyway Migration. In this example, we will go rather further than just running the formatter. With JSON output, we can import the data so that we can run reports and analytics on the number or types of issues found.

Again, we only want to check out the pending migrations because we can’t alter any code that has been successfully applied by Flyway because it will detect the change and complain. Fortunately, it is easy to get a list of pending migrations from Flyway using the flyway info command.

Running the reports

We don’t need to run this check at any particular point, so I’ve also added it to the Flyway Teamwork utilities as a cmdlet ($CheckFluffInPendingFiles) that can be run whenever convenient.

Details of how to install the Flyway Teamwork PowerShell framework are here: What is the Flyway Teamwork Framework? With that done, here is the simple code to run the SQL Fluff report:

In the Flyway project I used, the database was at version 1.1.4, with all subsequent migrations ‘pending’. This produces the report files for each pending migration file, each in its own version directory.

Process-FlywayTasks $dbDetails $PostMigrationTasks
Executed CheckFluffInPendingFiles
For the CheckFluffInPendingFiles, we saved the report in <MyPathTo>\develop\Variants\NoData\Versions\FluffProblems.json
in CheckFluffInPendingFiles, checked file for version 1.1.5 checked file for version 1.1.6 checked file for version 1.1.7 checked file for version 1.1.8 checked file for version 1.1.10 checked file for version 1.1.11

And here is an example of the JSON report for the migration file that takes us to version 1.1.11:

SQL Fluff JSON report

Analyzing report data in PowerShell

In the framework we just save the report files to disk, but we can also examine the results within PowerShell. I ran this for this demonstration on the same NoData variant as above. However, as long as the preliminary.ps1 file can find the Resources folder, it should all be OK in a simpler installation.

You’ll see the summary reports, though these are also written to the versions\<VersionNumber>\reports directory:

Summary report of code issues

Sadly, in its current state, SQL Fluff trips over some perfectly valid SQL Server DDL, so I had to filter out the parsing errors into a separate report:

SQL fluff parsing errors

Conclusions

I’ve demonstrated how one might do formatting and code analysis as part of a Flyway development. I’ve used SQL Fluff because it can, like Flyway, be used with many different RDBMSs. SQL Fluff’s parser isn’t ready for SQL Server. It is interesting, and it has potential, but like a child with a stethoscope, one doesn’t really trust it to wield the scalpel on you. However, it is ideal for demonstrating how one can extract data from even an unconventional output format. For this, we provide a cmdlet that is in the resources of the Flyway Teamwork framework and in my PowerShell library.

Tools in this post

Flyway

DevOps for the Database

Find out more