Product articles Redgate Flyway Code quality checks
Getting Started with Flyway SQL Code…

Getting Started with Flyway SQL Code Analysis Rules

Before you apply pending SQL migration scripts, Flyway can automatically check them for a range of code smells. The check -code command can perform a static code analysis on these scripts to check that they comply with a set of rules designed to encourage good coding practices. You can opt to use SQL Fluff, provide your own Regex rules, or use both. This article explains how Flyway's regex-based code checks work and how to start running checks on Flyway-managed databases.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Assessing SQL style, SQL habits, and code quality

Development teams generally aim to adopt a consistent SQL style that meets common standards for coding practices, formatting, and usage. However, it is not straightforward to define what we mean by “good SQL”. Coding standards vary depending on the database system, the team’s preferences, and the performance and maintainability requirements of the application.

In addition, while static code analysis helps enforce good practices, developers tend to resist rigid, one-size-fits-all rules. SQL offers plenty of flexibility in how queries can be structured, and what works well for one team or database system might not suit another.

In my commercial programming experience, two broad types of SQL code checks are useful in development (in addition to usual functional and performance tests):

  • Dynamic code analysis – examines a live database, querying the system catalog views, functions, tables, and properties to uncover potential issues such as missing indexes, disabled constraints, or inefficient queries. This type of analysis is typically done using a monitoring tool or by running targeted checks manually.
  • Static code analysis – uses either a parser or regex strings to inspect the code rather than the live database. We can run static analysis on build scripts, migration scripts, the code within functions or even the code being used by applications to query the database

Both static and dynamic analysis are required. Static analysis will alert teams to faults in the code, before they get into the ‘main’ or ‘develop’ branches of the database. However, there is only so much you can check with static rules and if you rely on these alone, you will miss many of the issues that creep into a live database, such as disabled constraints and indexes. See Running Dynamic Code Quality Checks on Flyway Databases: Table Smells for more details.

Parser or Regex-based static code analysis

There are two main ways to perform static code analysis. The first, and most reliable, method is to use a SQL parser or tokenizer, which analyzes the structure of the SQL and enforces predefined rules. However, this approach is relatively inflexible: you can enable or disable rules, but it is difficult to modify them to match team conventions.

The second approach, regex-based code analysis, offers greater flexibility. It allows you to define custom rules without relying on external tools, making it easy to enforce team-specific SQL conventions. For example, you may have a team rule that GROUP BY statements either must, or should never, have numbers, depending on the RDBMS. However, regex analysis is less accurate. Unlike a SQL parser, regex treats SQL as plain text rather than a structured language. This means it cannot distinguish between keywords, identifiers, literals, and comments, so there’s a greater risk of false positives. Despite its limitations, regex-based code checks are useful for basic SQL hygiene, and for catching patterns that would be difficult to detect with a parser, such as:

  • The use of SELECT * instead of explicit column selection
  • A missing WHERE clause in DELETE or UPDATE statements.
  • Deprecated SQL syntax, even when buried inside comments or string literals.
  • Debug and test artifacts, such as TODO comments or leftover print/debug statements.

Running Flyway code analysis checks

Flyway integrates with a SQL linter called SQL Fluff for a parser-based approach. I’ve previously explained how to run its checks through PowerShell in Reviewing SQL Migration Files Before a Flyway Migration. However, it is now fully integrated with Flyway, and once installed and configured, Flyway’s check -code command will run its rules automatically on any pending migrations.

The check -code command will also analyze the text of SQL migration files, executing both built-in and custom regex rules. It works much like a traditional GREP tool (e.g., PowerGREP), but with the added advantage that Flyway only analyzes those scripts that haven’t been applied. This makes it particularly useful for enforcing consistent SQL practices across a team.

Flyway static code analysis

The Regex rule format in Flyway

Each of Flyway’s regex rules are kept in a separate TOML file within a folder within your project. Each file can have several regexes per rule. If you don’t specify the location using the Check.rulesLocation configuration item, then Flyway looks in the ‘rules’ subdirectory in the Flyway installation for the built-in rules.

The filenames are in the format: A__B.toml (that’s two underscores) Where A is the rule identifier (e.g. MyRule01) and B is a short rule description. In the ‘code smell report’ B will be replaced by the more detailed description field in the TOML file, if supplied.

Here is an example built-in TOML rule file, to catch any use of SELECT *:

The rule file contains four main fields:

  • Name – a unique identifier for the rule. This is embedded in the filename.
  • Dialects – a list of applicable SQL dialects (e.g., TSQL, PostgreSQL, MySQL). If it is useful for all, you need to write TEXT rather than ALL.
  • Rules – a list of the regex pattern(s) used to detect violations.
  • passOnRegexMatch – this is set to false by default. It determines whether a rule is considered broken when a regex matches (false) or when it does not match (true).
  • Description – an explanation of the rule and why it may be important.

Flyway’s default regex rules

Flyway stores the default rules in the Flyway installation, within the ‘rules’ directory. This is a collection of TOML files with sample rules in them. You can find the full list in the documentation, but broadly they include rules that seek to:

  • Prevent destructive commands with a risk of data loss – such as dropping or truncating a table, or dropping a column
  • Guard against security misconfigurations – detecting commands that grant excessive privileges or make unauthorized changes to roles, users or passwords
  • Encourage best practices – finding tables with no Primary key or that lack documentation.

These built-in rules are useful for learning, but if you wish to improve them, it is best to copy them over to your project’s ‘rules’ directory and use the copy, along with any new rules you add to them. Otherwise, all your projects must share the same default rules, and you lose any improvements you make to the built-in rules the next time you update Flyway.

Adding custom regex rules

Fyway’s built-in regex rules focus mostly on security and data protection. It’s a good starting point, but you’ll soon want to add more rules to enforce or encourage SQL best practices and code hygiene standards. Redgate maintains a small library of custom rules on GitHub, which you can copy over to your project’s rules folder, but you’ll also want to create your own custom rules.

I demonstrate how to create, test and maintain custom regex rules in the second article in this series, and provide a small regex library for seeking out code smells that could at some point cause performance problems and unpredictable behavior. These smells are selected from my SQL Code Smells booklet, which documents what I consider the worst offenders. There is also a GitHub site, and a GitHub version of the book. At the time of writing, if you’re working on Windows or using PowerShell, then all new rules had to be written explicitly in the Windows-1252 file format, or else UTF-8 without BOM, to avoid an inscrutable Java error. This problem is fixed in Flyway 11.4.1 and later.

For this article, however, we’ll get started using just the default rules.

Running the regex rules

The check -code command runs static code analysis over the pending SQL migrations, running all the rules defined in filesystem:rulesLocation. You need to provide an environment so that Flyway can determine what dialect of SQL you are using, and you can specify the dialects for which the regex can be used in the rules file.

You can provide parameters to alter the way that the check runs

  • check.rulesLocation – where Flyway looks for rules. If you specify a location as a parameter, that is used instead of the ‘rules’ directory of the Flyway installation
  • check.majorRules – list of rules considered to be major
  • check.majorTolerance – the number of major rules violations to be tolerated before throwing an error.
  • check.minorRules – list of rules considered to be minor
  • check.minorTolerance – the number of minor rules violations to be tolerated before throwing an error
  • reportFilename – the output path of the generated report.
  • workingDirectory – the directory to consider the current working directory. All paths will be considered relative to this.

Make sure that there are no files in your rulesLocation other than your TOML files. Flyway will try to read any file, whatever its filetype and then ‘error-out’. A simple check -code command might look like this:

The best and most useful output of check -code is the JSON output:

JSON output from flyway code analysis

There are HTML reports for users of Flyway Desktop, but these aren’t useful within an automated pipeline.

Testing the regex rules

I’ve provided a SQL style assault course, geared for SQL Server, to test out the collection of built-in Regex rules and any custom rules you add subsequently (more on this in the next article). This test script should be saved into one of the project’s locations as a pending migration. It is also useful for developing Regex expressions. Just add to the pending script an example of a SQL Code smell that you are keen to discover.

You can run your regex code checks like this:

You’ll get a report like this:

File                                 Line Col SQL Smell          -                                 ---- --- ---------            
V1.2__SecondRelease1-1-3to1-1-11.sql 9674   1 RX001 (DROP TABLE statement)                                             
V1.3__ThirdRelease1-1-11to1-1-15.sql 1468   9 RX001 (DROP TABLE statement)                                             
V99__DummySQLSmells.sql                 2   1 RX014 (A table has been created but has no MS_Description property added)
V99__DummySQLSmells.sql                50   1 RX001 (DROP TABLE statement)                                             
V99__DummySQLSmells.sql                52   1 RX003 (TRUNCATE statement)                                               
V99__DummySQLSmells.sql                55   1 RX004 (DROP COLUMN statement)                                            
V99__DummySQLSmells.sql                59   1 RX006 (GRANT WITH GRANT OPTION statement)                                
V99__DummySQLSmells.sql               101   1 RX002 (Attempt to change password)                                       
V99__DummySQLSmells.sql               109   1 RX005 (GRANT TO PUBLIC statement)                                        
V99__DummySQLSmells.sql               113   1 RX007 (GRANT WITH ADMIN OPTION statement)                                
V99__DummySQLSmells.sql               115   1 RX008 (ALTER USER statement)                                             
V99__DummySQLSmells.sql               117   1 RX009 (GRANT ALL statement)                                              
V99__DummySQLSmells.sql               119   1 RX010 (CREATE ROLE statement)                                            
V99__DummySQLSmells.sql               121   1 RX011 (ALTER ROLE statement)                                             
V99__DummySQLSmells.sql               123   1 RX012 (DROP PARTITION statement)                                         
V99__DummySQLSmells.sql               125   1 RX013 (CREATE TABLE statement without a PRIMARY KEY)

The test script

In case you don’t have access to GitHub, or the script gets moved or lost at some point, here it is in full:

Conclusion

Flyway’s regex rules provide a configurable, lightweight approach to SQL code analysis. While not as powerful as the checks that use full SQL parsers, they catch many common mistakes and enforce some best practices. Their great advantage is that they are configurable.

A static code analysis is always useful because it checks what is in the SQL Script rather than the resulting metadata. As such, it provides a quick, lightweight safety net to prevent bad practices in migration scripts, or to ensure that code inserted only for testing or debugging is removed before the script gets into the migration chain.

This article gets you started with Flyway’s built-in rules but very soon you’ll want to start adding your own. In the next article, I describe how I develop and test new rules using a tool such as Regex Buddy and point out a few unique ‘gotchas’ around the Java regex syntax. I’ll also explain what I think is a more maintainable way of managing the rules, as your library grows, using a single JSON rules file and then auto-extracting from it the individual rule definition (.toml) files.

Tools in this post

Redgate Flyway

DevOps for the Database

Find out more