Product articles Flyway Database Testing and Quality
Running Dynamic Code Quality Checks on…

Running Dynamic Code Quality Checks on Flyway Databases: Table Smells

How to extend the range of SQL code analysis, during database development, to include dynamic analysis of the database metadata. This will allow regular checks for problems with the design of your tables and indexes ("table smells") that can affect database performance and data integrity.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

What is dynamic code analysis?

When you’re doing a routine build of a database, it is very useful to run a static code analysis. This is a SQL syntax check for mistakes or problems in the source code used to build or alter a database, which could reduce the performance and reliability of that database. Static code analysis will detect many common ‘code smells’ in your databases, but on its own it isn’t enough, because it will miss many of the mistakes or deficiencies that cause performance problems in a live database. You must also access the database metadata, through a live database connection, to check for missing indexes, duplicate indexes, and poorly designed tables, including tables that have no unique constraints at all, and tables that have no primary key (called ‘heaps’). These dynamic checks are best done using SQL to access the metadata functions.

You’ve all got favorite SQL scripts to find these sorts of problems, or ‘table smells’, and I have mine. The PowerShell automation I’ll demonstrate in this article will automatically run my SQL ‘metadata sniffer’ script on a Flyway-managed SQL Server database. I like to save the report in a JSON file because it is more easily processed by PowerShell. This article will demonstrate how to do that.

Why do we need dynamic code analysis?

If you do a regular full build as part of your database development cycle, you can be pretty sure that the result will be what the code intended. If, however, the database is being continuously migrated between versions, then this won’t necessarily be the case. For example, a fresh build of a database will have all constraints enabled. However, for an existing database that you are migrating, you may find out that, at some point, a developer disabled all constraints for a table, or for all tables, to do a data import, and then forgot to switch them back on. That sort of change could linger in the migrated database, undetected by Flyway or by static code analysis, and can sometimes make its way into migrations run on the production database.

To get a report of all the ‘table smells’ you’ve forgotten about in designing your database, or that have crept into a previous version of a migration, you need to run a dynamic analysis of the live database. This sort of dynamic code quality check is particularly important with a Flyway migration. If you don’t do it and, in our example, constraints aren’t switched on, then performance tests will be puzzlingly slow for a migrated database compared to a fresh build of the same version. Worse, if a trigger is switched off, say during debugging, and isn’t switched back on, then a process may fail, and data can get lost or mangled. I could do on: please believe me, this sort of test is like a fire alarm. It is valuable even if the alarm never goes off!

Running code quality checks in a Flyway development

What I’ll be doing is creating a script block task ($ExecuteTableSmellReport) that runs a set of code quality checks, using various metadata functions, and produces a ‘table smells’ report, in JSON format. You can run this as a standalone task, but it can also slot into the PowerShell framework that I use to do Flyway development with Redgate Deploy.

There are two problems in demonstrating how to do the various chores in building and scripting databases. The first is that every development and deployment system for databases I’ve experienced or come across are different. If there is a single yellow brick road to stress-free database deployment out there, I haven’t found it. The second problem is that any build system can get complex unless you actively resist it. To get around both these problems, I’ve used, in these articles on Flyway, a rather unconventional approach to scripting that is more akin to Lego modelling. I use script blocks to represent each task and chain them in an array to define a process. In this way you can adapt it to the way that you prefer to work and that suits your team, business and application. If you’re thinking of using Flyway, this system will get you airborne.

Which table smells are reported?

The idea is that every version of the developing database should have an easily accessible report on any issues to do with tables and their indexing. As each database version is immutable, it only needs one Table Smells report, done once. I find it a good memory-jogger. There is nothing worse than having a heap that works perfectly well with test data but slows to a crawl when the data pours in, once the database is in production. This will tell you if a table has one or more of the following issues that need investigation or might raise eyebrows. Some of these won’t be seen in a newly built table but can creep into any existing database that is being successively migrated from version to version, rather than being built from source.

This report will tell you if a table…

  • is a wide table (set this to the number of columns you consider to be ‘wide’)
  • is an undocumented table (no Extended property explaining what the table is for)
  • is a Heap (An unordered table, which in SQL Server means a table without a clustered index)
  • Has a GUID in a clustered Index (performance)
  • Has no Primary Key (no obvious way of retrieving even a single row)
  • Has a disabled Foreign Key (will cause performance problems)
  • Has untrusted FK (won’t be used in indexes)
  • Has a foreign key that has no supporting index (it generally helps performance)
  • Is unrelated to any other table (an abandoned feature?)
  • No candidate key (no unique constraint on column(s))
  • Has no index at all (High performance-risk-effectively write-only once it grows in size)
  • Has disabled Index(es)
  • Has leftover fake index(es). A ‘hypothetical index’ is used in debugging and such an index can be neither considered nor used by the query optimizer
  • Can’t be indexed (an index can be created on a table only if there is a column that isn’t a string of more than 900 bytes.)
  • Has disabled constraint(s) (was that intentional?)
  • Has untrusted constraint(s) (won’t be used in indexes)
  • Has ANSI_NULLs set to OFF (can cause problems in SQL server)
  • Has a column collation different from the database (Did you mean to do that?)
  • Has a surprisingly low Fill Factor (can affect performance)
  • Is not referenced by any procedure, view or function (is it an abandoned feature?)
  • Has a deprecated LOB datatype (used back when we wore flares and had mullet hairstyles)
  • Has unintelligible column names (probably done to try to make it look more scientific)
  • Has non-compliant column names (unnecessary, and can complicate development)
  • Has a trigger that hasn’t got NOCOUNT ON (can result in cluttering the message stream)
  • Has a disabled trigger (needs investigating. Can happen by accident)

Generating database code quality reports

The $ExecuteTableSmellReport script block task that runs these SQL checks and generates the report is included in a PowerShell file called DatabaseBuildAndMigrateTasks.ps1. It contains a host of other tasks, as described in my previous article Running SQL Code Analysis during Flyway Migrations, along with a helper cmdlet that I’ll describe shortly, called Process-FlywayTasks, which makes it much simpler to execute the chain of tasks you need, before or after you run a Flyway migration.

You can find this file in my GitHub repo and once you have it, you can include it at the start of your script that connects to the database and runs the list of required tasks. The simplest way to do this is to put it in the same directory as your Flyway migration scripts, and then pull it in with code at the start of each script, like this (I’ve included a sample script called FlywayCodeSmells.ps1 in the same GitHub directory):

Each task writes to the hash table that is passed between them and will let you know the directories that it uses. It also tells you about errors, exceptions and warnings.

An ad-hoc code quality report

Purely to test out the report, we can just supply the parameter set that we need to locate the database we want to check, supply our credentials, and so on, on the command line. Of course, doing this isn’t a good idea; you really don’t want passwords, User IDs, server names or database names in PowerShell scripts unnecessarily.

The result should be a TableIssues.json report, saved in ..TheProjectName\TheVersionNumber\Reports. It will be immediately apparent that here you have a task that can be easily adapted to provide any SQL-based report, particularly if it produces a JSON output. This report is to get you started: you can adapt it to your own purposes.

Generating a code quality report during a Flyway migration

Here is a version that runs after a Flyway migrate:

Saving the parameter set securely

To avoid supply having to supply these credentials every time, and in an insecure way, we instead use the $FetchAnyRequiredPasswords task to fetch and save the password from the secure user area, and the $FetchOrSaveDetailsOfParameterSet task to save a named parameter set for each database, again saving the details on disk in the user area. I’ve described how the latter works in a previous article.

To execute this task, along with other tasks, we can now use code like this:

Using the Process-FlywayTasks helper cmdlet

This can all be done much more simply by executing the Process-FlywayTasks cmdlet that is provided as a helper function in the task file (DatabaseBuildAndMigrateTasks.ps1) .

Or, if you need to access the hash table afterwards you can do it a slightly more complicated way that preserves the hash table but is otherwise the same. You first create the $DatabaseDetails hash table, so that the reports can be referenced after the tasks have executed.

Table                     Problem                                      
-----                     -------                                      
dbo.flyway_schema_history Undocumented table                           
dbo.flyway_schema_history unrelated to any other table                 
dbo.flyway_schema_history not referenced by procedure, view or function
dbo.stores                not referenced by procedure, view or function
dbo.discounts             not referenced by procedure, view or function
dbo.pub_info              not referenced by procedure, view or function
dbo.roysched              not referenced by procedure, view or function
dbo.sales                 not referenced by procedure, view or function 

The Table Smell Report task

We’ve seen the report it produces, and all that remains to explain is the code that executes the ‘table smells’ task. It uses SQLCMD, so the first thing you must do before using the task is to provide the alias for it. I use SQLCMD because it is quick, has some wonderful ‘extra’ features for doing complicated things, and because it requires no extra libraries. It will be immediately apparent that this sort of task can be adapted to all sort of SQL-based reports, particularly those that have JSON output.

Conclusions

Although it is, hopefully, useful to have a report about Table Smells, this article was also aimed at showing how one could go about creating any task that got a JSON report from a SQL query. This represents many possibilities for tasks that are designed to make checks on versions of databases, including running integration tests and unit tests. You’d probably want to pull in a whole directory of SQL-based integration and unit tests, and execute them one after another, with a standard JSON result.

Now, with this sort of task provided for to use along with a Flyway migration, you now have the basis for the build/migration itself together with the added scripts that give you integration with existing release systems as well as the basic routine quality reports and the tests.

The system may seem complicated compared with doing it ‘by hand’, but when you factor the time, the boredom and the inherent inaccuracy of the manual approach, and the horrible possibility that it just won’t get done routinely because of the pain, you’ll see why I’ve adopted the automated approach. Daily build and release? Yeah. Why not?

Tools in this post

Flyway

DevOps for the Database

Find out more