Product articles Flyway Database migrations
Reporting on Changes Made by Flyway…

Reporting on Changes Made by Flyway Migration Scripts

Phil Factor demonstrates some PowerShell tasks that will produce a high-level overview, or narrative, of the main differences in the metadata between two versions of a database, during Flyway Teams migrations.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Often, during development, it’s useful to be able to see, at a glance, which objects have been changed, between successive versions of a database. It is handy for such things as merges, after creating a branch, or for generating reports of what changed in what version.

I’ve previously shown a cross-RDBMS way to generate these change reports, by creating generic models of the database from ODBC metadata, in Discovering What’s Changed by Flyway Migrations. It works well enough and can be used by most databases that have JDBC connectivity, but it lacks some of the details you often need, especially with indexes and keys.

In this article, I’ll show an alternative approach for those who are using predominantly a single RDBMS (SQL Server) with Flyway and who need some of the extra detail that just isn’t available in the ODBC metadata. It’s based on my “Gloop” method of saving the details of the metadata of your database, including tables and routines, in JSON files. It is intended to be used with Flyway, SMO or Redgate Deploy. It works hand-in-glove with SQL Compare, providing the reports that SQL Compare can then use to home in on the details.

Keeping track of database design changes

Even if you are solely responsible for a database, it is useful to know when changes were made. If you are doing teamwork, it can prevent collisions taking place. If you are adopting branching and merging teamwork strategies, it becomes essential.

If you can keep a record of your database in a structured form, then you can track changes in the design of the database, over time. Traditionally, we used to do this by saving build scripts into source control and seeing the differences in the script via the ‘diff’ tools provided with the source control system. After all, what is a build script if not an accurate description of the structural schema of your database? Predictably, though, the method breaks down as the database grows. It is also a very manual process. To make matters worse, database developers won’t always work directly from a script, but prefer to use ER modelling tools, table-builders and so on: this means that scripts are likely to be generated. These are migration scripts, and they must be applied to the database, from which you then generate a build script. You then can’t track changes because the script may use slightly different syntax or script the objects in a different order, so you’ll inevitably find changes flagged-up where no change to a table or view is made.

At some point, you adapt to using object-level build scripts for each table, view, procedure and so on, which are better able to tell you where the changes happened. That’s fine as far as it goes, but tables, especially, can become remarkably complex and so you need to be able to drill deeper the details of what changed. Even a simple change to a constraint can cause a database release to fail.

Is there a better way of comparing databases than by using a build script? Sure. Relational databases can be represented as hierarchical networks. Databases have schemas that have tables that have columns, that have all sorts of attributes. In truth they are more like a network than a hierarchy if you add in dependencies. By creating a model of a database, rather than a build script, it is much easier to compare different versions of the same database to see what has changed. This is the way that any database comparison tool works, including SQL Compare. It allows database developers to get a summary of what has changed between versions of a database.

Recording database metadata in JSON

The Gloop technique works by saving the state of the database metadata at various points, in JSON form, so that we can use it to verify that a database is at the version that we think it is and to produce a narrative of changes or detect drift.

Unlike SQL Compare, it only checks the main objects, ignores SQL Server-specific features such a query store, and doesn’t check scripts within procedures. It has no visual interface and doesn’t generate any scripts. It’s not the point, because its use is merely to provide a general narrative of changes between versions to allow database developers to get an overview.

Getting a database’s metadata as a JSON document

I want a hierarchy that can be represented by JSON and is lean enough to be easily stored and scanned. SQL Server can be used to produce hierarchies by means of outer joins and the FOR JSON AUTO keywords. I’ve used this technique elsewhere to create and maintain a ‘data dictionary’ for your databases, in JSON format.

It is a good start, but it cannot produce lean hierarchies. When you think about it, object names, column names, type names and index names in relational databases are keys, so this would allow us to assume that they are unique and so to use the names as hashtable keys, where possible. In this way, we can make hierarchical paths closer to the RDBMS naming conventions.

We can do this ‘lean’ conversion in PowerShell, after getting the JSON document from a single SQL query. The SQL Query can get us a long way towards what we want, but it needs extra help to get to this level of compactness, which is intelligible to most database professionals at a glance because it uses database names and database terms as keys.

You can see that this is quite an accessible record of what’s in the database. We can compare objects like this easily in PowerShell to see changes in the database objects we’ve chosen to return. I must emphasize that you need a comparison tool to get a definitive comparison: here we just want the overview for our model because we only want to track changes in the objects in which we’re interested, for development work.

The single query that produces the database model, in a JSON document, is quite long, so I’ll just point you to the GitHub Repository where I keep it, TheGloopDatabaseModel.sql.

Download the file and we’ll run it in from the following PowerShell test routine, which transforms the JSON model into its’ sparse form. Under the covers, I’m converting it to PowerShell object notation which is then executed. I do this to be certain of the form of the resulting object. Doing it in a pipeline can be like nailing jelly to the ceiling.

Generating database models during Flyway migrations

Once we’ve satisfied ourselves that it works, we just need to transfer it into a form that allows us to add it to the list of tasks in our DatabaseBuildAndMigrateTasks.ps1 PowerShell library.

When we execute a migration run, in Flyway, it will trigger a script callback that detects the version number and fills in our credentials, before calling the $SaveDatabaseModelIfNecesary task to generate a database model for the new version.

The afterversioned callback script

This is the body of the afterVersioned callback, which I’ve called afterVersioned__Narrative, and which I’ve included within the Scripts folder of the GitHub project. It is called after the final versioned migration script in a migration run completes. I can’t use an afterEachVersion callback because all the ‘Each’ callbacks take place within the same transaction as the migrate, which gives Flyway its rollback abilities. Any metadata SQL call that happens in this callback is likely to deadlock the database.

When we run the Flyway migrate command (later) we need to pass into the callback the flyway ‘schemas’ list, via a placeholder. It isn’t strictly necessary for SQL Server, but it is essential for other database systems.

Trying it out

To see this in action, I’ve provided a test harness that, first, runs a series of Flyway migrations, generating a database model for each new version, and then produces a summary report of the changes made in each successive version.

It’s all one script, stored here as CreateNarrative.ps1, but I’ll demo it in those two parts, first generating the models then producing the change report.

Generating a database model for each flyway migration

To make this happen, you’ll need to copy the sample Pubs project from GitHub, here in PubsAndFlyway/PubsFlywayTeamsMigration/. You’ll also need to install Diff-Objects and Display-Object from my Phil-Factor/PowerShell-Utility-Cmdlets.

Normally, if you were astute enough to already be running the afterVersioned__Narrative.ps1 file in your Flyway Teams projects, then every version you get to will already have that lean JSON model of the database, so you can move smugly on to the report-generating part of this article.

Otherwise, you’ll need to run the following harness. It runs the sample Pubs database through eleven versioned migrations, one by one. For each version, the call to the $SaveDatabaseModelIfNecessary task, from the script callback, writes out its JSON database model. Make sure that you have afterVersioned__Narrative.ps1 in the Scripts directory (it may have a ‘DontDo’ prefix; if it has, remove it).

If all works well, your copy of the Pubs database should now be at v1.1.11 and in the specified location for your project reports ($MyDatabasePath in the DatabaseBuildAndMigrateTasks.ps1 PowerShell library) you’ll find a subdirectory for your database, and in that, a JSON model has been generated for every version:

A JSON model of the database metadata

Getting the “diff” reports

What we want is a report of the changes for every version as a markdown file that looks nice in GitHub. The second part of the test harness reads every model, in order, and compares it with the previous version. It then generates and publishes a markdown report providing a narrative pf the changes between each version.

This part of CreateNarrative.ps1 requires my Diff-Objects and Display-Object PowerShell cmdlets (the latter is used by the former) to perform the “diff”, both of which you can get from my Phil-Factor/PowerShell-Utility-Cmdlets. I explain how it works in a Simple-talk blog, and used the same cmdlet in my Discovering What’s Changed by Flyway Migrations article, which demos the alternative, cross-RDBMS technique of generating models of the database from ODBC metadata.

If all went well, each version would have three new files, which should provide you with enough flexibility to generate a report to taste. There will be a narrative.md that explains in what way the version changed the metadata. There is a CSV report of the changes and their location, and a JSON version of these metadata changes:

Change reports for a Flyway migration

The narrative.md report for each version describes what changed. In GitHub (or Typora, or similar), this will look like this, for one migration chosen at random:

What changed in each database version

In the root of the output folder, you’ll find the NarativeOfVersions.md report, which details what changed across all versions, the start of which should look something like this…

Summary report of changes across all database versions

Conclusions

It may be hitting you that if you make the build or migration process the heart of your database development work, you have a much better chance of providing the means of controlling and documenting what is happening and giving the whole development process better visibility. This is essentially what I’m trying to illustrate here, with the production of the narrative. With script callbacks, a lot of work can happen entirely automatically, producing analysis, code reviews, alerts or documentation. OK, it is a fair amount of up-front work, but once it is bedded down and working, it grids away to liberate you from the stultifying chores that would otherwise conspire to make your job a dull one.

Tools in this post

Flyway

Version control for your database. Robust schema evolution across all your environments and technologies.

Find out more