Product articles Flyway Database migrations
Database Development Visibility using…

Database Development Visibility using Flyway and PowerShell

The payback of DevOps is not simply in automation but in using that automation to increase the visibility of the development processes. This article demonstrates way to make Flyway developments more visible, regardless of your RDBMS, such as by providing a detailed migration history, and change reports that reveal detail of what is going on to all involved.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

For the shy database developers among us, participating in an organizational database-driven application can be a bit of a culture shock. Suddenly, a lot of people within and beyond the IT department need to know what you’re up to. It isn’t just your development colleagues who are involved with your work, especially in a DevOps setting: it is also the customers of your database, and those who use it for reporting. They need to be sure that it is on track to deliver what they need, that it complies with the legislation, and is secure.

In these circumstances, database development, if it is to be done quickly and well, needs to be visible.

Why should database development be so public?

The changes you make to an ‘organizational’ database, and the time it takes you to add features, will likely affect several applications, and many people; you need to give as much visibility as possible into impending changes, as early as possible.

Those responsible for downstream reporting, for example, need plenty of notice of impending changes and additions, especially if it affects their reports, as will application developers, specializing in interfaces and reporting. Testers will need to get advanced warning on what they’ll need to test and when their services are needed. Team coordinators and managers will need to know what aspects of the database development are in danger of slipping their schedules or might be spiraling into technical debt. There are a host of other experts that need to get a broad view of what is going on so they can advise at a time where a design is easily changed.

Of course, of all the colleagues that take an interest in database development, it is the other database developers that most need to be able to ‘see’ what is going on. This is because database development is even more prone to collisions and dependency dilemmas than application development. Ideally, you, as a developer, would want some sort of indication within the IDE that you’re using, that someone else is sneaking up on that table you’re depending on in that process you’re developing. You might also need to see what other work is going on, especially if, for example, you’re helping part-time with a long-term feature that is due to be released in a while.

Many database teams still use the old ‘shared model’ of database development because such problems manifest instantly but never become a festering issue. Brief cries of pain at integration test time, but quickly repaired. The other great advantage is that, depending on the relational database system you’re using, you know from the metadata, or from system messages, that a database object has changed, and when it happened. That’s important for team-working on a database, so database developers generally soon become experts in metadata queries.

Nowadays, branch-based development is more efficient, and allows for the separation of discrete features to the extent that their release can be timed appropriately. However, this approach will lead to collisions and conflict at merge time, unless team processes and disciplines are in place to help maintain the sort of visibility and immediacy that comes naturally with the shared model. This article will describe how to provide this, by saving all the information you need in JSON and from it producing the necessary change reports, migrations histories and so on.

Making Flyway developments more visible

In any Flyway project involving several developers, the task of integrating all the strands of work can get quite complicated. There are several different ways of integrating the work. Some developers, particularly those who are more used to Java or C# development, will take a workstation-based approach and leave integration and coordination to be accomplished within source control. This is what I call “branching by stealth”. It usually involves an individual developer creating a branch in source control and then working away in private on their workstation till there is a sudden merge request. It puts a lot of faith in the source control system to take care of the merge and help resolve conflicts.

Due to my experience with application development in large organizations, I prefer to take a network-based approach to database development that ensures that all project resources including branch databases are on a shared network and available to the whole team. I’d call it “branching upfront”, rather than branching by stealth and it encourages team members to work cooperatively on feature branches, spot and resolve potential issues early on, and leave source control do what it does best.

Understanding the current branching

A Flyway development, when done as a single stream of migrations, doesn’t scale. You must branch and merge, and you do this with a separate migration chain for each branch. These branches need to be network-based to allow team members to collaborate and to be able to quickly move between branches and participate in them. This means working with a hierarchy of Flyway branches, each branch supported by its own database. You cannot do it by working in isolation on individual workstations because you’ll get collisions from changes to dependent objects and objects that are depended on, and subtle time-wasting bugs from uncontrolled changes.

Flyway works smoothly when you change your current working directory to a branch folder that contains a Flyway.conf file. This file basically gives flyway the credentials, locations, settings and placeholders to maintain a database. I’ve explained the ‘Flyway branching’ model I use in previous articles; see Flyway Branching Walkthrough and What is the Flyway Teamwork Framework?

The examples in this article using the pubs project folder, in my Flyway teamwork project on GitHub.

What is the branching hierarchy?

It is easy to see the Flyway branching hierarchy. I just run a PowerShell function:

This produces:

[
    {
        "main":  "Pubs",
        "Variants":  {
                         "variant":  "NoData"
                     },
        "Branches":  {
                         "branch":  "develop",
                         "Variants":  {
                                          "variant":  "NoData"
                                      },
                         "Branches":  [
                                          {
                                              "branch":  "Accounting"
                                          },
                                          {
                                              "branch":  "Reporting"
                                          },
                                          {
                                              "branch":  "search"
                                          }
                                      ]
                     }
    }
]

It is a bit easier on the eye in YAML:

…and even easier when viewed in PlantUML. Just pop a @startyaml and @endyaml header and footer round the above code and paste it into PlantUML server:

Flyway branching hierarchy in YAML

Investigating branch configuration

So now we know what branches there are, but perhaps not all these branches are active and contain flyway.conf files. Also, for those that are active, what are the settings? For which database is each branch configured? We can use a function for this.

This produces a scripting object that saves all the settings being used by each branch. Once we have this information, we can save it as JSON or query it to inspect each branch’s settings. This is how we can see what is in the flyway.url settings, using the above cmdlet, which is included in the framework.

This will list out the flyway.url configuration for each active branch that has a configuration file in it. This tells us the RDBMS, server and database being used in each active branch.

Saving a detailed migration history

The trickiest aspect of Flyway is knowing when a migration was first applied and who did it. This is particularly important with the two permanent branches, main and develop because it can help in tracking bugs. It can help with source control. It also helps us to know who to ask or blame.

Flyway relies on the Flyway schema history table for this information, but this table can easily be removed by an injudicious Clean. You can, of course, disable the ‘Clean’ command in the Main or Develop branches, but in other branches, it is wise to ‘never say never’. It is better, I think, to write all out this Flyway migration information every time after every migration run, but only if the files you write to don’t already exist. The following code does just that. When we run it as a post-migration task in Flyway (more on this shortly), it will write out all the migration information the first time each version is created, along with the name of the developer who did it.

All the existing versions that don’t already have it, will now have the ApplyInfo JSON report in the Report directory for that version, explaining who did it, when and how long it took. This makes the information less volatile than it is in the Flyway Schema History table. The report would have to be deleted before it could be replaced, but it should already be saved to source control anyway when first created. Here is a YAML version of the JSON report for each version produced by the code, which is rather easier to read.

Flyway Applyinfo report

Each time a migration is done, the following information from the flyway info command is also put in a Migrationinfo.json file in the current directory:

Flyway Migrationinfo report

finally, we also have the information about the start of the branch, such as when it was started, who did it and with what script. We store this Creationinfo report in the root of the Versions folder.

With the framework, all this can now be done automatically, using a scriptblock task called $SaveFlywaySchemaHistoryIfNecessary, which works for any RDBMS. It can be used either in a PowerShell script callback (Teams only), like this:

Or in a conventional script, as part of a batch of scriptblocks that are called after a migration run:

And every version will have an account of the first time every migration was done:

Applyinfo JSON report

Detecting changes

If you adopt the idea of storing a ‘model’ of the major parts of the database (Schemas, columns, tables, views, procedures, functions, constraints, indexes and triggers) on every change, then it becomes a lot easier to detect change. You just compare the two models.

There are already tasks in the framework that store a JSON model for every version ($SaveDatabaseModelIfNecessary), and one that provides a narrative of changes between versions ($CreateVersionNarrativeIfNecessary), with a report in a markdown file. My article What is the Flyway Teamwork Framework? provides a demo of this for a PostgreSQL database, writes a complete version-by-version story of what is happening with an individual branch, for every migration run.

flyway version narrative report

This is just done by interpreting the result of the comparison, but there are plenty of ways of doing that. To make it easy to write custom reports, the framework also writes out the results of the comparison as a CSV file …

metadata changes csv report

… and a JSON report

metadatachanges JSON report

You can use the information in these reports to create your own messaging and reporting system based on what has changed. You may use an internal website or wiki or use messaging systems such as Slack.

The obvious application would be one that reads the latest metadata changes from all the branches of the project, and either alerts team members or does a daily ‘bulletin’ of changes. Any PowerShell process that uses the framework will have a Hashtable available that gives all the necessary information.

Eyeballing the migration code

You might well think that you could more easily find out what changed by just looking at the migration code. I had to do exactly that to check that the metadata code worked. I’d forgotten that I’d included in a few SQL handbrake turns, and the result was that it isn’t at all obvious from the migration what is going on. I think I must have been feeling bored. However, it serves to illustrate the fact that a SQL script isn’t always self-explanatory. A quick glance will convince you that no parser could ever detect what had changed.

Summary

For me, the heart of the effectiveness of DevOps is in automating as much as possible of the development processes and using automation to increase the visibility of the detail of what is going on. This visibility becomes essential as the number of people involved in, and with responsibility for, the development of a database increases.

Database development tools can no longer be IDEs that take you from soup to nuts through requirement gathering, design, implementation and deployment. To extend the cookery analogy, the support and tooling of database development is becoming increasingly like fusion cuisine in its flavor, with a wide range of specialized ingredients for messaging, bug-tracking, code quality, source control, database comparison, monitoring and provision.

Flyway’s great strength is that it fits into this more eclectic way of supporting development work. It is a command-line tool that is easy to integrate, especially if you can use a framework to do the basics, and it is an ideal way of eliminating the horrors of version-confusion in testing and deployment.

Tools in this post

Flyway

DevOps for the Database

Find out more