Product articles
Flyway
Database Testing and Quality
Automating Flyway Development Chores…

Automating Flyway Development Chores using Database Diagrams

If you can produce a quick Entity-Relationship diagram for any version of a database, you'll have a simple way to 'sanity check' it for unreferenced tables, missing keys and other design flaws. This article shows how to auto-generate these diagrams when running a migration, using Flyway Teams and PowerShell.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

You might think that the only purpose of Flyway is to apply migration scripts to a database, in the right order, to create whatever version of the database you wish. In fact, Flyway lends itself to being the centre of a development and deployment system and can help you automate many other development chores.

Developing databases is different to developing procedural code. Developers need more information and need to automate far more of the routine ‘chore’ work. Luckily Flyway, with its architecture of placeholders, callbacks and configuration files, is designed to make automation easy.

Catering for ad-hoc reports

Whenever Flyway does a migration run, it inspects the ‘locations’ that you’ve specified in the locations configuration parameter for any callback script that it must execute. If you’re using the community version of Flyway, you can use either SQL or Java callbacks. With Flyway Teams, you can also use PowerShell.

The sort of task that you’d generally want to script out at this point would be creating a build script, doing a backup of the new version or creating a database model. These are all-or-nothing tasks that don’t require input from you. You would never want a light backup, a well-grilled backup or a single-source backup: just a backup.

There are of course many other routine tasks to be done during a Flyway development, such as bug fixing, database code reviews, performance tuning, investigating potential data issues (e.g., causing problems or inconsistencies in business reports) or doing a security check. I like to call these the ‘canteen’ tasks because they feed on the results of the all-or-nothing tasks.

For each canteen task, you need a particular report to answer specific questions. For example, when performing a database code review, you’ll want to answer questions such as:

  1. What tables aren’t referenced by any other table?
  2. Are there any duplicate or near-duplicate indexes?
  3. Have you a table without any candidate keys?
  4. Are there any tables with columns incorrectly allowing NULLs?

You can extract the answers from the database model, for the version of database you’re investigating. It is all about making it easier to maintain database quality standards and, with this approach, you don’t even need access to the database itself. That is neat, because you then don’t need to learn the metadata calls for that particular RDBMS.

The FlywayTeamwork PowerShell framework that I use generates these models if you request them, for any of the more common RDBMSs (SQL Server, PostgreSQL, MariaDB, MySQL and SQLite). My article, Simple Reporting with Flyway and Database Models, shows how to generate the model, produced by a script called by Flyway, and then use it for simple reporting. In this article we’ll take it to a more advanced level.

Producing the code for a very simple Entity-Relationship diagram

Mapping and documenting the Foreign Key relationships between tables is one of those many rather tedious routine tasks that tend to be neglected when a team gets under pressure, which is a shame because it is so easy to spot mistakes in a diagram, whereas errors tend to get lost in lists or code. Far more time is wasted tracking down foreign key problems than are saved by abandoning diagramming. It is part of the greater task of working out the dependencies of a table and the consequences of making a change. So why not automate the process?

Databases are easier to develop if you have Entity-Relationship diagrams covering the tables within the domain of your work. I usually create the UML diagrams for designing databases and processes from PlantUML. You can design a database change, and communicate it to the team, by generating a diagram. These diagrams are scripts, so they can easily be placed in source control, and a change can automatically update the graphic.

Here is a small sample of the sort of thing you can produce:

A simple Entity-Relationship diagram

PlantUML will create Entity-Relationship diagrams as well as classes, states, deployment, or Gantt charts. I’ve already demonstrated how it can produce a Gantt chart from the contents of the Flyway Schema History table, to track the progress of a Flyway project.

Although it will produce Entity Relationship diagrams, you generally wouldn’t want a diagram for an entire database: no paper-size is big enough for a typical corporate database. You’d be more interested in groups of inter-related tables but even then, you’d probably want to exclude certain relationships from the diagram.

Generate the PUML code for the E-R diagram from the database model

Your first task is to locate the model for the version of the database for which you want to generate the E-R diagram.

I’ll refer you to the Simple Reporting with Flyway and Database Models article for details of how to generate the model using Flyway and my PowerShell framework. If you are already using the project directory as your working directory, and have set up all your working parameters, you can read it in like this…

Alternatively, I’ve provided some sample models for various RDBMSs, so you’ve something to play with. You can read in any of these models, like this:

Having done that, we can create the code for the diagram. Here is the script that will generate the PUML code from the model that we just read in:

This will produce the following PUML:

Viewing and editing the E-R diagram

We can start very simply with PUML. All you need is the web-based PlantUML editor to render the above PUML code into an E-R diagram. Simply paste in the code and hit “Submit“, which gives our initial diagram (we have to shrink even this database to view it):

Basic E-R diagram for pubs database

We can do a bit better than that, though. Firstly, we can make it a lot prettier by editing the PlantUML code:

My additions give the code hints on displaying the diagram. This has to be done by hand. The use of a special ‘handwriting’ font and ‘hand-drawn’ lines is handy for people who are struggling with process and don’t realize that this is a ‘working’ diagram rather than a splendid design proposition. It must be done on a PC because it uses a font that isn’t available to online diagram generators.

Here is the resulting “prettified” E-R diagram:

E-R diagram with styling

This will immediately flag up missing relationships between tables, of course, but we could really do more, and we will…

Which keys are being used?

We’d like to know which keys are being used to reference other tables, and which keys are being referenced. In the model, we already pick up these details for the comments or documentation that are applied to a table, so we can add them to our diagram. We could do with a title and date too, so let’s add those as well.

This script will produce the following PUML code. You can prettify it quite a lot more, but I am trying to keep this relatively simple.

In my case, for the pubs the sample database I’ve provided, you get this diagram:

E-R diagram showing Primary and Foreign Keys

Investigating relationships for a subset of tables

Well, this is fine, but it still doesn’t go quite far enough. Even with our small, sample database the diagram starts to get cramped. We can do help a bit by altering the direction of the relations but, actually, we probably don’t even want the entire database plotted out, just those tables that are directly involved in our investigation.

This leads us to a central theme of this article, and more generally of what’s required to support a DevOps approach to team-based development: you cannot predict all the requirements that the team will have for information. You’ll never provide everything necessary in advance. It is much better to provide the ingredients (the model and the code to extract information from it) than the eventual meal. If you just served up an ER diagram of the entire database on every revision, it would make anyone’s eyeballs swivel even if they had the necessary eyesight. It is much better just to provide the basic information you need, by automation, and at that point allow you to find out just what you need.

In this case, we’ll improve the code so you can specify a group of tables for which you need to investigate the relationships. We’ll do it just by passing the name of a member of the group of tables that you need to examine. In this case, we just specify a member of the group such as ‘publications’ to get just a collection of linked tables.

Diagram Description automatically generated

I can’t show in the article all the code to allow you to do this, so I’ve prepared it as a Cmdlet on Github called Create-PUMLEntityDiagram.ps1. However, it really is just a slightly more complex version of the code I’ve shown here.

Either you can generate the model for the whole database or, to get the list of tables, you choose one the tables from the group that you wish to investigate:

In this case, I chose TagTitle, but it could be any of the main group.

Summary

Once you have a model of a database, you have the opportunity for gaining a lot more information about the database you are developing. A model is merely a handy way of taking a snapshot of the main features of a database. It in turn facilitates a lot of reports, quality checks, diagrams and programming aids.

It isn’t always possible to extract all the information you need just from the model. With performance work, for example, you need the live database. However, for a lot of the time, information derived from a model can give you many insights into things that aren’t quite right with a database.

Fortunately, Flyway evolved with the advice and input of teams who wanted to automate as many of the processes around database development as possible. It makes all this much more convenient: you can generate a model from a callback and then kick off whatever further actions and reporting from that model that you need.

Tools in this post

Flyway

DevOps for the Database

Find out more