Product articles Redgate Flyway Database Build tasks
Comparing Databases and Generating…

Comparing Databases and Generating Schema Models with Flyway

This article explains how to compare databases and generate schema models using Flyway Enterprise CLI with PowerShell. It allows developers and DBAs to identify schema differences between environments and create schema models that capture the current state of a database, making it easier to track and review changes over time.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Development work can be nerve-wracking without an easy way to compare databases for differences. One day, you’re sure that the database in the ‘Test’ environment matches the one in ‘Develop’, only to find out later that someone made a manual change, and now you’re wasting your time testing an old version. Or maybe you need to bring the shared development branch up to date with your new feature branch, but you’re not entirely sure what’s changed in the meantime.

In the past, with no better alternative, we used to compare the text of generated SQL build scripts, but for that to work they had to be generated in the same way, by the same application. The real breakthrough came when schema comparison tools became available letting us directly compare two database versions, identify differences, and generate ‘synchronization’ scripts.

Flyway Enterprise’s built-in schema comparison takes this a step further. It tells you what’s changed and makes it easier to automate the process of generating migration scripts that will update a database between known Flyway versions. It can also compare a database to a schema model, a directory of object-level scripts representing a database’s structure, making it much simpler to track schema changes over time as you commit them to version control.

In this article, I’ll walk through how to use Flyway Enterprise CLI to compare databases and generate schema models using PowerShell. These models help track and review schema changes and form the foundation for generating migration scripts, which we’ll cover in detail in another article.

Basics of schema comparison

Database schema comparison will compare two versions of a database, a source and a target, for schema differences. Often, the goal is to generate a script that captures those differences. When the script is executed on the target database, the script will update the metadata of the target version to be the same as the source version, preserving the existing data wherever possible. It will create any objects that exist only in the source, drop any objects that exist only in the target, and update any object that is in both so that its definition matches that in the source. You can use various schema comparison options and filters to control how the comparison is done, and what objects it ignores.

Source and target options

Flyway’s schema comparison engine works with a variety of database representations, so you’re not limited to comparing live databases. Options include:

  • environment – you can compare databases in two environments, such as dev or test, simply by specifying each environment, defined in the TOML configuration file, and referencing it as source or target (e.g. -environment=dev).
  • schemaModel – the schema model is a directory of object-level scripts representing a database’s state.
  • migrations – the contents of the Migrations folder, which Flyway ‘materializes’ in the build, or shadow, database referenced by buildEnvironment in the TOML configuration file.
  • empty – an empty database
  • snapshot – a file capturing a database’s schema at a specific point in time.

These options allow us to compare and synchronize environments, even without direct access to the source or target databases.

What tasks can you automate using schema comparison?

Once you can compare two database schemas, especially without necessarily needing direct access to the live databases, then suddenly all sorts of development tasks become much easier. For example, you can:

  • Update a target database so it is the same as the source – e.g. compare Dev and Test database environments, to find and capture the differences and synchronize Test to Dev.
  • Create or update a schema model from a database – individual developers can capture their latest development changes to their local schema model, incrementally, and commit them to shared version control for object-level change tracking.
  • Generate a migration script – for example capture the changes to a branch database in a migration script that will merge the changes into the shared development branch
  • Create or update a database build or teardown script – Compare a database to an empty database to get a build or ‘baseline migration’ script. In reverse, you can get a deletion or ‘teardown’ script for the target
  • Compare a database with a snapshot, or two snapshots, to find out what’s changed – then save the changes for review or generate a synch script.

If you use a tool, such as an ER diagramming tool, to make changes to an existing version of a database or perhaps add a contribution from an offline version of the database, you can ‘mop up’ by creating a retrospective migration that encapsulates the changes you made. Or, if you are doing a lot of small ‘tidying-up’ jobs, you can consolidate the resulting ‘mess’ of hand-coded migrations into a single auto-generated script that goes from the previous version to the current version.

All these tasks are made easier by the fact that you don’t need to have the live version of either the source or target database; you can just associate each version with a ‘schema model’ or snapshot. If, as I do, you save a model for every version then you can compare any two versions of a database just by comparing their ‘models’. This will give you a helpful running narrative of the changes for every migration. It also means you can generate a migration script to jump between any two database versions.

Flyway Concepts with the Comparison Engine

Traditionally, when using a tool such as SQL Compare, Redgate’s comparison and ‘synchronization’ process was mainly done visually, so you could review the list of objects and then check, annotate, document or refine before use.

Flyway has adapted the process to allow it to be more easily automated. To do this, Flyway has separated out the ‘diff’ process that detects and reports the differences between two versions of the database, from the processes that generate scripts and apply changes. By doing so, it allows for a review stage between automatically generating the SQL scripts and committing the version change.

The comparison phase (diff) compares two versions of the database, the source and target, and produces a diff artefact, effectively a list of differences. The settings and preferences for performing the comparison using Flyway can be refined for your project from the desktop version of SQL Compare or Oracle Compare and then ported into Flyway TOML configuration for use in the automated process.

Once you have generated a diff artefact, from the diff command, you can:

  • Use the diffText command – to get a change report that will show you which objects need to be created, dropped or altered.
  • Use the model command – to apply the changes in the diff artifact to the schema model, for example to update it to reflect recent database changes
  • Use the generate command – to create a Flyway migration script for version changes (e.g., V2 → V3).

In this article, I’ll focus only on ‘diffing’ two databases to produce the artefact and then using it to create or update schema models. I’ll cover change reports and generating scripts in detail in later articles.

flyway diff command

Running Flyway schema comparison with PowerShell

The various configuration parameters that are relevant to these Flyway commands have an optional prefix that indicates the command to which they are relevant. This allows parameters such as ‘source’ and ‘target‘ to be defined with a different value for different commands, in the comparison process.

If you are using PowerShell to compare databases and ‘synchronize’ them, or generate Flyway migrations, or to manage the various comparison artefacts, it is best to always use the dotted notation and to provide parameters as strings. To be more precise, under some circumstances, you dispense with the dotted prefix, because Flyway can apply ‘namespace short-circuiting’ if Flyway can be certain that parameters belong to a particular command.

Preliminaries: creating the Flyway project using INIT

If your project is still using the old CONF configuration format, you’ll need to convert to using TOML, because that’s how the schema comparison engine stores its preferences. If you try to use the new commands while with a CONF configuration, Flyway complains that there is no such thing as an ‘environment’.

To convert your configuration to TOML on a project-by-project basis, the easiest option is to use the INIT command. It converts the project-level CONF configuration file in the local ‘working directory’ to TOML and creates an empty ‘schema-model’ configuration setting that defines where you keep the schema models.

Defining a source or target database environment

Any live database is defined, in Flyway’s terminology, as an ‘environment’. The parameters in a TOML ‘environment’ are mainly concerned with connecting to a database (see Environments Namespace documentation for a list).

Generally, you’re unlikely to need to use more than user, password and url. However, you can only sensibly use a password or user, and often the URL, if it is associated with a resolver that gets the values from environment variables or a secure location. Right now, there is no way to associate a user placeholder value, which rather limits the usefulness of the system.

As an example, here is how the main and dev environments might be defined in the flyway.user.toml configuration file. I am using resolvers for the url and user and supplying the values as environment variables. I am also using a ‘configuration override’ for dev to tell Flyway where to find the migration files for that branch:

We would then run Flyway commands on each branch database, like this:

Many organizations will require their database development teams to keep database connection details protected as individual ‘secrets’, such as individual, encrypted files, one for each database environment, rather than placing them in a single TOML file in the project area.

This suggests that the connection details and credentials are best passed securely as parameters. We’ll illustrate this in the next example

Diffing two databases

We compare two schemas using the Flyway diff command, specifying what type of source and target we want to use: a database environment, schema model, migrations (a build environment), an empty database, or a snapshot.

The changes detected are stored in a diff artefact, which we use to generate a script or produce a model of the database and so on. Unless you specify otherwise, the artefact is stored in a temporary file location and is updated every time the diff command runs a fresh comparison.

Here we create the diff artifact file. For simplicity in this demo, I’m just storing all the sensitive connection information and credentials in CONF files in the secure user area. We then turn them into main and dev environments and compare them.

The diff artifact records the changes that were present at the time that the diff command was run. They are a volatile record and aren’t for permanent use. They are intended to be the basis for the output that you require. If you need a permanent record of the schema of a version, you’ll want to generate a flyway snapshot or a schema model instead.

Creating and updating schema models

Here, we are going to use the diff artifact to create and update a schema model, where the current ‘state’ of every object in the database is represented by a script that will create it.

Creating and updating the current schema model for a database

The TOML configuration has a default location for the schema model and the init command creates a sub-directory for you, in the Flyway project folder, called schema-model, to which the schemaModelLocation in the TOML file points. When you refer, in code, to the schemaModel, this location is used by default.

This is designed to allow us to create the initial schema model and then keep it up-to-date with the latest development changes. For example, in the following code, we chain the diff and model commands to create the temporary diff artifact and then immediately use it to create the initial schema model that represents the main environment:

In the Flyway Desktop development workflow, the source is generally the latest development version of the database, and the target is the schema model folder. With a similar command, Flyway will update the existing model, to reflect the latest development version.

Storing a schema model for each version

From experience, I like to have a Versions directory that keeps models, build scripts, snapshots, and other artifacts for each relevant version. All the artifacts for each database version go into an individual subdirectory:

saving the schema model

Having these artifacts allows me to run ad-hoc development tasks quickly, such as building a particular version from scratch, or quickly comparing any two versions to see the differences and do some debugging.

By default, the Flyway configuration is designed to maintain only the current model, but it is flexible enough to adapt to other teamwork processes. To save a model for each individual version of the database, we need a slightly more complex routine that determines the current version, works out the location where the model should be stored, creates the directory, if necessary, and then cleans out the current contents, if there are any, and uses this location as the schemaModelLocation.

In this example, we are using the $main array as before to specify the main environment. I’ve shown in the previous code an example of how this array is created by reading my connection details, settings, and credentials from the appropriate secure file and applying it to Flyway via a PowerShell array. This technique is described here in Pipelining Configuration Information Securely to Flyway.

With all that in place, you just need to define the path to the root of your working folders $env:FlywayWorkPath and run the following code:

Conclusions

This is the first glimpse at the power of Flyway Enterprise in running schema comparison (‘diffs’) and then generating a schema model. I’ve tackled this topic first because we can use the model to track changes at the object level, between Flyway versions. Developers can also save schema models (or snapshots) per database version, and use them retrospectively to generate a lot of other useful scripts, like baseline scripts and undo scripts, or to do some debugging.

Until now, I have done all these tasks using tools like SQL Compare and Oracle Compare, and it entailed a lot of overhead. Having worked out the intricacies of the TOML format, and the new Flyway interface to the Redgate schema compare engines, it is a much smoother process.

Tools in this post

Redgate Flyway

DevOps for the Database

Find out more