Using Flyway and Data Containers for Database Testing: Rapid Setup and Teardown

This article demonstrates how to use the rgclone CLI in Redgate Test Data Manager to automatically save each new version of a database created by Flyway as a data container revision, tracking which Flyway version maps to which container revision. We can then, from a single data container, load any version of the database during test setup, as well as quickly 'reset' a container during test teardown.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

There are several good reasons for wanting to save a new revision for a data container. We can use rgclone load to load any version of the database, to inspect it, run tests on it, or compare it to another version. We can use the reset command to return the data container to the previous saved revision, removing all subsequent changes, both schema and data. If we need something more permanent, we can graduate a revision into a new image, from which we can spin up disposable data containers whenever we need them. I demonstrated all of this in my previous article, Data Container Revisions Resets and Graduations.

Data containers for test-driven database development

Why is any of this exciting for a developer? Imagine you need to run a series of tests on the latest version of a Flyway database. You’ll need an automated way to run each test so that it establishes the correct version of the database, with the required data, runs the test actions and reports the results then tears-down the test, resetting the database schema and data back to the starting version, ready for the next run.

A teardown script, rather like an undo migration, can be tricky to write and a considerable effort to test. Either you are lucky enough to have a database schema comparison tool that does a lot of the work for you, or the usual alternative is to restore a backup of the database version under test, but this takes a lot more time, and often requires the help of a DBA.

By comparison, data containers are a gift for any developer faced with dealing with the teardown of a test, or needing to undo the effects of a failed migration that made changes that can’t be automatically rolled back within a transaction. In either case, all you need to do is reset the data container to return to the previous saved revision. Each time you use rgclone to save a revision, it captures any changes that have been made to the live, containerized database in a lightweight filesystem snapshot. Resetting, or reverting, to a previous revision (‘snapshot’) is simple, fast and economic with network-based resources.

This is one of the reasons why rgclone is an essential part of Test Data Management. There are other ways of doing it but nothing quite as easy. The joy of using just the one container is that the connection details remain the same, and there is the minimum of use of network resources

Demo: automatically saving Flyway versions as data container revisions

Although there is some interest at first in saving revisions from the command-line, the novelty eventually wears off. If we want to record every Flyway database version as an rgclone revision this requires an automated process. For example, we might consider automatically saving a revision in an afterMigrate callback script.

We want to save each ‘versioned state’ of the Flyway-managed database as a data container revision, so that we can easily revert or reset the database to specific states. We’ll need to keep a record of what revisions have been saved, and to what Flyway version they correspond, because we want to save a revision once, for each data container. We’ll subsequently repeat a migration, but unless we are trying to defeat the system, it will always produce the same database schema/design.

The most obvious strategy would be to apply a tag to each data container revision to indicate the Flyway version. Unfortunately, there is currently no way of doing this. We could track which revisions correspond to which Flyway versions in a table within the database, but revision points apply only to the container rather than the database, and it is quite likely that more than one container will be in use, probably with different revision points.

It is time to think about the unpalatable, and have a file, such as a JSON file, that stores the revision tracking information. We’d probably best put it in the current working directory because the Flyway user is very likely to run the system from there.

 

Autosave flyway database versions as data container revisions

So, here is the script for a Apply-RGCloneRevisionToContainer function that automates the process of saving the new Flyway version as a data container revision, writing to a Record.json file that tracks Flyway versions to data container revisions:

Where and how can I use this?

This routine can be done in PowerShell at any point in the Flyway database development process. It can also be done in a Flyway callback. To use it in a callback, you’ll need to provide to the callback the name of the ‘Clone project‘, either in an environment variable or a Flyway placeholder. I follow a Database-Project-RDBMS-Branch (-Version) naming convention for Clone projects.

You’ll need your admin team to set up the TDM Clone cluster. On your local workstation, you’ll need to have installed the Cloning CLI (rgclone), provided the API endpoint for the Clone server, and authenticated to it. The rgcloneauth authentication process will, in practice, require an authentication token rather than the OIDC identity provider, though the latter can be made to work for a script despite its timeout, if you either feel lucky or if you take care to re-authenticate before running the script. All of these basic setup details are explained in my previous article, Getting Started with Cloning in Redgate Test Data Manager.

You will need the Flyway project to be your current working directory, and you will need to supply the name of your data container, which is just the “Clone project” name with a “-container” suffix. You will also need to have executed a script that saves the connection details and credentials for the container to your user directory, either as a Flyway configuration (.conf) file (see Provisioning Clones for Flyway Projects) or as JSON file (see Data Container Revisions Resets and Graduations).

An easy way to test it out would be use the example code from my previous article to establish a correctly-named data container holding an empty Pubs database, then use this Flyway project to migrate it to V1.3. Having done this, you can use the Apply-RGCloneRevisionToContainer function to save this new Flyway version as a data container revision (rev.1, in this case):

In the work directory you should find the revision tracking file:

tracking flyway database versions as data container revisions

Tracking changes to each revision: this can get complicated

In this article, we are just playing with the data container for the main branch of a Flyway Project, in the assumption that we’ll just reset to the starting revision. All we are trying to achieve is to save every migration at a particular Flyway version just once. The most likely reason for doing this is for test teardown or to recover from a failed migration where we cannot use the database’s built-in transactional rollback. It is a relatively neat and simple process that discards all the work after the saved revision point.

For this simple task, we don’t really need our JSON revision tracking file, that relates revisions to Flyway versions. However, things can get more complicated. If, instead, we use the load command to revert to any older revision, then rgclone must perform a more complex function that allows it to deal with several ‘alternate realities’ that arise from any changes made to those older revisions.

The first time a data container is created from a data image, a new timeline is created by default. This becomes like a ‘Main’ branch in version control, and the changes to the containerized database can be recorded along this timeline, as a series of revisions (rev.0, rev.1, rev.2 and rev.3 for example). When you load a previous revision, rev.1 say, a new timeline is created that behaves somewhat like a new branch. The data container for this branch is assigned a randomly allocated ‘branch-identifier’ suffix. You’ll notice its name is recorded as:

Any subsequent changes are saved to this separate ‘branch’ as rev1.xxxx.1, rev1.xxxx.2, and rev1.xxxx.3, where xxxx is the branch identifier. Now, our record.json tracking file would need to relate all these branch revisions to their Flyway version! This would allow a developer to keep several working revisions of a data container and to switch between them during development.

The system you adopt to keep track of these ‘branches’ in the container is a topic for another article. Basically, much depends on the database development methodology you are using. You could, of course, switch to the Flyway work directory for the correct branch of the project, and then save the relationship between Flyway versions within the branch and the data container revisions, but there are other possibilities.

The best way of using clones

It is best to understand clones you use as being volatile. They should be short-lived, and never considered a permanent record of your work. A volatile clone is just fine for testing, because you can create it from an image at a known database version, run any sort of test on it, and dispose of it.

A clone is economic on resources when young, because only the local changes are saved. However, operations such as re-indexing or intensive OLTP operations can turn an advantage into a disadvantage. There is a good reason for working databases being designed the way they are.

If clones are disposable, they must be easy to create. If your testing requires you to use revisions, then you need a scripted routine that runs unattended and turns a newly-created container into the version of the database that you need for testing. To do this, I’ve provided an automated system that runs with Flyway, for creating data container revisions at certain points in the migration system.

However, I would argue more radically than this. When you use Flyway in the spirit that was intended, the schema of your database is determined only by a series of migrations that, once run, are never altered. This means that it is only the migrations that must be preserved as the source of the metadata. It isn’t hard to preserve an immutable file. If you need to see what has changed between two arbitrary versions you can do a text-based comparison between build scripts, use a comparison tool, or compare an object model of the two versions.

The loading of the data is another matter and presents its own challenges. A database requires several different datasets to allow for complete test coverage. If, say, you needed to maintain four different datasets, you would need to have an image for each dataset. You could then apply a migration on these all datasets. A data image can therefore be chosen at any previous version and migrated to the version you want – but that’s a story for another article…

Tools in this post

Flyway

DevOps for the Database

Find out more

Redgate Test Data Manager

Reliable and secure test data provisioning

Find out more