Product articles Flyway Database migrations
Flyway’s Clean Command Explained…

Flyway’s Clean Command Explained Simply

The Clean command resets a database to its initial state, before any Flyway migrations were applied. In other words, it empties the database. This can be useful for any development task that requires that you recreate the database structure, or for tearing down a test harness. It also allows you to try out experiments and alternative strategies within an isolated feature branch, and then reverse out of them.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

How Flyway Clean works

On issuing the Clean command, Flyway connects to the target database and drops all objects, in any schemas under its management, in the correct dependency order. This will include deleting the Flyway schema history table, erasing all records of applied migrations. If this is a single-database project, you can now safely make changes to existing migration scripts without violating Flyway’s validation checks.

In the following example, the Flyway project manages the dbo and people schemas, as defined in the flyway.schemas configuration item. When we connect the project to the existing database, it clears out all the objects in these two schemas, in dependency order. In the process it will remove the Flyway schema history table, which by default would have been in dbo, the first schema listed in flyway.schemas. Utility objects in the utils schema are maintained by a separate project, which will manage the utils schema only, and are left in place.

Flyway Clean in action

With the next Flyway action you perform, after a Clean, Flyway will recreate the Schema History table in your default schema, gather all the files from your list of migration paths, and use the filenames if finds to create a chain of files, ordered by the version numbers within the filenames.

Uses for Flyway Clean

While the Clean command’s wanton destruction should be disabled for production databases (see later), it can be very useful in the earlier, more ‘experimental’ or creative stages of database development. It can allow a rapid database “reset” in development and testing work and gives a team a means of consolidating a messy series of migrations, in a feature branch, into a single clear and logical ‘merge’ migration.

Rapid database reset in development and testing

In the early stages of database development, developers need the freedom to experiment, often trying out and testing several different design strategies in a ‘sandbox’ database, before deciding the best one. There is little need to preserve these “early drafts” for posterity. After all, even Harper Lee took several years, and many drafts, to produce the masterpiece ‘To Kill a Mockingbird’. Very few people would want to read the early drafts. The same is true of the dead ends and failed experiments that are often produced by even the best database developers in the early stages of design.

In the early stages a developer will just need a migration script to build the latest development version from clean (or to the version at which a branch was created) and a quick means to load any test data. With this, they can adopt a “clean-rebuild-fill” approach where they can test theories and make mistakes on their sandbox or branch database, without affecting the work of others, and then immediately clean the database, rebuild it at the initial version and reload the data (or you restore a backup, if you took one before running Clean). This clean-rebuild-fill approach will give developers more confidence that they can always quickly reproduce any version of the database.

When to use Undo instead?

If you are repeatedly testing a tricky migration that affects existing data, in a branch database, you can consider using Undo migrations instead, which will return the database schema and data back to the current version. By testing undo migrations alongside them migration itself you also have the making of a rollback script, should the migration go wrong in deployment.

Similarly, an automated test run, for regression testing, will need to create the database at the right version, load the test database, run the tests, check the results are correct and then ‘tear down’ and reset, ready for the next test. The Clean command is ideal for the tear-down phase.

Branch-based database development: correcting or consolidating migration files

Flyway automatically runs validation checks for the integrity of version files, verifying that the checksums for applied migrations, stored in the schema history table, remain unchanged and warning you if they have. This process will detect any attempted changes to existing migration files that are shared with other branches and have been applied to other databases, when a command other than Clean is executed. If you wish to modify a shared migration file, it should only be done on team agreement, since it will require all team members to Repair, or sometimes to clean and rebuild, any development and test database that use these files.

However, after running the Clean command, but before running any further commands, there is a brief period where the connected database will not have a Flyway schema history table, and so Flyway cannot run these validation checks. When working in isolated branch database, this provides an opportunity to modify any existing SQL migration scripts that are ‘private’ to the branch, correcting errors and mistakes before rebuilding the current branch version, often using a single, simpler migration.

By doing this, a developer can provide a simple ‘narrative’ that excludes the reality of the twists and turns, mistakes, and intuitive leaps that underly progress. In a long career as a database developer, I’ve often discarded a way of implementing a feature, rewritten it, and then decided I was right first time. Source control saves you from any tiresome consequences, and serves as an audit trail of changes, but I’d hate to see this sort of “volte-face” preserved in a chain of migrations. After all, at least beyond the main branch the migration files are the means of understanding the database, not of understanding the struggle to get there.

Similarly, the Clean command gives the database team member an important opportunity, within a branch database, to concatenate a messy series of branch changes into a single, logical migration. Once tested, this single migration can be used to promote the changes to parent branches, for release. This will make it much easier to maintain the integrity and coherence of the database schema over time

Taking precautions

One some environments you’ll want to ‘ban’ the use Clean, and in others put in place some safety nets that make it easier and quicker to recover the last saved state of the database.

Preventing accidental cleans

In Production environments, or any others where use of Clean would be unsafe or undesirable, we can disable it at the project or installation level. Preventing accidental execution of the Clean command elsewhere is best done by adding the line flyway.cleanDisabled=true to the user-level flyway.conf file. This would not stop you using the Clean command, but it would apply a ‘sanity check’ because you’d then have to add a command-line parameter to the Clean command before it would work.

‘beforeClean’ backups and build scripts

We can use a beforeClean callback to automatically back up a database, or perhaps also generate a build script for the current version of the database, before the Clean command itself is executed. This provides a sort of “undo” for a Clean if it was done in error. More generally, it also makes it quicker for a developer, working on a branch database, to ‘back out’ of a mistake, or tear down and reset a database test harness.

The intricacies of cleaning a database

Under the covers, the task of deleting all the objects of a database isn’t straightforward. Some RDBMSs don’t provide an easy way of doing this. To do this manually, you must produce a dependency graph of the database that plots out the dependency chains. Then you delete the objects that aren’t referenced by any object. Then you repeatedly delete objects that are only referenced by deleted objects.

Flyway attempts to do this for you, for any schemas it manages, when you run Clean, but dependencies can be subtle. For example, the Adventureworks sample database has a table that has a CHECK constraint that references another table, and this can occasionally cause the Clean command to fail.

How well did the cleanup go?

If you are running Flyway in a script, you’ll know about errors, but you could easily miss reports and warnings. Flyway will pass back the success or failure of the operation, but you can get warnings from JSON output.

To specify that the output of the ‘clean’ command that would otherwise show as text should be written to a JSON file you would provide the JSON output of the clean command (flyway clean -outputType=json). This will tell you which schemas were dropped and list any warnings in an array.

‘afterClean’ mop up

Occasionally, a Clean command will leave a residue of database objects that can’t be deleted or are deemed by the process to be too risky to delete on account of complex data dependencies. This is where the afterClean.sql callback is so useful – you can add custom code to clean any objects that Flyway cannot do.

Conclusion

The Clean command might seem slightly scary, particularly for anyone who is more familiar with the traditional build-based database development where it isn’t always clear what version a database is at, or how it got there. With Flyway, it is always easy to build a database to the version you want, and a clean command will allow you to become confident that a database can be easily reproduced at any version you need. It also provides the speed of creating test environments and running scripts with them, as well as the freedom to experiment with different strategies in feature branches.

 

Tools in this post

Flyway

DevOps for the Database

Find out more