Product articles
Flyway
Database Dependencies
Undoing Actions on Groups of Database…

Undoing Actions on Groups of Database Tables

During development you'll occasionally need an undo script that drops a group of tables, or you might need to truncate a group of tables and then insert fresh data in order to run some tests. Unless you perform the required actions in the correct dependency order, you'll be tripped up by foreign key constraint violations. This article provides a SQL function that returns the list of tables in the correct dependency order.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

One of the good things that comes with Flyway Teams is the undo file. It is intended for developers who are working on database features, usually in a branch of the database development work, and who want to try out different ways of creating a new feature to see which works best. If you find yourself doing this, you will soon need to undo changes, not only to single objects but to whole group of tables.

If your undo script needs to drop a group of tables, for example, then you’ll need to do it in the correct dependency order. The same requirement exists for other tasks too: if you need to run tests on a group of tables then you’ll probably need to empty them of all existing data first and then insert fresh test data, and these operations must also be done in the correct dependency order.

The basic requirement for these tasks is a manifest that lists the correct dependency order of tables. Initially, I showed how to create a manifest in SQL, so that I could build a database directly from object-level scripts. I then adapted the $SaveDatabaseModelIfNecessary script task in my Flyway Teamwork framework so that it generates the table manifest for a range of RDBMSs, from the database JSON model. I’ve demonstrated how to use it to load and delete datasets.

Here, however, we want to work with only a defined group of tables, and we don’t relish the dependency of needing to import a current manifest. The goal is to have a SQL function: we provide it a parameter specifying a list of tables that we want to drop, for example, and it provides our list of tables in the correct order, as a result. We can then use it to do the DROP commands in the right order in our Flyway undo script. The example in this article is SQL Server specific though the technique should be easily adaptable to most other RDBMSs.

Why have UNDO scripts?

My initial reaction to the idea of UNDO files was that it was a lot of work. My next reaction was that it would be just the sort of thing that would be easier to do with a tool that produced a synch script. I’ve now recovered from both reactions and have come to like the discipline of creating UNDO files for every migration file. It is just a minor chore, like documentation, that is an immense help to other team-members, or even myself in the future.

I’m sure I’m not the only one who looks at my old code and will immediately think of a much better way of doing it. If I can undo, and then redo, a part of the work before it hits the development branch, then the whole process feels a lot more relaxed. Maybe I’m just impatient, but my pleasure isn’t in building things but getting good quality results rapidly, and ditching changes quickly the moment you find a better way. Without this sort of culture, the building of a database can become too much like a sort of mystical quasi-religious experience.

The problem with this more relaxed approach is when you extend it beyond the sweet isolation of a branch. What you can do may quickly affect other work. To avoid this, you need an unchanging or ‘versioned’ interface for each feature in the form of views and routines, to allow you to make changes, or even switch the feature on and off. It isn’t a new or radical idea, but probably requires some ingenuity if using SQLite or MySQL.

Tackling an UNDO script for a group of tables in a database

The undo script isn’t challenging to do. However, undoing changes to a group of tables can become a chore because you have to do it in a particular order, so that you never try to drop a table that is being referenced by an existing table. In undo scripts you are more likely to be required to drop tables altogether.

Your first impulse might be to list out the tables you need, and deal with each in turn. Bad idea: you may get lucky, but you’re more likely to get an error when you try to, for example, delete all the rows from a table and the table you’re cleaning out is being referred to by another table that you haven’t cleaned out yet.

No worries, you think, just DROP the FOREIGN KEY constraints first, and then reinstate them afterwards. You just list their names and drop each one in turn, and after the operation use the list to reinstate then. The risk here is that FOREIGN KEY constraints don’t need names, but instead can be, and usually are, assigned random names generated by the relational database system you’re using. Your hand-crafted artisanal routine needs the name of the constraints. They will work now but won’t work next time you need to perform the same operation because all the FOREIGN KEY constraints have been assigned different names!

The obvious answer is to name all your constraints explicitly but good luck with that one, especially if you’re faced with those colleagues who seem to find it to be an intolerable invasion of their human rights to be compelled to use constraints at all.

How about just disabling all the constraints? Well, DISABLE isn’t part of a SQL Standard, and most RDBMSs don’t seem to allow it. SQL Server is an exception in the way that it allows all constraints in a table to be disabled with a single command. However, when you need to drop tables, SQL Server won’t let you drop the referenced table even if you disable the constraints in all the referencing tables.

A “Dependency Order” function: populate, truncate or drop a group of tables

The perfect requirement for this sort of task arose recently, I was developing a set of accounting tables, as an extension to the venerable Pubs database.

Accounting tables

You can find the Flyway migration script to build the group of accounting tables, and the undo script to remove them all in the right order, in the develop branch of my Pubs project on GitHub.

In the early stages, I found I constantly needed to tear down all the tables, make a few changes to my Flyway migration script and try again. At other times I might need to delete or insert all or some of the data from a group of tables. This is a common task, and one that you’ll face when running any serious integration test on a development database. For work of this nature, you first need to clear out the existing data for all the tables, before loading them with your test data.

I decided to write a SQL function that could perform, or just print out the commands for, any of these tasks. It determines the table dependency order, meaning the ‘natural order’ in which we need to drop them, or delete all their data or insert into them and lists them in that order. First to be dealt with are the tables that no other tables reference via FOREIGN KEY constraints; then the tables that are no longer referenced because you’ve just dealt with the ones that reference them…and so on until all your tables are done.

What if a table still can’t be processed? This will happen if you have tables that reference each other. I’ve never experienced this and hope never to do so, but the solution here would be to determine the name of all the constraints in the circle and drop them.

Here is a SQL Server routine that creates a script that you can use as a basis for the function or batch that does the work. In this example, we provide the function with the list of tables, either a specific schema.table list or just a wildcard list such as "accounting.%", and it returns the list of tables in the order in which we need to process them in order avoid any foreign key constraint errors. Once you have a procedure of function that works for your particular requirements, the logical place for it is in your Utilities schema.

And here is how we’d use this function to prepare strings that can be executed in a Flyway UNDO file to drop all the accounting tables:

The text output is as follows:

If, instead, we want the commands to empty these tables of all existing data, then just change the DROP TABLE command to DELETE FROM.

If you just want a list of all the tables in the database, in the correct dependency order, then you can use:

A Cross-RDBMS Solution?

If you wish to create a direct translation to other RDBMSs, you will see the tricks used for the SQL Server solution and roll your eyes. They are unlikely to translate.

As I mention earlier, the $SaveDatabaseModelIfNecessary script block task in the Flyway Teamwork (which you can find in the DatabaseBuildAndMigrateTasks.ps1 scriptblock collection) provides a cross-RDBMS solution for creating a table manifest for each version of the database.

This manifest is equivalent to using the routine I’ve provided here without a parameter – you list out all the tables in the dependency order. All you need to do is to select just the tables you need from the list. However, you must make sure that there are no other tables outside your selection that are dependent on them, otherwise you’ll get an error.

The problem is that, however you do it, it is an iterative process; and SQL doesn’t naturally allow this. Also, you would need to generate the manifest for the version you are undoing, and import it into the SQL Script. However, it is easy to do in a script such as PowerShell, but to cover the process for several RDBMSs in this article would make it rather unwieldy.

Conclusions

Doing things to the data in a lot of tables can be a daunting task unless you do it in a particular order. Ironically, it is one of those good things that you add to the table structures to protect the data, namely the foreign key constraints, that are likely to trip you up.

If you add a row to a table that refers to data in another table, and it isn’t there, then the constraint will fire. If you delete a row in a table when a foreign key is referencing the row, the constraint that guards that relationship will let you know. If you drop a table, when it is being referenced by another table, likewise the database won’t let you. For almost all the time, these constraints save you from bad data, which is the worst evil in a database. Just occasionally, it proves awkward, but it proves easy once you know the right order of doing things.

 

Tools in this post

Flyway

DevOps for the Database

Find out more

Flyway Teams

Ideal for organizations looking to improve collaboration and fine tune their processes during development and the deployment of database changes.

Find out more