18 January 2023

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Phil Factor

18 January 2023

Undoing Actions on Groups of Database Tables

During development you'll occasionally need an undo script that drops a group of tables, or you might need to truncate a group of tables and then insert fresh data in order to run some tests. Unless you perform the required actions in the correct dependency order, you'll be tripped up by foreign key constraint violations. This article provides a SQL function that returns the list of tables in the correct dependency order.

Guest post

One of the good things that comes with Flyway Teams is the undo file. It is intended for developers who are working on database features, usually in a branch of the database development work, and who want to try out different ways of creating a new feature to see which works best. If you find yourself doing this, you will soon need to undo changes, not only to single objects but to whole group of tables.

If your undo script needs to drop a group of tables, for example, then you’ll need to do it in the correct dependency order. The same requirement exists for other tasks too: if you need to run tests on a group of tables then you’ll probably need to empty them of all existing data first and then insert fresh test data, and these operations must also be done in the correct dependency order.

The basic requirement for these tasks is a manifest that lists the correct dependency order of tables. Initially, I showed how to create a manifest in SQL, so that I could build a database directly from object-level scripts. I then adapted the $SaveDatabaseModelIfNecessary script task in my Flyway Teamwork framework so that it generates the table manifest for a range of RDBMSs, from the database JSON model. I’ve demonstrated how to use it to load and delete datasets.

Here, however, we want to work with only a defined group of tables, and we don’t relish the dependency of needing to import a current manifest. The goal is to have a SQL function: we provide it a parameter specifying a list of tables that we want to drop, for example, and it provides our list of tables in the correct order, as a result. We can then use it to do the DROP commands in the right order in our Flyway undo script. The example in this article is SQL Server specific though the technique should be easily adaptable to most other RDBMSs.

Why have UNDO scripts?

My initial reaction to the idea of UNDO files was that it was a lot of work. My next reaction was that it would be just the sort of thing that would be easier to do with a tool that produced a synch script. I’ve now recovered from both reactions and have come to like the discipline of creating UNDO files for every migration file. It is just a minor chore, like documentation, that is an immense help to other team-members, or even myself in the future.

I’m sure I’m not the only one who looks at my old code and will immediately think of a much better way of doing it. If I can undo, and then redo, a part of the work before it hits the development branch, then the whole process feels a lot more relaxed. Maybe I’m just impatient, but my pleasure isn’t in building things but getting good quality results rapidly, and ditching changes quickly the moment you find a better way. Without this sort of culture, the building of a database can become too much like a sort of mystical quasi-religious experience.

The problem with this more relaxed approach is when you extend it beyond the sweet isolation of a branch. What you can do may quickly affect other work. To avoid this, you need an unchanging or ‘versioned’ interface for each feature in the form of views and routines, to allow you to make changes, or even switch the feature on and off. It isn’t a new or radical idea, but probably requires some ingenuity if using SQLite or MySQL.

Tackling an UNDO script for a group of tables in a database

The undo script isn’t challenging to do. However, undoing changes to a group of tables can become a chore because you have to do it in a particular order, so that you never try to drop a table that is being referenced by an existing table. In undo scripts you are more likely to be required to drop tables altogether.

Your first impulse might be to list out the tables you need, and deal with each in turn. Bad idea: you may get lucky, but you’re more likely to get an error when you try to, for example, delete all the rows from a table and the table you’re cleaning out is being referred to by another table that you haven’t cleaned out yet.

No worries, you think, just DROP the FOREIGN KEY constraints first, and then reinstate them afterwards. You just list their names and drop each one in turn, and after the operation use the list to reinstate then. The risk here is that FOREIGN KEY constraints don’t need names, but instead can be, and usually are, assigned random names generated by the relational database system you’re using. Your hand-crafted artisanal routine needs the name of the constraints. They will work now but won’t work next time you need to perform the same operation because all the FOREIGN KEY constraints have been assigned different names!

The obvious answer is to name all your constraints explicitly but good luck with that one, especially if you’re faced with those colleagues who seem to find it to be an intolerable invasion of their human rights to be compelled to use constraints at all.

How about just disabling all the constraints? Well, DISABLE isn’t part of a SQL Standard, and most RDBMSs don’t seem to allow it. SQL Server is an exception in the way that it allows all constraints in a table to be disabled with a single command. However, when you need to drop tables, SQL Server won’t let you drop the referenced table even if you disable the constraints in all the referencing tables.

A “Dependency Order” function: populate, truncate or drop a group of tables

The perfect requirement for this sort of task arose recently, I was developing a set of accounting tables, as an extension to the venerable Pubs database.

Accounting tables

You can find the Flyway migration script to build the group of accounting tables, and the undo script to remove them all in the right order, in the develop branch of my Pubs project on GitHub.

In the early stages, I found I constantly needed to tear down all the tables, make a few changes to my Flyway migration script and try again. At other times I might need to delete or insert all or some of the data from a group of tables. This is a common task, and one that you’ll face when running any serious integration test on a development database. For work of this nature, you first need to clear out the existing data for all the tables, before loading them with your test data.

I decided to write a SQL function that could perform, or just print out the commands for, any of these tasks. It determines the table dependency order, meaning the ‘natural order’ in which we need to drop them, or delete all their data or insert into them and lists them in that order. First to be dealt with are the tables that no other tables reference via FOREIGN KEY constraints; then the tables that are no longer referenced because you’ve just dealt with the ones that reference them…and so on until all your tables are done.

What if a table still can’t be processed? This will happen if you have tables that reference each other. I’ve never experienced this and hope never to do so, but the solution here would be to determine the name of all the constraints in the circle and drop them.

Here is a SQL Server routine that creates a script that you can use as a basis for the function or batch that does the work. In this example, we provide the function with the list of tables, either a specific schema.table list or just a wildcard list such as "accounting.%", and it returns the list of tables in the order in which we need to process them in order avoid any foreign key constraint errors. Once you have a procedure of function that works for your particular requirements, the logical place for it is in your Utilities schema.

CREATE OR ALTER FUNCTION [dbo].[TablesInDependencyOrder] (@TheList NVARCHAR(400))

RETURNS @TheOrderOfDoingIt TABLE /* if possible, we end up with a list of the tables

with the order in which to delete them so that there are no foreign key constraint

errors. */

(TheOrder INT IDENTITY,

Tablename sysname NOT NULL

UNIQUE,

TheObject_id INT NOT NULL,

comment NVARCHAR(300) NULL)

BEGIN

IF (@TheList IS NULL OR Trim(@TheList) = '' ) SELECT @TheList ='["%.%"]'

DECLARE @TablesToBeTackled TABLE /* This is what is inout, A list of the tables

that we want to delete */

(Tablename sysname

UNIQUE NOT NULL,

TheObject_id INT NOT NULL);

INSERT INTO @TablesToBeTackled (Tablename, TheObject_id)

SELECT DISTINCT Schema_Name (schema_id) + '.' + name, object_id

FROM

sys.tables

INNER JOIN OpenJson (@TheList)

ON Object_Schema_Name (object_id) + '.' + name LIKE Value;

/* we start by inserting into the list all those tables from the list

that we want to delete from or drop that aren't referenced by any other

table */

INSERT INTO @TheOrderOfDoingIt (Tablename, TheObject_id)

SELECT TheDeleted.Tablename, TheDeleted.TheObject_id

FROM

@TablesToBeTackled AS TheDeleted

LEFT OUTER JOIN sys.foreign_key_columns AS TheReferenced

ON TheReferenced.referenced_object_id = TheObject_id

WHERE TheReferenced.referenced_object_id IS NULL;

/* we now iterate through the list, assuming the ones in the list

are deleted or dropped, adding to the list those tables not would no longer

be referenced by any surviving table. Normally, unless you are

doing something clever, evil or both. This will end up with two

temporary tables with the identical tables, but in a different

order, with the @TheOrderOfDoingIt tables containing the order

in which they should be deleted or dropped

DECLARE @ii INT = 0;

WHILE (@ii < 12) --execute this loop up to twelve times before giving up

BEGIN

INSERT INTO @TheOrderOfDoingIt (Tablename, TheObject_id)

SELECT f.TheTable, f.referenced_object_id

FROM

(SELECT fks.referenced_object_id,

Object_Schema_Name (fks.referenced_object_id) + '.'

+ Object_Name (fks.referenced_object_id) AS Thetable,

Object_Schema_Name (fks.parent_object_id) + '.'

+ Object_Name (fks.parent_object_id) AS referencedBy,

CASE WHEN ref.TheOrder IS NULL THEN 1 ELSE 0 END

AS ReferencedIsntInTheOrderTable,

CASE WHEN tab.TheOrder IS NULL THEN 1 ELSE 0 END

AS TableIsntInTheOrderTable

FROM

sys.foreign_key_columns fks

INNER JOIN @TablesToBeTackled DelSquad

ON fks.referenced_object_id = DelSquad.TheObject_id

LEFT OUTER JOIN @TheOrderOfDoingIt ref

ON ref.TheObject_id = fks.parent_object_id

LEFT OUTER JOIN @TheOrderOfDoingIt tab

ON tab.TheObject_id = fks.referenced_object_id)

f(referenced_object_id, TheTable, ReferencedBy,

ReferencedIsntInTheOrderTable, TableIsntInTheOrderTable)

GROUP BY

f.referenced_object_id, f.TheTable

HAVING

Sum (f.ReferencedIsntInTheOrderTable) = 0

AND Sum (f.TableIsntInTheOrderTable) > 0;

IF @@RowCount = 0 SELECT @ii = 12;

SELECT @ii = @ii + 1;

END;

IF EXISTS

(SELECT 1

FROM

@TheOrderOfDoingIt TOODI

FULL OUTER JOIN @TablesToBeTackled TTBT

ON TTBT.Tablename = TOODI.Tablename

WHERE

TTBT.Tablename IS NULL OR TOODI.Tablename IS NULL)

BEGIN

DELETE FROM @TheOrderOfDoingIt;

INSERT INTO @TheOrderOfDoingIt (Tablename, TheObject_id, comment)

VALUES

('', -- Tablename - sysname

0, -- TheObject_id - int

'Sorry, but there is either a mutual reference or a table outside the list is referencing one of these tables so they can''t be deleted this way.' -- comment - nvarchar(300)

);

END;

RETURN;

END;

And here is how we’d use this function to prepare strings that can be executed in a Flyway UNDO file to drop all the accounting tables:

SELECT 'PRINT N''Dropping ' + Tablename + ''';

DROP table ' + Tablename + ';

GO'

FROM TablesInDependencyOrder ('["accounting.%"]')

ORDER BY TheOrder;

The text output is as follows:

PRINT N'Dropping accounting.Invoice_Lines';

DROP table accounting.Invoice_Lines;

PRINT N'Dropping accounting.Received_Money_Lines';

DROP table accounting.Received_Money_Lines;

PRINT N'Dropping accounting.Bill_Lines';

DROP table accounting.Bill_Lines;

<…output cropped for brevity…>

If, instead, we want the commands to empty these tables of all existing data, then just change the DROP TABLE command to DELETE FROM.

If you just want a list of all the tables in the database, in the correct dependency order, then you can use:

1	SELECT * FROM TablesInDependencyOrder (null) ORDER BY TheOrder

A Cross-RDBMS Solution?

If you wish to create a direct translation to other RDBMSs, you will see the tricks used for the SQL Server solution and roll your eyes. They are unlikely to translate.

As I mention earlier, the $SaveDatabaseModelIfNecessary script block task in the Flyway Teamwork (which you can find in the DatabaseBuildAndMigrateTasks.ps1 scriptblock collection) provides a cross-RDBMS solution for creating a table manifest for each version of the database.

This manifest is equivalent to using the routine I’ve provided here without a parameter – you list out all the tables in the dependency order. All you need to do is to select just the tables you need from the list. However, you must make sure that there are no other tables outside your selection that are dependent on them, otherwise you’ll get an error.

The problem is that, however you do it, it is an iterative process; and SQL doesn’t naturally allow this. Also, you would need to generate the manifest for the version you are undoing, and import it into the SQL Script. However, it is easy to do in a script such as PowerShell, but to cover the process for several RDBMSs in this article would make it rather unwieldy.

Conclusions

Doing things to the data in a lot of tables can be a daunting task unless you do it in a particular order. Ironically, it is one of those good things that you add to the table structures to protect the data, namely the foreign key constraints, that are likely to trip you up.

If you add a row to a table that refers to data in another table, and it isn’t there, then the constraint will fire. If you delete a row in a table when a foreign key is referencing the row, the constraint that guards that relationship will let you know. If you drop a table, when it is being referenced by another table, likewise the database won’t let you. For almost all the time, these constraints save you from bad data, which is the worst evil in a database. Just occasionally, it proves awkward, but it proves easy once you know the right order of doing things.

Redgate Test Data Manager

Flyway

Redgate Monitor

Overview

Test Data Management

Automate

Monitor

Product articles Flyway Undo migrations
Undoing Actions on Groups of Database…

Guest post

Undoing Actions on Groups of Database Tables

Guest post

Why have UNDO scripts?

Tackling an UNDO script for a group of tables in a database

A “Dependency Order” function: populate, truncate or drop a group of tables

A Cross-RDBMS Solution?

Conclusions

Tools in this post

Flyway

Flyway Teams

Flyway Teams

Product Articles

University

Events

Forums

Community

Simple Talk

Redgate Test Data Manager

Flyway

Redgate Monitor

Overview

Test Data Management

Automate

Monitor

Product articles Flyway Undo migrations Undoing Actions on Groups of Database…

Guest post

Guest post

Why have UNDO scripts?

Tackling an UNDO script for a group of tables in a database

A “Dependency Order” function: populate, truncate or drop a group of tables

A Cross-RDBMS Solution?

Conclusions

Tools in this post

Flyway

Flyway Teams

Flyway Teams

You may also like

Redgate Hub

Product Articles

University

Events

Forums

Community

Simple Talk

Product articles Flyway Undo migrations
Undoing Actions on Groups of Database…