Product articles
Flyway
Database Dependencies
The Uses of Dependency Information in…

The Uses of Dependency Information in Database Development

Dependency information will allow you to avoid errors during a database build or tear-down, by ensuring you create or remove objects in the right order. It will also help you to avoid future 'invalid object' errors, because it will allow you to check that no database alterations have introduced broken references, during Flyway migrations.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Whether you are using Flyway, or a more state-based approach, it is easy to become exasperated with the task of running scripts that create or alter databases. In particular, the database’s ‘safety net’ errors, triggered to prevent you breaking references between objects, are a common source of frustration. These errors will stop the build script and, unless you run the entire script within a transaction, you will be faced with a mopping-up task to roll back any changes. One of the virtues of Flyway is that it will run each script in a transaction, if the RDBMS you use allows it, and force a rollback on any error. This leaves the result at the previous version.

I’ll show how to use database dependency information (sys.sql_expression_dependencies system catalog view) to avoid errors during database build, database tear-down or object-level builds, and to avoid accidentally breaking references during Flyway migrations, leading to ‘invalid object’ errors.

Avoiding execution errors in Data-definition Language (DDL) scripts

When you’re building a database, or making changes to it, you’ve got to do things in the right order. For example, whenever SQL Server executes a CREATE or ALTER statement for a table, it checks all references that it makes via constraints to any other objects. It will raise an “invalid object” error on the referencing object if the referenced object doesn’t yet exist.

This most often happens if the relationship between tables is enforced by a foreign key constraint. Even though you know that you’re about to define that missing table that is referred to in a constraint, the database doesn’t, so it gives an error. Databases, annoyingly, deal with objective reality rather than ‘your truth’. Things must be done in the correct sequence: You can’t, for example, do much else in a database until you have your schemas created.

When working out the correct place in a build for tables to be created, isn’t just foreign key constraints that can trip you up. You might have, in SQL Server, a computed column or check constraint that uses a function. If that function isn’t yet created, your build is toast. The same goes for user-defined types.

You’ll face a similar problem in the process of dropping database objects: you can’t drop an object if there are other objects still referring to it. I use the word ‘objects’ because it isn’t just a matter of creating or dropping each type of object in the right order, so you do ‘user types’ first, then tables, then views and so on. It will often work out, but not always. Some database systems, such as SQL Server, will allow tables to have constraints that contain schema-bound user functions. That means that the user function can’t be deleted until the table is deleted.

All of us who develop relational database will have experienced the difficulties of getting the order of making changes right. Those of us who are old enough to have created build scripts without handy tools will know that you only add a table when all the tables that it references via foreign keys are already created. As a more general rule, don’t create an object until you’ve created the objects to which it refers.

Many RDBMSs allow you to break this rule, but it is still a good way of working that makes it easier to read a script. Otherwise, the experience of reading a script becomes like watching an irritating film that gets into the action without giving the audience the slightest clue as to who the various characters are, and how they relate.

Another way of breaking the rule in build scripts is to create all the tables first and then subsequently alter them to add the foreign key constraints. That works fine but if you are used to ‘reading’ table creation scripts, it is confusing to have to look outside of the CREATE TABLE statement to find all the ALTER statements that complete the ‘story’ about the table. To write code that is easy to maintain, all keys and constraints should be contained within the table definition.

Don’t even think about leaving out constraints because they cause hassle. Foreign key constraints are essential for databases. Although they complicate CREATE and DROP DDL code, they ‘shift-left’ the errors into development where they are, perhaps irritating, but harmless. Above all else they are our best defense against bad data, but they also help performance by informing the query optimizer and they cause the supporting indexes to be created.

Detecting broken ‘soft references’ (a.k.a Avoiding ‘invalid object’ errors)

We must clarify what is meant by ‘hard’ and ‘soft’ dependencies. Hard dependencies occur in tables and are defined explicitly by a foreign key constraint. Soft dependencies happen when you refer, within an object, to another object. You’ll get an immediate error from a breaking a hard dependency, or hard reference, because it is enforced by a constraint. However, broken soft references can sneak into non-schema-bound objects, such views, table-valued functions, triggers or stored procedures, without you noticing.

Let’s say you’ve created a module (e.g., a view, procedure or function) that references a table, and then someone removes or renames an object, such as a column, that the module references. Depending on your RDBMS, you may not hit the “invalid object” errors until the module is next used.

Some RMBMSs have bespoke ways of checking for broken soft references, in SQL Server’s case by using the sp_RefreshSQModule system stored procedure, as I explain in Checking for Missing Module References in a SQL Server Database Using Flyway. PostgreSQL has metadata that will do the same thing (see here)

Some also allow you to protect these ‘soft references’ by using schema binding for a routine. If you have a SQL Server routine that relies on the existence of, or attributes of, a column in a table, you can use WITH SCHEMDABINDING to ensure that the referenced table can’t be removed, or altered in a way that would break these references. If you attempt to do so, you’ll get a ‘binding’ error. Schema binding will protect all objects that use the schema-bound routine. Where you can refer to a schema-bound user-defined function in a computed column of a table (PostgreSQL’s generated columns don’t allow this), that function must be deleted after the table.

If you accidentally prevent a SQL expression from working in, say, a view, the RDBMS will know about it, but is assuming that you’ll put things right before the view is next used. If not, then the view will error out. If you are executing SQL directly from an application, then you’re at a huge disadvantage because you can’t be certain that the object you’ve changed or deleted isn’t being occasionally used in SQL DML from the application. It is much better for an application to use just an interface to the database, using views, procedures, and functions, so that you know for certain the objects that can be accessed).

We can, in SQL Server, detect these missing ‘soft dependencies’ using the sys.sql_expression_dependencies system catalog view (we can do the equivalent in other RDBMSs). We can illustrate what can go wrong and show how to detect these ‘invalid objects’ when we create them. We’ll create a table and a function that references it, and then do some unkind things to see what happens.

This gives the following results

---- both objects created. DeleteThisTestFunction is dependent on DeleteThisTestTable
---- We execute the function
(0 rows affected)
---- Hmm. That works. We now test to make sure there are no broken dependencies
(0 rows affected)
---- We now drop the table so that the function will produce an error
---- Has SQL Server found the broken dependency?
(1 row affected)
---- We re-execute the function. This should produce a binding error
Msg 208, Level 16, State 1, Procedure DeleteThisTestFunction, Line 8 [Batch Start Line 41]
Invalid object name 'dbo.DeleteThisTestTable'.
Msg 4413, Level 16, State 1, Line 42
Could not use view or function 'dbo.DeleteThisTestFunction' because of binding errors.

---- Yup. Error. We now drop the function
---- The broken 'soft' dependency has disappeared.
(0 rows affected)
Now we create the same function but with schema binding
Msg 3729, Level 16, State 1, Line 74
Cannot DROP TABLE 'dbo.DeleteThisTestTable' because it is being referenced by object 'DeleteThisTestFunction'.
Msg 3729, Level 16, State 1, Line 72
Cannot DROP TABLE 'dbo.DeleteThisTestTable' because it is being referenced by object 'DeleteThisTestFunction'.
Msg 3705, Level 16, State 1, Line 79
Cannot use DROP FUNCTION with 'dbo.DeleteThisTestTable' because 'dbo.DeleteThisTestTable' is a table. Use DROP TABLE.
Completion time: 2022-08-03T15:11:14.7396017+01:00(0 rows affected)

As you'll see from the result, the RDBMS knows when there is an unsatisfied reference but isn't going to rear up on its hind legs unless you attempt to use it (delayed compilation). You'll also see the effects of adding schema binding. It means that the table cannot be deleted while the function is referring to it.

If one of our Flyway migrations scripts accidentally breaks a soft reference made by a non-schema-bound object then, even though SQL Server knows about it, we won't find out about it until a user runs the broken function. To avoid this, it's best to run a check for broken soft references such as the one for SQL Server described here, as part of the migration, and roll it back if any are detected.

The tear-down

Take, as an example, a problem where you need to tear down a database that you’ve used for a test. Real-life examples are often more complicated because you may need to allow certain schemas or objects such as tables to be retained. Again, we’ll use SQL Server, because it is easy to find soft dependencies, either schema-bound or non-schema-bound, using sys.sql_expression_dependencies. In my DropAllObjects procedure, below, the first objects to be deleted are those to which no other object refers. Having done so, we find that there are now more objects to which no other object refers (because we just deleted the referring objects). We just repeat until all the objects are deleted. This stored procedure will even delete itself, which is very thorough. To try it out, restore a copy of AdventureWorks, or whatever sample database you have, and double-check that you are logged into the right database and not accidentally logged into the company’s payroll database as admin.

Flyway does its best to perform the same trick with its Clean command and it works well with most of my sample databases. However, as I write this, it fails with AdventureWorks, SQL Server’s sample database, due to SQL Server’s useful schema-binding feature.

Building from object-level scripts

One of the most important uses for dependency information about a database is for building databases from object-level scripts. For just building a database, it is easier to generate a single-file build script, but sometimes it is more convenient to do it from object-level source.

The migration-based approach to building databases, used by Flyway, has many advantages, but you lose the useful feature of committing object-level source to source control, which means that you can no longer track changes to individual objects over time. This is mainly useful for table, which will often get altered, in several places, in a sequence of migrations. Where, for example, a table has a change of index, this will then be picked up as a change in source control.

Redgate’s UI that combines Flyway and schema comparison functionality, called Flyway Desktop, also maintains the object-level source as well as migration scripts. Flyway command line allows you to get over this by supporting callbacks that allow you to save an object-level source for the entire database, after every migration run. I’ve included a $CreateScriptFoldersIfNecessary task in my Flyway Teamwork framework that generates the object-level scripts (see Creating Database Build Artifacts when Running Flyway Migrations).

When you write out an object-level script using the tools provided to go with the RDBMS, it is usual with SQL Server to provide a manifest. You don’t need this if you have Schema Compare for Oracle or SQL Compare for SQL Server because they can work with this source directly, to build a database – but for other RDBMSs, you would need to do it manually. There are plenty of uses for a having a manifest generated, every time you migrate a database to a new version. I’ll explore these in a later article.

A manifest is just a list of all the files that must be executed to build the database from the object-level scripts. Normally, the manifest is generated via SMO, the programmatic interface into SSMS. However, it is a lot quicker and easier to do this in SQL. Unless you have a Redgate Compare tool, you are forced to prepare your manifest from the live database. My Flyway Teamwork has a script task ($SaveDatabaseModelIfNecessary) that generates a table manifest for you from the JSON model of the database.

Unlike a schema compare tool, this type of manifest cannot deal with a build if you add object script files or make changes that change the dependencies. Here is the code to generate a manifest to go with your object-level source. It relies on you creating the schemas and types first, as it is only concerned with order of building schema-based objects.

When calling this in AdventureWorks, using the following expression …

…it will give …

A manifest for building a database from object-level scripts

Summary

The way that relational databases police dependencies and references can strike fear into the heart of a database novice. It is the same terror with which a child might face a life-saving inoculation. To the experienced database developer, the terror of bad data and broken references or dependencies is far, far greater, and we therefore apply constraints to guard against even the most unlikely of events. If you neglect this, fate has a way of illustrating your carelessness with a public and humiliating disaster.

The disentangling of the complex interrelationships between database objects is therefore a soothing activity, like mending your safety net. A slight inconvenience during a build, deletion or alteration, involving the rollback of a migration, is a small price to pay for a powerful defense against bad data or broken processes.

Manifests, which are ordered lists of database objects, originally were just used for building databases from object-level script files. However, they have a surprising number of uses. Flyway Teamwork can generate table manifests, which just include tables, for all the RDBMSs that it supports. These are valuable for any tasks that involve all or most of your tables, such as importing data, exporting data, clearing out data, finding out how many rows there are in each table and so on. From this introduction, we’ll explain all this in more detail when we demonstrate how to change your datasets.

 

Tools in this post

Flyway

DevOps for the Database

Find out more