Product articles Flyway Tracking development changes
Using Database Models and Flyway for…

Using Database Models and Flyway for Automating Routine Development tasks

Database models have all sorts of useful applications during Flyway development to help us automate those repetitive development tasks that otherwise slow down delivery. This article shows how models can help us automate mundane tasks such as generating a build script for any version of a database or deleting the data from every table.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

A model of SQL database is a hierarchical representation of its metadata, providing details of the structure of the tables, views, procedures and functions. We build these models by extracting and collating information from the RDBMS’s metadata views and saving it in markup such as XML or YAML, or in a language such as JSON.

A database model is mostly useful for scripting processes, or for recording, and later referencing, the design of a database at any particular version. If you use a hierarchical model in JSON, it can be accessed by any scripting language such as PowerShell or Python to find out the structure of the database. Even DOS batch, with the help of JQ.exe, can read from it.

I’ve exploited this fact, in my Flyway Teamwork Framework, to provide PowerShell tasks that can generate a JSON model for each new version of the database, and then compare the current model to the previous one to produce a summary of what changed. I’ve described the various ways in which this can help team-based development in Tracking Development Changes using Database Models with Flyway.

As well as allowing us to compare database versions, a model also has more general uses in helping to automate database development tasks. For example, the model provides the information we need to write routines for tasks such as checking that naming conventions are being followed, or that tables are complete, or for extracting documentation. Because the model also contains all the objects dependencies, we can also use it to automate tasks that rely on knowing the correct dependency order. For example, we can use a model to generate a build script that ensures all objects are created in the correct dependency order, or to produce a script that deletes all data from a database in reverse-dependency order.

To illustrate the point, I’ll demo a few of these uses for a model, using the Flyway Teamwork framework. They provide a standardized way to automate this sort of mundane, but necessary, development task, regardless of RDBMS.

Using database models

Generating a SQL Script that does a SQL task for every table

Unfortunately, there is no SQL method in common between the major RDBMSs to provide a simple way of executing a given Transact-SQL statement against each table in the database. Even if there was, tasks that affect a whole set of tables often need to happen in the correct dependency order. For example, if you need to build a database, you can’t create a table that has foreign key references to tables that you haven’t yet created. Conversely, if the task is to delete all the data, you can’t delete data from any table to which other tables with existing data still refer.

Even SQL Server ‘s sp_MSforeachtable can’t ensure a task is done in dependency, or reverse-dependency, order. It is certainly possible to achieve similar functionality for other RDBMSs by using their respective system views, cursors, and procedural languages. Fortunately, though, we don ‘t have to do this. Instead, we can steal the idea behind sp_MSforeachtable and use our database model to provide the dependency order.

As an example, I’ve provided a utility that generates code that does the action that you specify for every table, but in the right dependency order for the operation. It is called Script-ForEachTable. You’ll find it in the Resources directory and is just one of the resources that is installed by ‘Preliminary.ps1‘.

All you need to do is provide the SQL for the operation you want to perform as a template, using ‘??‘ as a placeholder for the actual table names. The utility will read the contents of the JSON model, sort the objects into the required dependency order and then iterate through the tables, in that order, replacing the placeholder with the actual table names. The result is a script that you can simply execute to perform the task. It is only the SQL template we use that needs to change for the RDBMS.

Using this utility, or alternatively by reading the table manifest that the $SaveDatabaseModelIfNecessary task in the Flyway Teamwork framework already generates, we can run these operations much more simply, as well as performing more general tasks, such as listing the number of rows in each table.

Deleting the data from every table

The following code uses Script-ForEachTable to auto-generate a script that will delete the contents of all the tables in a SQL Server database, without having to disable or delete any constraints:

This will provide the following SQL for every table, doing each table in reverse dependency order:

The reason that we are deleting data in smaller batches is that it is useful where large deletions might cause lock contention and other performance issues. The optimal batch size for your RDBMS and data is likely to be different.

Other RDBMSs will have different syntax. PostgreSQL doesn’t allow a LIMIT in a DELETE statement, so it is a bit more awkward.

In Oracle, you can achieve a similar result by using PL/SQL and the BULK COLLECT feature to fetch and delete data in chunks.

Count the number of rows in every table

Once you have a utility like this, it comes in handy for other database chores. How about this, for example, as a template for generating SQL scripts to count the number of rows in each table of a database:

A fragment of the script it generates would look like this:

And it would produce a report like this ..

dbo.titleauthor has 1499 rows
dbo.TagTitle has 1000 rows
dbo.sales has 21 rows
dbo.roysched has 86 rows
dbo.editions has 2600 rows
…etc…

Generating build scripts from a model

If you need a build script for each new database version, you can set a Flyway callback to generate the script for you, automatically, using whatever tool the RDBMS provides. The Flyway Teamwork Framework, along with a $SaveDatabaseModelIfNecessary task that provides the manifest that tells you the order for doing the work, also has a $CreateBuildScriptIfNecessary task that provides object-level build scripts for each migration.

Together, this give you everything you need to generate a build script but, of course, it requires a connection to a live database. If you don’t have access to the required version of the database, you can instead generate the build script from the database model. The Convert-modelToScript cmdlet, included with the Framework, will parse a model, and create from it a build script. You’ll find it in the Resources directory.

The following example generates a build script from the model for V1.3 of the Pubs database:

I get a script that I can then check against my build script to see what’s missing, or I can use to create a database that I can compare with the original. Here is a sample of the code generated…

At the moment, it is really just an elaborate test harness for the framework, to verify or inspect a model. For example, I can generate a build script from the model and use the script to create a database. Then, using SQL Compare, I compare the new database to the one that generates the model to ensure they’re identical. Because the code of all DML and procedural code of the routines and views is in the model, this is easier than it sounds, but it is one of those background maintenance jobs. I haven’t yet attempted to conform with individual dialects of all supported RDBMSs because I’ve not got independent schema comparison tools to check against.

Although JSON database object models are very easy to read, they will inevitably still contain a few SQL expressions. Many database objects, such as views, procedures, functions and some constraints cannot be represented without them. This means, for example, that we can’t interrogate the JSON definition of a view to work out the tables on which it depends. However, there are simple ways round these sorts of problems. To determine what tables are used in SQL expressions or batches, the Framework provides a generic SQL Tokenizer that can be used to give you a list of tables referenced within the scripted object such as a view, function or procedure.

Creating a style checker for tables

I’ve also created a Find-TableSmells utility, which is essentially a table style checker for database models. At the moment it only runs with SQL Server, but that is only because I needed it for a SQL Server database that needed tidying up. I’ve yet to update all RDBMSs to the new standard.

On the original Pubs Database, it is rather sniffy about the tables that have no primary keys but otherwise it finds little to criticize.

The dbo.roysched heap has no suitable candidate for a primary key
The dbo.roysched table has a foreign Key name  'FK__roysched__title___15502E78 ' that looks auto-generated
The dbo.pub_info table has a foreign Key name  'FK__pub_info__pub_id__20C1E124 ' that looks auto-generated
The dbo.titles table has no documentation
The dbo.titles table has a foreign Key name  'FK__titles__pub_id__08EA5793 ' that looks auto-generated
The dbo.discounts heap has no index
The dbo.discounts table has a foreign Key name  'FK__discounts__stor___173876EA ' that looks auto-generated
The dbo.sales table has a column name  'qty ' that is incomprehensible

Conclusion

One problem I had when changing from its SQL Server origins to a multi-RDBMS system was the work of scaling out the facilities that are needed for database development. Although it was relatively easy, and quite intriguing to do the first three RDBMSs, it then became a chore. All the various tasks seem to boil down to getting information about the database. If information can be put in a standard model and then that model is used as a source of information for the tasks, the complexity drops away. The same is true of database development, even though the context is different. If diagramming, change-reports, quality-control, searching and so on can be done from a model rather than just the current development ‘state’ how much easier the work can become, and how much easier to understand how and when changes were made.

Tools in this post

Flyway

DevOps for the Database

Find out more