Using Database Models and Flyway for Automating Routine Development tasks
Database models have all sorts of useful applications during Flyway development to help us automate those repetitive development tasks that otherwise slow down delivery. This article shows how models can help us automate mundane tasks such as generating a build script for any version of a database or deleting the data from every table.
A model of SQL database is a hierarchical representation of its metadata, providing details of the structure of the tables, views, procedures and functions. We build these models by extracting and collating information from the RDBMS’s metadata views and saving it in markup such as XML or YAML, or in a language such as JSON.
A database model is mostly useful for scripting processes, or for recording, and later referencing, the design of a database at any particular version. If you use a hierarchical model in JSON, it can be accessed by any scripting language such as PowerShell or Python to find out the structure of the database. Even DOS batch, with the help of JQ.exe, can read from it.
I’ve exploited this fact, in my Flyway Teamwork Framework, to provide PowerShell tasks that can generate a JSON model for each new version of the database, and then compare the current model to the previous one to produce a summary of what changed. I’ve described the various ways in which this can help team-based development in Tracking Development Changes using Database Models with Flyway.
As well as allowing us to compare database versions, a model also has more general uses in helping to automate database development tasks. For example, the model provides the information we need to write routines for tasks such as checking that naming conventions are being followed, or that tables are complete, or for extracting documentation. Because the model also contains all the objects dependencies, we can also use it to automate tasks that rely on knowing the correct dependency order. For example, we can use a model to generate a build script that ensures all objects are created in the correct dependency order, or to produce a script that deletes all data from a database in reverse-dependency order.
To illustrate the point, I’ll demo a few of these uses for a model, using the Flyway Teamwork framework. They provide a standardized way to automate this sort of mundane, but necessary, development task, regardless of RDBMS.
Generating a SQL Script that does a SQL task for every table
Unfortunately, there is no SQL method in common between the major RDBMSs to provide a simple way of executing a given Transact-SQL statement against each table in the database. Even if there was, tasks that affect a whole set of tables often need to happen in the correct dependency order. For example, if you need to build a database, you can’t create a table that has foreign key references to tables that you haven’t yet created. Conversely, if the task is to delete all the data, you can’t delete data from any table to which other tables with existing data still refer.
Even SQL Server ‘s sp_MSforeachtable
can’t ensure a task is done in dependency, or reverse-dependency, order. It is certainly possible to achieve similar functionality for other RDBMSs by using their respective system views, cursors, and procedural languages. Fortunately, though, we don ‘t have to do this. Instead, we can steal the idea behind sp_MSforeachtable
and use our database model to provide the dependency order.
As an example, I’ve provided a utility that generates code that does the action that you specify for every table, but in the right dependency order for the operation. It is called Script-ForEachTable
. You’ll find it in the Resources directory and is just one of the resources that is installed by ‘Preliminary.ps1‘.
All you need to do is provide the SQL for the operation you want to perform as a template, using ‘??
‘ as a placeholder for the actual table names. The utility will read the contents of the JSON model, sort the objects into the required dependency order and then iterate through the tables, in that order, replacing the placeholder with the actual table names. The result is a script that you can simply execute to perform the task. It is only the SQL template we use that needs to change for the RDBMS.
Using this utility, or alternatively by reading the table manifest that the $SaveDatabaseModelIfNecessary
task in the Flyway Teamwork framework already generates, we can run these operations much more simply, as well as performing more general tasks, such as listing the number of rows in each table.
Deleting the data from every table
The following code uses Script-ForEachTable
to auto-generate a script that will delete the contents of all the tables in a SQL Server database, without having to disable or delete any constraints:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
$DeletionRoutine=@" --delete the data from ??, in chunks of 10,000 rows if necessary Print 'Deleting all the data from ?? ' DECLARE @SoFarDeleted INT=1; WHILE (@SoFarDeleted > 0 and @@Error=0) BEGIN -- Delete a chunk of rows at a time DELETE TOP (10000) ??; SET @SoFarDeleted = @@ROWCOUNT; END go "@ $TheJSONModelLocation= 'S:\work\Github\FlywayTeamwork\Pubs\Versions\current\Reports\DatabaseModel.JSON ' Script-ForEachTable $DeletionRoutine $TheJSONModelLocation; |
This will provide the following SQL for every table, doing each table in reverse dependency order:
1 2 3 4 5 6 7 8 9 |
Print 'Deleting all the data from dbo.authors ' DECLARE @SoFarDeleted INT=1; WHILE (@SoFarDeleted > 0 and @@Error=0) BEGIN -- Delete a chunk of rows at a time DELETE TOP (10000) dbo.authors; SET @SoFarDeleted = @@ROWCOUNT; END go |
The reason that we are deleting data in smaller batches is that it is useful where large deletions might cause lock contention and other performance issues. The optimal batch size for your RDBMS and data is likely to be different.
Other RDBMSs will have different syntax. PostgreSQL doesn’t allow a LIMIT
in a DELETE
statement, so it is a bit more awkward.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
--delete the data from dbo.prices, in chunks of 10,000 rows if necessary DO ' DECLARE ChunkSize INT := 10000; RowsDeleted INT; BEGIN LOOP DELETE FROM dbo.prices WHERE ctid IN ( SELECT ctid FROM dbo.prices ORDER BY ctid LIMIT ChunkSize ); GET DIAGNOSTICS RowsDeleted = ROW_COUNT; IF RowsDeleted = 0 THEN EXIT; END IF; END LOOP; END '; |
In Oracle, you can achieve a similar result by using PL/SQL and the BULK
COLLECT
feature to fetch and delete data in chunks.
Count the number of rows in every table
Once you have a utility like this, it comes in handy for other database chores. How about this, for example, as a template for generating SQL scripts to count the number of rows in each table of a database:
1 2 3 4 5 6 7 |
@" DECLARE @count INT, @Feedback Nvarchar(80) SELECT @count=Count(*) FROM ?? SELECT @Feedback= '?? has '+Convert(VARCHAR(10), @Count)+ ' rows ' PRINT @Feedback Go "@ | Script-ForEachTable -PathToModel $TheJSONModelLocation; |
A fragment of the script it generates would look like this:
1 2 3 4 5 6 7 8 9 10 |
DECLARE @count INT, @Feedback Nvarchar(80) SELECT @count=Count(*) FROM dbo.publishers SELECT @Feedback= 'dbo.publishers has '+Convert(VARCHAR(10), @Count)+ ' rows ' PRINT @Feedback Go DECLARE @count INT, @Feedback Nvarchar(80) SELECT @count=Count(*) FROM dbo.Publication_Types SELECT @Feedback= 'dbo.Publication_Types has '+Convert(VARCHAR(10), @Count)+ ' rows ' PRINT @Feedback Go |
And it would produce a report like this ..
dbo.titleauthor has 1499 rows dbo.TagTitle has 1000 rows dbo.sales has 21 rows dbo.roysched has 86 rows dbo.editions has 2600 rows …etc…
Generating build scripts from a model
If you need a build script for each new database version, you can set a Flyway callback to generate the script for you, automatically, using whatever tool the RDBMS provides. The Flyway Teamwork Framework, along with a $SaveDatabaseModelIfNecessary
task that provides the manifest that tells you the order for doing the work, also has a $CreateBuildScriptIfNecessary
task that provides object-level build scripts for each migration.
Together, this give you everything you need to generate a build script but, of course, it requires a connection to a live database. If you don’t have access to the required version of the database, you can instead generate the build script from the database model. The Convert-modelToScript
cmdlet, included with the Framework, will parse a model, and create from it a build script. You’ll find it in the Resources directory.
The following example generates a build script from the model for V1.3 of the Pubs database:
1 2 |
'<MyPathTo> Pubs\Versions\1.3\Reports\DatabaseModel.JSON' | Convert-ModelToScript > 'MyBuildScript.SQL ' |
I get a script that I can then check against my build script to see what’s missing, or I can use to create a database that I can compare with the original. Here is a sample of the code generated…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
CREATE TYPE people.PersonalName FROM nvarchar (80) NOT NULL; CREATE TYPE people.PersonalPhoneNumber FROM varchar (20) NOT NULL; CREATE TYPE people.PersonalAddressline FROM varchar (60) NULL; CREATE TYPE people.PersonalPostalCode FROM varchar (15) NOT NULL; CREATE TYPE people.PersonalNote FROM nvarchar (MAX) NOT NULL; CREATE TYPE people.PersonalCVC FROM char (3) NOT NULL; CREATE TYPE people.PersonalTitle FROM nvarchar (20) NOT NULL; CREATE TYPE people.PersonalPaymentCardNumber FROM varchar (20) NOT NULL; CREATE TYPE people.PersonalLocation FROM varchar (20) NULL; CREATE TYPE people.PersonalEmailAddress FROM nvarchar (80) NOT NULL; CREATE TYPE people.PersonalSuffix FROM nvarchar (20) NULL; CREATE TYPE dbo.empid FROM char (9) NOT NULL; CREATE TYPE dbo.tid FROM varchar (6) NOT NULL; CREATE TYPE dbo.Dollars FROM numeric (9,2) NOT NULL; CREATE TYPE dbo.id FROM varchar (11) NOT NULL; GO CREATE TABLE dbo.authors ( /* The authors of the publications. a publication can have one or more author */ au_id dbo.id -- The key to the Authors Table CONSTRAINT UPKCL_auidind PRIMARY KEY (au_id), au_lname nvarchar(80), -- last name of the author au_fname nvarchar(80), -- first name of the author phone nvarchar(40) -- the author's phone number CONSTRAINT AssumeUnknown DEFAULT ('UNKNOWN'), address nvarchar(80), -- the author=s firest line address city nvarchar(40), -- the city where the author lives state char(2), -- the state where the author lives zip char(5), -- the zip of the address where the author lives contract bit -- had the author agreed a contract? ); CREATE TABLE dbo.jobs ( /* These are the job descriptions and min/max salary level */ job_id smallint IDENTITY(1,1) -- The surrogate key to the Jobs Table CONSTRAINT PK__jobs__6E32B6A51A14E395 PRIMARY KEY (job_id), job_desc varchar(50) -- The description of the job CONSTRAINT AssumeANewPosition DEFAULT ('New Position - title not formalized yet'), min_lvl tinyint, -- the minimum pay level appropriate for the job max_lvl tinyint -- the maximum pay level appropriate for the job ); CREATE TABLE dbo.Publication_Types ( /* An edition can be one of several types */ Publication_Type nvarchar(20) -- CONSTRAINT PK__Publicat__66D9D2B37E8D5751 PRIMARY KEY (Publication_Type) ); CREATE TABLE dbo.publishers ( /* this is a table of publishers who we distribute books for */ pub_id char(8) -- The surrogate key to the Publishers Table CONSTRAINT UPKCL_pubind PRIMARY KEY (pub_id), pub_name nvarchar(100), -- The name of the publisher city nvarchar(100), -- the city where this publisher is based state char(2), -- Thge state where this publisher is based country nvarchar(80) -- The country where this publisher is based CONSTRAINT AssumeItsTheSatates DEFAULT ('USA') ); CREATE TABLE dbo.stores ( /* these are all the stores who are our customers */ stor_id char(4) -- The primary key to the Store Table CONSTRAINT UPK_storeid PRIMARY KEY (stor_id), stor_name nvarchar(80), -- The name of the store stor_address nvarchar(80), -- The first-line address of the store city nvarchar(40), -- The city in which the store is based state char(2), -- The state where the store is base zip char(5) -- The zipt code for the store ); CREATE TABLE dbo.TagName ( /* All the categories of publications */ TagName_ID int IDENTITY(1,1) -- The surrogate key to the Tag Table CONSTRAINT TagnameSurrogate PRIMARY KEY (TagName_ID), Tag nvarchar(80) -- the name of the tag CONSTRAINT Uniquetag UNIQUE (Tag) ); |
At the moment, it is really just an elaborate test harness for the framework, to verify or inspect a model. For example, I can generate a build script from the model and use the script to create a database. Then, using SQL Compare, I compare the new database to the one that generates the model to ensure they’re identical. Because the code of all DML and procedural code of the routines and views is in the model, this is easier than it sounds, but it is one of those background maintenance jobs. I haven’t yet attempted to conform with individual dialects of all supported RDBMSs because I’ve not got independent schema comparison tools to check against.
Although JSON database object models are very easy to read, they will inevitably still contain a few SQL expressions. Many database objects, such as views, procedures, functions and some constraints cannot be represented without them. This means, for example, that we can’t interrogate the JSON definition of a view to work out the tables on which it depends. However, there are simple ways round these sorts of problems. To determine what tables are used in SQL expressions or batches, the Framework provides a generic SQL Tokenizer that can be used to give you a list of tables referenced within the scripted object such as a view, function or procedure.
Creating a style checker for tables
I’ve also created a Find-TableSmells
utility, which is essentially a table style checker for database models. At the moment it only runs with SQL Server, but that is only because I needed it for a SQL Server database that needed tidying up. I’ve yet to update all RDBMSs to the new standard.
1 |
Find-TableSmells '<path-to>\FlywayTeamwork\Pubs\Versions\1.1\Reports\DatabaseModel.JSON ' |
On the original Pubs Database, it is rather sniffy about the tables that have no primary keys but otherwise it finds little to criticize.
The dbo.roysched heap has no suitable candidate for a primary key The dbo.roysched table has a foreign Key name 'FK__roysched__title___15502E78 ' that looks auto-generated The dbo.pub_info table has a foreign Key name 'FK__pub_info__pub_id__20C1E124 ' that looks auto-generated The dbo.titles table has no documentation The dbo.titles table has a foreign Key name 'FK__titles__pub_id__08EA5793 ' that looks auto-generated The dbo.discounts heap has no index The dbo.discounts table has a foreign Key name 'FK__discounts__stor___173876EA ' that looks auto-generated The dbo.sales table has a column name 'qty ' that is incomprehensible
Conclusion
One problem I had when changing from its SQL Server origins to a multi-RDBMS system was the work of scaling out the facilities that are needed for database development. Although it was relatively easy, and quite intriguing to do the first three RDBMSs, it then became a chore. All the various tasks seem to boil down to getting information about the database. If information can be put in a standard model and then that model is used as a source of information for the tasks, the complexity drops away. The same is true of database development, even though the context is different. If diagramming, change-reports, quality-control, searching and so on can be done from a model rather than just the current development ‘state’ how much easier the work can become, and how much easier to understand how and when changes were made.