Product articles Flyway
Simple Reporting with Flyway and…

Simple Reporting with Flyway and Database Models

If you can generate a file-based (JSON) model for each new version of a database, produced by a Flyway migration, then you have an easy way to run simple reports to help you search, list, and understand the structure of these databases. I'll show how to produce the models using PowerShell and then run some queries against them to generate the reports.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

I developed the FlywayTeamwork PowerShell framework to help users automate various of the necessary database development tasks that must happen as part of a Flyway migration. If you’ve taken the leap, and already started to use the framework, then you’ll probably have noticed that it can automatically generate a file-based (JSON) model of each version of the database, for any one of the major RDBMSs (it currently supports PostgreSQL, MariaDB, MySQL, SQL Server and SQLite).

This is necessary for supporting many development tasks but was done primarily to check for database ‘drift’ to ensure that no ‘uncontrolled’ changes were ever made to a database. However, there is also a lot of other useful information for the working database developer in this model.

What is a database model?

A model of a SQL database is a hierarchical representation of its metadata, the structure of inter-related tables and views, its procedures and functions. It can be represented easily in markup such as XML or YAML, or in a language such as JSON or even PowerShell. I’ve used JSON.

The most obvious use of a JSON model of a database is that you can use it to compare, search or list parts of your database or even to create SQL-92 build scripts for such purposes as transferring tables. It isn’t my idea: similar devices have been used for some time by DevOps groups in Facebook and Netflix to try to speed up delivery, and there are several open-source applications that use a database model rather than a SQL build script for design work.

There are a huge number of models that you can use for this purpose. After years of experiment, I’ve settled on a sparse and economical model that, wherever possible, sticks to SQL conventions and uses the names of objects as the keys. The model must be lean and economical. Columns are defined entirely with legal and consistent SQL syntax so that they are easily understood and easily parsed.

With Flyway, I can save a model for every version, so I know what was in it without having to recreate the live database. If I compare a version with the previous one, I then know what the migration did. A lot of routine tasks can be performed quickly by querying the model rather than the database. One can get lists of tables, search for strings or column names, list dependencies that are enforced by constraints, and answer a host of other questions that crop up when you’re developing a database. Database diagrams can easily be generated as long as the database contains keys and constraints.

It is time to demonstrate some of this.

Generating the database model

The easiest way to test out some of the simple reporting that uses the JSON model is to grab one of the sample JSON models that I’ve provided (one for PostgreSQL and one for MySQL or MariaDB) and save it locally. If so, then you can skip this section and move straight to the one after, where we read the model into PowerShell and start doing some simple reporting.

However, if you want to try out generating the JSON model, it requires use of Flyway and my FlywayTeamwork PowerShell framework. For each database we want to manage with Flyway, we set up a Flyway project folder, within the framework. The framework currently supports PostgreSQL, MariaDB, MySQL, SQL Server and SQLite. In the demos for this article, I’m using both the PubsMySQL project and the development branch of the PubsPostgresSQL project. However, I’ve also provided sample versions of the old ‘Pubs’ database for the other RDBMSs.

For every successful Flyway migration run, we invoke a scriptblock task, included with my framework, called $SaveDatabaseModelIfNecessary, which generates the JSON models.

Set up the Flyway project

I’ve described how to set up the Flyway project, within the framework, in previous articles (see for example What is the Flyway Teamwork Framework?) so won’t go into detail here.

I have one root folder (in my case called FlywayDevelopments) that holds all my Flyway projects plus the Resources folder, containing all the required framework tasks and files. In this example, my Flyway project folder (PubsFlyway) looks like this:

Set up Flyway project

This is the ‘main’ branch of the project (the only one we’ll use in the example). All branches have a standard structure. Make sure you create the sub-folders as shown; we’ll only use Migrations and Versions in this example, but the Branches directory needs to exist.

You’ll also need a copy of preliminary.ps1, and you’ll need to create a flyway.conf file that contains all the required project details for your database. Each of the projects in my GitHub repo contains sample config files for various databases.

Run the Flyway migration to generate the JSON model

We use the following code. It starts by making the project folder the current working directory and then runs the ‘preliminary’ script that fetches all the relevant information you need for scripting and automating a Flyway development. It announces the result in a Write-Output message just to make sure it has all worked, that I’m working with the right branch and to prevent myself doing something stupid.

You’ll have spotted that all the details you need were created for you by this script and are in a hashtable called $dbDetails. If something goes awry, such as files ending up in unexpected locations, check the contents of $dbDetails (e.g., just highlight it and execute it, in a PowerShell scripting window).

You can then run all the migrations, to take the database up to a version 1.1.7.

And then run the required post-migration tasks. You can find all the currently supported tasks in DatabaseBuildAndMigrateTasks.ps1, but in this case we just need to do enough to generate the model:

When you run the $SaveDatabaseModelIfNecessary task you’ll find two new sub-folders in the Versions folder, one for the new version (1.1.7) and one called current. Within the 1.1.7 folder you’ll find a reports folder containing the database model and a model folder containing object-level JSON scripts, organized by object type. It also puts a copy of all model and reports folders into the current folder.

The scriptblock currently only creates all these JSON file once, the first time you create the version. If you want to refresh it, you must delete the current files.

Simple Reporting using the JSON model

If you are trying this out with an existing JSON model file, then it is simple to read it into PowerShell:

If you want to read the model created at the end of the last migration, then this is also simple.

Peeping at the model

Let’s have a look at what’s inside. JSON isn’t that easy to read directly. If you are a database person, it all looks rather neater, and easier to inspect, using YAML.

We show just part of the result – a procedure, view and table – just to show you the idea.

---
dbo:
  procedure:
    byroyalty:
      hash: 896a29ca1a4eadd58d230709e111b12b
      definition: > 
        Begin  select au_id from titleauthor  where titleauthor.royaltyper = percentage;
        end
  view:
    reptq1:
      columns:
        - pub_id varchar(8) NULL DEFAULT (NULL)  
        - avg_price decimal(23,8) NULL DEFAULT (NULL)  
      hash: 589bcc3a39d8e3cf287332da8cdb9f7d
      definition: > 
        select coalesce('dbo'.'publications'.'pub_id','ALL') AS 'pub_id',avg('dbo'.'prices'.'price')
        AS 'avg_price' from ((('dbo'.'publishers' join 'dbo'.'publications' on('dbo'.'p
        ublications'.'pub_id' = 'dbo'.'publishers'.'pub_id')) join 'dbo'.'editions' on('
        dbo'.'editions'.'publication_id' = 'dbo'.'publications'.'Publication_id')) join
        'dbo'.'prices' on('dbo'.'prices'.'Edition_id' = 'dbo'.'editions'.'Edition_id'))
        where 'dbo'.'prices'.'PriceEndDate' is null group by 'dbo'.'publications'.'pub_id'
        with rollup
  table:
     employee:
      comment: An employee of any of the publishers
      primary key:
        PRIMARY: emp_id
      foreign key:
        fk_Employee_Pubs_JobID:
          Cols: job_id
          "Foreign Table": dbo.jobs
          Referencing: job_id
        fk_Employee_JobID:
          Cols: job_id
          "Foreign Table": dbo.jobs
          Referencing: job_id
        fk_Employee_publishers_pub_id:
          Cols: pub_id
          "Foreign Table": dbo.publishers
          Referencing: pub_id
      columns:
        - emp_id char(9) NOT NULL  
        - fname varchar(20) NOT NULL  
        - minit char(1) NULL DEFAULT (NULL)  
        - lname varchar(30) NOT NULL  
        - job_id smallint(6) NOT NULL DEFAULT (1)  
        - job_lvl smallint(6) NULL DEFAULT (10)  
        - pub_id char(8) NOT NULL  
        - hire_date datetime NOT NULL DEFAULT (curdate())  
      index:
        Jobid_index:
          def: 
          Indexing: job_id
        pub_id_index:
          def: 
          Indexing: pub_id
 …etc etc …

I hope that this gives a feel for the type of information that’s there. The model is a recursive object. It has to be because of the hierarchical nature of database metadata. For example, Schemas contain tables, which contain columns.

If we use a handy PowerShell cmdlet provided with the framework that I’ve written called Display-Object, you can take quite a shortcut to finding out what is in an object. This allows us to query and filter it. What Display-Object does is to find every value in the JSON and gives a ‘path’ or address that tells you where it is in the model. You can reference any value easily via its path. You can specify the depth to which you want to dig to find the data. Schemas are shallow, (a depth of 1), categories of base objects (tables/views/types/functions and so on) are next at depth 2, and each object is then listed by name at level 3. Child objects are visible by name at level 4.

List all the schemas in the database

We just list the first level of the hierarchy, the schema level. We’ll just leave it as an object path for the time being (if you’ve used XML, you’ll know what I mean)

Path 
---- 
$.dbo

List object types within each schema

What happens if we raise the level by an increment? We get the objects that are present in each schema. In this MariaDB example we’ve only one schema:

Path           
----           
$.dbo.procedure
$.dbo.table    
$.dbo.view

List object names and their type and schema

We can just increment the depth to three, to see the object names as well as types:

Path                                  
----                                  
$.dbo.procedure.byroyalty             
$.dbo.procedure.reptq1                
$.dbo.procedure.reptq2                
$.dbo.procedure.reptq3                
$.dbo.table.authors                   
$.dbo.table.discounts                 
$.dbo.table.editions                  
$.dbo.table.employee                  
$.dbo.table.jobs                      
$.dbo.table.prices                    
$.dbo.table.publications              
$.dbo.table.publishers                
$.dbo.table.pub_info                  
$.dbo.table.roysched                  
$.dbo.table.sales                     
$.dbo.table.stores                    
$.dbo.table.tagname                   
$.dbo.table.tagtitle                  
$.dbo.table.titleauthor               
$.dbo.view.byroyalty                  
$.dbo.view.publishersbypublicationtype
$.dbo.view.reptq1                     
$.dbo.view.reptq2                     
$.dbo.view.reptq3                     
$.dbo.view.titles                     
$.dbo.view.titleview

However, if we are serious about reporting, we need to do better in terms of clarity.

Name                            Type     
----                            ----     
dbo.byroyalty                   procedure
dbo.reptq1                      procedure
dbo.reptq2                      procedure
dbo.reptq3                      procedure
dbo.authors                     table    
dbo.discounts                   table    
dbo.editions                    table    
dbo.employee                    table    
dbo.jobs                        table    
dbo.prices                      table    
dbo.publications                table    
dbo.publishers                  table    
dbo.pub_info                    table    
dbo.roysched                    table    
dbo.sales                       table    
dbo.stores                      table    
dbo.tagname                     table    
dbo.tagtitle                    table    
dbo.titleauthor                 table    
dbo.byroyalty                   view     
dbo.publishersbypublicationtype view     
dbo.reptq1                      view     
dbo.reptq2                      view     
dbo.reptq3                      view     
dbo.titles                      view     
dbo.titleview                   view

Listing the (view and table) object names and their child objects (e.g., constraints)

Just by going to a depth of 4, we can see the child objects:

Path                                          
----                                          
$.dbo.Table.authors.columns                   
$.dbo.Table.authors.index                     
$.dbo.Table.authors.primary key               
$.dbo.Table.discounts.columns                 
$.dbo.Table.discounts.foreign key             
$.dbo.Table.discounts.index                   
$.dbo.Table.discounts.primary key             
$.dbo.Table.editions.columns                  
$.dbo.Table.editions.index                    
$.dbo.Table.editions.primary key              
$.dbo.Table.employee.columns                  
$.dbo.Table.employee.foreign key              
$.dbo.Table.employee.index                    
$.dbo.Table.employee.primary key              
$.dbo.Table.jobs.columns                      
$.dbo.Table.jobs.primary key                  
$.dbo.Table.prices.columns                    
$.dbo.Table.prices.index                      
$.dbo.Table.prices.primary key                
$.dbo.Table.publications.columns              
 … etc, etc …

Filtering the data

We can easily put in a filter to list just the tables, procedures, functions or views. Here we just want the Views. We use a simple wildcard to do the filtering.

Path                                  
----                                  
$.dbo.View.byroyalty                  
$.dbo.View.publishersbypublicationtype
$.dbo.View.reptq1                     
$.dbo.View.reptq2                     
$.dbo.View.reptq3                     
$.dbo.View.titles                     
$.dbo.View.titleview

Listing out columns

We can list out all the columns in the database, together with its table. I’ll just show the first two tables:

TableName                       Column                          
---------                       ------                          
dbo.authors                     au_id varchar(11) NOT NULL      
dbo.authors                     au_lname varchar(80) NOT NULL   
dbo.authors                     au_fname varchar(80) NOT NULL   
dbo.authors                     phone varchar(40) NOT NULL DEFAULT ('UNKNOWN')                                     
dbo.authors                     address varchar(80) NULL DEFAULT (NULL)                                            
dbo.authors                     city varchar(40) NULL DEFAULT (NULL)                                               
dbo.authors                     state char(2) NULL DEFAULT (NULL)                                                  
dbo.authors                     zip char(5) NULL DEFAULT (NULL) 
dbo.authors                     contract smallint(6) NOT NULL   
dbo.discounts                   discounttype varchar(40) NOT NULL                                                  
dbo.discounts                   stor_id char(4) NULL DEFAULT (NULL)                                                
dbo.discounts                   lowqty smallint(6) NULL DEFAULT (NULL)                                             
dbo.discounts                   highqty smallint(6) NULL DEFAULT (NULL)                                            
dbo.discounts                   discount decimal(4,2) NOT NULL  
dbo.discounts                   discount_id int(11) NOT NULL auto_increment                                        
dbo.editions                    Edition_id int(11) NOT NULL auto_increment                                         
dbo.editions                    publication_id varchar(6) NOT NULL                                                 
dbo.editions                    Publication_type varchar(20) NOT NULL DEFAULT ('book')                             
dbo.editions                    EditionDate datetime NOT NULL DEFAULT (curdate())                                  
dbo.employee                    emp_id char(9) NOT NULL         
dbo.employee                    fname varchar(20) NOT NULL      
dbo.employee                    minit char(1) NULL DEFAULT (NULL)                                                  
dbo.employee                    lname varchar(30) NOT NULL      
dbo.employee                    job_id smallint(6) NOT NULL DEFAULT (1)                                            
dbo.employee                    job_lvl smallint(6) NULL DEFAULT (10)                                              
dbo.employee                    pub_id char(8) NOT NULL         
dbo.employee                    hire_date datetime NOT NULL DEFAULT (curdate())

You are more likely to use a filter to search out the table you are interested in. Here is a wildcard search for columns in the Sales table (you could search for keys as well):

Value                          
-----                          
stor_id char(4) NOT NULL       
ord_num varchar(20) NOT NULL   
ord_date varchar(50) NOT NULL  
qty int(11) NOT NULL           
payterms varchar(12) NOT NULL  
title_id varchar(6) NOT NULL

Listing other table attributes

Now, as far as the model is concerned, a column is merely an ordered list of ‘column’ attributes. So are primary keys and foreign keys. To illustrate this, here are a list of all the primary keys in the sample database. They are all given the automatically generated name 'PRIMARY' by MariaDB, in case you wondered!

Path                                         Value         
----                                         -----         
$.dbo.Table.authors.primary key.PRIMARY      au_id         
$.dbo.Table.discounts.primary key.PRIMARY    discount_id   
$.dbo.Table.editions.primary key.PRIMARY     Edition_id    
$.dbo.Table.employee.primary key.PRIMARY     emp_id        
$.dbo.Table.jobs.primary key.PRIMARY         job_id        
$.dbo.Table.prices.primary key.PRIMARY       Price_id      
$.dbo.Table.publications.primary key.PRIMARY Publication_id
$.dbo.Table.publishers.primary key.PRIMARY   pub_id        
$.dbo.Table.pub_info.primary key.PRIMARY     pub_id        
$.dbo.Table.sales.primary key.PRIMARY        ord_num       
$.dbo.Table.stores.primary key.PRIMARY       stor_id       
$.dbo.Table.tagname.primary key.PRIMARY      TagName_ID    
$.dbo.Table.tagtitle.primary key.PRIMARY     title_id      
$.dbo.Table.titleauthor.primary key.PRIMARY  au_id

Well, this is only a small practice database I’m using but there are other attributes besides columns and primary keys. In fact, we can tell you precisely what other attributes exist:

columns
comment
definition
foreign key
hash
index
primary key
unique

Refining the result (to support further processing)

This is OK for ad-hoc queries, assuming you don’t mind doing a bit of filtering by eye; but what if you want to generate a list of tables for further processing, such as to import or export data?

We need to have a more user-friendly result. There are several different ways of doing this. One is to use a Regex expression, exploiting the fact that the Select-Object cmdlet allows a PowerShell expression:

TableName       
---------       
dbo.authors     
dbo.discounts   
dbo.editions    
dbo.employee    
dbo.jobs        
dbo.prices      
dbo.publications
dbo.publishers  
dbo.pub_info    
dbo.roysched    
dbo.sales       
dbo.stores      
dbo.tagname     
dbo.tagtitle    
dbo.titleauthor

You can use Select-String to get the same effect if you like Regex expressions:

However, I prefer a much simpler way of doing it, relying on the fact that the path expression is dot delimited. Again, the result is the same:

Searching for strings

You can easily do global searches for a string within the metadata, via a wildcard or regex search. You can also apply filters to focus the search on a particular schema, object type or attribute. Here’s a wildcard search for title_id:

Here’s the results in grid view:

Search for a string in a JSON database model

Regex searches are just as easy:

Path                                              Value     
----                                              -----     
$.dbo.View.publishersbypublicationtype.columns[4] Ebook decimal(22,0) NULL DEFAULT (NULL)

We can easily restrict our searches just to columns, for example:

Path                               Value                         
----                               -----                         
$.dbo.Table.roysched.columns[0]    title_id varchar(6) NOT NULL  
$.dbo.Table.sales.columns[5]       title_id varchar(6) NOT NULL  
$.dbo.Table.tagtitle.columns[0]    title_id varchar(6) NOT NULL  
$.dbo.Table.titleauthor.columns[1] title_id varchar(6) NOT NULL  
$.dbo.View.titles.columns[0]       title_id varchar(6) NOT NULL

Or to foreign keys:

Path                                                        Value   
----                                                        -----   
$.dbo.Table.roysched.foreign key.roysched_ibfk_1.Cols       title_id
$.dbo.Table.sales.foreign key.sales_ibfk_2.Cols             title_id
$.dbo.Table.tagtitle.foreign key.tagtitle_ibfk_2.Cols       title_id
$.dbo.Table.titleauthor.foreign key.titleauthor_ibfk_2.Cols title_id

Exploring references

Probably the most common question asked by developers within a team is about references. If they are enforced by a foreign key, then the answer is easily fetched. You can approach it from the view of the table doing the referencing …

TableName       Key                           References      
---------       ---                           ----------      
dbo.discounts   fk_Discounts_Stores_Stor_id   dbo.stores      
dbo.employee    fk_Employee_JobID             dbo.jobs        
dbo.employee    fk_Employee_publishers_pub_id dbo.publishers  
dbo.employee    fk_Employee_Pubs_JobID        dbo.jobs        
dbo.pub_info    fk_Pubinfo_publishers_pub_id  dbo.publishers  
dbo.roysched    roysched_ibfk_1               dbo.publications
dbo.sales       sales_ibfk_1                  dbo.stores      
dbo.sales       sales_ibfk_2                  dbo.publications
dbo.tagtitle    tagtitle_ibfk_1               dbo.tagname     
dbo.tagtitle    tagtitle_ibfk_2               dbo.publications
dbo.titleauthor titleauthor_ibfk_1            dbo.authors     
dbo.titleauthor titleauthor_ibfk_2            dbo.publications

…or what tables are being referenced.

TableName        key                           ReferencedBy   
---------        ---                           ------------   
dbo.authors      titleauthor_ibfk_1            dbo.titleauthor
dbo.jobs         fk_Employee_JobID             dbo.employee   
dbo.jobs         fk_Employee_Pubs_JobID        dbo.employee   
dbo.publications roysched_ibfk_1               dbo.roysched   
dbo.publications sales_ibfk_2                  dbo.sales      
dbo.publications tagtitle_ibfk_2               dbo.tagtitle   
dbo.publications titleauthor_ibfk_2            dbo.titleauthor
dbo.publishers   fk_Employee_publishers_pub_id dbo.employee   
dbo.publishers   fk_Pubinfo_publishers_pub_id  dbo.pub_info   
dbo.stores       fk_Discounts_Stores_Stor_id   dbo.discounts  
dbo.stores       sales_ibfk_1                  dbo.sales      
dbo.tagname      tagtitle_ibfk_1               dbo.tagtitle

Listing references

What we’ve done so far is to get paths to objects and, if they are simple values like numbers or strings, we merely provide their values. We don’t provide the objects because that would mean overweight objects. If you need an object such as an index, table, procedure, or key, then it is sometimes easier to use the path to get the data from the object. Here we get a more comprehensive report of foreign keys by listing out the columns used in both the referring table and the referenced table. Unfortunately, we don’t have a multi-column primary key/foreign key in the sample database, but the columns become comma-delimited lists if necessary.

table           Foreign Key                   Key Cols   Referenced Table Referenced keys
-----           -----------                   --------   ---------------- ---------------
dbo.discounts   fk_Discounts_Stores_Stor_id   stor_id    dbo.stores       stor_id        
dbo.employee    fk_Employee_Pubs_JobID        job_id     dbo.jobs         job_id         
dbo.employee    fk_Employee_JobID             job_id     dbo.jobs         job_id         
dbo.employee    fk_Employee_publishers_pub_id pub_id     dbo.publishers   pub_id         
dbo.pub_info    fk_Pubinfo_publishers_pub_id  pub_id     dbo.publishers   pub_id         
dbo.roysched    roysched_ibfk_1               title_id   dbo.publications Publication_id 
dbo.sales       sales_ibfk_2                  title_id   dbo.publications Publication_id 
dbo.sales       sales_ibfk_1                  stor_id    dbo.stores       stor_id        
dbo.tagtitle    tagtitle_ibfk_2               title_id   dbo.publications Publication_id 
dbo.tagtitle    tagtitle_ibfk_1               TagName_ID dbo.tagname      TagName_ID     
dbo.titleauthor titleauthor_ibfk_2            title_id   dbo.publications Publication_id 
dbo.titleauthor titleauthor_ibfk_1            au_id      dbo.authors      au_id

Conclusions

I created the Flyway Teamwork framework to make it easier to automate Flyway with PowerShell. Although it seemed most important to integrate Flyway with other development teamwork processes such as source control, bug management, testing, deployment and code review, it was the problem of keeping track of the database objects that were being changed that drove me to create database models. With a database model, it is far easier to compare versions routinely. However, I soon began to appreciate how useful it was to be able to use the models, instead of the live database, when you’re just looking at the design and implementation, making sure tables are properly indexed, searching for strings and so on. It is a by-product of their primary value but surprisingly useful, nonetheless.

It is possible to do more. You can create Entity Relationship Diagrams, validate datasets and so on. However, this article is just intended as an introduction to the many uses of a database model that can be used by scripts. Sure, database models can be very useful.

Tools in this post

Flyway

DevOps for the Database

Find out more