Product articles Flyway Database Builds and Deployments
Managing Database Documentation during…

Managing Database Documentation during Flyway-based Development

How to create and maintain a 'data dictionary' for your databases, in JSON format, which you can then use to add and update the table descriptions, whenever you build a new version using Flyway.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

I’ll provide a JSON query that will extract descriptions of all your tables, from the SQL Server extended properties, into a JSON document. This JSON report is single source of database documentation, providing details and descriptions of all existing tables and their columns, in a format that is both very easy to read, and extendable. I’ve also written a PowerShell task for Flyway, to automatically produce this report, for each version.

The idea is that the team will maintain this SQL ‘data dictionary’ during development, adding documentation for each new object, and improving existing descriptions. I also provide a query they can use to apply the JSON document to the database, for each new version. This can be executed independently, or as a ‘callback’ script, during every Flyway deployment. The query uses sp_addextendedproperty to add details of each new table and column, including an MS_description to describe the purpose of each one, and updates documentation for all existing objects.

The aim is for a single source of documentation for each database, useful for quickly familiarizing new developers with the schema, and them helping them maintain comprehensive documentation throughout development. It should allow them, and the Ops team, to instantly see what’s in the database, for every version.

A single ‘source of truth’ for documenting database objects

The process of building and modifying a database isn’t always as simple as many of the advocates of a controlled system would have us believe. There are times when you need to change the ways you do things to get the job done in reasonable time and effort. To provide a flexible system that accommodates the way you need to work, your tools need to be versatile and flexible. I must admit that my source control systems in the past have occasionally had all the order and systematization of a teenager’s bedroom, mainly due to my liking for Entity-Relationship modelling tools and other visual tools for helping with database development.

Database documentation is an example of the sort of task that doesn’t really fit into any methodology. You need to be able to view the complete story. Imagine, for example, that you need to provide an up-to-date description of your base tables for developers, business analysts or technical architects. If this is unnecessary for your project, then you’re lucky and probably in a parallel universe.

A migration approach to database development can make the task of producing a complete data documentation more difficult because you will have tables that have been migrated in several versions via several files, with the descriptions in source code comments. You need to scoop these comments up in a way that can be easily read.

Aha, you say, just create a build script for each table, liberally sprinkled with helpful comments. Nope, this won’t wash. Unlike routines such as views, functions and procedures, comments aren’t preserved in the metadata, for tables. Even if they were, there is just too much detail to be scanned quickly to get the overview.

I believe that the best ‘deliverable’ to de-mystify the purpose of tables and their columns for other co-workers, and even yourself in the future, is a markup document, either XML or JSON, that describes each table. When I say ‘table’, I mean any object that is a table source, such as a view, table, or table-valued function. I’ve assumed that, for most of your team, you wouldn’t want to document your constraints, indexes or any other child object-types of a table. It is perfectly possible, though at this stage slightly scarier.

Here is a sample table from our extended sample ‘Pubs’ database. The extra work was in making it readable, and extendable. This is something you can publish to co-workers, and which the developers can then alter and improve.

You will want this document to become the ‘source of truth’, so that you can add or amend the documentation here. For this to be practical, you will, at various times, want to get this information in or out of a database, and ideally, extract comments from a script and save them into a database, or insert them into a document like this. In this article, I’ll be showing a SQL script that takes the JSON document and uses it to update the descriptions for all the tables and table sources in the database, whatever the version of the database.

Documenting individual tables using database properties

In the various relational database systems, there is a way of adding and updating descriptions to various database objects like columns, indexes, tables, and functions. SQL Server has extended properties. PostgreSQL and Oracle have the COMMENT command. This is a much better, less volatile, way of commenting tables than annotating the source code. It also allows you to separate comments out from DDL code so that it isn’t registered as a metadata change.

Parsing comments in source code

If you have a lot of comments embedded in source code, and you need to turn them into documentation, it is possible to use a parser to find all the /*comment blocks */ and –inline comments, but it isn’t ideal. See How to document SQL Server Tables.

None of these methods produce easily readable comments in your source-code as you can see:

Extended properties for a table in SQL Server

Documenting all tables in JSON

We’d like to collect all these table and table source descriptions into a structured document, such as JSON, which our developers can then embellish, and then publish it all as more readable and versatile documentation, as well as use it to update the extended property descriptions for each table.

It makes sense, then, to be able to generate the documentation in JSON form, from the database, and to be able update the database from a JSON document. It is therefore better, I reckon, to maintain just one ‘master’ script, throughout development, that documents the database in JSON and that we can use to insert or update the documentation that is preserved within the database properties. You’d be able to use it at the end of every build because it will only add documentation to objects that exist at the time. Whenever new objects, with or without documentation, are generated you can generate fresh JSON documentation and use it to update the master script.

The big advantage of this approach is that it can easily be accommodated by both a migration-based approach such as Flyway, and a static approach. You’d still be able to maintain an object-level source with the descriptions included with them, but the ultimate source of all documentation would be the ‘data dictionary’ we’ve created.

For this to work, the master script that we maintain for the documentation needs to work on any version of the database, or even on an empty database, or an entirely different database. Without triggering an error or warning, or in any other way becoming over-excited, it would resolutely refuse to touch any object that wasn’t exactly as specified but adds documentation to tables and columns that match.

Generating the JSON Document

The task of generating a JSON document that lists out tables and their columns, along with their descriptions, if any, is best done in SQL. It’s useful to refer to, so it is important to provide a JSON structure that is easy to read. I’ve already provided a sample output, above. To produce this for all the tables in a database, I use this code. Recent versions of SQL Server (2017 and later) can use the first query (which uses string_agg)and earlier versions will use the XML variant to produce the same result:

In the DatabaseBuildAndMigrateTasks.ps1 PowerShell task that does this, and saves the result to disk, I show how to run either of these routines based on the compatibility level of the database.

Applying the JSON document to the database

Once you’ve taken this JSON document and added lots of helpful descriptions, you’ll want to add or update these descriptions as extended properties to the database (in Oracle or PostgreSQL you’d use COMMENTs). This script ought really to be completely idempotent, but to achieve this, you’d have to add complexity to the JSON to determine the difference between a blank comment and a non-existent comment. If you allow blank comments to be inserted into the database, they will appear in generated scripts as a mute reminder that you’ve yet to fill it in, and generally you can easily alter the script to ‘fill in the blank’.

Here is an example in which I’ve severely reduced the number of tables to two. The full source is on GitHub.

Using a query for updating or adding documentation after every migration

If you are using Flyway, you can add to your migration files a SQL afterMigrate callback script that is an expanded version of the script I’ve just shown. It will add or update the documentation for all existing objects. This call back script will be your ‘live’ data dictionary, as in this example, but with all the tables and table-sources added.

The script is called after each migration, so all you need to do is keep this version up-to-date, and it is applied, where possible, to every version.

Maintaining your JSON table-documentation

If you generate a new JSON document of your documentation from the latest version of the database, any ‘blank’ documentation strings will immediately highlight missing documentation. You will also see all new objects that have been properly created with their documentation. You can then use this to update the afterMigrate callback script.

Automatically generating the JSON table-documentation

I’ve written a PowerShell task $ExecuteTableDocumentationReport to produce the JSON report of the documentation that is in place for each version. It is here in DatabaseBuildAndMigrateTasks.ps1.

Conclusion

A documentation of a database is something you need to see, as a whole, so it fits uneasily in a purely migration-based database development. It is so much easier to see all the documentation for a database in a single place. The major relational databases have ways of applying notes and descriptions to database objects, and by using them, you can ensure that they are to hand for the database developer. By placing these annotations into a JSON report, they become accessible to other ways of publishing documentation, such as in PDF, eBook or on a website. They are also eminently searchable. In short, it is good practice to have a data catalog like this readily available. I have used SQL Server in this example, but this applies for Oracle and PostgreSQL with only minor modifications.

Tools in this post

Flyway

Version control for your database. Robust schema evolution across all your environments and technologies.

Find out more

Flyway

Version control for your database. Robust schema evolution across all your environments and technologies.

Find out more