Product articles Flyway Database Testing and Quality
Transferring Database Documentation…

Transferring Database Documentation into SQL DDL Scripts

We all love having documentation in source code, if not writing it. We just want to ensure that it gets propagated and retained so you and your team members can read it if they need to. This article demonstrates a cross-RDBMS PowerShell task that will extract the comments from a primary JSON source of database documentation and add it to a set of SQL DDL source files.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Why is documenting the migration or build script a problem?

Database documentation is a problem. No, wait, no sniggering please. The obvious place to document the source is in the code. That’s fine until you come to tables. Whereas routines such as procedures and functions are stored as-is in the database, and so persist in the source dumps, object-level scripts, migration scripts and synchronization scripts, the same isn’t true of tables which aren’t, and so don’t.

Table documentation in a DDL script is not retained in the live database.

If you just document the source of a table, or one of its components, then it gets stripped out from the live object. This causes irritations when, for example, the delivery item of a sprint is a generated synchronization script that is entirely innocent of any table documentation. For example, when you’re saving your work to source control, it causes confusions when you see that the object-level source that has been generated by a build, or a successful migration run, hasn’t got its comments. It may cause grief when these comment blocks in your scripts with your lavish explanations don’t get into the metadata where database developers like me want them.

Basically, we want comments in generated code and the metadata, not just the original hand-cut source code.

Surely, the RDBMSs already allow table documentation?

The most obvious solution to preserving table documentation, theoretically, is to assign your documentation and comments to the table’s attributes in the metadata. This is fine because you can then extract the documentation from the metadata for reports and summaries. Smart IDEs will even provide you this information in help text. There is a problem though: in some RDBMSs, it requires yards of code that obscures the table declaration. MySQL is an exception:

Sadly, in MySQL, adding or altering a column comment requires you to recreate the column. However, despite the awkwardness, it means that the documentation persists in an object-level source, or in synch scripts. I’ve shown you how to do this is in MySQL.

Another problem is that there is no common SQL standard across databases for COMMENTs. SQL Server has gone entirely its own way, with extended properties, but even amongst the other RDBMSs the COMMENT attribute is implemented inconsistently. With SQL Server and PostgreSQL, you must add a series of separate SQL Statements to the CREATE TABLE script to decorate the created object with comments. SQLite is very different, and probably has the best solution in that it preserves the source of the table along with its comments.

Let’s talk about solutions

Although the SQL Scripts are the best place to add the documentation of the design of the database, it suits team-working better if they are in a separate data dictionary as well. This is particularly true if you are working on migration scripts because the documentation of the table script is unlikely to be in the same script as the one you are working on. The challenge is to allow developers a simple way to generate and maintain a primary source of database documentation (or ‘data dictionary’) during development, while ensuring the documentation then gets propagated into the database metadata and into any DDL scripts generated from it. Ideally, we’d also be able to update our primary source from our DDL scripts, and vice versa.

I would hate to dictate to any other development team as to what constitutes the primary source of documentation. I know what suits me, and that is to work in a fairly haphazard way until a database project requires some structure. At that point, I like to maintain documentation in a JSON file that can then be quickly turned into something readable or applied to any copy of the DDL for the database, whether it be object-level source directories, build scripts, synch files or migration files.

An example of a database object in a JSON document to do this is:

The idea is that the team extract this JSON document from the database metadata, and then maintain it as the primary source of documentation throughout development.

I’ve written a scriptblock task within the Flyway Teamwork Framework called $ExecuteTableDocumentationReport. It can be run for every Flyway version created, and it extracts the documentation of every table or view and its columns from the database metadata into the JSON document (1).

The team add any missing comments, such as to columns and other table attributes (2). When we run Flyway migrate (3), we update the database directly from the JSON document as an ‘afterMigrate’ callback (4). An example of how this can be applied to a MySQL Database is here in afterMigrate__ApplyTableDescriptions.sql and also for SQL Server here.

Writing documentation to and from a database and a JSON source.

I’ve explained steps 1-4 in the above diagram in previous articles. See Taming Database Documentation with Flyway and MySQL and Managing SQL Server Database Documentation during Flyway Development for the details.

A missing bit of the puzzle is the ability to update generated source code files from the JSON document (5). I’ll tackle that in this article, providing a cmdlet that uses Regexes to do the task. In another article, I’ll show how to use a simple SQL parser to do this, and thereby avoid the obvious shortfalls of this approach.

It is also possible to update the JSON document from a documented DDL source file (6). Sadly, though it is possible to reliably add CREATE TABLE documentation from a SQL file to a JSON document, and vice versa, you will need a parser-based approach to reliably add documentation to its columns, constraints or indexes.

Updating the DDL source files with comments

I’ve encapsulated into a cmdlet all the logic of adding documentation to a SQL DDL source file, from our primary JSON source. It is a bit long as a batch and can be used as a cmdlet in several different contexts. It also involves Regex. The source code for Write-DocumentationTo SQLDDLFile is on the Github site for my Powershell Utility cmdlets, and in the resources for the Flyway Teamwork Framework. However, it will work happily by itself without the framework.

An example of a complete JSON document for table and column comments, for the sample Pubs database, is here.

Processing a single SQL file

In the first example we’ll add blocks of documentation to a file. We’ll be cautious and save the new file with an ‘Altered_‘ prefix:

This segment…

…is changed to…

…and so on through the file

Normally, of course, there would be a lot more in the comment block, but this is just a sample.

Processing an object-level directory

We can provide the cmdlet with a whole list of files. Here we convert an object-level source generated from SQL Compare, but it will work with any SQL object-level directory. The Teamwork framework currently generates object-level directories for MySQL, MariaDB, PostgreSQL and SQLite, as well as using SQL Compare to do SQL Server object-level directories.

Processing all the pending files for a Flyway migration

Processing all the pending files for a Flyway migration is easy too, because Flyway provides a JSON list of all the files, which we can filter for the pending SQL files. Flyway will ‘raise an eyebrow’ if you were to change the files that were already successfully applied to the database

Extracting the table documentation from the DDL source code

I’ve written in the past, in How to Document SQL Server Tables about extracting the documentation of tables from a build script by using the SQL Server parser. When working with Flyway, I’ve had to extend this facility to be used with any dialect of RDBMS, not just SQL Server, mainly to pick up the comments that explain each table anywhere in a chain of migrations. I can try to persuade colleagues in a development team to do it a particular way, but an automated task is easier on the blood pressure. So, I can now take a long meandering chain of CREATE and ALTER table statements and pick up any adjacent documentation block. It can be defeated by embedding a valid DDL statement in a string or a comment block, of course, but Regex solutions cannot be as good as using a parser, and a good SQL Parser is outside my price-range. Besides, my main target is generated code which is more consistent. Whereas a parser can pick up comments and constraints, this level of complexity is beyond the casual use of regex, and it wasn’t really the sort of task it was designed for.

Conclusions

This article adds a new ingredient to the method of database documentation I’ve advocated in a couple of previous articles. In the first, I introduced the idea of generating JSON documentation files in SQL Server and then, in the second one, I extended this to other RDBMSs. Now, with this article, I’ve introduced the idea, and the code, for using the JSON documentation report to document SQL DDL code that has been generated from the database, comparing two databases, or had been written without commenting the code. In a separate article, I’ve shown how to extract the documentation from existing scripts. With these tools, you would have a lot more options in the way you document databases.

Tools in this post

Flyway

DevOps for the Database

Find out more

Flyway Teams

Ideal for organizations looking to improve collaboration and fine tune their processes during development and the deployment of database changes.

Find out more