Transferring Database Documentation into SQL DDL Scripts
We all love having documentation in source code, if not writing it. We just want to ensure that it gets propagated and retained so you and your team members can read it if they need to. This article demonstrates a cross-RDBMS PowerShell task that will extract the comments from a primary JSON source of database documentation and add it to a set of SQL DDL source files.
Why is documenting the migration or build script a problem?
Database documentation is a problem. No, wait, no sniggering please. The obvious place to document the source is in the code. That’s fine until you come to tables. Whereas routines such as procedures and functions are stored as-is in the database, and so persist in the source dumps, object-level scripts, migration scripts and synchronization scripts, the same isn’t true of tables which aren’t, and so don’t.
If you just document the source of a table, or one of its components, then it gets stripped out from the live object. This causes irritations when, for example, the delivery item of a sprint is a generated synchronization script that is entirely innocent of any table documentation. For example, when you’re saving your work to source control, it causes confusions when you see that the object-level source that has been generated by a build, or a successful migration run, hasn’t got its comments. It may cause grief when these comment blocks in your scripts with your lavish explanations don’t get into the metadata where database developers like me want them.
Basically, we want comments in generated code and the metadata, not just the original hand-cut source code.
Surely, the RDBMSs already allow table documentation?
The most obvious solution to preserving table documentation, theoretically, is to assign your documentation and comments to the table’s attributes in the metadata. This is fine because you can then extract the documentation from the metadata for reports and summaries. Smart IDEs will even provide you this information in help text. There is a problem though: in some RDBMSs, it requires yards of code that obscures the table declaration. MySQL is an exception:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE TABLE dbo.publishers ( pub_id char(4) NOT NULL COMMENT 'The primary key', pub_name varchar(40) COMMENT 'The full name of the publisher', city varchar(20) COMMENT 'the postal city of the main office', state char(2) COMMENT 'the state, county or region of the main office', country varchar(30) DEFAULT 'USA' COMMENT 'the IBAN code of the country', PRIMARY KEY (pub_id) ) ENGINE=INNODB COMMENT='This table contains the list of publishers that supply the distributors to the trade. A secondary table pub_info provides a description of the organisation '; |
Sadly, in MySQL, adding or altering a column comment requires you to recreate the column. However, despite the awkwardness, it means that the documentation persists in an object-level source, or in synch scripts. I’ve shown you how to do this is in MySQL.
Another problem is that there is no common SQL standard across databases for COMMENT
s. SQL Server has gone entirely its own way, with extended properties, but even amongst the other RDBMSs the COMMENT
attribute is implemented inconsistently. With SQL Server and PostgreSQL, you must add a series of separate SQL Statements to the CREATE
TABLE
script to decorate the created object with comments. SQLite is very different, and probably has the best solution in that it preserves the source of the table along with its comments.
Let’s talk about solutions
Although the SQL Scripts are the best place to add the documentation of the design of the database, it suits team-working better if they are in a separate data dictionary as well. This is particularly true if you are working on migration scripts because the documentation of the table script is unlikely to be in the same script as the one you are working on. The challenge is to allow developers a simple way to generate and maintain a primary source of database documentation (or ‘data dictionary’) during development, while ensuring the documentation then gets propagated into the database metadata and into any DDL scripts generated from it. Ideally, we’d also be able to update our primary source from our DDL scripts, and vice versa.
I would hate to dictate to any other development team as to what constitutes the primary source of documentation. I know what suits me, and that is to work in a fairly haphazard way until a database project requires some structure. At that point, I like to maintain documentation in a JSON file that can then be quickly turned into something readable or applied to any copy of the DDL for the database, whether it be object-level source directories, build scripts, synch files or migration files.
An example of a database object in a JSON document to do this is:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "TableObjectName":"People.NotePerson", "Type":"table", "Description":" relates a note to a person ", "Brief":"who wrote the note", "TheColumns":{ "NotePerson_id":"Surrogate primary key for the link table", "Person_id":"foreign key to the person who has the addess", "Note_id":"foreign key to the actual note", "InsertionDate":" whan the note was inserted ", "ModifiedDate":" when the association of note with person was last modified " } } |
The idea is that the team extract this JSON document from the database metadata, and then maintain it as the primary source of documentation throughout development.
I’ve written a scriptblock task within the Flyway Teamwork Framework called $ExecuteTableDocumentationReport. It can be run for every Flyway version created, and it extracts the documentation of every table or view and its columns from the database metadata into the JSON document (1).
The team add any missing comments, such as to columns and other table attributes (2). When we run Flyway migrate (3), we update the database directly from the JSON document as an ‘afterMigrate’ callback (4). An example of how this can be applied to a MySQL Database is here in afterMigrate__ApplyTableDescriptions.sql and also for SQL Server here.
I’ve explained steps 1-4 in the above diagram in previous articles. See Taming Database Documentation with Flyway and MySQL and Managing SQL Server Database Documentation during Flyway Development for the details.
A missing bit of the puzzle is the ability to update generated source code files from the JSON document (5). I’ll tackle that in this article, providing a cmdlet that uses Regexes to do the task. In another article, I’ll show how to use a simple SQL parser to do this, and thereby avoid the obvious shortfalls of this approach.
It is also possible to update the JSON document from a documented DDL source file (6). Sadly, though it is possible to reliably add CREATE TABLE
documentation from a SQL file to a JSON document, and vice versa, you will need a parser-based approach to reliably add documentation to its columns, constraints or indexes.
Updating the DDL source files with comments
I’ve encapsulated into a cmdlet all the logic of adding documentation to a SQL DDL source file, from our primary JSON source. It is a bit long as a batch and can be used as a cmdlet in several different contexts. It also involves Regex. The source code for Write-DocumentationTo SQLDDLFile is on the Github site for my Powershell Utility cmdlets, and in the resources for the Flyway Teamwork Framework. However, it will work happily by itself without the framework.
An example of a complete JSON document for table and column comments, for the sample Pubs database, is here.
Processing a single SQL file
In the first example we’ll add blocks of documentation to a file. We’ll be cautious and save the new file with an ‘Altered_‘ prefix:
1 2 3 4 5 6 7 |
#specify the location of the current documentation file $DocFile = '<The Path to>\TableDocumentation.json' # specify the location of the sql file $TheFilename = '<The Path to>\V1.1.1__Initial_Build.sql' Write-DocumentationToSQLDDLFile -FileList @("$TheFilename") ` -Documentation $DocFile ` -FileVersion 'Altered_' -Verbose |
This segment…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
GO PRINT N'Creating [dbo].[jobs]' GO CREATE TABLE [dbo].[jobs] ( [job_id] [smallint] NOT NULL IDENTITY(1, 1), [job_desc] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL CONSTRAINT [DF__jobs__job_desc__1BFD2C07] DEFAULT ('New Position - title not formalized yet'), [min_lvl] [tinyint] NOT NULL, [max_lvl] [tinyint] NOT NULL ) ON [PRIMARY] GO GO PRINT N'Creating primary key [PK__jobs__6E32B6A51A14E395] on [dbo].[jobs]' GO ALTER TABLE [dbo].[jobs] ADD CONSTRAINT [PK__jobs__6E32B6A51A14E395] PRIMARY KEY CLUSTERED ([job_id]) ON [PRIMARY] GO |
…is changed to…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
PRINT N'Creating [dbo].[jobs]' GO CREATE TABLE dbo.jobs /* These are the job descriptions and min/max salary level */ ( [job_id] [smallint] NOT NULL IDENTITY(1, 1), [job_desc] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL CONSTRAINT [DF__jobs__job_desc__1BFD2C07] DEFAULT ('New Position - title not formalized yet'), [min_lvl] [tinyint] NOT NULL, [max_lvl] [tinyint] NOT NULL ) ON [PRIMARY] GO GO PRINT N'Creating primary key [PK__jobs__6E32B6A51A14E395] on [dbo].[jobs]' GO ALTER TABLE dbo.jobs /* Job descriptions */ ADD CONSTRAINT [PK__jobs__6E32B6A51A14E395] PRIMARY KEY CLUSTERED ([job_id]) ON [PRIMARY] GO |
…and so on through the file
Normally, of course, there would be a lot more in the comment block, but this is just a sample.
Processing an object-level directory
We can provide the cmdlet with a whole list of files. Here we convert an object-level source generated from SQL Compare, but it will work with any SQL object-level directory. The Teamwork framework currently generates object-level directories for MySQL, MariaDB, PostgreSQL and SQLite, as well as using SQL Compare to do SQL Server object-level directories.
1 2 3 4 5 6 7 8 9 10 |
# Specify the location of the sql files within the object-level directory $ObjectLevelDirectory='S:\work\Github\FlywayTeamwork\Pubs\Branches\develop\Versions\1.1.11\Source\*.sql' # create a list of SQL files by recursing through the subdirectories $FileList= dir $ObjectLevelDirectory -Recurse |foreach{$_.FullName} # change every file in the list by adding the appropriate documentation from the # JSON documentation. In this example we prefix each file with a V1_ prefix # just to avoid over-writing Write-DocumentationToSQLDDLFile -FileList $fileList ` -Documentation $DocFile ` -FileVersion 'V1_' -Verbose |
Processing all the pending files for a Flyway migration
Processing all the pending files for a Flyway migration is easy too, because Flyway provides a JSON list of all the files, which we can filter for the pending SQL files. Flyway will ‘raise an eyebrow’ if you were to change the files that were already successfully applied to the database
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
cd S:\work\Github\FlywayTeamwork\Pubs\Branches\develop\Variants\Documented <# if not using Flyway Teamworks, you'll need to provide credentials. With Flyway, you'd run preliminary.ps1 that would manage all that #> flyway clean $Migrations = Flyway info -outputType=json | convertfrom-json $DocFile = 'S:\work\Github\FlywayTeamwork\Pubs\Branches\develop\Variants\Documented\Migrations\TableDocumentation.json' if ($Migrations.error -ne $null) { #if getting the migration list was unsuccessful write-warning $Migrations.error.message } else { $fileList = $Migrations.migrations | where { ![string]::IsNullOrEmpty($_.filepath) -and #not version zero please ($_.type -ieq 'SQL') -and # just the migration files please ($_.state -ieq 'Pending') #just select the pending files in this case } | foreach { $_.Filepath } Write-DocumentationToSQLDDLFile $fileList $DocFile -Verbose } |
Extracting the table documentation from the DDL source code
I’ve written in the past, in How to Document SQL Server Tables about extracting the documentation of tables from a build script by using the SQL Server parser. When working with Flyway, I’ve had to extend this facility to be used with any dialect of RDBMS, not just SQL Server, mainly to pick up the comments that explain each table anywhere in a chain of migrations. I can try to persuade colleagues in a development team to do it a particular way, but an automated task is easier on the blood pressure. So, I can now take a long meandering chain of CREATE
and ALTER
table statements and pick up any adjacent documentation block. It can be defeated by embedding a valid DDL statement in a string or a comment block, of course, but Regex solutions cannot be as good as using a parser, and a good SQL Parser is outside my price-range. Besides, my main target is generated code which is more consistent. Whereas a parser can pick up comments and constraints, this level of complexity is beyond the casual use of regex, and it wasn’t really the sort of task it was designed for.
Conclusions
This article adds a new ingredient to the method of database documentation I’ve advocated in a couple of previous articles. In the first, I introduced the idea of generating JSON documentation files in SQL Server and then, in the second one, I extended this to other RDBMSs. Now, with this article, I’ve introduced the idea, and the code, for using the JSON documentation report to document SQL DDL code that has been generated from the database, comparing two databases, or had been written without commenting the code. In a separate article, I’ve shown how to extract the documentation from existing scripts. With these tools, you would have a lot more options in the way you document databases.
Tools in this post
Flyway Teams
Ideal for organizations looking to improve collaboration and fine tune their processes during development and the deployment of database changes.