How to Document SQL Server Tables

When you create a table, SQL Server doesn’t save your comments explaining the table as it does with stored procedures, for example. In this article, Phil Factor explains how you can merge the documentation from your build scripts into the database.

Why do you have to go to the live database to see what objects are in it? It is all in the source code, even if it isn’t always easy to find. It is likely to be nicely commented too. This goes particularly for tables and their columns, constraints and indexes. You may not even have a live database of the version you’re interested in checking out; you’ve just got the source code. What do you do then?

Even if you have access to the live server, information about tables isn’t that intuitive to get. Unlike routines, such as procedures and functions, the actual source of a SQL Server table, with its comments and descriptions, isn’t stored in SQL Server. I can’t think of a good reason for this, beyond that fact that a table represents a complex assemblage of objects and properties rather than a single object.

I’ll be describing a way of getting this information. Although one can do much more, the code in this article can be used merely for the useful task of cataloguing the source code. A PowerShell one-liner, specifying the source code of a file or directory, can produce reports like this using sample database source code as input.

Why Get the Documentation into the Database?

You might argue that all you need is the name of the columns and their datatypes, and these are nicely displayed in SSMS. Nope. One of the handiest ways of understanding a routine is to look at its source in the metadata, especially if it has comments and documentation in it. Just a few well-chosen words can save plenty of time for the developers and DBAs maintaining a system.

It is a great advantage to have this documentation in the live database. It is irritating to have to go to and fro between the source code and the live database just to see what a column or constraint is for, or to understand why a table was designed the way it is. It is much easier to have the information with the actual table in a way that can be queried in SQL. With a large database, it is so much easier to find a particular column if you can search not only by name or by datatype but by comment!

If you are so inclined, you can, using the techniques I describe here, store every table script as an extended property attached to the table, but I’ll be describing how you can, using PowerShell, even tease out all the end-of-line comments and block comments from the source and store them as extended properties with their associated column, table, constraint or index.

With all this information in extended properties, you can get lists of tables or columns complete with explanations. I occasionally use code that reverse-engineers a build script complete with the reconstituted comments. Having the information in place makes security checks easier too, such as ‘Are all tables containing personal data encrypted?’.

Documentation in the Wrong Place

It has always been a problem that documentation in the source, where it should be, is not then passed into the live database when the build script is executed. In a table, you have columns, constraints and indexes that you are likely to document using line-ending comments and block comments. You probably have a big block comment at the start, explaining the table. This information should be available in the live database. Microsoft don’t have a good answer and vaguely go on about adding comments in extended properties. Well, that’s fine but it hasn’t happened, unsurprisingly: Have you ever tried to do it? It is an almost impossible task, even with SQL Doc.

My solution is to execute my finely-documented build script as usual to create the latest version of the database, and then process the same script in PowerShell to add all the comments and documentation as extended properties in the right place in the live database.

I use a Microsoft SQL Server parser for .NET to get the documentation, and the script then checks the live database. If the documentation for the table, column, constraint or index isn’t in the database then it puts it there. Any build script generated from the live database will have the script for the documentation included there as extended properties.

Using a PowerShell Parse-TableDDLScript Function

Here in this article I’ll introduce a PowerShell function that does the hard work. Before we get too far into the details, here it is parsing a table. A string containing a table (it could be a database of tables) is passed to the function. It produces an object that describes the table. We convert it to JSON merely to make it easy to display.

If we look at the jSON, we can see what it makes of the Create script

But you’d want to process an entire table build script to find out what’s in it. Here is the script I used to get the list of tables I displayed earlier (you need to put the correct path in the filespec), and you can get the file I used from GitHub.

And, of course I ought to include the one that listed the columns. The content variable is re-used from the previous snippet.

The Parse-TableDDLScript

Here is the PowerShell script. It uses a State Machine, because the script is working on the streamed output from a parser. The meaning of any token extracted from the code in sequence depends on what preceded it. Occasionally, it has more than one possible meaning dependent on what follows it. All this can be managed naturally in a state machine, even though the code looks unsettlingly different. Never mind, because you just execute it rather than look at it. (If you have trouble importing the sqlserver module, check here for more information.)

The current version of this code is contained in the GitHub project here

A state machine will read a series of inputs. When it reads an input that belongs to a particular set of values associated with that state, it will switch to a different state. Each state specifies which state to switch to, for a given input. This means that you have a script where complexity can be limited to a single place and the script is very easy to amend. It is curious that, as you better-understand and develop the algorithm, one can pull code out, and the code shrinks.

The parsing routine ends up with an object that represents a set of tables. Basically, you just want the name and type of the property or object so you can use the information. I suspect there are other uses for it, but I couldn’t resist getting datatypes for columns.

If the code finds something that isn’t a table script, it skids through it naturally, because it never finds that ‘table’ token. Hopefully, it recovers to the correct state to proceed on to the next statement to see if it is a Table script. (Hopefully because I haven’t yet tested it with every possible CREATE statement).

The output of the function is a list of table objects, or maybe just one. Each object consists of its name, the type, and any comment blocks or end-of-line comments.

Each object has three lists. The columns, the constraints and the inline indexes. These all contain the name, the type and the comment. It is a PowerShell object but you can easily convert this to JSON, YAML or XML for export. With a bit of tweaking you can convert it to CSV, save it to disk and put it in Excel or whatever you wish.

Merging the Documentation to a Database.

It is very easy to pass an object like the one produced by this PowerShell Script to a procedure. This procedure can be used to update the extended properties of the table, column, constraint or index. I use temporary procedures because I don’t like adding utility procedures in databases. We start with the code for the procedure. Sadly, it uses JSON so it will only work with versions of SQL Server that have this implemented.

The latest version of this code is included in the Github project here

Putting It All Together

We now have all the components we need to add the documentation from the source code into the live database. As you’ll see in the comments, you need to first install the SqlServer module (it is in the PowerShell Gallery) and make sure you have the Microsoft.SqlServer.Management.SqlParser (it is part of SMO and the sqlserver module). You will, of course, need to provide the path to both the build script and the source for the stored procedure and the connection string properties.

(this code is included in the GitHub project together with a sample table script)

We run this on our test database stable script and lo, the documentation has been added (just an extract from a sample table). This is a screenshot from Redgate’s SQLDoc.

 

Summary

I’m hoping that, if you’ve got this far in the article, I’ve convinced you that not only should source build scripts be documented but this documentation should be included in the actual built database. Database Documentation needs the light. I’ve provided several ways of using this documentation from extended properties using SQL.

There are, however, many other ways of using the information gleaned by the PowerShell function. You can, for example, create reports and PDF documentation. There are plenty of ways of using this general technique. I use a similar technique to take structured headers for tables or functions and parse them into YAML documents for various team documentation resources. By combining what you can glean from the source, with information you can get from the live database, such as dependencies, you have some very useful information.