Implementing a Database Versioning System
Database versioning brings more control to team-based database development and helps avoid many of the errors that often creep into database delivery. This article explains all the requirements of a semantic versioning system for databases, from specifying the format of the version numbers, to deciding where to store them and how to process and compare them.
Database versioning means that each alteration to the metadata, rather than the data, results in a new ‘state’ that is assigned either a unique version name or unique version number.
Traditionally, though, a database has been a moving target, much to the exasperation of the developers and users of downstream applications. Most database systems don’t even have a standard way of applying versions to databases. If an application wants to check for breaking changes in the database interface, there is no industry standard query or function call to do so. Even organizations that commit to international standards on Configuration control and change management strategy (e.g., CMMI) don’t always ‘version’ their databases.
Despite all this, relational databases have thrived over the years, but the practice of maintaining strict database versioning can make life easier for a database developer, and lead to a more rapid delivery of database functionality.
How to specify a database version number
One advantage of database versioning is that it can help prevent version-mismatch between the database and dependent applications. The database can report its version number so that the application is notified and can deal with any ‘breaking changes’.
An associated challenge, though, is in avoiding too many “false positives”. The applications and users of a database only need to know about changes that affect the interface that they use. If they exclusively access certain data tables, then they need only know about changes to the metadata/structure of these tables. Changes to an internal messaging system that prevents overload at times of peak demand, for example, will be important to operations and DBAs but are of no interest to application users. To deal with this, it is possible to separately version each defined application interface and protect the application from as many incompatibilities as possible by keeping the interface as stable as possible.
In applications, the standards for the conventions of versions are weak, but the most common practice to use a semantic version number. Each component in a sematic version has a meaning, indicating the nature of the change, such as whether it is a bugfix, minor change, major change or breaking change. Whichever semantic versioning system you use, it must also be easily accessible and allow versions to be ordered consistently and compared reliably.
The best solution would seem to be a number-based semantic version that is supported by Microsoft. This is a .net library that can be used in PowerShell scripts, so sorting and ordering is easy. These semantic versions, in their basic form, have major, minor, build and revision parts. You may see simpler, 3-part numbers in use (major.minor.revision), but equally the Microsoft standard can go up to 5 parts, splitting revision into MinorRevision or MajorRevision.
In Flyway, for example, every migration script has the target version number in its file name. So, using this system, we might have a file called v1.2.4.20__description.sql (revision 20, build 4, minor version 2, major version 1). When the migration is applied to a database, the version number is ‘stamped’ into a special table, in the database. The article Versioning Database Development using Flyway explains in more detail how Flyway’s cross-RDBMS versioning system works.
With this system in place, and the convention that the database will increment the ‘major’ component of the version for every interface-breaking change, an application can easily do intelligent version-checks. When it connects to a database, it just looks for a change in the ‘major’ number of the version.
Where to store the version information?
The next problem for database developers is where to store the version information. Unfortunately, there is no standard that extends across different database systems. The DDL COMMENT ON statement is found in a few RDBMSs for adding general comments, but some database vendors provide no means of attaching comments, let alone version numbers. SQL Server has extended properties that will allow version numbers to be attached to a database, or any other database object, and scripted with them. They are easily read by a script, and visible in SSMS.
In choosing how you attach a version to a database, schema or object, the most subtle difficulty is access rights. The version number should be easily accessed, and access rights need to be at the guest level: otherwise, getting the version to the application can be tortuous.
The only way of applying a version to a database that can be guaranteed to work across a whole range of different relational database systems is to use a function, view or table that returns the version. The safest option is to store the version information in a table. After all, all relational database systems, however minimal, must have tables.
Flyway does it this way, using a ‘version history’ table called flyway_schema_history
in which it records a history of the schema changes to arrive at any version, detailing which versioned migration files were applied, when and by whom.
However, there are two potential problems with storing the versions in a base table. Firstly, any task that requires a list of all the base tables, such as for scripting, inserting or extracting data, needs to explicitly ignore the ‘version history’ table. Flyway doesn’t make this any easier by making the name and schema of its table configurable (see Exploring the Flyway Schema History Table). Secondly, if access permissions are implemented, it is likely that access to base tables is restricted, whereas version information shouldn’t be restricted to the same degree. This means we still need to make sure that applications and users with only ‘guest’ permissions have a way to find the version of a database.
Using version numbers in development
With a version number attached to a database, as well as the means of reliably creating a database at that version, and of migrating it to that version from any previous version, we have more chance of a clean release process that has a much higher chance of an error-free deployment. To do this it must be easy to establish the version of any copy of the database.
Applying version numbers in development work
I’ve described in a previous article, The Database Development Stage, a general approach to using version numbers in database development work, using a SQL Server database as an example. It stores the versions in an extended property and in a version file in source control.
Finding the version of a Flyway-managed database
A database version number should be easily readable by the team responsible for the development and operation of the database, whatever way you are accessing or administering a database and whatever RDBMS is being used. A database may be cloud-based or in a container. It may be in production and subject to strict access controls.
One of the most frustrating problems for the database developer is that none of the major cross-RDBMS IDEs have a consistent way of handling version numbers in databases, mainly because of the lack of a standard. Flyway is unusual in the fact that it supports versioning so comprehensively.
Using Flyway “info”
If you’re using Flyway and have access to both the database and the Flyway project folder (because the Flyway configuration may have specified a different name or schema for the database), you can use Flyway’s info
command.
This will give you the version along with the history of how it was created, but to get the information in a script you will need to use the -outputType=json
parameter to extract it.
1 |
$Report = Flyway info -outputType=json |
You can extract the current version from the report by specifying the value of schemaVersion
1 |
$Report.schemaVersion |
Using a Flyway SQL callback
If you need to find the database version, but aren’t using Flyway, you can use SQL to get the current version from the schema history table. The logic is slightly more complicated than you might think if there have been any UNDO runs which would roll back to a previous version.
In SQL Server, MySQL, or MariaDB, the query would be:
1 2 3 4 5 6 7 |
/*We can calculate the version. This is slightly complicated by the possibility that you've done an UNDO. */ SELECT version FROM flyway_schema_history WHERE installed_rank = (SELECT Max(installed_rank) FROM flyway_schema_history WHERE success = 1) |
To get this to run in PostgreSQL, just change the Success = 1
to Success = true
.
Unfortunately, this query is not always reliable because Flyway allows the location and name of the table to be configurable. Instead, you’d need to run the script within the Flyway system as a SQL Callback, using the built-in placeholders to provide the name and schema of the table:
1 2 3 4 5 |
SELECT version FROM ${flyway:defaultSchema}.${ flyway:table} WHERE installed_rank = (SELECT Max(installed_rank) FROM ${flyway:defaultSchema}.${ flyway:table} WHERE success = 1) |
Making the database version accessible to other users and applications
As discussed earlier, we also need to provide a way for allowing applications and users to access the version, without having base table access. There are a few ways to do it.
Extended properties (SQL Server only)
When I’m working with SQL Server, I use an afterVersioned or afterMigrate SQL callback, which runs the previous SQL to extract the new version number of a database, after a successful migration run, and then add or updates an extended property with the latest value. I’ve provided a sample here, afterMigrate_Add_Version_EP.SQL and I explain how it works in Customizing Database Deployments using Flyway Callbacks and Placeholders.
Once you’ve done this, the application now doesn’t need to go ferreting around to find the Flyway history table and then laboriously finding the current version number from the history; it just reads it from the extended property.
Using a view
Other RDBMSs would need their solution to this problem, which is likely to involve a view, procedure of function. In several database systems, for example, it is possible to create a schema holding a view that contains nothing but the version number, extracted from the Flyway schema history table. If the Flyway Schema history table changes its name and location, only the code in the view needs to change.
This view would need to be created in the schema by a user with access rights to the table and this would then allow the table to be accessed by anyone with access to the schema. The great advantage of this is that it is metadata, not data, and so will be firmly attached to any generated build script, however it is produced.
The other advantage over the extended property or comments solution is that the history of changes can also be made visible (assuming this information is deemed suitable for ‘guest’ access).
Processing version numbers in scripts
Version numbers can be tricky if mishandled and make a nonsense of your attempts to read the history table. If you try to sort version numbers, it can go wrong.
1 |
@('1.12.13','2.5.4.14','13.1.5.140','4.6.0001.60','2.5.4.127')|sort-object |
…which gives…
1.12.13 13.1.5.140 2.5.4.127 2.5.4.14 4.6.0001.60
Whereas, if you sort them as version
objects, like this:
1 |
@('1.12.13','2.5.4.14','13.1.5.140','4.6.0001.60','2.5.4.127')| foreach{[version]$_}|sort-object|foreach{"$($_)"} |
…you get the results you’d expect:
1.12.13 2.5.4.14 2.5.4.127 4.6.1.60 13.1.5.140
The version
object consists of the four separate components:
1 |
@('1.12.13','2.5.4.14','13.1.5.140','4.6.0001.60','2.5.4.127')| foreach{[version]$_}|sort-object |
So, extracting the part you want is trivial:
1 |
([version]'1.73.3.14').Minor |
73
If your brain is just too big for the restrictions of four separate numbers. You can pack in another, giving you a majorRevision
and minorRevision
. These are packed into the upper and lower halves of a 32-bit integer
1 |
[uint32]"0x00210004" |
2162692
1 |
([version]'1.73.3.2162692').minorRevision |
4
1 |
([version]'1.73.3.2162692').majorRevision |
33
0x0021 is 33 decimal, as every geek knows.
Summary
If you can make it as easy as possible to update copies of a database, by being certain of the version of a database before and after you update it, then many database development tasks lose their terror, and the deployment process, in particular, becomes far easier.
If the SQL scripts that you use to change a database, whether they are migration scripts or synchronization scripts, are clearly designed to go from one version to another, then you need fewer of them, and they are likely to be better tested. If your database build scripts are to a specific version, then the same applies.
For versioning to be most effective, it requires cultural changes for the development process. How does one deal with the introduction of new features when their release-date isn’t yet known? How do you handle variants of what is obviously a single version of a database? How do you manage versioning when an application is spread across several databases and servers? How do you deal with independent branches when you can have duplicate version numbers?
Versioning by itself isn’t a panacea, but it is an excellent way of allowing you to design a system that avoids most of the errors that creep into database delivery.