Product articles Flyway Database migrations
Managing Static Data in Flyway Database…

Managing Static Data in Flyway Database Development

If your database application requires 'static data' to function, then the best way to manage that data is using a view based on a derived table. This article demonstrates ways to create these views, depending on your RDBMS's capabilities, and how to build and manage them in development work, using Flyway and PowerShell.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

There is a quandary lying in wait for anyone planning a migrations-based development. Changing the data shouldn’t change the version of your database. Versions have different design of tables, perhaps, or different procedures, functions or views, but a data change doesn’t change the version. It would just be a variant of the same version. Although it is easy to do so, you shouldn’t include data INSERTs within migrations, because migrations are likely to be applied to production databases to update them. It is scarily easy, though never career-enhancing, to accidentally introduce test data into production systems (see the Mailshot in the Foot).

However, this leaves us with two problems. Firstly, a database, even just to be testable, must have data. The developers must ensure that any migration preserves the existing data because it is likely to be eventually applied to a production system, and you can’t test this without data. To provide test data in development, you can, and should, use sets of entirely fake data. Test datasets should be introduced to the database in a separate process and Flyway callbacks are ideal for this as they are, by their nature, not version-specific (see Managing Datasets for Database Development Work using Flyway).

The second problem is how to load and manage data that is required for the basic functioning of the database and dependent applications. This ‘seed’ or ‘static’ or ‘reference’ data needs to be in the both the development and production system. This article will concentrate on the problem of creating and managing ‘static data’ in views or, for ‘slowly-changing’ data, read-only tables.

Static and ‘slowly-changing’ data

Databases usually require what is often referred to as ‘static’, or ‘reference’, data. This is unchanging data that will include such things as the names for the days of the week, error messages, names of geographical locations, currency or tax. This type of data must be in place before the database can be used at all.

Static data needs to be in source control because, like the code, it can cause chaos if it suddenly changes; you need to know why and when, and who made the change. When the database is released and is in the deployment pathway, we remove the callbacks that introduce the development data or test data, leaving just the static or reference data within the migrations, ready for staging.

Whereas procedural languages like C have long supported enumerations, there is no tradition of ‘static data’ or ‘reference data’ in relational databases and so there is no standard way of making a table read-only. Relational databases do not distinguish between static and dynamic data.

However, static data has very different requirements to normal development data in that it changes rarely, if at all, and needs to be in source control. It also needs to be part of a language package if you are writing a multi-language application that must be checked and signed off by a translator. If this data is essential for a functioning database, then it needs to be part of the migration or build and is read-only. A view, based on a derived table, is the best and easiest way to implement static data.

There is another category of ‘required’ data, that of ‘slow-moving’ data. This might be the sort of information you used to find in an encyclopedia. It could be major categories of widgets, a list of major cities, types of coffee, or world currencies. It is the kind of information you’d expect to see in a dropdown list. Slow-moving data must outlive the release of the database without change, otherwise it is too fast changing to be handled this way. The point is that it must be there for the database application to function. Slow-moving data is likely to require more complicated handling, with the use of tightly access-restricted read-only tables.

Managing slow-moving data in read-only tables

While a view, based on a derived table, is the obvious way to implement read-only data, there are times when only a read-only table will do the job, such as when there is such a large volume of static data that it must be indexed (you cannot index a view based on a derived table). A read-only table is also the best way to manage data that is slowly changing rather than static, as it allows us to update only the data, without changing the DDL (and therefore without affecting the database version).

Once you’ve loaded the table with data, you then make it read-only. Occasionally, an RDBMS will allow you to create a read-only table. You can also make a table read-only without getting involved with DCL and access rights, if your RDBMS has features that allow it. Most RDBMSs support triggers that can fire before the event to prevent the insertion, instead of the event to just fire the error, or even after the event. An ‘after event’ trigger can roll back the transaction and provide and error message. That’s fine but triggers can usually be disabled! Alternatively, you can put all the static data in a separate read-only database or, in SQL Server, store the static data table in a read-only filegroup within the same database.

However, the most sensible long-term approach is access control. In other words, we execute the CREATE TABLE statement, ‘initialize’ the table with data using DML statements, and then use Data Control Language (DCL) to deny object-level INSERT, UPDATE, DELETE permissions to the table for everyone other than the god-like ‘table-creator’, which will be a login that has System Administrator or Database Owner privileges. For automating changes to the data, such as when, say, interest rates change, you can use a special account for the process that can do absolutely nothing to the database except change the values.

If a table such as TypeOfPhone, Honorifics or Salutation needs to be filled in with small amounts of reference data, then you could do it using a multi-row INSERT statement, or an INSERT from a UNION query. If the volume of required data is significant, then all relational databases also have a ‘Bulk Import’ device (BCP in Sybase and SQL Server) to deal with this.

Managing static data in views

A view, based on a derived table, is by far the most convenient place for most static data (see Static Data and Database Builds). With every relational database I know about, static data can be stored in a view as read-only, and this requires just a view statement with no separate loading of data.

Unlike normal data, where a change doesn’t affect the version, a change in the static data does and should change the database version. In effect, this means that the static data must be managed as part of the DDL, created and modified using versioned migration scripts.

A view based on a VALUES table-source

A table-value constructor, using a VALUES statement (or a CTE in MariaDB or SQLite), will create a table source with the data you specify. This can then be used in a view. Because we are not using the VALUES statement directly in an INSERT clause, there is no restriction on the number of rows.

Here is a very simple example in SQL Server:

Static data in a view based on a VALUES table-source

SQL Server and PostgreSQL both use the above syntax, but MySQL and SQLite both use:

And you can test this out with…

You may wonder how I can smugly assert which syntax works with which RDBMSs. I just use the PowerShell library of my Flyway Teamwork framework to test it out on four different RDBMSs. This is handy if you make a tweak somewhere and wish to guarantee that it works!

A view based on a UNION ALL table-source

You may find a relational database system, especially an old version of one, that will give you difficulties because it doesn’t support the multi-row VALUES statement. No worries because you can use the UNION ALL clause.

Most RDBMSs implement this in a similar way to the multi-row VALUES statement, so it is usually as quick in its performance. Be warned though, there is generally a limit on the number of UNIONs you are allowed in one statement.

Scripting out ‘static data views’ using SQL Prompt

Many database GUIs have ways of scripting out multi-row VALUES statements. With SQL Server and SQL Prompt, for example, you can use the Script as INSERT feature will take a grid result as input and from it generate a multi-row VALUES expression for an INSERT INTO statement. I’ve given a demo of this previously.

Scripting out multi-row VALUES clauses using SQL Prompt

We then edit the output into the form we need. In this case, we just need to use the generated VALUES expression within our CREATE VIEW statement.

Building static data views using Flyway

For a more generic solution, it is handy to create a build for a static data view. My Flyway Teamwork PowerShell framework already has a ConvertTo-InsertStatements cmdlet to produce INSERT statements from JSON files. I’ve modified that slightly to create a convertTo-View cmdlet that produces a static data view from JSON data, using the table value constructor (TVC) method, if supported, and UNION ALL otherwise. The cmdlet is here on GitHub in my PowerShell-Utility-Cmdlets as ConvertTo-View.ps1, and is also in the Resources folder of my FlywayTeamwork GitHub project.

First, let’s create a JSON file of some static data:

This would give us the SQL Server/PostgreSQL version of the view…

…which is this…

Whereas, if we were working with MySQL, MariaDB or SQLite, for example we might do this.

…which would give you this….

If all else fails, you can do this…

…which gives you code which almost any RDBMS will execute:

Conclusions

One of the great advantages of Flyway is that, if you use it properly, you know the version of any database. Alongside its other advantages, this allows you to easily manage static or enumeration data. If you place enumeration data in a special type of view, as I’ve described, you can opt to make it part of a database version. If not, then you can, of course, use callbacks to load or synchronize development data that you don’t want to have tied to a particular version, but maybe a range of versions. For a database build, rather than a versioned migration, this can cause quite a bit of head-scratching to get right, but even with a build, the read-only view can be a godsend if used together with source control.

 

Tools in this post

Flyway

Version control for your database. Robust schema evolution across all your environments and technologies.

Find out more