Building Databases: One Man's Best Practice?

In software development, one man’s meat is often another man’s poison. We come across a technique and it works for us. We assume it will work in all contexts and publish it as a ‘best practice’. It fails to live up to its promise in other circumstances.

As an example of this, a migrations-based approach to database version control, popular in teams that use Agile development techniques, is provided in six database versioning best practices. Every change to the database schema and reference data is stored explicitly in the VCS, as a separate, immutable script. The team migrate a database from one version to another by running the right set of migration scripts in the right order.

It’s certainly a straightforward and easily-understood approach that will work with any database system, since it does not rely on the availability of a schema “diff” engine to generate an upgrade script. If your team eschews much upfront schema design, and therefore frequently needs to refactor tables, then you’ll probably need a very careful migrations-based approach in order to ensure you can apply the changes to staging and production, while preserving data.

Migrations are fine for developing certain types of database, but why would it be unwise to generalize and make it a best-practice? I can think of a few reasons:

  • It reduces the VCS to a mere document store – if every table change is a new script then there is effectively nothing to “version”; the VCS’s diff/delta capabilities become redundant
  • Migrations are only relevant to tables – routines such as stored procedures don’t hold data and are entirely overwritten on every change. Tracking every change to a routine is a return to the bad old days…before we had version control
  • Merging and branching will be difficult – and only achieved by adding scripts
  • Unless you migrate one table per script, it’s likely you won’t immediately spot conflicting changes from two developers, without rooting through all the migration scripts submitted previously to ensure you’re not making an incompatible change

Ultimately, many teams will struggle to make the ‘Migrations’ approach scale well to enterprise databases, and large development teams. The essence of the problem is that of recapitulation (in the biological sense). If the source scripts are immutable, then one is doomed to repeat every evolutionary development change, mistakes and all, during database deployment. If one developer mistakenly drops a column, then the team must add another script to restore it, and ensure that running both scripts, during a deployment, won’t result in any data loss.

For a database of any size, the migration will inevitably take longer and longer, the greater the number of migration scripts that need to be run, until eventually most teams resort to a “rebase”, often using a schema diff tool to generate a single upgrade script that will replace a vast swathe of migration scripts.

Where a database is used as a small-scale, table-based data repository without routines or other complications, the migration approach does well by avoiding build problems whilst preserving existing data. However, it’s dangerous to extend this approach as a general “best-practice” for enterprise-scale databases.