What is database continuous integration?

Have you ever longed for a way of making the delivery of databases more visible, predictable and measurable? Do you ever wish that they would be of better quality, quicker to change, and cost less? Grant Fritchey explains some of the secrets of doing Continuous Integration for Databases to relieve some of the pain-points of the Database Delivery process.

Continuous Integration (CI) is an essential step of application development for DevOps organisations. Whenever a developer commits code, a build is kicked off. The code must compile and pass automated testing before it is merged into the main branch. Developers have been working this way for years, but including the database in CI has lagged behind. Leaving the database out of CI has caused it to become the bottleneck that slows down the delivery of new features. Even though it’s more difficult to execute, many organizations have successfully implemented database CI with the right tools.

The purpose of Database CI is exactly the same as for application CI. The development team establish a working version of the database very early in the development cycle, and then continue to verify regularly that it remains in a working state as they expand and refactor the schema and database code objects. Developers integrate new and changed code into a shared version control repository several times a day. Development proceeds in small steps. Developers first write the tests that, if passed, will prove that a small new piece of functionality works. They then implement the code to make the tests pass. When the tests pass, they commit the code to “trunk” in the shared VCS, and their “commit tests” are added to the broader suite of tests for the application. Each commit, or check-in, is then verified by an automated database build or migration, and subsequent testing, allowing teams to detect problems early.

Where do you start with database CI? There are several requirements outlined here.

Maintain the database in version control

The first prerequisite for Database CI is that the source of “truth” for the database CI process and all subsequent database deployments must be the build and migration scripts in a version control system (VCS), such as Git or Subversion. In other words, the database CI process must always be triggered from the VCS. There must be no ad-hoc builds or database modifications that bypass this formal process.

There are several CI management software services available. Each one I’ve worked with has hooks into various version control systems. With these hooks you can set up mechanisms for automating your database CI processes.

Automate database builds

Before you attempt database CI, you should already to be able to automate a build of your database from the command line (i.e. separately from your CI system), using a set of DDL scripts stored in the VCS. A well-documented build script for every object makes it very easy for the team to see the exact state of an object at any given time, and to understand how changes to that object are likely to affect other objects. I recommend getting a complete build of an empty database set up as soon as you have your database moved into version control. There are a number of reasons for this:

  • It provides you with information on the effectiveness of your database version control implementation
  • You can immediately identify gaps, such as missing objects, in your version control management
  • You get feedback on your ability to build your database out of source control

Once you’ve established independently that you can successfully build a database from scratch, from its component scripts, then triggering an automated database build through your CI process, in response to database modifications, is a great first step on the road to database CI.

Commit early and frequently

The idea of database CI is to have a fast and continuous set of validations of your database changes, so my general advice is integrate like you’re voting in Chicago: early and often (an old joke; you’re only supposed to vote once per election). In other words, rather than hold onto functionality for days or even weeks while you’re working on it, segment the work into discrete sets of changes, such as just one stored procedure, or just the changes necessary to modify a single table, and commit those changes into version control as often as possible. Ideally, you’ll have a “commit-based database CI” process set up that builds a new database, and runs basic tests, on each commit of a database change. Even if this isn’t achievable, you’ll still want to perform as many successful integrations of your database changes as possible. You’ll then have the benefit of integrating each small change as you work on it, rather than have a single massive set of code to integrate, which is likely to result in multiple failures.

One proviso, of course, is that you should only ever commit a database change to version control after you’ve validated that it’s working on your development machine, via unit testing. If you don’t know if the code is working, don’t commit it into version control. You’re likely to cause the database CI process to fail, but even worse, you might be supplying broken code to other developers.

Isolate the CI server environment

The CI Server should be running the same version of your DBMS as production, on a machine with no other service on it, and with restricted logins. Don’t run your database CI processes on a server that also runs additional services, or where people are attempting to get other work done. Your CI processes are likely to eat server resources, as they drop and recreate databases, load data, run tests, and so on. If you then add in the need to have this process for more than one development team, for more than one database or set of databases, all this points to the requirement for an isolated environment.

An isolated environment helps ensure that CI testing doesn’t interfere with the work of others. Isolation also ensures that failures are related either to the CI process itself, or problems in the code you’ve modified within version control. Build or migration failures within the CI process should not be caused because someone was mistakenly reading data from the CI database or attempting to run code manually against the CI database. These types of false negatives will not help you improve your CI process or the quality of your code.

Choose a CI server

While it is possible to build your own CI server process, it’s a lot of work. Unless you have specialized requirements, it will likely be easier to take advantage of an established CI Server such as Jenkins, AWS DeveloperTools, TeamCity, or Azure DevOps.

Each of these CI servers offers specialist functionality that that may make one of them more attractive than another, within your environment. Plan for an evaluation period to test two or more of these servers in order to identify the one that works best for you.

At a minimum, your CI Server process should:

  • Integrate with your version control environment – including the ability to deal with labels, branching and versions
  • Allow granular control over the workflow that determines how the database is built – the order in which actions occur, and the ability to add additional functions such as calls to tests
  • Maintain a record of builds – in order to know the success or failure of any given run
  • Offer connectivity to email and other alerting mechanisms – so that when the process is fully automated, you can get reports on the successful or unsuccessful builds

Build a testing framework

Every time a developer commits a database change to the VCS, we will want to trigger a database CI process that performs a build and runs a small suite of tests to ensure the basic behavior of the database structure and code. For our nightly integration builds, we’ll run more extensive integration and acceptance tests with a realistic data load. These tests need to relay information back to the build server; probably via NUnit-style XML output, which any modern CI solution can understand.

Depending on the DBMS you’re using and the language in which you’re programming, you may have varying levels of testing frameworks available. Research your DBMS to track these down.

Employ a bug-tracking, workflow, and messaging system

Failed builds and integrations must be written to issue tracking system with as much detail as possible regarding the potential cause. The issue tracking system should be closely integrated with a VCS, and an associated workflow system, such as that available through Azure DevOps, to allow the team to manage and assign issues quickly and efficiently.

For example, we should never issue further commits on a broken build, for reasons we discussed earlier in this article, so a typical workflow might, in response to a failed build, put the VCS into a “locked” state that will prevent checking in any more work. The issue will be assigned to a developer, whose task is to check in a fix that resolves the issue as quickly as possible and returns the VCS to an “unlocked” state. The team will also need a developer messaging system such as Slack or Gitter, to receive instant notification of build issues.

Conclusion

The ability to automate testing early in your development process clearly applies as much to databases as it does any other code or architecture within your system. To make this happen, you will have to get your database into a source control system and create the automation necessary to leverage that system. However, as we’ve shown here, these various processes can be put together in support of your needs for your own Continuous Integration process, even on your databases.

NOTE:  If you like this, you may also be interested in Why is database continuous integration important?