DLM S3 – Continuous Integration

S3-textbox In this post, we take a big step forward in engineering the database development process by looking at Continuous Integration (CI) for your database and the characteristics of a software pipeline that begins to implement a repeatable process. This is the S3 stage, named for, and characterized by, the use of CI.

S3 – Continuous Integration

In order to reach this stage in your development maturity, your team must be using a Version Control System (VCS). This S2 maturity level, which I talked about in my last post, is where all changes are captured and stored, and is crucial to ensure the success and integrity of your CI process.

If you are manually storing scripts in a folder or VCS, then the success of your CI process will be tightly coupled with your success in capturing all code. In all likelihood, some changes will not be captured by manual processes, which means that your CI process will not accurately reflect the state of your development.

A CI process is designed to “build” your database often and perform automated testing. This implies two things: you can build a database, and you have tests written for your database code. Neither of these is hard to implement, but they do take some time and practice to ensure they work reliably, and provide benefits for your developers.

The goal of a CI process is to feed back issues to developers immediately. A good CI process should complete in about 10 minutes, which means any problem code is still in the mind of the developer who checked in the code. A CI process should also isolate builds so that the number of changes being tested are limited to a single checkin.

Building a Database

Much of the time we execute code on a development database, we are “building” objects for use by our application. However, the assembling of all these objects isn’t necessarily accurate because the refactoring of objects over time means that other, dependent, objects may no longer work correctly. This will be the case, despite the fact that SQL Server may not return an error when we refactor code. Only when other calls are made will problems be detected.

A build process for your database should involve rebuilding all objects, allowing dependencies to be resolved and checked by SQL Server. You can build all these objects from scratch in a new database, which is what we recommend at Redgate. You can also start with a database at a known state (likely the previous production version of all code) and then apply changes while executing tests against all other objects.

In either case, the process should detect unresolved references, perhaps tables or views where the schema doesn’t match references from other views or stored procedures. The build process detects these issues and fails the CI process. If the process succeeds, the developers should have confidence the database will be structurally intact and can be rebuilt from scratch.

Automated Tests

The other part of a mature CI process involves running automated tests. These tests may be no different to the tests individual developers run on their machines to check code. In fact, there may not be any tests in the CI process that developers have not already executed.

The key with using a CI process to run these tests is that the tests are always executed. There are no human frailties, no forgetfulness on the part of a busy database developer. The CI process ensures all tests deemed necessary are run every time against all the code. This is in contrast to most manual testing, which often limits the testing to the scope of the objects being changed, ignoring the potential cascading effects on other pre-existing objects.

Unit testing is an art and, while tooling can make it easier to manage, store, version, and execute tests, the individual developer’s skill in writing tests determines the value of those tests for your application. We highly recommend you view testing as part of your ongoing process and continually refine the way you write tests, as well as the set of tests you find valuable. Like many programming skills, test writing should be a skill your developers share with each other and constantly work on.

There are numerous unit testing frameworks suitable for database code. DBUnit, tSQLt, Microsoft’s Unit Testing framework, and more can all be used. The key is to ensure your tests are also captured in your VCS and incorporated into your CI process. Most of the frameworks allow programmatic execution of a series of tests.

Tests can also be grouped into sets that progressively test larger and more complex sets of functionality. All CI servers will allow you to schedule separate testing steps that execute each set of progressively more complex tests. In this way you ensure that low bars of functionality are tested first and if failures occur, resource are not expended on more complex tests.

Capturing Upgrade Scripts

Whether you work with software systems that are installed over and over for customers, or deal with a database being constantly modified, you will need to produce some scripts that will “upgrade” an existing database. The output of your CI process should result in a script that is considered a “Release Candidate” for future environments.

Depending on your database development philosophy, you might store migration, or change, scripts in your VCS as a way of making changes to your database schema. You might also use a hybrid approach that runs a comparison operation between the desired state of a database and an existing database, and then merges these results with other scripts (pre, post, or substitution scripts).

In either case, your CI process needs a final step that produces a package that can be executed against test, QA, staging, UAT, and ultimately, production environments. This package should be captured and stored without changes for all later database upgrades. If there are problems with this script, then a new script should be produced after changing the CI process or changing code in development and re-running the entire CI process.

There are numerous ways to produce this package, but the assembly of the script should be through some automated process that ensures the package is produced in the same way each time the CI process runs.

Repeatability

The purpose of the CI process is to ensure your build, test, and upgrade package process is performed in a known, repeatable fashion. Every change checked into your VCS results in a CI process execution that provides rapid feedback to developers if there are problems. This allows fixes to occur immediately.

If no problems occur, the output of the CI process is a package you could potentially run against your production (and other) environments. In most cases you will not send this package to your testing environments, but at all times the CI process produces a software package that is “potentially” releasable. Whether you choose to do so is a decision your organization will make for some of the packages. Others will remain available for further testing until they are eventually removed by some archival process.

When you achieve the S3 stage, your software process will be very well developed and the quality of your software should be better than it was before you implemented CI. The pace of your software manufacturing should also be more efficient than in the past as developers learn about issues more quickly, preventing future work from building on flawed past work.

There is more work to continue, as your CI process should be used to improve your developers’ knowledge and habits, as well as adding to the test coverage of your code.

The next stage after CI is the S4, Release Management stage where the deployment of code from one environment to the next is automated and repeatable.

You can read more about that, and the other stages in Redgate’s Database Lifecycle Management maturity model, by dipping into my other posts:

An Overview of Database Lifecycle Management

Stage 1 – The Manual Stage

Stage 2 – Automated Version Control

Stage 3 – Continuous Integration

Stage 4 – Release Management