A Version Control Strategy for Database DevOps
Our goal was to transform a 'serial' and manual development and release process into one that supported parallel development work and automated releases. This article explains step one: migrating our monolithic, centralized repos to Git, then implementing a Release Flow branching strategy for parallel development work streams, and a Pull Request workflow to control an automated build and release process.
My previous article gave a broad overview of the path we took to transforming our legacy database development and release process into an automated and reliable DevOps pipeline, using Flyway (Enterprise Edition). The legacy process was manual and error prone. Developers worked on a shared development database, with all changes committed to a ‘monolithic’, centralized version control repository. The dependencies this introduced, and likelihood of conflicting changes, meant that a talented team were restricted to a ‘serial’ development process, tackling their backlog one task at a time.
The transformation towards parallel workflows and DevOps automation, with Flyway, required two major changes:
- Migrating our centralized, ‘monolithic’ repos over to Git, a collaborative version control system (VCS) with strong support for topic-based development workstreams
- Shifting from shared databases to dedicated, per-developer databases – this change went hand-in-glove with the shift to a distributed VCS, since we could not fully exploit Git’s potential to help us split the development work into isolated, task-based branches unless we could also provision isolated databases, per branch.
The most significant process change, from a database development perspective, is from shared databases to personal databases, and I’ll cover how we did that, using disposable ‘clones’, in the next article. However, the migration to a different VCS should not be underestimated, as there is a lot more involved than just switching tools, and that’s what we’ll cover here.
First, the team needed to migrate the monolithic repos over to Git, while also splitting them down into more manageable per customer-per-project repositories, with fewer dependencies. With the migration complete, we needed to make sure we had in place a branching strategy and workflow that would support the team with its completely new way of working and take the full potential of the new VCS.
Legacy process: centralized VCS, monolithic repos, shared dev databases
The development teams comprised both .NET and database developers. In the legacy system, all of the bespoke application code, for each customer, was stored in a single, monolithic repo. Within each customer’s folder were multiple folders containing that customer’s various projects (Visual Studio solutions). There was a separate repo containing all shared components.
Similarly, the database code was stored in a separate monolithic repo, with folders per customer and within each of customer folder was a folder per database containing the code (a SQL Source Control project).
Initially, all these repos were hosted in Subversion (SVN), although later the team switched to Team Foundation Version Control (TFVC) for their application code. Both are so-called centralized VCSs, where all historical data is maintained on the server and developers only have one version (usually latest) locally. Before making changes, the developers check out files, and operations require a connection to the server.
All developers made changes to files in the same monolithic repo. There was also no branching. This caused maintenance and performance problems (such as long commit times) as well as dependency problems. For example, when several developers needed to make changes to shared application components there was a high risk of altering code in a way that would disrupt the work of others. Often developers had to wait around for other developers to finish a task, when more than one user story was being developed in a repo.
Similar problems affected the database development work. A developer working on database A1 would see their own work mixed up with commits to other customer databases. The lack of branching and use the shared development model for database work meant there was no way for a developer to work in an isolated fashion, free from the fear of disrupting the work of others. Overall, it was very hard for developers to work in parallel; they were restricted to working on one story at a time from backlog.
When the time came to release the required changes, there were other problems. There was no concept of releasing a known and immutable version of an application component or of a database. In fact, with all developers working in one repo, with no branching, there was no real visibility into what code base was released to customer. To release, a developer simply copied what was built on their computer to the customer’s environment. The same applied to the database changes, a manual, state-based comparison from Test to Prod was the source for a release of database code.
All this was the long way from the DevOps ideal of isolated and parallel development workflows that result in well-defined, immutable and thoroughly tested release artifacts.
DevOps: distributed VCS, dedicated repos, isolated dev databases
For all the above reasons, it was soon clear that no centralized VCS system, coupled with use of shared databases, was ever going to support the isolated, parallel development workflows that the team needed.
Git on the other hand is a distributed VCS. That means that you don’t check out from a server, you clone the server to your local machine. Developers can commit each set of changes locally, in a branch, and perform version control operations such as history and compare without a network connection. And local is local until you decide to push your changes to the server. If you create a branch to experiment, commit code to it, but eventually realize it’s not going to work, you can just delete the branch; nobody ever noticed it because it was all local!
Git supports very lightweight branching. Creating a branch is nearly instantaneous and switching back and forth between branches is a piece of cake. To the team, it seemed that if they could understand and master this feature it would unlock the door to a much more controlled and efficient development and release workflow.
Migrating to Git: breaking down the monoliths
Before the team could start to exploit Git’s branching capabilities, they knew they needed to deal with the problem of the oversized repos, and so remove some of the inefficiencies and unnecessary dependencies that hampered their development work.
The application code was split into per-project repositories, with each shared component also in its own repo. Similarly, the databases repo was split down into one repo per customer, containing only that customer’s databases, with each Database folder containing the code for that database (a Flyway Desktop project).
Thanks to this info on GitHub the team achieved to split up dozens of repositories with the preservation of all commit history. Other useful tools for getting the migration done, technically, are:
- Git-Svn is a tool that provides bidirectional flow of changes between a SVN and Git, so super easy for migration
- Git-TFS does the same as above, but then for TFVC and Git, also bidirectional
Improved development and release processes
For the application code, the team had now split the codebase into much smaller functional pieces, where every project (Visual Studio solution) had its own Git repo. This had immediate benefits. It became much easier to maintain each repo, and performance improved. There was much less ‘inference’ between developers’ changes as only developers working on a specific project or component were committing to that repo, instead of the big repo that everyone was using.
For the database code, there were multiple dependent databases per customer, so each repo now represented a “SQL instance” containing all the required databases. When a developer creates a branch, the ‘isolated database’ to support that branch actually consists of an “instance” of interconnected databases, so avoiding build problems caused by missing dependent objects.
For both application and database code, the use of dedicated repos gave much better visibility into exactly what was in a release, and what had change from the previous release, and made it much easier to write automation logic for the build and release process.
The team can now build, test and release each application component as an immutable build artifact. Each build is a new versioned artifact, so there’s never any doubt exactly what is being released, and customers can use the latest or older versions of the same shared component, as required. Similarly, there is much better visibility into exactly what database changes are being released, which objects in which databases are affected and how, since the team can track these changes object-by-object in the version control system, via the schema model. These schema model changes are linked to the release and can therefore be used to see the changes between releases.
But it doesn’t stop there, developers can also link work items while committing changes, or when running one of the checks that take place as part of the pull request workflow (covered shortly). This means that the version control system can also show what work items went into a specific release.
Git branching Strategy
If you adopt a branching strategy for your team, you collaborate better and spend less time managing version control, and your team spends more time developing. You could say that adopting a Git branching strategy is a minimum requirement, for team-based development.
So, the team need to decide on a branching strategy, including a branch naming strategy, and then on the release workflow that will allow changes tested changes to be merged back to the parent branch, and deployed to production.
The pros and the cons of different strategies are discussed widely elsewhere, see this link for a comparison. Of course, you could also develop your own strategy with the help of this page on the Microsoft website. The team adopted the Release Flow branching strategy, as used by the VSTS (now Azure DevOps) team at Microsoft. It is a scaled trunk-based branching strategy, with short lived ‘topic’ branches and a main branch that is always in a releasable state. The release flow branching strategy has a main branch plus three branch categories (topic, release, hotfix):
- main: all topics start and end in the main branch. It must always be in a releasable state, and so represents the latest release candidate. All changes must be thoroughly tested before merging and any conflicts dealt with immediately.
- topic: we create a new topic branch any and every time we want to write some code, whether to implement a feature or develop a fix a bug. A topic branch is always created from and merged back to main. The only exception to this rule is for hotfixes to production – see below.
- release: the release branch is created from main’ at the end of a sprint and is a frozen state of the main, at the point of branching, which is the deployed to production.
- hotfix: a hotfix for a production-disrupting issue is always started from the ‘release’ that is in the Production environment. Hotfix changes are merged back into release but also cherry-picked into main.
Naming your branches
it is important that the team members agree on and adopt a standard way to name branches. Although the branch categories are part of the branching strategy, the exact naming of each branch is not, and if the team can adopt a naming convention, then it makes collaborative working with branches so much easier.
Our branch names are constructed as follows:
- BranchCategory – is the type of branch. In the release flow strategy this will be topic, release or hotfix.
- ReleaseId – is a unique identifier for each release, in this case comprised of the sprint and iteration of a release. For example, the second release of sprint 97 will be named
ReleaseIdis only included in Release and Hotfix branches, as follows:
- A Release branch is simply named as
- A Hotfix branch is always named according to the release from which it was created. So, the name of a hotfix branch to fix release
s97.2will start with
- A Release branch is simply named as
- ReferenceId – is an identifier from an external system, to identify the work item or bug to which the branch relates, for example
ext10658(all lower case).
- PascalCasedDescription – is an optional description written in Pascal Casing (e.g.,
Choose your release identifiers based on what works best for your team. For example, it might make sense for your team to have a
ReleaseId that reflects a location, such as Atlanta, Brussels and so on. However, it’s worth bearing in mind that an Azure DevOps repo sorts the branches alphabetically on name, so choose carefully – you won’t want to see branches ordered illogically in your branch overview!
Of course, the same naming convention is applied to both the .NET repos and database repos. This also helps the team coordinate the release of application changes with the required database changes. If there is a user story with database and .NET changes, there will be a topic/a branch for both the database as well as the .NET repo. If a user story requires only database changes, then topic/a will only exist for the database repo, and in testing the team will just deploy the latest software release. Conversely, for a user story with only .NET changes there is a topic/a only for the .NET repo and the latest database release is used in dev and test.
The Pull Request workflow
The other half of your branching strategy is understanding the process by which the code for a new topic, bug or hotfix is merged back into the main branch and eventually into a release to production. To control the flow of code changes between branches, we issue a Pull Request (PR), which a formal administrative process in the version control system (in our case Azure DevOps-hosted Git). We can build review and test steps into the PR workflow such that code from a topic, for example, can only be merged into main if it passes the required tests and meets certain other criteria.
After a few iterations, each time incorporating feedback and improvements from retrospectives, the team arrived at a PR workflow based on used of Pull Request triggered artifacts, with the ‘merge criteria’ established in the branch policies of the target branch. The optimized PR workflow now looks as follows. You’ll notice there are relatively few manual steps in the optimized workflow. Azure DevOps takes care of a lot of the process steps automatically:
For a tester to do their job, the developer must first create a Pull request, essentially a request to ask if the topic branch changes are ‘good enough’ to get merged in to main. This is the point in the process where a formal code review must be made, build / test validation can be mandatory and linking to a work item from the board is possible for a better provenance. All these checks are done because the target branch, main, has them set up in its branching policies.
This PR is sent for code review. When the reviewer approves the changes, the developer generates the Flyway migration script, and the pull request is approved. With this approval, one of the first steps of completion is met, but we still have a few more boxes to check. The reviewer can enroll the PR for “auto complete” which means that the PR is allowed to complete automatically, and be merged into main, but only if the code passes all subsequent checks and tests.
To test the new release candidate, containing the latest changes, we must build the new version and release it to the Test environment. To do this, we queue the release candidate for a pre-merging build. The key point about a “pre-merging build” is that it will include any changes committed to main in the meantime, so it’s a build of the latest version in main, plus the changes from the topic branch (it roughly means “build as if the merge already is done“). The main advantage is that it reveals merge conflicts immediately, rather than later when they can do more damage.
If the branch has been long running, it would be useful for the developers to review the changes made to main, since the topic branch was created, to check for possible conflicts, such as by generating a Flyway ‘dry run’ script containing all the changes to main between the version at time of branching and the latest version.
If the build of the new version is successful, it is released to Test, by deploying a clone of the latest database release in main, and then migrating it to the version available in the pre-merge build, using the Flyway engine (we’ll return to those details on the provisioning of those clones in the next article). The new release is evaluated by the tester. If it all checks out, the tester accepts the release and, thanks to the auto-complete option, the code is now automatically merged into main. From there, a release branch is created for the deployment to production.
These steps are also highlighted in the diagram below depicting the end-to-end DevOps pipeline that we built using Flyway Enterprise and Azure DevOps (which I described in the previous article):
Summary and outlook
As promised in the first article, the ‘devil is in the detail’, and there are a lot of details to consider when migrating to new VCS. It involved a big process change for the team as well, but this switch is mandatory to make full use of Flyway migrated databases in production and automatically provisioned dedicated databases for the team members. And one of those topics will be the subject of our next article.
Was this article helpful?
Tools in this post
DevOps for the Database