Automated Provisioning of Multi-Database Environments in a Flyway Development
This article describes how we implemented lightweight database provisioning, using clones, within an automated Flyway release process . It allows users to self-serve personal copies of a complex multi-database system, in a way that is transparent, supports flexible Git branching and merging strategies and agile releases, but does not compromise data security nor add significant admin overhead.
My previous article in this series explained how we implemented a new and automated database development strategy, using Git “Release Flow” branching for parallel development work streams, and a Pull Request workflow to control an automated database build and release process.
The new system relied on developers and testers being able to “self-provision” personal development and test copies of the production database system, safely. Our challenge was to find a way to do this when the database system was complex and comprised 10+ interdependent databases, and no copies of the data were allowed to leave the customer’s servers. We’d learned previously that while provisioning the schemas was straightforward, generating good test data was not. The team needed a solution that included schema and data.
Our answer was to implement database provisioning using clones, as part of our automated Flyway workflow. It allows developers and testers to “self-serve” personal copies, or clones, of the entire set of production databases, while ensuing the company’s data remains secure and without incurring significant additional administrative overhead. This article describes, strategically, how we achieved all these objectives, and my next article will fill in the technical details.
The result is a system that has successfully delivered at least one automated deployment per 2-week sprint, 50 in total since its inception. Previously, once per month was the maximum cadence. Each deployment saves a minimum of one hour of manual scripting compared to the old system. Perhaps most remarkable is an 85% reduction in the number of deployment failures. Under the old system it had become ‘normal’ to have to rollback a deployment and try again, or else apply hotfixes to correct any subsequent issues it caused. Under the new system only one deployment so far has required a rollback.
Providing per-developer copies of complex database systems
By implementing database provisioning using Clones, developers were able to “self-serve” personal copies, or clones, of the database system, each time they created a branch for a new work item, and then refresh and reset them on demand.
How do Clones work?
When creating clones, we create one complete, read-only image of the database files (a data image) and then exploit data virtualization technology to allow many database instances to share access to this image. Each clone can be created and reset very quickly because it requires minimal initial storage space; it is using data from the image and only stores locally the data blocks for any changes made subsequently to the clone.
Each weekend we capture multiple images of the production instance, containing all the interdependent databases. Since the data is related, we do this operation outside of production hours, so that database dependencies will remain in place. A user then simply requests a fresh copy (‘clone’) of the required database, from the provided images, uses it to develop or test, and then disposes of it once the work is complete and committed. Clones are designed to be short-lived, ‘disposable’.
The current underlying technology is Redgate’s SQL Clone, which uses disk virtualization services built into any 64-bit Windows (7 and higher) operating system. Other implementations are available, including Redgate Clone (which is Kubernetes-based and cross-RDBMS) or you can even build your own DIY Clones, if you’re feeling brave! In any case, we aimed to use automation and workflow to make the implementation details transparent to the users: we should be able to change the implementation without affecting the established development workflow.
Database provisioning requirements
For our new database provisioning system to be successful, it had to meet four other important design and implementation criteria:
- Data protection – the solution could not compromise the security of the company’s data
- Automation and transparency– the underlying database provisioning technology should be transparent to the developer. They should not have to learn a set of manual steps to create or reset their database, or to deliver their changes into the test and delivery pipeline.
- Flexible workflow – as well as working on their own branch a developer should be able to switch priorities, collaborate on other branches, without risk of losing changes.
- Housekeeping – the ability to spin up multiple clones of the any version of a database, on demand, is great but it can get messy quickly, without some automated clean-up!
Data protection: external team, local data usage
One of the requirements of the customer was that, for security reasons, the data may never leave the company. The size of the databases also made it obvious that the images of the production system needed co-located with the clones, to prevent slow queries. Database clones work exactly like a normal database, but they require a connection back to the “image” of the source database so unless that connection is fast and reliable, we can hit performance problems:
To meet these requirements, we created a development and test environment which was physically located on a server hosted by the customer, technically isolated from the production environment, but easily accessible for the team:
Developers and testers performed development work, using SSMS and Flyway Desktop, on an RDP Server, which is hosted within a so-called demilitarized zone (DMZ).
Each developer and tester can access the RDP Server via a secure VPN connection. Another firewall separates this from the customer’s production network, hosting the images and clones. Each developer and tester only have SQL access to the clones on their respective personal SQL instances. This way the team can work with the SQL Clone data, securely, while also avoiding any issues with the speed and bandwidth of the connection between image and clones.
Automated and transparent workflow
A key aspect of the design of our system was that the implementation details must be ‘invisible’ to the users. They simply follow a standard process that results in automated provisioning of the environment they need to do their work. For each work item, each branch, each change, the process, as described in the first article in this series, will always be the same:
- Pick up a work item
- Create a branch in Git, link it to this work item. This automatically triggers the follow two actions:
- build latest database system
- provision personal development environment
- Develop feature, run tests
- Dispose of environment.
By following this standard process, everybody in the team knows that starting new work will provision a fresh set of databases. With support per-developer and per-tester database instances, the team were now able to work on user stories in parallel, rather than serially, each one in an isolated branch, auto provisioned with a fresh build of the latest production system. When work on a branch is committed to the version control system (Git), an automated pull-request workflow using an Azure DevOps pipeline and Flyway, ensures that it is tested, reviewed and approved before being merged and released. This improved the quality of the released code and contributed significantly to the big reduction in deployment failures.
To achieve this the database provisioning process must be automated and ‘on-demand’. Our team created PowerShell scripts that automate both Clone provisioning and Flyway migrations and can be run from any team member’s workstation. The goals were to remove any “complexity” to using clones and to minimize the requirement for manual steps, which can accidentally be forgotten when provisioning clones and running Flyway migrations.
Now, as long as we maintain this documented and automated process, we can switch out the current cloning technology for another, if needed, without disrupting the team. For example, if we later decide to switch out SQL Clone (Windows-based, SQL Server only) for its successor, Redgate Clone (Kubernetes clusters and containers, cross-RDBMS), then the team should barely notice; they just continue to follow the same process as before.
Flexible workflow: branch switching and database stashing
The reality of database development is that the team do not always get to follow their standard process, as described above. A developer is assigned to a task, creates a branch, auto-provisions their database environment, and starts work. Before they are ready to commit the change, the manager changes priorities and the developer is asked to work on a fix for an urgent bug! At this point, the disrupted developer needs a quick way to ‘stash’ their current work, so they can pick it up exactly where they left off, once the other urgent job is complete.
It is not just managers who mess up processes and priorities. There’s plenty of other reasons why finishing the process in one go is not possible, and so switching branches is a common requirement. Development work can get stuck due to vague requirements, bad test data or other external conditions.
A solution for this is called switching branches. The idea is that when a developer working on a new feature branch has to switch priorities to an urgent hotfix, the current ‘state’ of his topic branch should be saved. The complicating factor with database development is that to achieve this we must also save the current state of the topic branch database. This becomes much simpler when provisioning with clones because we can quickly ‘stash’ the clone for each branch. This can be implemented locally using git hooks, as described in this article. However, what we are after is, again, a standardized approach that will not be run on a user’s PC, but rather is embedded within PowerShell scripts that support the standard process.
The following diagram summarizes, and somewhat simplifies, how this works. When the developer switches priorities to the urgent hotfix, the current copy of his topic branch database, in this example called MeasurementDb
, is automatically renamed and stashed, and a fresh copy of the database provisioned for the hotfix branch (only one database is shown here, but this logic is executed for all the databases in the personal instance):
To get this to work, each time our PowerShell process provisions a clone, it populates or update a database extended property (called SQLCloneReleaseName
) with the name of the release that instigated the process. For example, the clone produced by the topic branch release might have an extended properly value of 23.9.53.topic-mms25562-ScanWithoutMeasure.r1:
When the developer then creates a hotfix branch, linked to the work item for the urgent bug, our PowerShell scripts will make sure that just before the new databases (clones) are created, the existing databases, in the personal environment of the team member, are renamed and stashed, so that no work is lost. In this example the new name of MeasurementDb database for the topic branch will be MeasurementDb_23.9.53.topic-mms25562-ScanWithoutMeasure.r1.
Once the hotfix is complete, the developer can return to their original task in minutes, simply by redeploying the release 23.9.53.topic-mms25562-ScanWithoutMeasure.r1:
When the developer redeploys the initial release, the PowerShell process detects that a stashed database is available for that release, and renames it back to the original database name, in this example MeasurementDb.
Housekeeping: automated cleanup
Last, but not least, is the important task of housekeeping! With all team members able to self-serve database clones for the various branches, who keeps track of all those stashed databases? Well, our vision on that is that we need an automated “cleanup” process to take care of it. Clones are intended to be ephemeral: if a pull request is completed and a branch is closed, the databases related to the release can be deleted.
This “cleanup process” was already in place for stale .NET releases. With the implementation of Database DevOps, we simply extended this periodic task so that it also cleans up clones of finished work. In addition, we run checks every night to see if there are any unused SQL Clone images (image that aren’t referenced by any clones). The storage space of images is often referred to as the “clone tax”. While paying the clone tax of 1.5 TB for each set of relevant databases is OK, no one wants to pay the tax bill for unused images.
This cleanup routine is part of the automated process that regularly stocks new images. We created an Azure pipeline that is scheduled as a “cron job”. Each weekend, at night, old unused images are cleared out and a fresh set of production database is captured as an image so that the next working day it can be used by the team.
Conclusion
In my experience, it is difficult to sustain a frequent and reliable delivery of database changes if teams are restricted to use of a single, shared database for all development or testing work. The cost involved in shifting, safely, to dedicated, per-developer databases can seem high, for the seemingly intangible reward of “increased productivity”. However, this project proved to the team that the rewards are substantial, and worth the effort. Our transition from a manual to an automated workflow has revolutionized our database development and tests. We use GIT version control, SQL Clone and Flyway to enable parallel workflows without directly exposing the techniques to the team.
The next article will dive into these technical details, the PowerShell scripts and YAML, that drive this process. We also be looking at the critical housekeeping scripts that keep our pipeline running smoothly and personal database instances clean.