How frequently do you integrate your code? Let’s refine that question even further. How frequently do you integrate your code and your database? If you’re working on a team of developers and you do not yet have a method for continuous integration of your application code, database code, and related infrastructure, there’s a pretty strong chance that you’re frequently encountering a painful step at some point in your development lifecycle when that integration occurs.
In order to avoid this pain, you should be looking to create a point in the lifecycle where you are integrating your changes across the team and the application. Because this could be a time consuming process, you’re going to also want to find as many ways as possible to automate it.
The longer you travel down the development path without integrating the changes you or your teammates have made, the more divergent the code becomes. This divergence makes it harder and more time consuming to integrate those changes. Many teams will put off this pain as long as possible, sometimes not even attempting to integrate until they’re putting together a release candidate, causing a descent into hell just when the deadlines are at their most inflexible.
In order to avoid this hell, a better approach is to integrate early and often. The more frequently you integrate your code, the smaller a unit of change you’ll be working with. A small unit of change makes it much easier to identify and address errors as they arise. If you integrate more frequently you’ll also be addressing issues with code that you just put together, so it’ll be much more familiar, again, adding to the ease with which issues can be addressed. The longer you wait, the larger the change set and the more distant those changes become in your memory. All this leads to the need to integrate your code as frequently as possible.
This problem goes back to the beginning of team development. The evolution of development and the evolution of resources such as source code control and build services have pretty much occurred together. Twenty years ago, you had to design and build your own build service by hand. Today we have mature services such as TeamCity, Bamboo and Jenkins which make this job much easier. Combining your source control system that maintains the versions of your application with a build service will allow you to start down the process of putting together continuous integration.
But, it all starts with the ability to automate your builds. And before you can automate your builds, you have to have your database code and structures in a format that you can use in automation. That means getting your database into source control.
Source Control Your Database
Developers have been using some form of source control in order to manage code and deployments within a team environment for several decades, almost since the beginning of programming. Because of the difficulties around data retention, databases have not been maintained in the same fashion. But, over the last five years a number of tools have come out that make it possible to get your database into source control management in the same fashion as application code. This opens up a number of opportunities.
First, you will have a known state that you can use to help maintain your systems. You know what structures and code you currently have in your production environment, but you can’t know exactly what code has been deployed to any of your development or test servers unless you just finished making that deployment. With your database builds coming out of source control, you now have another known state. Depending on your source control system, you have labels or branches that tell you the exact state of the database. You can then use these in deployments and testing in order to know exactly which version of the database is being deployed.
Next, putting your database into source control allows you to more tightly couple the state of the database with the state of the application code. If you version and branch the two together, you’ll always know that a particular version of the database corresponds to a precise version of the application code and vice versa. This makes troubleshooting and deployments much easier since you’ll always be deploying from a common starting point using the known versions maintained by source control.
With your database in source control you also get the ability to audit changes to the databases. Since all deployments will come through source control, you’ll always know who changed which piece of database structure and when it was changed. This helps in troubleshooting issues. But it also helps the database administrators to review changes prior to deployments. They’ll be able to quickly understand exactly which changes have been made and spend time reviewing them instead of having to first investigate and learn the objects that have been changed. Finally this auditing helps to satisfy certain legal requirements built into structures like Sarbannes-Oxley and others.
Most importantly, once you have your database in source control, and all changes to the database start in your source control system, you’ll be able to use this known set of scripts to build your databases.
Automating the Database Build
There are two kinds of database builds you can perform, complete and incremental. A complete database build would entail dropping the existing database and recreating it from the scripts in source control, or, if no database exists, creating it from scratch, again from the scripts in source control. An incremental build is one where the database, and the data, are left intact and only the changes for a given version are deployed to the database. Each of these types of builds is useful, but ultimately, for your production deployments, you’re going to focus down on the incremental build the most.
The easiest database build is the complete build. Taking away the need to retain the data in an existing database means that you only have to have a mechanism in place that ensures the scripts are run in the correct order. There is almost no need for any kind of tooling to get a complete database build. The best approach to a complete build, if you’re not using a tool that can understand the scripts and run them in the correct order, would be to build a roster of the scripts you have and the order that they should be run in. That roster will have to be maintained manually. But, once you have the roster in place, you can use almost any means at all to run the scripts from application code to PowerShell scripts.
Complete builds are useful in a couple of areas. If you are deploying a system for the first time, you’ll be doing a complete build on the database. If you’re a new developer or you are setting up a new server, you’ll be performing a complete build. But the most useful area for complete builds would be to couple a continuous integration process with your source control system. Then, you have the ability to validate your database scripts at the basic level each time a change is made. Because the complete build is so simple to do, and it’s quick because it’s not maintaining data, you can run complete builds very frequently as part of an automated testing process.
But, most of your work in automating your build processes will be in automating incremental builds.
There are two kinds of changes you can have within your database
- a change that doesn’t affect data in any way or, is completely benign to the data
- changes that can lead to data loss
Once you have your database in source control, you can use different mechanisms to generate scripts from your source control system. The easiest we already talked about, which is a complete build. Just drop the database and the rebuild everything using a tool set or a roster. The next easiest is when you have code that won’t affect data like adding a new stored procedure, or creating a non-unique index. Once more, you can simply pull the scripts out of your source control system and then run them. If you’re using a tool, it will ensure the scripts are run in the right order. Otherwise, you’ll again have to create a roster of scripts to be run.
But the hard part is when you’re introducing a change that can lead to data loss. This can be from modifying data types on a column or having to load data in support of columns that are not nullable. In these cases, most tools break down. This is the one area where, even if you’re using a tool to generate your builds, you’re going to have to manually intervene. You’ll need to build a script that retains the data and then that has to go into your source control system in order to be run at the appropriate moment.
But, this doesn’t mean you can’t automate your deployments. You just need to take advantage of the functionality offered by your tool set. Some of them allow for pre or post-deployment scripts, scripts to be run in addition to the generated scripts. Adding scripts that contain the necessary commands for dealing with breaking changes to the pre or post deployment means that you can still automate your builds. And, because you’re maintaining your code within source control, you can ensure that a given script runs with one version of the database deployment, but not others. Other tools provide mechanisms for simply replacing generated scripts with manual scripts at the appropriate change point. Again, these tools work off the functionality provided by maintaining the code in source control where you can specifically identify particular changes to objects within your database and replacing the script appropriately.
Because of the functionality offered through the tool sets in working with source control, you can still automate even incremental builds prior to a deployment to production.
With this automation, you can generate a single artifact, usually a T-SQL script, but it could be a Data Tier Application (DACPAC) file, a NuGet package, or other scripts. This single artifact means that you can test the deployment against a database, ensure that everything worked, and then use that same, tested, artifact for your production deployment. Automating the generation of this artifact from your source control system will ensure smoother, more accurate deployments.
But, the automation doesn’t stop with your builds.
Once you have your code in source control and you’re building your deployment scripts automatically, you can consider other forms of automation. First, as part of a development process, you can implement continuous integration as we’ve talked about throughout this article. Further, you can start to put automated testing in place that can help to ensure that your code does what it’s supposed to do as well as ensure that the deployments themselves are functioning correctly. You’ll also be able to add in additional functionality such as backing up your databases or taking a snapshot prior to deployment. Once you start automating your builds, lots of opportunities open up.
The goal is simple, to get better, safer, faster deployments of your code and database to your production environment. But to meet that goal, you have to start with the fundamentals, and that means getting your database into source control. But, once you’ve done that, you can then start the process of automating your build processes in order to generate that tested artifact that is going to have been deployed, successfully, many times prior to ever getting near production. This approach will help you meet that goal to speed your processes while still protecting your production server.